MySQL is an outstanding open source transactional database used by most web-based applications and is very good at handling OLTP workloads. However, modern business is very much dependent on analytical data. ClickHouse is a columnar database that handles analytical workloads quickly. I recommend you read our previous blog, Using ClickHouse as an Analytic Extension for MySQL, from Vadim to know more about this.
In this blog post, I will show you how to replicate MySQL data in real-time to ClickHouse. I am going to use the MaterializedMySQL engine for this purpose. The blog post contains the following topics.
- MaterializedMySQL Engine – Overview
- Prerequisites for the Replication
- Replication setup
- Replication testing
- Understanding ReplacingMergeTree
- What happens if the Replication event fails?
- What happens if MySQL or ClickHouse restarted?
- Conclusion
MaterializedMySQL Engine – Overview
- MaterializedMySQL engine is an experimental release from the ClickHouse team.
- By default, it will create a database on ClickHouse with all the tables and their data. (We can do the filter as well.)
- Schema conversion from MySQL to ClickHouse will be taken care of by the engine. No manual work!
- Once we configured the setup, the ClickHouse server will act as a replica.
- ClickHouse will read the MySQL binary logs and perform the replication.
- Replication will work well with DDL and DML statements.
- ReplacingMergeTree will be used to perform the replication.
Prerequisites for the replication
We must meet certain requirements to configure the replication from MySQL to ClickHouse. I have split this into two parts, which have to be done from the MySQL end and the ClickHouse end.
From the MySQL end
MySQL should be configured with the following parameters:
[mysqld] gtid_mode = ON enforce_gtid_consistency = ON binlog_format = ROW binlog_row_image = FULL server-id = 100
To configure the replication, we have to create a MySQL user with “REPLICATION SLAVE, REPLICATION CLIENT, SELECT, RELOAD” privileges. Make sure the user should be created with “mysql_native_password” authentication. When you initiate the replication, ClickHouse will copy the existing data from the MySQL table to the ClickHouse table and then start the replication from Binary logs. That is the reason we need SELECT privilege as well.
percona labs > create user ch_repl@'192.168.64.13' identified with mysql_native_password by 'Percona$331'; Query OK, 0 rows affected (0.01 sec) percona labs > grant replication slave, replication client, reload, select on *.* to ch_repl@'192.168.64.13'; Query OK, 0 rows affected (0.02 sec) percona labs > flush privileges; Query OK, 0 rows affected (0.01 sec)
From the ClickHouse end
Make sure you have MySQL port access from the ClickHouse server to the MySQL server.
root@clickhouse:~# telnet 192.168.64.7 3306 Trying 192.168.64.7... Connected to 192.168.64.7.
MaterializedMySQL engine is the experimental release. So, you should enable the parameter (allow_experimental_database_materialized_mysql) in the ClickHouse config file. This can be enabled in the User profile. (file: /etc/clickhouse-server/users.xml)
#vi /etc/clickhouse-server/users.xml <profiles> <default> <!-- Allowing Experimental feature to enable the replication from MySQL to ClickHouse --> <allow_experimental_database_materialized_mysql>1</allow_experimental_database_materialized_mysql> </default> </profiles>
Activating this parameter does not need a ClickHouse restart. You can verify it using the following command.
clickhouse :) select name, value,changed,description from settings where name = 'allow_experimental_database_materialized_mysql'G Query id: 6ad9c836-1c95-48a0-bb2f-ef89474618bf Row 1: ────── name: allow_experimental_database_materialized_mysql value: 1 changed: 1 description: Allow to create database with Engine=MaterializedMySQL(...). 1 row in set. Elapsed: 0.007 sec.
Now, you can configure the replication from MySQL to ClickHouse.
Replication setup
To configure the replication, I have created two servers with the following hostnames. Both servers are running their respective databases.
- mysql
- clickhouse
At MySQL, I have created the table “percona.herc” which has five records.
percona labs > use percona Database changed percona labs > show tables; +-------------------+ | Tables_in_percona | +-------------------+ | herc | +-------------------+ 1 row in set (0.00 sec) percona labs > select * from herc; +----+------+---------------------+--------+ | id | name | c_time | active | +----+------+---------------------+--------+ | 1 | ab | 2023-08-19 01:28:32 | 1 | | 2 | cd | 2023-08-19 01:28:36 | 1 | | 3 | ef | 2023-08-19 01:28:39 | 1 | | 4 | gh | 2023-08-19 01:28:42 | 1 | | 5 | ij | 2023-08-19 01:28:45 | 1 | +----+------+---------------------+--------+ 5 rows in set (0.00 sec)
To configure the replication, we have to run the following command in ClickHouse instance.
Syntax:
create database <database_name> engine = MaterializeMySQL('source host', 'source database', 'source user', 'source password') settings <options>;
In action:
clickhouse :) create database perconaCH engine = MaterializeMySQL('192.168.64.7', 'percona', 'ch_repl', 'Percona$331') settings allows_query_when_mysql_lost=true, max_wait_time_when_mysql_unavailable=10000, materialized_mysql_tables_list='herc'; CREATE DATABASE perconaCH ENGINE = MaterializeMySQL('192.168.64.7', 'percona', 'ch_repl', 'Percona$331') SETTINGS allows_query_when_mysql_lost = 1, max_wait_time_when_mysql_unavailable = 10000, materialized_mysql_tables_list = 'herc' Query id: 499f2057-e879-43ae-b406-eb7cdecd1e82 Ok. 0 rows in set. Elapsed: 0.046 sec.
Let’s verify the data in ClickHouse.
clickhouse :) use perconaCH USE perconaCH Query id: 2d534f39-871a-4718-8f24-daa855fe3311 Ok. 0 rows in set. Elapsed: 0.002 sec. clickhouse :) show tables; SHOW TABLES Query id: 6b0763ef-d78e-41aa-8e33-1b7bd1b047da ┌─name─┐ │ herc │ └──────┘ 1 row in set. Elapsed: 0.004 sec. clickhouse :) select * from herc; SELECT * FROM herc Query id: 84bed28e-3db6-4711-b5cf-ad858eebdce2 ┌─id─┬─name─┬──────────────c_time─┬─active─┐ │ 1 │ ab │ 2023-08-19 01:28:32 │ 1 │ │ 2 │ cd │ 2023-08-19 01:28:36 │ 1 │ │ 3 │ ef │ 2023-08-19 01:28:39 │ 1 │ │ 4 │ gh │ 2023-08-19 01:28:42 │ 1 │ │ 5 │ ij │ 2023-08-19 01:28:45 │ 1 │ └────┴──────┴─────────────────────┴────────┘ 5 rows in set. Elapsed: 0.005 sec.
As you can see, the table “herc” was created, and the data has been copied. (Note: You can use a different database name in ClickHouse.)
ClickHouse has a different data type structure. The data types will be automatically converted by ClickHouse when we initiate the replication process. The relevant data types will be chosen. For example, below, I am sharing the table structure from MySQL and ClickHouse. You can compare the data types for the table we created. You can find more details here.
MySQL
percona labs > show create table hercG *************************** 1. row *************************** Table: herc Create Table: CREATE TABLE `herc` ( `id` int NOT NULL AUTO_INCREMENT, `name` varchar(16) DEFAULT NULL, `c_time` datetime DEFAULT CURRENT_TIMESTAMP, `active` enum('1','0') DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=18 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec)
ClickHouse
clickhouse :) show create table hercG Row 1: ────── statement: CREATE TABLE perconaCH.herc ( `id` Int32, `name` Nullable(String), `c_time` Nullable(DateTime), `active` Nullable(Enum8('1' = 1, '0' = 2)), `_sign` Int8 MATERIALIZED 1, `_version` UInt64 MATERIALIZED 1, INDEX _version _version TYPE minmax GRANULARITY 1 ) ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(id, 4294967) ORDER BY tuple(id) SETTINGS index_granularity = 8192 1 row in set. Elapsed: 0.003 sec.
Note: There are two new columns, “_sign” and “_version.” They are created to manage the data deduplication by ReplacingmergeTree. We will see about it later in this post.
To understand the mechanism of how ClickHouse copies the existing data from MySQL, I am sharing the logs that were collected from MySQL’s general logs. You can see the SELECT statement, which is used to copy the data. Initially, before copying the data, it used “FLUSH TABLES WITH READ LOCK → REPEATEBLE_READ → START TRANSACTION – UNLOCK TABLES”.
2023-08-19T14:14:38.543090Z 12 Connect ch_repl@192.168.64.13 on percona using SSL/TLS 2023-08-19T14:14:38.544934Z 12 Query SET NAMES utf8mb4 2023-08-19T14:14:38.546775Z 12 Query SHOW VARIABLES 2023-08-19T14:14:38.557433Z 12 Query SHOW VARIABLES 2023-08-19T14:14:38.560367Z 12 Query SHOW GRANTS FOR CURRENT_USER() 2023-08-19T14:14:38.561910Z 12 Query FLUSH TABLES 2023-08-19T14:14:38.568580Z 12 Query FLUSH TABLES WITH READ LOCK 2023-08-19T14:14:38.569364Z 12 Query SHOW MASTER STATUS 2023-08-19T14:14:38.570542Z 12 Query SHOW VARIABLES WHERE Variable_name = 'binlog_checksum' 2023-08-19T14:14:38.573362Z 12 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2023-08-19T14:14:38.574353Z 12 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */ 2023-08-19T14:14:38.575577Z 12 Query SELECT TABLE_NAME AS table_name FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE != 'VIEW' AND TABLE_SCHEMA = 'percona' 2023-08-19T14:14:38.577692Z 12 Query SHOW CREATE TABLE percona.herc 2023-08-19T14:14:38.579019Z 12 Query UNLOCK TABLES 2023-08-19T14:14:38.587415Z 12 Query SELECT COLUMN_NAME AS column_name, COLUMN_TYPE AS column_type FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'percona' AND TABLE_NAME = 'herc' ORDER BY ORDINAL_POSITION 2023-08-19T14:14:38.589956Z 12 Query SELECT `id`,`name`,`c_time`,`active` FROM percona.herc 2023-08-19T14:14:38.591538Z 12 Query COMMIT 2023-08-19T14:14:38.592842Z 13 Connect ch_repl@192.168.64.13 on using TCP/IP 2023-08-19T14:14:38.593196Z 13 Query SET @master_binlog_checksum = 'CRC32' 2023-08-19T14:14:38.593536Z 13 Query SET @master_heartbeat_period = 1000000000 2023-08-19T14:14:38.594339Z 13 Binlog Dump GTID Log: '' Pos: 4 GTIDs: '030f9a68-3dfc-11ee-b900-525400edf935:1-44' 2023-08-19T14:14:38.595585Z 12 Query SHOW VARIABLES WHERE Variable_name = 'binlog_checksum'
The following two threads will be initiated at the MySQL end to copy the data and for replication purpose.
| 12 | ch_repl | 192.168.64.13:44142 | percona | Sleep | 479 | | NULL | PRIMARY | | 13 | ch_repl | 192.168.64.13:44152 | NULL | Binlog Dump GTID | 479 | Source has sent all binlog to replica; waiting for more updates | NULL | PRIMARY |
From the ClickHouse end, ClickHouse logs are very verbose trace logs. We can find some useful information there. For example, (file: /var/log/clickhouse-server/clickhouse-server.log).
2023.08.19 03:50:22.755656 [ 4787 ] {} <Debug> mysqlxx::Pool: Connecting to percona@192.168.64.7:3306 as user ch_repl 2023.08.19 03:50:22.755840 [ 4787 ] {} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@192.168.64.7:3306 as user ch_repl with settin gs: connect_timeout=60, read_write_timeout=1800 2023.08.19 19:44:38.557707 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> executeQuery: (internal) /*Materialize MySQL step 1: execute MySQL DDL for dump data*/ EXTERNAL DDL FROM MySQL(perconaCH, percona) CREATE TABLE `herc` ( `id` int NOT NULL AUTO_INCREMENT, `name` varchar(16) DEFAULT NUL L, `c_time` datetime DEFAULT CURRENT_TIMESTAMP, `active` enum('1','0') DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=18 DEFAULT CHAR SET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci (stage: Complete) 2023.08.19 19:44:38.558200 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> executeQuery: (internal) /* Rewritten MySQL DDL Query */ CREATE TAB LE perconaCH.herc (`id` Int32, `name` Nullable(String), `c_time` Nullable(DateTime), `active` Nullable(Enum8('1' = 1, '0' = 2)), `_sign` Int8() MATERI ALIZED 1, `_version` UInt64() MATERIALIZED 1, INDEX _version _version TYPE minmax GRANULARITY 1) ENGINE = ReplacingMergeTree(_version) PARTITION BY in tDiv(id, 4294967) ORDER BY tuple(id) (stage: Complete) 2023.08.19 19:44:38.560678 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> perconaCH.herc (14fd18a9-7030-44a8-ad21-13d0b7c6bfd3): Loading data parts
You can see the CREATE TABLE statement from MySQL and how ClickHouse rewrites it from the above logs.
So, all set! We were able to configure the replication, and the data has been copied from MySQL to ClickHouse. Next step, we can start to test the replication.
UPDATE on MySQL
In MySQL
percona labs > select * from herc where id = 5; +----+------+---------------------+--------+ | id | name | c_time | active | +----+------+---------------------+--------+ | 5 | ij | 2023-08-19 01:28:45 | 1 | +----+------+---------------------+--------+ 1 row in set (0.00 sec) percona labs > update herc set name = 'Percona' where id = 5; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 percona labs > select * from herc where id = 5; +----+---------+---------------------+--------+ | id | name | c_time | active | +----+---------+---------------------+--------+ | 5 | Percona | 2023-08-19 01:28:45 | 1 | +----+---------+---------------------+--------+ 1 row in set (0.00 sec)
At ClickHouse
clickhouse :) select * from herc where id = 5; SELECT * FROM herc WHERE id = 5 Query id: a087e52e-cc58-4057-b608-ad41b594a937 ┌─id─┬─name────┬──────────────c_time─┬─active─┐ │ 5 │ Percona │ 2023-08-19 01:28:45 │ 1 │ └────┴─────────┴─────────────────────┴────────┘ 1 row in set. Elapsed: 0.006 sec.
We can see the data has been updated in ClickHouse. Let’s try to get more details from the ClickHouse log file regarding the UPDATE statement (file: /var/log/clickhouse-server/clickhouse-server.log).
=== XID_EVENT === Timestamp: 1692455972 Event Type: XID_EVENT Server ID: 100 Event Size: 31 Log Pos: 1906 Flags: 0 XID: 108 2023.08.19 20:09:33.075649 [ 5088 ] {b1733f81-e3a9-40b9-a8f9-6df6ceaceea5} <Debug> executeQuery: (internal) /*Materialize MySQL step 1: execute dump d ata*/ INSERT INTO `herc` (id, name, c_time, active, _sign, _version) VALUES (stage: Complete) 2023.08.19 20:09:33.076804 [ 5088 ] {b1733f81-e3a9-40b9-a8f9-6df6ceaceea5} <Trace> perconaCH.herc (14fd18a9-7030-44a8-ad21-13d0b7c6bfd3): Trying to re serve 1.00 MiB using storage policy from min volume index 0 2023.08.19 20:09:33.078745 [ 5088 ] {} <Information> MaterializedMySQLSyncThread: MySQL executed position: === Binlog Position === Binlog: binlog.000007 Position: 1906 GTIDSets: 030f9a68-3dfc-11ee-b900-525400edf935:1-46
As you see above, the logs were generated for the single UPDATE statement. You can find more useful information related to Binary logs, position, GTID, XID number, etc. This can be used to debug and monitor the replication events.
And, as you see above, the UPDATE statement was converted to an INSERT statement with “_sign” and “_version” columns. To understand this, we have to understand How ReplacingMergeTree works.
Understanding ReplacingMergeTree
ReplacingMergeTree deduplicates the data using the “_version” and “_sign” columns in the background. To understand this better in a practical way:
Creating a new record in MySQL
percona labs > insert into herc values (7,'testRep','2023-08-19 01:29:11',1); Query OK, 1 row affected (0.01 sec)
In ClickHouse, this time, I will use all the columns in the SELECT statement instead of “SELECT *.”
clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7; Query id: fc272a7c-9c03-4061-b76d-2728aadbec31 ┌─id─┬─name────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐ │ 7 │ testRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 1 │ └────┴─────────┴─────────────────────┴────────┴───────┴──────────┘ 1 row in set. Elapsed: 0.006 sec.
You can see this is the first version of data “_version = 1”.
Let’s do UPDATE on MySQL:
percona labs > update herc set name = 'UPDtestRep' where id = 7; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0
At ClickHouse:
clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7; Query id: f498186c-cc9d-4ce1-9613-af491b3850e8 ┌─id─┬─name────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐ │ 7 │ testRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 1 │ └────┴─────────┴─────────────────────┴────────┴───────┴──────────┘ ┌─id─┬─name───────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐ │ 7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 2 │ └────┴────────────┴─────────────────────┴────────┴───────┴──────────┘ 2 rows in set. Elapsed: 0.005 sec.
You can see two records for “id = 7”. This means it is keeping the previous version of data “_version = 1, name = testRep”, as well as the latest version “_version = 2, name = UPDtestRep”.
Now, let’s execute the DELETE statement on MySQL.
percona labs > delete from herc where id = 7; Query OK, 1 row affected (0.01 sec)
At ClickHouse:
clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7; Query id: e3aab7e8-1ac8-4e7a-8ec7-c1cd2a63a24d ┌─id─┬─name────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐ │ 7 │ testRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 2 │ └────┴─────────┴─────────────────────┴────────┴───────┴──────────┘ ┌─id─┬─name───────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐ │ 7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 3 │ └────┴────────────┴─────────────────────┴────────┴───────┴──────────┘ ┌─id─┬─name───────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐ │ 7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1 │ -1 │ 4 │ └────┴────────────┴─────────────────────┴────────┴───────┴──────────┘ 3 rows in set. Elapsed: 0.013 sec.
Now, you can see three records (three versions).
- The first record is the one we initially created with INSERT ( _sign = 1, _version = 1, name = testRep )
- Second record is created when doing UPDATE ( _sign = 1, _version = 2, name = UPDtestRep)
- The third record is created when doing DELETE. You can see the _sign with negative value when doing DELETE ( _sign = -1, _version = 3, name = UPDtestRep)
So, here, the latest version is “3” with negative value “_sign = -1”. Which means the data got deleted. You can find this by executing the following query.
clickhouse :) select * from herc where id = 7; Query id: f7634bbf-a12a-4d9a-bea1-cd917f72d581 Ok. 0 rows in set. Elapsed: 0.009 sec. clickhouse :) select id, name , c_time, active from herc where id = 7; Query id: bf27b945-ec11-4caa-9942-ef7e35bd2bb4 Ok. 0 rows in set. Elapsed: 0.008 sec.
When doing without “_sign, _version” columns OR “SELECT *”, you can find the latest value 🙂
What happens if the replication event fails?
All the MySQL statements are not supported for ClickHouse with the MaterialisedMySQL engine. So, it is necessary to monitor the replication failure events. (I will create another blog to cover more on this topic.) You can use the ClickHouse server and error logs to monitor the replication failure events. For example, at MySQL:
percona labs > create user 't2'@localhost identified by 'Aer@321'; Query OK, 0 rows affected (0.01 sec)
At ClickHouse logs:
=== GTID_EVENT === Timestamp: 1692394221 Event Type: GTID_EVENT Server ID: 100 Event Size: 79 Log Pos: 10313 Flags: 0 GTID Next: 030f9a68-3dfc-11ee-b900-525400edf935:35 2023.08.19 03:00:21.267928 [ 3861 ] {9fea936d-8f9e-4c7d-9075-39fe4876610c} <Debug> executeQuery: (internal) (stage: Complete) 2023.08.19 03:00:21.268093 [ 3861 ] {9fea936d-8f9e-4c7d-9075-39fe4876610c} <Error> MaterializedMySQLSyncThread(perconaCH): Query EXTERNAL DDL FROM MySQL(perconaCH, percona) CREATE USER 't2'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*A907AFD142E6FA14B293E13D508A8F505B7119AE' wasn't finished successfully: Code: 62. DB::Exception: Syntax error: failed at position 114 ('USER'): USER 't2'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*A907AFD142E6FA14B293E13D508A8F505B7119AE'. Expected one of: TEMPORARY, TABLE. (SYNTAX_ERROR), Stack trace (when copying this message, always include the lines below):
You can see in the log it mentioned: “Query wasn’t finished successfully.” This can be used to monitor the failure events. The replication chain will not be broken if the event is not executed. It will just skip that event.
What happens if MySQL or ClickHouse restarted?
In case of failures of MySQL or ClickHouse, replication will be reinitiated when they are alive. For example, If MySQL restarted, you can find the following logs:
2023.08.19 03:50:12.737040 [ 4787 ] {} <Debug> mysqlxx::Pool: Connecting to percona@192.168.64.7:3306 as user ch_repl 2023.08.19 03:50:12.737204 [ 4787 ] {} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@192.168.64.7:3306 as user ch_repl with settin gs: connect_timeout=60, read_write_timeout=1800 2023.08.19 03:50:12.739134 [ 4787 ] {} <Error> mysqlxx::Pool: mysqlxx::ConnectionFailed 2023.08.19 03:50:22.850810 [ 4787 ] {} <Trace> mysqlxx::Pool: Entry(connection 8): sending PING to check if it is alive. 2023.08.19 03:50:22.851138 [ 4787 ] {} <Trace> mysqlxx::Pool: Entry(connection 8): PING ok. 2023.08.19 03:50:22.851155 [ 4787 ] {} <Trace> mysqlxx::Query: Running MySQL query using connection 8 2023.08.19 19:44:38.504900 [ 4533 ] {499f2057-e879-43ae-b406-eb7cdecd1e82} <Debug> mysqlxx::Pool: Connecting to percona@192.168.64.7:3306 as user ch_repl 2023.08.19 19:44:38.504922 [ 4533 ] {499f2057-e879-43ae-b406-eb7cdecd1e82} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@192.168.64.7:3306 as user ch_repl with settings: connect_timeout=60, read_write_timeout=1800
From the above logs, you can see that ClickHouse keeps checking if MySQL is alive or not. Once MySQL is alive, it will initiate the connection to MySQL. When the replication starts, the executed events will not be executed again.
If ClickHouse restarted
As you can see below, ClickHouse was terminated. After some time, when it starts again, ClickHouse is trying to initiate the connection to MySQL and then resume the replication.
2023.08.19 20:52:59.656085 [ 4532 ] {} <Trace> BaseDaemon: Received signal 15 2023.08.19 20:52:59.656190 [ 4532 ] {} <Information> Application: Received termination signal (Terminated) ………… 2023.08.19 20:52:59.825884 [ 5672 ] {} <Information> Application: Starting ClickHouse 23.7.4.5 (revision: 54476, git hash: bd2fcd445534e57cc5aa8c170cc 25b7479b79c1c, build id: F73D00BCFC4671837E6C185DB350C813D958BD25), PID 5672 2023.08.19 20:52:59.826045 [ 5672 ] {} <Information> Application: starting up 2023.08.19 20:52:59.952675 [ 5929 ] {} <Trace> mysqlxx::Pool: (percona@192.168.64.7:3306 as user ch_repl): Iterating through existing MySQL connections 2023.08.19 20:52:59.952692 [ 5929 ] {} <Trace> mysqlxx::Pool: Entry(connection 11): sending PING to check if it is alive. 2023.08.19 20:52:59.953015 [ 5929 ] {} <Trace> mysqlxx::Pool: Entry(connection 11): PING ok. 2023.08.19 20:52:59.953023 [ 5929 ] {} <Trace> mysqlxx::Query: Running MySQL query using connection 11
Conclusion
Having data in real-time for analytics is helpful in improving your business. The MaterializedMySQL engine is one of the components that can be used to replicate the MySQL data to ClickHouse. But remember, this is still in the experimental phase. Altinity’s Sink connector is another good solution to explore. I will try to write about that in the future.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!