
MySQL is an outstanding open source transactional database used by most web-based applications and is very good at handling OLTP workloads. However, modern business is very much dependent on analytical data. ClickHouse is a columnar database that handles analytical workloads quickly. I recommend you read our previous blog, Using ClickHouse as an Analytic Extension for MySQL, from Vadim to know more about this.
In this blog post, I will show you how to replicate MySQL data in real-time to ClickHouse. I am going to use the MaterializedMySQL engine for this purpose. The blog post contains the following topics.
- MaterializedMySQL Engine – Overview
- Prerequisites for the Replication
- Replication setup
- Replication testing
- Understanding ReplacingMergeTree
- What happens if the Replication event fails?
- What happens if MySQL or ClickHouse restarted?
- Conclusion
MaterializedMySQL Engine – Overview
- MaterializedMySQL engine is an experimental release from the ClickHouse team.
- By default, it will create a database on ClickHouse with all the tables and their data. (We can do the filter as well.)
- Schema conversion from MySQL to ClickHouse will be taken care of by the engine. No manual work!
- Once we configured the setup, the ClickHouse server will act as a replica.
- ClickHouse will read the MySQL binary logs and perform the replication.
- Replication will work well with DDL and DML statements.
- ReplacingMergeTree will be used to perform the replication.
Prerequisites for the replication
We must meet certain requirements to configure the replication from MySQL to ClickHouse. I have split this into two parts, which have to be done from the MySQL end and the ClickHouse end.
From the MySQL end
MySQL should be configured with the following parameters:
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
binlog_row_image = FULL
server-id = 100
To configure the replication, we have to create a MySQL user with “REPLICATION SLAVE, REPLICATION CLIENT, SELECT, RELOAD” privileges. Make sure the user should be created with “mysql_native_password” authentication. When you initiate the replication, ClickHouse will copy the existing data from the MySQL table to the ClickHouse table and then start the replication from Binary logs. That is the reason we need SELECT privilege as well.
percona labs > create user ch_repl@'192.168.64.13' identified with mysql_native_password by 'Percona$331';
Query OK, 0 rows affected (0.01 sec)
percona labs > grant replication slave, replication client, reload, select on *.* to ch_repl@'192.168.64.13';
Query OK, 0 rows affected (0.02 sec)
percona labs > flush privileges;
Query OK, 0 rows affected (0.01 sec)
From the ClickHouse end
Make sure you have MySQL port access from the ClickHouse server to the MySQL server.
root@clickhouse:~# telnet 192.168.64.7 3306
Trying 192.168.64.7...
Connected to 192.168.64.7.
MaterializedMySQL engine is the experimental release. So, you should enable the parameter (allow_experimental_database_materialized_mysql) in the ClickHouse config file. This can be enabled in the User profile. (file: /etc/clickhouse-server/users.xml)
#vi /etc/clickhouse-server/users.xml
<profiles>
<default>
<!-- Allowing Experimental feature to enable the replication from MySQL to ClickHouse -->
<allow_experimental_database_materialized_mysql>1</allow_experimental_database_materialized_mysql>
</default>
</profiles>
Activating this parameter does not need a ClickHouse restart. You can verify it using the following command.
clickhouse :) select name, value,changed,description from settings where name = 'allow_experimental_database_materialized_mysql'G
Query id: 6ad9c836-1c95-48a0-bb2f-ef89474618bf
Row 1:
──────
name: allow_experimental_database_materialized_mysql
value: 1
changed: 1
description: Allow to create database with Engine=MaterializedMySQL(...).
1 row in set. Elapsed: 0.007 sec.
Now, you can configure the replication from MySQL to ClickHouse.
Replication setup
To configure the replication, I have created two servers with the following hostnames. Both servers are running their respective databases.
At MySQL, I have created the table “percona.herc” which has five records.
percona labs > use percona
Database changed
percona labs > show tables;
+-------------------+
| Tables_in_percona |
+-------------------+
| herc |
+-------------------+
1 row in set (0.00 sec)
percona labs > select * from herc;
+----+------+---------------------+--------+
| id | name | c_time | active |
+----+------+---------------------+--------+
| 1 | ab | 2023-08-19 01:28:32 | 1 |
| 2 | cd | 2023-08-19 01:28:36 | 1 |
| 3 | ef | 2023-08-19 01:28:39 | 1 |
| 4 | gh | 2023-08-19 01:28:42 | 1 |
| 5 | ij | 2023-08-19 01:28:45 | 1 |
+----+------+---------------------+--------+
5 rows in set (0.00 sec)
To configure the replication, we have to run the following command in ClickHouse instance.
Syntax:
create database <database_name> engine = MaterializeMySQL('source host', 'source database', 'source user', 'source password') settings <options>;
In action:
clickhouse :) create database perconaCH engine = MaterializeMySQL('192.168.64.7', 'percona', 'ch_repl', 'Percona$331') settings allows_query_when_mysql_lost=true, max_wait_time_when_mysql_unavailable=10000, materialized_mysql_tables_list='herc';
CREATE DATABASE perconaCH
ENGINE = MaterializeMySQL('192.168.64.7', 'percona', 'ch_repl', 'Percona$331')
SETTINGS allows_query_when_mysql_lost = 1, max_wait_time_when_mysql_unavailable = 10000, materialized_mysql_tables_list = 'herc'
Query id: 499f2057-e879-43ae-b406-eb7cdecd1e82
Ok.
0 rows in set. Elapsed: 0.046 sec.
Let’s verify the data in ClickHouse.
clickhouse :) use perconaCH
USE perconaCH
Query id: 2d534f39-871a-4718-8f24-daa855fe3311
Ok.
0 rows in set. Elapsed: 0.002 sec.
clickhouse :) show tables;
SHOW TABLES
Query id: 6b0763ef-d78e-41aa-8e33-1b7bd1b047da
┌─name─┐
│ herc │
└──────┘
1 row in set. Elapsed: 0.004 sec.
clickhouse :) select * from herc;
SELECT *
FROM herc
Query id: 84bed28e-3db6-4711-b5cf-ad858eebdce2
┌─id─┬─name─┬──────────────c_time─┬─active─┐
│ 1 │ ab │ 2023-08-19 01:28:32 │ 1 │
│ 2 │ cd │ 2023-08-19 01:28:36 │ 1 │
│ 3 │ ef │ 2023-08-19 01:28:39 │ 1 │
│ 4 │ gh │ 2023-08-19 01:28:42 │ 1 │
│ 5 │ ij │ 2023-08-19 01:28:45 │ 1 │
└────┴──────┴─────────────────────┴────────┘
5 rows in set. Elapsed: 0.005 sec.
As you can see, the table “herc” was created, and the data has been copied. (Note: You can use a different database name in ClickHouse.)
ClickHouse has a different data type structure. The data types will be automatically converted by ClickHouse when we initiate the replication process. The relevant data types will be chosen. For example, below, I am sharing the table structure from MySQL and ClickHouse. You can compare the data types for the table we created. You can find more details here.
MySQL
percona labs > show create table hercG
*************************** 1. row ***************************
Table: herc
Create Table: CREATE TABLE `herc` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(16) DEFAULT NULL,
`c_time` datetime DEFAULT CURRENT_TIMESTAMP,
`active` enum('1','0') DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=18 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
ClickHouse
clickhouse :) show create table hercG
Row 1:
──────
statement: CREATE TABLE perconaCH.herc
(
`id` Int32,
`name` Nullable(String),
`c_time` Nullable(DateTime),
`active` Nullable(Enum8('1' = 1, '0' = 2)),
`_sign` Int8 MATERIALIZED 1,
`_version` UInt64 MATERIALIZED 1,
INDEX _version _version TYPE minmax GRANULARITY 1
)
ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(id, 4294967)
ORDER BY tuple(id)
SETTINGS index_granularity = 8192
1 row in set. Elapsed: 0.003 sec.
Note: There are two new columns, “_sign” and “_version.” They are created to manage the data deduplication by ReplacingmergeTree. We will see about it later in this post.
To understand the mechanism of how ClickHouse copies the existing data from MySQL, I am sharing the logs that were collected from MySQL’s general logs. You can see the SELECT statement, which is used to copy the data. Initially, before copying the data, it used “FLUSH TABLES WITH READ LOCK → REPEATEBLE_READ → START TRANSACTION – UNLOCK TABLES”.
2023-08-19T14:14:38.543090Z 12 Connect ch_repl@192.168.64.13 on percona using SSL/TLS
2023-08-19T14:14:38.544934Z 12 Query SET NAMES utf8mb4
2023-08-19T14:14:38.546775Z 12 Query SHOW VARIABLES
2023-08-19T14:14:38.557433Z 12 Query SHOW VARIABLES
2023-08-19T14:14:38.560367Z 12 Query SHOW GRANTS FOR CURRENT_USER()
2023-08-19T14:14:38.561910Z 12 Query FLUSH TABLES
2023-08-19T14:14:38.568580Z 12 Query FLUSH TABLES WITH READ LOCK
2023-08-19T14:14:38.569364Z 12 Query SHOW MASTER STATUS
2023-08-19T14:14:38.570542Z 12 Query SHOW VARIABLES WHERE Variable_name = 'binlog_checksum'
2023-08-19T14:14:38.573362Z 12 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2023-08-19T14:14:38.574353Z 12 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2023-08-19T14:14:38.575577Z 12 Query SELECT TABLE_NAME AS table_name FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE != 'VIEW' AND TABLE_SCHEMA = 'percona'
2023-08-19T14:14:38.577692Z 12 Query SHOW CREATE TABLE percona.herc
2023-08-19T14:14:38.579019Z 12 Query UNLOCK TABLES
2023-08-19T14:14:38.587415Z 12 Query SELECT COLUMN_NAME AS column_name, COLUMN_TYPE AS column_type FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'percona' AND TABLE_NAME = 'herc' ORDER BY ORDINAL_POSITION
2023-08-19T14:14:38.589956Z 12 Query SELECT `id`,`name`,`c_time`,`active` FROM percona.herc
2023-08-19T14:14:38.591538Z 12 Query COMMIT
2023-08-19T14:14:38.592842Z 13 Connect ch_repl@192.168.64.13 on using TCP/IP
2023-08-19T14:14:38.593196Z 13 Query SET @master_binlog_checksum = 'CRC32'
2023-08-19T14:14:38.593536Z 13 Query SET @master_heartbeat_period = 1000000000
2023-08-19T14:14:38.594339Z 13 Binlog Dump GTID Log: '' Pos: 4 GTIDs: '030f9a68-3dfc-11ee-b900-525400edf935:1-44'
2023-08-19T14:14:38.595585Z 12 Query SHOW VARIABLES WHERE Variable_name = 'binlog_checksum'
The following two threads will be initiated at the MySQL end to copy the data and for replication purpose.
| 12 | ch_repl | 192.168.64.13:44142 | percona | Sleep | 479 | | NULL | PRIMARY |
| 13 | ch_repl | 192.168.64.13:44152 | NULL | Binlog Dump GTID | 479 | Source has sent all binlog to replica; waiting for more updates | NULL | PRIMARY |
From the ClickHouse end, ClickHouse logs are very verbose trace logs. We can find some useful information there. For example, (file: /var/log/clickhouse-server/clickhouse-server.log).
2023.08.19 03:50:22.755656 [ 4787 ] {} <Debug> mysqlxx::Pool: Connecting to percona@192.168.64.7:3306 as user ch_repl
2023.08.19 03:50:22.755840 [ 4787 ] {} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@192.168.64.7:3306 as user ch_repl with settin
gs: connect_timeout=60, read_write_timeout=1800
2023.08.19 19:44:38.557707 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> executeQuery: (internal) /*Materialize MySQL step 1: execute MySQL
DDL for dump data*/ EXTERNAL DDL FROM MySQL(perconaCH, percona) CREATE TABLE `herc` ( `id` int NOT NULL AUTO_INCREMENT, `name` varchar(16) DEFAULT NUL
L, `c_time` datetime DEFAULT CURRENT_TIMESTAMP, `active` enum('1','0') DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=18 DEFAULT CHAR
SET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci (stage: Complete)
2023.08.19 19:44:38.558200 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> executeQuery: (internal) /* Rewritten MySQL DDL Query */ CREATE TAB
LE perconaCH.herc (`id` Int32, `name` Nullable(String), `c_time` Nullable(DateTime), `active` Nullable(Enum8('1' = 1, '0' = 2)), `_sign` Int8() MATERI
ALIZED 1, `_version` UInt64() MATERIALIZED 1, INDEX _version _version TYPE minmax GRANULARITY 1) ENGINE = ReplacingMergeTree(_version) PARTITION BY in
tDiv(id, 4294967) ORDER BY tuple(id) (stage: Complete)
2023.08.19 19:44:38.560678 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> perconaCH.herc (14fd18a9-7030-44a8-ad21-13d0b7c6bfd3): Loading data
parts
You can see the CREATE TABLE statement from MySQL and how ClickHouse rewrites it from the above logs.
So, all set! We were able to configure the replication, and the data has been copied from MySQL to ClickHouse. Next step, we can start to test the replication.
UPDATE on MySQL
In MySQL
percona labs > select * from herc where id = 5;
+----+------+---------------------+--------+
| id | name | c_time | active |
+----+------+---------------------+--------+
| 5 | ij | 2023-08-19 01:28:45 | 1 |
+----+------+---------------------+--------+
1 row in set (0.00 sec)
percona labs > update herc set name = 'Percona' where id = 5;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
percona labs > select * from herc where id = 5;
+----+---------+---------------------+--------+
| id | name | c_time | active |
+----+---------+---------------------+--------+
| 5 | Percona | 2023-08-19 01:28:45 | 1 |
+----+---------+---------------------+--------+
1 row in set (0.00 sec)
At ClickHouse
clickhouse :) select * from herc where id = 5;
SELECT *
FROM herc
WHERE id = 5
Query id: a087e52e-cc58-4057-b608-ad41b594a937
┌─id─┬─name────┬──────────────c_time─┬─active─┐
│ 5 │ Percona │ 2023-08-19 01:28:45 │ 1 │
└────┴─────────┴─────────────────────┴────────┘
1 row in set. Elapsed: 0.006 sec.
We can see the data has been updated in ClickHouse. Let’s try to get more details from the ClickHouse log file regarding the UPDATE statement (file: /var/log/clickhouse-server/clickhouse-server.log).
=== XID_EVENT ===
Timestamp: 1692455972
Event Type: XID_EVENT
Server ID: 100
Event Size: 31
Log Pos: 1906
Flags: 0
XID: 108
2023.08.19 20:09:33.075649 [ 5088 ] {b1733f81-e3a9-40b9-a8f9-6df6ceaceea5} <Debug> executeQuery: (internal) /*Materialize MySQL step 1: execute dump d
ata*/ INSERT INTO `herc` (id, name, c_time, active, _sign, _version) VALUES (stage: Complete)
2023.08.19 20:09:33.076804 [ 5088 ] {b1733f81-e3a9-40b9-a8f9-6df6ceaceea5} <Trace> perconaCH.herc (14fd18a9-7030-44a8-ad21-13d0b7c6bfd3): Trying to re
serve 1.00 MiB using storage policy from min volume index 0
2023.08.19 20:09:33.078745 [ 5088 ] {} <Information> MaterializedMySQLSyncThread: MySQL executed position:
=== Binlog Position ===
Binlog: binlog.000007
Position: 1906
GTIDSets: 030f9a68-3dfc-11ee-b900-525400edf935:1-46
As you see above, the logs were generated for the single UPDATE statement. You can find more useful information related to Binary logs, position, GTID, XID number, etc. This can be used to debug and monitor the replication events.
And, as you see above, the UPDATE statement was converted to an INSERT statement with “_sign” and “_version” columns. To understand this, we have to understand How ReplacingMergeTree works.
Understanding ReplacingMergeTree
ReplacingMergeTree deduplicates the data using the “_version” and “_sign” columns in the background. To understand this better in a practical way:
Creating a new record in MySQL
percona labs > insert into herc values (7,'testRep','2023-08-19 01:29:11',1);
Query OK, 1 row affected (0.01 sec)
In ClickHouse, this time, I will use all the columns in the SELECT statement instead of “SELECT *.”
clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7;
Query id: fc272a7c-9c03-4061-b76d-2728aadbec31
┌─id─┬─name────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐
│ 7 │ testRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 1 │
└────┴─────────┴─────────────────────┴────────┴───────┴──────────┘
1 row in set. Elapsed: 0.006 sec.
You can see this is the first version of data “_version = 1”.
Let’s do UPDATE on MySQL:
percona labs > update herc set name = 'UPDtestRep' where id = 7;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
At ClickHouse:
clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7;
Query id: f498186c-cc9d-4ce1-9613-af491b3850e8
┌─id─┬─name────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐
│ 7 │ testRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 1 │
└────┴─────────┴─────────────────────┴────────┴───────┴──────────┘
┌─id─┬─name───────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐
│ 7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 2 │
└────┴────────────┴─────────────────────┴────────┴───────┴──────────┘
2 rows in set. Elapsed: 0.005 sec.
You can see two records for “id = 7”. This means it is keeping the previous version of data “_version = 1, name = testRep”, as well as the latest version “_version = 2, name = UPDtestRep”.
Now, let’s execute the DELETE statement on MySQL.
percona labs > delete from herc where id = 7;
Query OK, 1 row affected (0.01 sec)
At ClickHouse:
clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7;
Query id: e3aab7e8-1ac8-4e7a-8ec7-c1cd2a63a24d
┌─id─┬─name────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐
│ 7 │ testRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 2 │
└────┴─────────┴─────────────────────┴────────┴───────┴──────────┘
┌─id─┬─name───────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐
│ 7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1 │ 1 │ 3 │
└────┴────────────┴─────────────────────┴────────┴───────┴──────────┘
┌─id─┬─name───────┬──────────────c_time─┬─active─┬─_sign─┬─_version─┐
│ 7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1 │ -1 │ 4 │
└────┴────────────┴─────────────────────┴────────┴───────┴──────────┘
3 rows in set. Elapsed: 0.013 sec.
Now, you can see three records (three versions).
- The first record is the one we initially created with INSERT ( _sign = 1, _version = 1, name = testRep )
- Second record is created when doing UPDATE ( _sign = 1, _version = 2, name = UPDtestRep)
- The third record is created when doing DELETE. You can see the _sign with negative value when doing DELETE ( _sign = -1, _version = 3, name = UPDtestRep)
So, here, the latest version is “3” with negative value “_sign = -1”. Which means the data got deleted. You can find this by executing the following query.
clickhouse :) select * from herc where id = 7;
Query id: f7634bbf-a12a-4d9a-bea1-cd917f72d581
Ok.
0 rows in set. Elapsed: 0.009 sec.
clickhouse :) select id, name , c_time, active from herc where id = 7;
Query id: bf27b945-ec11-4caa-9942-ef7e35bd2bb4
Ok.
0 rows in set. Elapsed: 0.008 sec.
When doing without “_sign, _version” columns OR “SELECT *”, you can find the latest value 🙂
What happens if the replication event fails?
All the MySQL statements are not supported for ClickHouse with the MaterialisedMySQL engine. So, it is necessary to monitor the replication failure events. (I will create another blog to cover more on this topic.) You can use the ClickHouse server and error logs to monitor the replication failure events. For example, at MySQL:
percona labs > create user 't2'@localhost identified by 'Aer@321';
Query OK, 0 rows affected (0.01 sec)
At ClickHouse logs:
=== GTID_EVENT ===
Timestamp: 1692394221
Event Type: GTID_EVENT
Server ID: 100
Event Size: 79
Log Pos: 10313
Flags: 0
GTID Next: 030f9a68-3dfc-11ee-b900-525400edf935:35
2023.08.19 03:00:21.267928 [ 3861 ] {9fea936d-8f9e-4c7d-9075-39fe4876610c} <Debug> executeQuery: (internal) (stage: Complete)
2023.08.19 03:00:21.268093 [ 3861 ] {9fea936d-8f9e-4c7d-9075-39fe4876610c} <Error> MaterializedMySQLSyncThread(perconaCH): Query EXTERNAL DDL FROM MySQL(perconaCH, percona) CREATE USER 't2'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*A907AFD142E6FA14B293E13D508A8F505B7119AE' wasn't finished successfully: Code: 62. DB::Exception: Syntax error: failed at position 114 ('USER'): USER 't2'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*A907AFD142E6FA14B293E13D508A8F505B7119AE'. Expected one of: TEMPORARY, TABLE. (SYNTAX_ERROR), Stack trace (when copying this message, always include the lines below):
You can see in the log it mentioned: “Query wasn’t finished successfully.” This can be used to monitor the failure events. The replication chain will not be broken if the event is not executed. It will just skip that event.
What happens if MySQL or ClickHouse restarted?
In case of failures of MySQL or ClickHouse, replication will be reinitiated when they are alive. For example, If MySQL restarted, you can find the following logs:
2023.08.19 03:50:12.737040 [ 4787 ] {} <Debug> mysqlxx::Pool: Connecting to percona@192.168.64.7:3306 as user ch_repl
2023.08.19 03:50:12.737204 [ 4787 ] {} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@192.168.64.7:3306 as user ch_repl with settin
gs: connect_timeout=60, read_write_timeout=1800
2023.08.19 03:50:12.739134 [ 4787 ] {} <Error> mysqlxx::Pool: mysqlxx::ConnectionFailed
2023.08.19 03:50:22.850810 [ 4787 ] {} <Trace> mysqlxx::Pool: Entry(connection 8): sending PING to check if it is alive.
2023.08.19 03:50:22.851138 [ 4787 ] {} <Trace> mysqlxx::Pool: Entry(connection 8): PING ok.
2023.08.19 03:50:22.851155 [ 4787 ] {} <Trace> mysqlxx::Query: Running MySQL query using connection 8
2023.08.19 19:44:38.504900 [ 4533 ] {499f2057-e879-43ae-b406-eb7cdecd1e82} <Debug> mysqlxx::Pool: Connecting to percona@192.168.64.7:3306 as user ch_repl
2023.08.19 19:44:38.504922 [ 4533 ] {499f2057-e879-43ae-b406-eb7cdecd1e82} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@192.168.64.7:3306 as user ch_repl with settings: connect_timeout=60, read_write_timeout=1800
From the above logs, you can see that ClickHouse keeps checking if MySQL is alive or not. Once MySQL is alive, it will initiate the connection to MySQL. When the replication starts, the executed events will not be executed again.
If ClickHouse restarted
As you can see below, ClickHouse was terminated. After some time, when it starts again, ClickHouse is trying to initiate the connection to MySQL and then resume the replication.
2023.08.19 20:52:59.656085 [ 4532 ] {} <Trace> BaseDaemon: Received signal 15
2023.08.19 20:52:59.656190 [ 4532 ] {} <Information> Application: Received termination signal (Terminated)
…………
2023.08.19 20:52:59.825884 [ 5672 ] {} <Information> Application: Starting ClickHouse 23.7.4.5 (revision: 54476, git hash: bd2fcd445534e57cc5aa8c170cc
25b7479b79c1c, build id: F73D00BCFC4671837E6C185DB350C813D958BD25), PID 5672
2023.08.19 20:52:59.826045 [ 5672 ] {} <Information> Application: starting up
2023.08.19 20:52:59.952675 [ 5929 ] {} <Trace> mysqlxx::Pool: (percona@192.168.64.7:3306 as user ch_repl): Iterating through existing MySQL connections
2023.08.19 20:52:59.952692 [ 5929 ] {} <Trace> mysqlxx::Pool: Entry(connection 11): sending PING to check if it is alive.
2023.08.19 20:52:59.953015 [ 5929 ] {} <Trace> mysqlxx::Pool: Entry(connection 11): PING ok.
2023.08.19 20:52:59.953023 [ 5929 ] {} <Trace> mysqlxx::Query: Running MySQL query using connection 11
Conclusion
Having data in real-time for analytics is helpful in improving your business. The MaterializedMySQL engine is one of the components that can be used to replicate the MySQL data to ClickHouse. But remember, this is still in the experimental phase. Altinity’s Sink connector is another good solution to explore. I will try to write about that in the future.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
Try Percona Distribution for MySQL today!