Complete Walkthrough: MySQL to ClickHouse Replication Using MaterializedMySQL Engine

MySQL to ClickHouse Replication

MySQL is an outstanding open source transactional database used by most web-based applications and is very good at handling OLTP workloads. However, modern business is very much dependent on analytical data. ClickHouse is a columnar database that handles analytical workloads quickly. I recommend you read our previous blog, Using ClickHouse as an Analytic Extension for MySQL, from Vadim to know more about this.

In this blog post, I will show you how to replicate MySQL data in real-time to ClickHouse. I am going to use the MaterializedMySQL engine for this purpose. The blog post contains the following topics.

  • MaterializedMySQL Engine – Overview
  • Prerequisites for the Replication
  • Replication setup
  • Replication testing
  • Understanding ReplacingMergeTree
  • What happens if the Replication event fails?
  • What happens if MySQL or ClickHouse restarted?
  • Conclusion

MaterializedMySQL Engine – Overview

  • MaterializedMySQL engine is an experimental release from the ClickHouse team.
  • By default, it will create a database on ClickHouse with all the tables and their data. (We can do the filter as well.)
  • Schema conversion from MySQL to ClickHouse will be taken care of by the engine. No manual work!
  • Once we configured the setup, the ClickHouse server will act as a replica.
  • ClickHouse will read the MySQL binary logs and perform the replication.
  • Replication will work well with DDL and DML statements.
  • ReplacingMergeTree will be used to perform the replication.

Prerequisites for the replication

We must meet certain requirements to configure the replication from MySQL to ClickHouse. I have split this into two parts, which have to be done from the MySQL end and the ClickHouse end. 

From the MySQL end

MySQL should be configured with the following parameters:


gtid_mode = ON
enforce_gtid_consistency = ON
binlog_format = ROW
binlog_row_image = FULL
server-id = 100

To configure the replication, we have to create a MySQL user with “REPLICATION SLAVE, REPLICATION CLIENT, SELECT, RELOAD” privileges. Make sure the user should be created with “mysql_native_password” authentication. When you initiate the replication, ClickHouse will copy the existing data from the MySQL table to the ClickHouse table and then start the replication from Binary logs. That is the reason we need SELECT privilege as well.

percona labs > create user ch_repl@'' identified with mysql_native_password by 'Percona$331';
Query OK, 0 rows affected (0.01 sec)

percona labs > grant replication slave, replication client, reload, select on *.* to ch_repl@'';
Query OK, 0 rows affected (0.02 sec)

percona labs > flush privileges;
Query OK, 0 rows affected (0.01 sec)

From the ClickHouse end

Make sure you have MySQL port access from the ClickHouse server to the MySQL server.

root@clickhouse:~# telnet 3306
Connected to

MaterializedMySQL engine is the experimental release. So, you should enable the parameter (allow_experimental_database_materialized_mysql) in the ClickHouse config file. This can be enabled in the User profile. (file: /etc/clickhouse-server/users.xml)

#vi /etc/clickhouse-server/users.xml
           <!-- Allowing Experimental feature to enable the replication from MySQL to ClickHouse -->

Activating this parameter does not need a ClickHouse restart. You can verify it using the following command.

clickhouse :) select name, value,changed,description from settings where name = 'allow_experimental_database_materialized_mysql'G
Query id: 6ad9c836-1c95-48a0-bb2f-ef89474618bf
Row 1:
name:        allow_experimental_database_materialized_mysql
value:       1
changed:     1
description: Allow to create database with Engine=MaterializedMySQL(...).
1 row in set. Elapsed: 0.007 sec.

Now, you can configure the replication from MySQL to ClickHouse.

Replication setup

To configure the replication, I have created two servers with the following hostnames. Both servers are running their respective databases.

  • mysql
  • clickhouse

At MySQL, I have created the table “percona.herc” which has five records. 

percona labs > use percona
Database changed

percona labs > show tables;
| Tables_in_percona |
| herc              |
1 row in set (0.00 sec)

percona labs > select * from herc;
| id | name | c_time              | active |
|  1 | ab   | 2023-08-19 01:28:32 | 1      |
|  2 | cd   | 2023-08-19 01:28:36 | 1      |
|  3 | ef   | 2023-08-19 01:28:39 | 1      |
|  4 | gh   | 2023-08-19 01:28:42 | 1      |
|  5 | ij   | 2023-08-19 01:28:45 | 1      |
5 rows in set (0.00 sec)

To configure the replication, we have to run the following command in ClickHouse instance.


create database <database_name> engine = MaterializeMySQL('source host', 'source database', 'source user', 'source password') settings <options>;

In action:

clickhouse :) create database perconaCH engine = MaterializeMySQL('', 'percona', 'ch_repl', 'Percona$331') settings allows_query_when_mysql_lost=true, max_wait_time_when_mysql_unavailable=10000, materialized_mysql_tables_list='herc';
ENGINE = MaterializeMySQL('', 'percona', 'ch_repl', 'Percona$331')
SETTINGS allows_query_when_mysql_lost = 1, max_wait_time_when_mysql_unavailable = 10000, materialized_mysql_tables_list = 'herc'
Query id: 499f2057-e879-43ae-b406-eb7cdecd1e82
0 rows in set. Elapsed: 0.046 sec.

Let’s verify the data in ClickHouse.

clickhouse :) use perconaCH
USE perconaCH
Query id: 2d534f39-871a-4718-8f24-daa855fe3311
0 rows in set. Elapsed: 0.002 sec.

clickhouse :) show tables;
Query id: 6b0763ef-d78e-41aa-8e33-1b7bd1b047da
│ herc │
1 row in set. Elapsed: 0.004 sec.

clickhouse :) select * from herc;
FROM herc
Query id: 84bed28e-3db6-4711-b5cf-ad858eebdce2
│  1 │ ab   │ 2023-08-19 01:28:32 │ 1      │
│  2 │ cd   │ 2023-08-19 01:28:36 │ 1      │
│  3 │ ef   │ 2023-08-19 01:28:39 │ 1      │
│  4 │ gh   │ 2023-08-19 01:28:42 │ 1      │
│  5 │ ij   │ 2023-08-19 01:28:45 │ 1      │
5 rows in set. Elapsed: 0.005 sec.

As you can see, the table “herc” was created, and the data has been copied. (Note: You can use a different database name in ClickHouse.)

ClickHouse has a different data type structure. The data types will be automatically converted by ClickHouse when we initiate the replication process. The relevant data types will be chosen. For example, below, I am sharing the table structure from MySQL and ClickHouse. You can compare the data types for the table we created. You can find more details here. 


percona labs > show create table hercG
*************************** 1. row ***************************
       Table: herc
Create Table: CREATE TABLE `herc` (
  `name` varchar(16) DEFAULT NULL,
  `c_time` datetime DEFAULT CURRENT_TIMESTAMP,
  `active` enum('1','0') DEFAULT NULL,
  PRIMARY KEY (`id`)
1 row in set (0.00 sec)


clickhouse :) show create table hercG
Row 1:
statement: CREATE TABLE perconaCH.herc
    `id` Int32,
    `name` Nullable(String),
    `c_time` Nullable(DateTime),
    `active` Nullable(Enum8('1' = 1, '0' = 2)),
    `_sign` Int8 MATERIALIZED 1,
    `_version` UInt64 MATERIALIZED 1,
    INDEX _version _version TYPE minmax GRANULARITY 1
ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(id, 4294967)
ORDER BY tuple(id)
SETTINGS index_granularity = 8192
1 row in set. Elapsed: 0.003 sec.

Note: There are two new columns, “_sign” and “_version.” They are created to manage the data deduplication by ReplacingmergeTree. We will see about it later in this post.

To understand the mechanism of how ClickHouse copies the existing data from MySQL, I am sharing the logs that were collected from MySQL’s general logs. You can see the SELECT statement, which is used to copy the data. Initially, before copying the data, it used “FLUSH TABLES WITH READ LOCK → REPEATEBLE_READ → START TRANSACTION – UNLOCK TABLES”.

2023-08-19T14:14:38.543090Z   12 Connect ch_repl@ on percona using SSL/TLS
2023-08-19T14:14:38.544934Z   12 Query SET NAMES utf8mb4
2023-08-19T14:14:38.546775Z   12 Query SHOW VARIABLES
2023-08-19T14:14:38.557433Z   12 Query SHOW VARIABLES
2023-08-19T14:14:38.560367Z   12 Query SHOW GRANTS FOR CURRENT_USER()
2023-08-19T14:14:38.561910Z   12 Query FLUSH TABLES
2023-08-19T14:14:38.568580Z   12 Query FLUSH TABLES WITH READ LOCK
2023-08-19T14:14:38.569364Z   12 Query SHOW MASTER STATUS
2023-08-19T14:14:38.570542Z   12 Query SHOW VARIABLES WHERE Variable_name = 'binlog_checksum'
2023-08-19T14:14:38.574353Z   12 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2023-08-19T14:14:38.575577Z   12 Query SELECT TABLE_NAME AS table_name FROM INFORMATION_SCHEMA.TABLES  WHERE TABLE_TYPE != 'VIEW' AND TABLE_SCHEMA = 'percona'
2023-08-19T14:14:38.577692Z   12 Query SHOW CREATE TABLE percona.herc
2023-08-19T14:14:38.579019Z   12 Query UNLOCK TABLES
2023-08-19T14:14:38.589956Z   12 Query SELECT `id`,`name`,`c_time`,`active` FROM percona.herc
2023-08-19T14:14:38.591538Z   12 Query COMMIT
2023-08-19T14:14:38.592842Z   13 Connect ch_repl@ on  using TCP/IP
2023-08-19T14:14:38.593196Z   13 Query SET @master_binlog_checksum = 'CRC32'
2023-08-19T14:14:38.593536Z   13 Query SET @master_heartbeat_period = 1000000000
2023-08-19T14:14:38.594339Z   13 Binlog Dump GTID Log: '' Pos: 4 GTIDs: '030f9a68-3dfc-11ee-b900-525400edf935:1-44'
2023-08-19T14:14:38.595585Z   12 Query SHOW VARIABLES WHERE Variable_name = 'binlog_checksum'

The following two threads will be initiated at the MySQL end to copy the data and for replication purpose.

| 12 | ch_repl         | | percona | Sleep            |   479 |                                                                 | NULL                                         | PRIMARY          |
| 13 | ch_repl         | | NULL    | Binlog Dump GTID |   479 | Source has sent all binlog to replica; waiting for more updates | NULL                                         | PRIMARY          |

From the ClickHouse end, ClickHouse logs are very verbose trace logs. We can find some useful information there. For example, (file: /var/log/clickhouse-server/clickhouse-server.log).

2023.08.19 03:50:22.755656 [ 4787 ] {} <Debug> mysqlxx::Pool: Connecting to percona@ as user ch_repl
2023.08.19 03:50:22.755840 [ 4787 ] {} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@ as user ch_repl with settin
gs: connect_timeout=60, read_write_timeout=1800
2023.08.19 19:44:38.557707 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> executeQuery: (internal) /*Materialize MySQL step 1: execute MySQL
DDL for dump data*/ EXTERNAL DDL FROM MySQL(perconaCH, percona) CREATE TABLE `herc` ( `id` int NOT NULL AUTO_INCREMENT, `name` varchar(16) DEFAULT NUL
SET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci (stage: Complete)
2023.08.19 19:44:38.558200 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> executeQuery: (internal) /* Rewritten MySQL DDL Query */ CREATE TAB
LE perconaCH.herc (`id` Int32, `name` Nullable(String), `c_time` Nullable(DateTime), `active` Nullable(Enum8('1' = 1, '0' = 2)), `_sign` Int8() MATERI
ALIZED 1, `_version` UInt64() MATERIALIZED 1, INDEX _version _version TYPE minmax GRANULARITY 1) ENGINE = ReplacingMergeTree(_version) PARTITION BY in
tDiv(id, 4294967) ORDER BY tuple(id) (stage: Complete)
2023.08.19 19:44:38.560678 [ 5088 ] {76b80228-9870-4b41-b1ad-1d47d1cdda65} <Debug> perconaCH.herc (14fd18a9-7030-44a8-ad21-13d0b7c6bfd3): Loading data

You can see the CREATE TABLE statement from MySQL and how ClickHouse rewrites it from the above logs.

So, all set! We were able to configure the replication, and the data has been copied from MySQL to ClickHouse. Next step, we can start to test the replication.



percona labs > select * from herc where id = 5;
| id | name | c_time              | active |
|  5 | ij   | 2023-08-19 01:28:45 | 1      |
1 row in set (0.00 sec)

percona labs > update herc set name = 'Percona' where id = 5;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

percona labs > select * from herc where id = 5;
| id | name    | c_time              | active |
|  5 | Percona | 2023-08-19 01:28:45 | 1      |
1 row in set (0.00 sec)

At ClickHouse

clickhouse :) select * from herc where id = 5;
FROM herc
WHERE id = 5
Query id: a087e52e-cc58-4057-b608-ad41b594a937
│  5 │ Percona │ 2023-08-19 01:28:45 │ 1      │
1 row in set. Elapsed: 0.006 sec.

We can see the data has been updated in ClickHouse. Let’s try to get more details from the ClickHouse log file regarding the UPDATE statement (file: /var/log/clickhouse-server/clickhouse-server.log).

=== XID_EVENT ===
Timestamp: 1692455972
Event Type: XID_EVENT
Server ID: 100
Event Size: 31
Log Pos: 1906
Flags: 0
XID: 108

2023.08.19 20:09:33.075649 [ 5088 ] {b1733f81-e3a9-40b9-a8f9-6df6ceaceea5} <Debug> executeQuery: (internal) /*Materialize MySQL step 1: execute dump d
ata*/ INSERT INTO `herc` (id, name, c_time, active, _sign, _version) VALUES (stage: Complete)
2023.08.19 20:09:33.076804 [ 5088 ] {b1733f81-e3a9-40b9-a8f9-6df6ceaceea5} <Trace> perconaCH.herc (14fd18a9-7030-44a8-ad21-13d0b7c6bfd3): Trying to re
serve 1.00 MiB using storage policy from min volume index 0
2023.08.19 20:09:33.078745 [ 5088 ] {} <Information> MaterializedMySQLSyncThread: MySQL executed position:

=== Binlog Position ===
Binlog: binlog.000007
Position: 1906
GTIDSets: 030f9a68-3dfc-11ee-b900-525400edf935:1-46

As you see above, the logs were generated for the single UPDATE statement. You can find more useful information related to Binary logs, position, GTID, XID number, etc. This can be used to debug and monitor the replication events.

And, as you see above, the UPDATE statement was converted to an INSERT statement with “_sign” and “_version” columns. To understand this, we have to understand How ReplacingMergeTree works.

Understanding ReplacingMergeTree

ReplacingMergeTree deduplicates the data using the “_version” and “_sign” columns in the background. To understand this better in a practical way:

Creating a new record in MySQL

percona labs > insert into herc values (7,'testRep','2023-08-19 01:29:11',1);
Query OK, 1 row affected (0.01 sec)

In ClickHouse, this time, I will use all the columns in the SELECT statement instead of “SELECT *.”

clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7;
Query id: fc272a7c-9c03-4061-b76d-2728aadbec31
│  7 │ testRep │ 2023-08-19 01:29:11 │ 1      │     1 │        1 │
1 row in set. Elapsed: 0.006 sec.

You can see this is the first version of data “_version = 1”.

Let’s do UPDATE on MySQL:

percona labs > update herc set name = 'UPDtestRep' where id = 7;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

At ClickHouse:

clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7;
Query id: f498186c-cc9d-4ce1-9613-af491b3850e8
│  7 │ testRep │ 2023-08-19 01:29:11 │ 1      │     1 │        1 │
│  7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1      │     1 │        2 │
2 rows in set. Elapsed: 0.005 sec.

You can see two records for “id = 7”. This means it is keeping the previous version of data “_version = 1, name = testRep”, as well as the latest version “_version = 2, name = UPDtestRep”.

Now, let’s execute the DELETE statement on MySQL.

percona labs > delete from herc where id = 7;
Query OK, 1 row affected (0.01 sec)

At ClickHouse:

clickhouse :) select id,name,c_time,active,_sign, _version from herc where id = 7;
Query id: e3aab7e8-1ac8-4e7a-8ec7-c1cd2a63a24d
│  7 │ testRep │ 2023-08-19 01:29:11 │ 1      │     1 │        2 │
│  7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1      │     1 │        3 │
│  7 │ UPDtestRep │ 2023-08-19 01:29:11 │ 1      │    -1 │        4 │
3 rows in set. Elapsed: 0.013 sec.

Now, you can see three records (three versions).

  • The first record is the one we initially created with INSERT ( _sign = 1, _version = 1, name = testRep )
  • Second record is created when doing UPDATE ( _sign = 1, _version = 2, name = UPDtestRep)
  • The third record is created when doing DELETE. You can see the _sign with negative value when doing DELETE ( _sign = -1, _version = 3, name = UPDtestRep)

So, here, the latest version is “3” with negative value “_sign = -1”. Which means the data got deleted. You can find this by executing the following query. 

clickhouse :) select * from herc where id = 7;
Query id: f7634bbf-a12a-4d9a-bea1-cd917f72d581
0 rows in set. Elapsed: 0.009 sec.

clickhouse :) select id, name , c_time, active from herc where id = 7;
Query id: bf27b945-ec11-4caa-9942-ef7e35bd2bb4
0 rows in set. Elapsed: 0.008 sec.

When doing without “_sign, _version” columns OR “SELECT *”, you can find the latest value 🙂

What happens if the replication event fails?

All the MySQL statements are not supported for ClickHouse with the MaterialisedMySQL engine.  So, it is necessary to monitor the replication failure events. (I will create another blog to cover more on this topic.) You can use the ClickHouse server and error logs to monitor the replication failure events. For example, at MySQL:

percona labs > create user 't2'@localhost identified by 'Aer@321';
Query OK, 0 rows affected (0.01 sec)

At ClickHouse logs:

=== GTID_EVENT ===
Timestamp: 1692394221
Event Type: GTID_EVENT
Server ID: 100
Event Size: 79
Log Pos: 10313
Flags: 0
GTID Next: 030f9a68-3dfc-11ee-b900-525400edf935:35
2023.08.19 03:00:21.267928 [ 3861 ] {9fea936d-8f9e-4c7d-9075-39fe4876610c} <Debug> executeQuery: (internal)  (stage: Complete)
2023.08.19 03:00:21.268093 [ 3861 ] {9fea936d-8f9e-4c7d-9075-39fe4876610c} <Error> MaterializedMySQLSyncThread(perconaCH): Query EXTERNAL DDL FROM MySQL(perconaCH, percona) CREATE USER 't2'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*A907AFD142E6FA14B293E13D508A8F505B7119AE' wasn't finished successfully: Code: 62. DB::Exception: Syntax error: failed at position 114 ('USER'): USER 't2'@'localhost' IDENTIFIED WITH 'mysql_native_password' AS '*A907AFD142E6FA14B293E13D508A8F505B7119AE'. Expected one of: TEMPORARY, TABLE. (SYNTAX_ERROR), Stack trace (when copying this message, always include the lines below):

You can see in the log it mentioned: “Query wasn’t finished successfully.” This can be used to monitor the failure events. The replication chain will not be broken if the event is not executed. It will just skip that event.

What happens if MySQL or ClickHouse restarted?

In case of failures of MySQL or ClickHouse, replication will be reinitiated when they are alive. For example, If MySQL restarted, you can find the following logs:

2023.08.19 03:50:12.737040 [ 4787 ] {} <Debug> mysqlxx::Pool: Connecting to percona@ as user ch_repl
2023.08.19 03:50:12.737204 [ 4787 ] {} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@ as user ch_repl with settin
gs: connect_timeout=60, read_write_timeout=1800
2023.08.19 03:50:12.739134 [ 4787 ] {} <Error> mysqlxx::Pool: mysqlxx::ConnectionFailed
2023.08.19 03:50:22.850810 [ 4787 ] {} <Trace> mysqlxx::Pool: Entry(connection 8): sending PING to check if it is alive.
2023.08.19 03:50:22.851138 [ 4787 ] {} <Trace> mysqlxx::Pool: Entry(connection 8): PING ok.
2023.08.19 03:50:22.851155 [ 4787 ] {} <Trace> mysqlxx::Query: Running MySQL query using connection 8
2023.08.19 19:44:38.504900 [ 4533 ] {499f2057-e879-43ae-b406-eb7cdecd1e82} <Debug> mysqlxx::Pool: Connecting to percona@ as user ch_repl
2023.08.19 19:44:38.504922 [ 4533 ] {499f2057-e879-43ae-b406-eb7cdecd1e82} <Debug> mysqlxx::Pool: Creating a new MySQL connection to percona@ as user ch_repl with settings: connect_timeout=60, read_write_timeout=1800

From the above logs, you can see that ClickHouse keeps checking if MySQL is alive or not. Once MySQL is alive, it will initiate the connection to MySQL. When the replication starts, the executed events will not be executed again. 

If ClickHouse restarted

As you can see below, ClickHouse was terminated. After some time, when it starts again, ClickHouse is trying to initiate the connection to MySQL and then resume the replication.

2023.08.19 20:52:59.656085 [ 4532 ] {} <Trace> BaseDaemon: Received signal 15
2023.08.19 20:52:59.656190 [ 4532 ] {} <Information> Application: Received termination signal (Terminated)
2023.08.19 20:52:59.825884 [ 5672 ] {} <Information> Application: Starting ClickHouse (revision: 54476, git hash: bd2fcd445534e57cc5aa8c170cc
25b7479b79c1c, build id: F73D00BCFC4671837E6C185DB350C813D958BD25), PID 5672
2023.08.19 20:52:59.826045 [ 5672 ] {} <Information> Application: starting up
2023.08.19 20:52:59.952675 [ 5929 ] {} <Trace> mysqlxx::Pool: (percona@ as user ch_repl): Iterating through existing MySQL connections
2023.08.19 20:52:59.952692 [ 5929 ] {} <Trace> mysqlxx::Pool: Entry(connection 11): sending PING to check if it is alive.
2023.08.19 20:52:59.953015 [ 5929 ] {} <Trace> mysqlxx::Pool: Entry(connection 11): PING ok.
2023.08.19 20:52:59.953023 [ 5929 ] {} <Trace> mysqlxx::Query: Running MySQL query using connection 11


Having data in real-time for analytics is helpful in improving your business. The MaterializedMySQL engine is one of the components that can be used to replicate the MySQL data to ClickHouse. But remember, this is still in the experimental phase. Altinity’s Sink connector is another good solution to explore. I will try to write about that in the future.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!


Try Percona Distribution for MySQL today!


Using ClickHouse as an Analytic Extension for MySQL

ClickHouse as an Analytic Extension for MySQL

ClickHouse as an Analytic Extension for MySQLMySQL is an outstanding database for online transaction processing. With suitable hardware, it is easy to execute more than 1M queries per second and handle tens of thousands of simultaneous connections. Many of the most demanding web applications on the planet are built on MySQL. With capabilities like that, why would MySQL users need anything else?  

Well, analytic queries for starters. Analytic queries answer important business questions like finding the number of unique visitors to a website over time or figuring out how to increase online purchases. They scan large volumes of data and compute aggregates, including sums, averages, and much more complex calculations besides. The results are invaluable but can bog down online transaction processing on MySQL. 

Fortunately, there’s ClickHouse: a powerful analytic database that pairs well with MySQL. Percona is working closely with our partner Altinity to help users add ClickHouse easily to existing MySQL applications. You can read more about our partnership in our recent press release as well as about our joint MySQL-to-ClickHouse solution. 

This article provides tips on how to recognize when MySQL is overburdened with analytics and can benefit from ClickHouse’s unique capabilities. We then show three important patterns for integrating MySQL and ClickHouse. The result is more powerful, cost-efficient applications that leverage the strengths of both databases. 

Signs that indicate MySQL needs analytic help

Let’s start by digging into some obvious signs that your MySQL database is overburdened with analytics processing. 

Huge tables of immutable data mixed in with transaction tables 

Tables that drive analytics tend to be very large, rarely have updates, and may also have many columns. Typical examples are web access logs, marketing campaign events, and monitoring data. If you see a few outlandishly large tables of immutable data mixed with smaller, actively updated transaction processing tables, it’s a good sign your users may benefit from adding an analytic database. 

Complex aggregation pipelines

Analytic processing produces aggregates, which are numbers that summarize large datasets to help users identify patterns. Examples include unique site visitors per week, average page bounce rates, or counts of web traffic sources. MySQL may take minutes or even hours to compute such values. To improve performance it is common to add complex batch processes that precompute aggregates. If you see such aggregation pipelines, it is often an indication that adding an analytic database can reduce the labor of operating your application as well as deliver faster and more timely results for users. 

MySQL is too slow or inflexible to answer important business questions

A final clue is the in-depth questions you don’t ask about MySQL-based applications because it is too hard to get answers. Why don’t users complete purchases on eCommerce sites?  Which strategies for in-game promotions have the best payoff in multi-player games? Answering these questions directly from MySQL transaction data often requires substantial time and external programs. It’s sufficiently difficult that most users simply don’t bother. Coupling MySQL with a capable analytic database may be the answer. 

Why is ClickHouse a natural complement to MySQL? 

MySQL is an outstanding database for transaction processing. Yet the features of MySQL that make it work well–storing data in rows, single-threaded queries, and optimization for high concurrency–are exactly the opposite of those needed to run analytic queries that compute aggregates on large datasets.

ClickHouse on the other hand is designed from the ground up for analytic processing. It stores data in columns, has optimizations to minimize I/O, computes aggregates very efficiently, and parallelizes query processing. ClickHouse can answer complex analytic questions almost instantly in many cases, which allows users to sift through data quickly. Because ClickHouse calculates aggregates so efficiently, end users can pose questions in many ways without help from application designers. 

These are strong claims. To understand them it is helpful to look at how ClickHouse differs from MySQL. Here is a diagram that illustrates how each database pulls in data for a query that reads all values of three columns of a table. 

MySQL stores table data by rows. It must read the whole row to get data for just three columns. MySQL production systems also typically do not use compression, as it has performance downsides for transaction processing. Finally, MySQL uses a single thread for query processing and cannot parallelize work. 

By contrast, ClickHouse reads only the columns referenced in queries. Storing data in columns enables ClickHouse to compress data at levels that often exceed 90%. Finally, ClickHouse stores tables in parts and scans them in parallel.

The amount of data you read, how greatly it is compressed, and the ability to parallelize work make an enormous difference. Here’s a picture that illustrates the reduction in I/O for a query reading three columns.  

MySQL and ClickHouse give the same answer. To get it, MySQL reads 59 GB of data, whereas ClickHouse reads only 21 MB. That’s close to 3000 times less I/O, hence far less time to access the data. ClickHouse also parallelizes query execution very well, further improving performance. It is little wonder that analytic queries run hundreds or even thousands of times faster on ClickHouse than on MySQL. 

ClickHouse also has a rich set of features to run analytic queries quickly and efficiently. These include a large library of aggregation functions, the use of SIMD instructions where possible, the ability to read data from Kafka event streams, and efficient materialized views, just to name a few. 

There is a final ClickHouse strength: excellent integration with MySQL. Here are a few examples. 

  • ClickHouse can ingest mysqldump and CSV data directly into ClickHouse tables. 
  • ClickHouse can perform remote queries on MySQL tables, which provides another way to explore as well as ingest data quickly. 
  • The ClickHouse query dialect is similar to MySQL, including system commands like SHOW PROCESSLIST, for example. 
  • ClickHouse even supports the MySQL wire protocol on port 3306. 

For all of these reasons, ClickHouse is a natural choice to extend MySQL capabilities for analytic processing. 

Why is MySQL a natural complement to ClickHouse? 

Just as ClickHouse can add useful capabilities to MySQL, it is important to see that MySQL adds useful capabilities to ClickHouse. ClickHouse is outstanding for analytic processing but there are a number of things it does not do well. Here are some examples.  

  • Transaction processing – ClickHouse does not have full ACID transactions. You would not want to use ClickHouse to process online orders. MySQL does this very well. 
  • Rapid updates on single rows – Selecting all columns of a single row is inefficient in ClickHouse, as you must read many files. Updating a single row may require rewriting large amounts of data. You would not want to put eCommerce session data in ClickHouse. It is a standard use case for MySQL. 
  • Large numbers of concurrent queries – ClickHouse queries are designed to use as many resources as possible, not share them across many users. You would not want to use ClickHouse to hold metadata for microservices, but MySQL is commonly used for such purposes. 

In fact, MySQL and ClickHouse are highly complementary. Users get the most powerful applications when ClickHouse and MySQL are used together. 

Introducing ClickHouse to MySQL integration

There are three main ways to integrate MySQL data with ClickHouse analytic capabilities. They build on each other. 

  • View MySQL data from ClickHouse. MySQL data is queryable from ClickHouse using native ClickHouse SQL syntax. This is useful for exploring data as well as joining on data where MySQL is the system of record.
  • Move data permanently from MySQL to ClickHouse. ClickHouse becomes the system of record for that data. This unloads MySQL and gives better results for analytics. 
  • Mirror MySQL data in ClickHouse. Make a snapshot of data into ClickHouse and keep it updated using replication. This allows users to ask complex questions about transaction data without burdening transaction processing. 

Viewing MySQL data from ClickHouse

ClickHouse can run queries on MySQL data using the MySQL database engine, which makes MySQL data appear as local tables in ClickHouse. Enabling it is as simple as executing a single SQL command like the following on ClickHouse:

CREATE DATABASE sakila_from_mysql
ENGINE=MySQLDatabase('mydb:3306', 'sakila', 'user', 'password')

Here is a simple illustration of the MySQL database engine in action. 

The MySQL database engine makes it easy to explore MySQL tables and make copies of them in ClickHouse. ClickHouse queries on remote data may even run faster than in MySQL! This is because ClickHouse can sometimes parallelize queries even on remote data. It also offers more efficient aggregation once it has the data in hand. 

Moving MySQL data to ClickHouse

Migrating large tables with immutable records permanently to ClickHouse can give vastly accelerated analytic query performance while simultaneously unloading MySQL. The following diagram illustrates how to migrate a table containing web access logs from ClickHouse to MySQL. 

On the ClickHouse side, you’ll normally use MergeTree table engine or one of its variants such as ReplicatedMergeTree. MergeTree is the go-to engine for big data on ClickHouse. Here are three important features that will help you get the most out of ClickHouse. 

  1. Partitioning – MergeTree divides tables into parts using a partition key. Access logs and other big data tend to be time ordered, so it’s common to divide data by day, week, or month. For best performance, it’s advisable to pick a number that results in 1000 parts or less. 
  2. Ordering – MergeTree sorts rows and constructs an index on rows to match the ordering you choose. It’s important to pick a sort order that gives you large “runs” when scanning data. For instance, you could sort by tenant followed by time. That would mean a query on a tenant’s data would not need to jump around to find rows related to that tenant. 
  3. Compression and codecs – ClickHouse uses LZ4 compression by default but also offers ZSTD compression as well as codecs. Codecs reduce column data before turning it over to compression. 

These features can make an enormous difference in performance. We cover them and add more performance tips in Altinity videos (look here and here.) as well as blog articles.

The ClickHouse MySQL database engine can also be very useful in this scenario. It enables ClickHouse to “see” and select data from remote transaction tables in MySQL. Your ClickHouse queries can join local tables on transaction data whose natural home is MySQL. Meanwhile, MySQL handles transactional changes efficiently and safely. 

Migrating tables to ClickHouse generally proceeds as follows. We’ll use the example of the access log shown above. 

  1. Create a matching schema for the access log data on ClickHouse. 
  2. Dump/load data from MySQL to ClickHouse using any of the following tools:
    1. Mydumper – A parallel dump/load utility that handles mysqldump and CSV formats. 
    2. MySQL Shell – A general-purpose utility for managing MySQL that can import and export tables. 
    3. Copy data using SELECT on MySQL database engine tables.
    4. Native database commands – Use MySQL SELECT OUTFILE to dump data to CSV and read back in using ClickHouse INSERT SELECT FROM file(). ClickHouse can even read mysqldump format.  
  3. Check performance with suitable queries; make adjustments to the schema and reload if necessary. 
  4. Adapt front end and access log ingest to ClickHouse. 
  5. Run both systems in parallel while testing. 
  6. Cut over from MySQL only to MySQL + ClickHouse Extension. 

Migration can take as little as a few days but it’s more common to take weeks to a couple of months in large systems. This helps ensure that everything is properly tested and the roll-out proceeds smoothly.  

Mirroring MySQL Data in ClickHouse

The other way to extend MySQL is to mirror the data in ClickHouse and keep it up to date using replication. Mirroring allows users to run complex analytic queries on transaction data without (a) changing MySQL and its applications or (b) affecting the performance of production systems. 

Here are the working parts of a mirroring setup. 

ClickHouse has a built-in way to handle mirroring: the experimental MaterializedMySQL database engine, which reads binlog records directly from the MySQL primary and propagates data into ClickHouse tables. The approach is simple but is not yet recommended for production use. It may eventually be important for 1-to-1 mirroring cases but needs additional work before it can be widely used. 

Altinity has developed a new approach to replication using Debezium, Kafka-compatible event streams, and the Altinity Sink Connector for ClickHouse. The mirroring configuration looks like the following. 

The externalized approach has a number of advantages. They include working with current ClickHouse releases, taking advantage of fast dump/load programs like mydumper or direct SELECT using MySQL database engine, support for mirroring into replicated tables, and simple procedures to add new tables or reset old ones. Finally, it can extend to multiple upstream MySQL systems replicating to a single ClickHouse cluster. 

ClickHouse can mirror data from MySQL thanks to the unique capabilities of the ReplacingMergeTree table. It has an efficient method of dealing with inserts, updates, and deletes that is ideally suited for use with replicated data. As mentioned already, ClickHouse cannot update individual rows easily, but it inserts data extremely quickly and has an efficient process for merging rows in the background. ReplicatingMergeTree builds on these capabilities to handle changes to data in a “ClickHouse way.”  

Replicated table rows use version and sign columns to represent the version of changed rows as well as whether the change is an insert or delete. The ReplacingMergeTree will only keep the last version of a row, which may in fact be deleted. The sign column lets us apply another ClickHouse trick to make those deleted rows inaccessible. It’s called a row policy. Using row policies we can make any row where the sign column is negative disappear.  

Here’s an example of ReplacingMergeTree in action that combines the effect of the version and sign columns to handle mutable data. 

Mirroring data into ClickHouse may appear more complex than migration but in fact is relatively straightforward because there is no need to change MySQL schema or applications and the ClickHouse schema generation follows a cookie-cutter pattern. The implementation process consists of the following steps. 

  1. Create schema for replicated tables in ClickHouse. 
  2. Configure and start replication from MySQL to ClickHouse. 
  3. Dump/load data from MySQL to ClickHouse using the same tools as migration. 

At this point, users are free to start running analytics or build additional applications on ClickHouse whilst changes replicate continuously from MySQL. 

Tooling improvements are on the way!

MySQL to ClickHouse migration is an area of active development both at Altinity as well as the ClickHouse community at large.  Improvements fall into three general categories.  

Dump/load utilities – Altinity is working on a new utility to move data that reduces schema creation and transfer of data to a single.  We will have more to say on this in a future blog article. 

Replication – Altinity is sponsoring the Sink Connector for ClickHouse, which automates high-speed replication, including monitoring as well as integration into Altinity.Cloud. Our goal is similarly to reduce replication setup to a single command. 

ReplacingMergeTree – Currently users must include the FINAL keyword on table names to force the merging of row changes. It is also necessary to add a row policy to make deleted rows disappear automatically.  There are pull requests in progress to add a MergeTree property to add FINAL automatically in queries as well as make deleted rows disappear without a row policy. Together they will make handling of replicated updates and deletes completely transparent to users. 

We are also watching carefully for improvements on MaterializedMySQL as well as other ways to integrate ClickHouse and MySQL efficiently. You can expect further blog articles in the future on these and related topics. Stay tuned!

Wrapping up and getting started

ClickHouse is a powerful addition to existing MySQL applications. Large tables with immutable data, complex aggregation pipelines, and unanswered questions on MySQL transactions are clear signs that integrating ClickHouse is the next step to provide fast, cost-efficient analytics to users. 

Depending on your application, it may make sense to mirror data onto ClickHouse using replication or even migrate some tables into ClickHouse. ClickHouse already integrates well with MySQL and better tooling is arriving quickly. Needless to say, all Altinity contributions in this area are open source, released under Apache 2.0 license. 

The most important lesson is to think in terms of MySQL and ClickHouse working together, not as one being a replacement for the other. Each database has unique and enduring strengths. The best applications will build on these to provide users with capabilities that are faster and more flexible than using either database alone. 

Percona, well-known experts in open source databases, partners with Altinity to deliver robust analytics for MySQL applications. If you would like to learn more about MySQL integration with ClickHouse, feel free to contact us or leave a message on our forum at any time. 


Improving OLAP Workload Performance for PostgreSQL with ClickHouse Database

Improving OLAP Workload Performance for PostgreSQL

Every database management system is not optimized for every workload. Database systems are designed for specific loads, and thereby give better performance for that workload. Similarly, some kinds of queries work better on some database systems and worse on others. This is the era of specialization, where a product is designed for a specific requirement or a specific set of requirements. We cannot achieve everything performance-wise from a single database system. PostgreSQL is one of the finest object databases and performs really well in OLTP types of workloads, but I have observed that its performance is not as good as some other database systems for OLAP workloads. ClickHouse is one of the examples that outperforms PostgreSQL for OLAP.

ClickHouse is an open source, column-based database management system which claims to be 100–1,000x faster than traditional approaches, capable of processing of more than a billion rows in less than a second.

Foreign Data Wrapper (Clickhousedb_fdw)

If we have a different kind of workload that PostgreSQL is not optimized for, what is the solution? Fortunately, PostgreSQL provides a mechanism where you can handle a different kind of workload while using it, by creating a data proxy within PostgreSQL that internally calls and fetches data from a different database system. This is called Foreign Data Wrapper, which is based on SQL-MED. Percona provides a foreign data wrapper for the Clickhousedb database system, which is available at Percona’s GitHub project repository.

Benchmark Machine

  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: Samsung  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
  • PostgreSQL: version 11

Benchmark Workload

Ontime (On Time Reporting Carrier On-Time Performance) is an openly-available dataset I have used to benchmark. It has a table size of 85GB with 109 number of different types of columns, and its designed queries more closely emulate the real world.


Q# Query Contains Aggregates and Group By
Q1 SELECT DayOfWeek, count(*) AS c FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;
Q2 SELECT DayOfWeek, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year >= 2000 AND Year <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;
Q3 SELECT Origin, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year >= 2000 AND Year <= 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;
Q4 SELECT Carrier, count(*) FROM ontime WHERE DepDelay>10 AND Year = 2007 GROUP BY Carrier ORDER BY count(*) DESC;
Q5 SELECT a.Carrier, c, c2, c*1000/c2 as c3 FROM ( SELECT Carrier, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year=2007 GROUP BY Carrier ) a
INNER JOIN (  SELECT   Carrier,count(*) AS c2 FROM ontime WHERE Year=2007 GROUP BY Carrier)b on a.Carrier=b.Carrier ORDER BY c3 DESC;
Q6 SELECT a.Carrier, c, c2, c*1000/c2 as c3 FROM ( SELECT Carrier,  count(*) AS c FROM ontime  WHERE DepDelay>10 AND Year >= 2000 AND
Year <= 2008 GROUP BY Carrier) a INNER JOIN ( SELECT  Carrier, count(*) AS c2 FROM ontime  WHERE Year >= 2000 AND Year <= 2008 GROUP BY Carrier ) b on a.Carrier=b.Carrier ORDER BY c3 DESC;
Q7 SELECT Carrier, avg(DepDelay) * 1000 AS c3 FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY Carrier;
Q8 SELECT Year, avg(DepDelay) FROM ontime GROUP BY Year;
Q9 select Year, count(*) as c1 from ontime group by Year;
Q10 SELECT avg(cnt) FROM (SELECT Year,Month,count(*) AS cnt FROM ontime WHERE DepDel15=1 GROUP BY Year,Month) a;
Q11 select avg(c1) from (select Year,Month,count(*) as c1 from ontime group by Year,Month) a;
Q12 SELECT OriginCityName, DestCityName, count(*) AS c FROM ontime GROUP BY OriginCityName, DestCityName ORDER BY c DESC LIMIT 10;
Q13 SELECT OriginCityName, count(*) AS c FROM ontime GROUP BY OriginCityName ORDER BY c DESC LIMIT 10;
Query Contains Joins
Q14 SELECT a.Year, c1/c2 FROM ( select Year, count(*)*1000 as c1 from ontime WHERE DepDelay>10 GROUP BY Year) a
INNER JOIN (select Year, count(*) as c2 from ontime GROUP BY Year ) b on a.Year=b.Year ORDER BY a.Year;
Q15 SELECT a.”Year”, c1/c2 FROM ( select “Year”, count(*)*1000 as c1 FROM fontime WHERE “DepDelay”>10 GROUP BY “Year”) a INNER JOIN (select “Year”, count(*) as c2
FROM fontime GROUP BY “Year” ) b on a.”Year”=b.”Year”;

Table-1: Queries used in the benchmark


Benchmark Result

Improving OLAP Workload Performance for PostgreSQL


This graph shows the time taken by each query in PostgreSQL, ClickHouse, and Clickhusedb_fdw. You can clearly see that ClickHouse performs really well on its own while PostgreSQL doesn’t. The benchmark has been done on a single node with multiple CPU cores and can scale with multiple nodes. But I think we can see enough performance as ClickHouse is almost 1000x faster in most of the queries. In a previous blog post, I went over some of the reasons why some queries do not perform well in clickhouse_fdw, if you would like to learn more.


The graph above proves my point that PostgreSQL is not fit for every type of workload and query. However, the beauty of PostgreSQL is its ability to use its foreign data wrappers to create a heterogeneous environment with other database systems, which can then optimally handle specific workloads.

The clickhousedb_fdw clearly gives users the power to use ClickHouse for much-improved performance for OLAP workloads whilst still interacting with PostgreSQL only. This has simplified client applications that don’t need to interact with multiple database systems.


MariaDB Track at Percona Live

MariaDB track at Percona Live 2019

mariadb logoLess than one month left until Percona Live. This time the Committee work was a bit unusual. Instead of having one big committee for the whole conference we had a few mini-committees, each responsible for a track. Each independent mini-committee, in turn, had a leader who was responsible for the whole process. I led the MariaDB track. In this post, I want to explain how we worked, which topics we have chosen, and why.

For MariaDB, we had seven slots: five for 50-minutes talks, two for 25-minutes talks and 19 submissions. We had to reject two out of three proposals. We also had to decide how many topics the program should cover. My aim here was to use the MariaDB track to demonstrate as many MariaDB unique features as possible. I also wanted to have as many speakers as possible, considering the number of slots we had available.

The committee agreed, and we tried our best for the program to cover the various topics. If someone sent us two or more proposals, we choose only one to allow more speakers to attend.

We also looked to identify gaps in submitted sessions. For example, if we wanted for a topic to be covered and no one sent a proposal with such a subject, we invited potential speakers and asked them to submit with that topic in mind. Or we asked those who already submitted similar talks to improve them.

In the end, we have five 50-minutes sessions, one MariaDB session in the MySQL track, two 25-minutes sessions, one tutorial, and one keynote. All of them are by different speakers.

The Program

The first MariaDB event will be a tutorial: “Expert MariaDB: Harness the Strengths of MariaDB Server” by Colin Charles on Tuesday, May 28

Colin started his MySQL career as a Community Engineer back in the MySQL AB times. He worked on numerous MySQL events, both big and small, including Percona Live’s predecessor, O’Reilly’s MySQL Conference and Expo. Colin joined Monty Program Ab, and MariaDB Corporation as a Chief Evangelist, then spent two years as Chief Evangelist at Percona. Now he is an independent consultant at his own company GrokOpen.

Colin will not only talk about unique MariaDB features up to version 10.4, but will also help you try all of them out. This tutorial is a must-attend for everyone interested in MariaDB.

Next day: Wednesday, May 29 – the first conference day – will be the MariaDB Track day.

MariaDB Foundation Bronze SponsorshipMariaDB talks will start from the keynote by Vicentiu Ciorbaru about new MariaDB features in version 10.4. He will highlight all the significant additions in this version.

Vicentiu started his career at MariaDB Foundation as a very talented Google Summer of Code student. His first project was Roles. Then he worked a lot on MariaDB Optimizer, bug fixes, and code maintenance. At the same time, he discovered a talent for public speaking, and now he is the face of MariaDB Foundation.

We at the committee had a hard choice: either to accept his 50-minutes session proposal or ask him to make a keynote. This decision was not easy, because a keynote is shorter than 50 minutes. At the same time, though, everyone at the conference will be able to see it. Brand new features of version 10.4 are a very important topic. Therefore, we decided that it would be best to have Vicentiu as a keynote speaker.

Morning sessions

virtualhealthSessions will start with a talk by Alexander Rubin “Opensource Column Store Databases: MariaDB ColumnStore vs. ClickHouse” Alex began his MySQL career as a web developer, then joined MySQL AB as a consultant. He then moved to Percona as Principal Architect. It was our loss when he left Percona to start applying his recommendations himself on behalf of a medical startup VirtualHealth! During his career as a MySQL consultant, he tried all the sexiest database products, loaded terabytes of data into them, ran the deadly intensive loads. He is the one who knows best about database strengths and weaknesses. I would recommend his session to everyone who is considering a column store solution.

codership_720_156Next talk is “Galera Cluster New Features” by Seppo Jaakola. This session is about the long-awaited Galera 4 library. Seppo is one of three founders of Codership Oy: the company which brought us Galera library. Before the year 2007, when the Galera library was first released, MySQL users had to choose between asynchronous replication and asynchronous replication (that’s not a typo). Seppo brought us a solution which allowed us to continue using InnoDB in the style we were used to using while writing to all nodes. The Galera library looks after the data consistency. After more than ten years the product is mature and leaving its competitors far behind. The new version brings us streaming replication technology and other improvements which relax usage limitations and make Galera Cluster more stable. I recommend this session for everyone who looks forward to a synchronous replication future.

Afternoon sessions

Walmart LogoAfter the lunch break, we will meet MariaDB users Sandeep Jangra and Andre Van Looveren who will show how they use MariaDB at Walmart in their talk “Lessons Learned Building a Fully Automated Database Platform as a Service Using Open Source Technologies in the Cloud”. Sandeep and Andre manage more than 6000 MariaDB installations. In addition to setting up automation, they have experience with migration and upgrade. This talk will be an excellent case study, which I recommend to attend everyone who is considering implementing automation for a farm of MariaDB or MySQL servers.MariaDB Foundation

Next topic is “MariaDB Security Features and Best Practices” by Robert Bindar.  Robert is a server Developer at MariaDB Foundation. He will cover best security practices for MariaDB deployment, including the latest security features, added to version 10.4

At 4:15 pm we will have two MariaDB topics in parallel

MariaDB Foundation Bronze Sponsorship“MariaDB and MySQL – What Statistics Optimizer Needs Or When and How Not to Use Indexes” by Sergei Golubchik – a Member of the MariaDB Foundation Board – discovers optimization techniques which are often ignored in favor of indexes. Sergei worked on MySQL, and then on MariaDB, from their very first days. I’ve known him since 2006 when I joined the MySQL team. Each time when I am in trouble to find out how a particular piece of code works, just a couple of words from Sergei help to solve the issue! He has an encyclopedic knowledge on both MariaDB and MySQL databases. In this session, Sergei will explain which statistics optimizer we can use in addition to indexes. While he will focus on specific MariaDB features he will cover MySQL too. Spoiler: these are not only histograms!

Backups in the MySQL track…

In the parallel MySQL track, Iwo Panowicz and Juan Pablo Arruti will speak about backups in their “Percona XtraBackup vs. Mariabackup vs. MySQL Enterprise Backup” Iwo and Juan Pablo are Support Engineers at Percona. Iwo joined Percona two years ago, and now he is one of the most senior engineers in the EMEA team. Linux, PMM, analyzing core files, engineering best practices: Iwo is well equipped to answer all these and many more questions. Juan Pablo works in the American Support team for everything around MariaDB and MySQL: replication, backup, performance issues, data corruption… Through their support work, Iwo and Juan Pablo have had plenty of chances to find out strengths and weaknesses of different backup solutions.

Three tools, which they will cover in the talk, can be used to make a physical backup of MySQL and MariaDB databases, and this is the fastest and best recommended way to work with an actively used server. But what is the difference? When and why should you prefer one instrument over another? Iwo and Juan Pablo will answer these questions.

At the end of the day we will have two 25-minute sessions

Alibaba CloudJim Tommaney will present “Tips and Tricks with MariaDB ColumnStore”. Unlike Alex Rubin, who is an end user of ColumnStore databases, Jim is from another side: development. Thus his insights into MariaDB ColumnStore could be fascinating. If you are considering ColumnStore: this topic is a must-go!

Daniel Black will close the day with his talk “Squash That Old Bug”. This topic is the one I personally am looking forward to the most! Not only because I stick with bugs. But, well… the lists of accepted patches which Daniel’s posts to MariaDB and to MySQL servers are impressive. Especially when you know how strict is the quality control for external patches in MariaDB and MySQL! IBMIn his talk, Daniel is going to help you to start contributing yourself. And to do it successfully, so your patches are accepted. This session is very important for anyone who has asked themselves why one or another MariaDB or MySQL bug has not been fixed for a long time. I do not know a single user who has not asked that question!

MariaDB track at Percona Live 2019Conclusion

This blog about MariaDB track at Percona Live covers eight sessions, one keynote, one tutorial, 12 speakers, seven mini-committee members – two of whom are also speakers. We worked hard, and continue to work hard, to bring you great MariaDB program.

I cannot wait for the show to begin!

Photo by shannon VanDenHeuvel on Unsplash


ClickHouse Performance Uint32 vs Uint64 vs Float32 vs Float64

Q1 least compression

While implementing ClickHouse for query executions statistics storage in Percona Monitoring and Management (PMM),  we were faced with a question of choosing the data type for metrics we store. It came down to this question: what is the difference in performance and space usage between Uint32, Uint64, Float32, and Float64 column types?

To test this, I created a test table with an abbreviated and simplified version of the main table in our ClickHouse Schema.

The “number of queries” is stored four times in four different columns to be able to benchmark queries referencing different columns.  We can do this with ClickHouse because it is a column store and it works only with columns referenced by the query. This method would not be appropriate for testing on MySQL, for example.

    digest String,
    db_server String,
    db_schema String,
    db_username String,
    client_host String,
    period_start DateTime,
    nq_UInt32 UInt32,
    nq_UInt64 UInt64,
    nq_Float32 Float32,
    nq_Float64 Float64
ENGINE = MergeTree
PARTITION BY toYYYYMM(period_start)
ORDER BY (digest, db_server, db_username, db_schema, client_host, period_start)
SETTINGS index_granularity = 8192

When testing ClickHouse performance you need to consider compression. Highly compressible data (for example just a bunch of zeroes) will compress very well and may be processed a lot faster than incompressible data. To take this into account we will do a test with three different data sets:

  • Very Compressible when “number of queries” is mostly 1
  • Somewhat Compressible when we use a range from 1 to 1000 and
  • Poorly Compressible when we use range from 1 to 1000000.

Since it’s unlikely that an application will use the full 32 bit range, we haven’t used it for this test.

Another factor which can impact ClickHouse performance is the number of “parts” the table has. After loading the data we ran OPTIMIZE TABLE FINAL to ensure only one part is there on the disk. Note: ClickHouse will gradually delete old files after the optimize command has completed. To avoid these operations interfering with benchmarks, I waited for about 15 minutes to ensure all unused data was removed from the disk.

The amount of memory on the system was enough to cache whole columns in all tests, so this is an in-memory test.

Here is how the table with only one part looks on disk:

root@d01e692c291f:/var/lib/clickhouse/data/pmm/test_lc# ls -la
total 28
drwxr-xr-x 4 clickhouse clickhouse 12288 Feb 10 20:39 .
drwxr-xr-x 8 clickhouse clickhouse 4096 Feb 10 22:38 ..
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 10 20:30 201902_1_372_4
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 10 19:38 detached
-rw-r--r-- 1 clickhouse clickhouse 1 Feb 10 19:38 format_version.txt

When you have only one part it makes it very easy to see the space different columns take:

root@d01e692c291f:/var/lib/clickhouse/data/pmm/test_lc/201902_1_372_4# ls -la
total 7950468
drwxr-xr-x 2 clickhouse clickhouse 4096 Feb 10 20:30 .
drwxr-xr-x 4 clickhouse clickhouse 12288 Feb 10 20:39 ..
-rw-r--r-- 1 clickhouse clickhouse 971 Feb 10 20:30 checksums.txt
-rw-r--r-- 1 clickhouse clickhouse 663703499 Feb 10 20:30 client_host.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 client_host.mrk
-rw-r--r-- 1 clickhouse clickhouse 238 Feb 10 20:30 columns.txt
-rw-r--r-- 1 clickhouse clickhouse 9 Feb 10 20:30 count.txt
-rw-r--r-- 1 clickhouse clickhouse 228415690 Feb 10 20:30 db_schema.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 db_schema.mrk
-rw-r--r-- 1 clickhouse clickhouse 6985801 Feb 10 20:30 db_server.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 db_server.mrk
-rw-r--r-- 1 clickhouse clickhouse 19020651 Feb 10 20:30 db_username.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 db_username.mrk
-rw-r--r-- 1 clickhouse clickhouse 28227119 Feb 10 20:30 digest.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 digest.mrk
-rw-r--r-- 1 clickhouse clickhouse 8 Feb 10 20:30 minmax_period_start.idx
-rw-r--r-- 1 clickhouse clickhouse 1552547644 Feb 10 20:30 nq_Float32.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 nq_Float32.mrk
-rw-r--r-- 1 clickhouse clickhouse 1893758221 Feb 10 20:30 nq_Float64.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 nq_Float64.mrk
-rw-r--r-- 1 clickhouse clickhouse 1552524811 Feb 10 20:30 nq_UInt32.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 nq_UInt32.mrk
-rw-r--r-- 1 clickhouse clickhouse 1784991726 Feb 10 20:30 nq_UInt64.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 nq_UInt64.mrk
-rw-r--r-- 1 clickhouse clickhouse 4 Feb 10 20:30 partition.dat
-rw-r--r-- 1 clickhouse clickhouse 400961033 Feb 10 20:30 period_start.bin
-rw-r--r-- 1 clickhouse clickhouse 754848 Feb 10 20:30 period_start.mrk
-rw-r--r-- 1 clickhouse clickhouse 2486243 Feb 10 20:30 primary.idx

We can see there are two files for every column (plus some extras), and so, for example, the Float32 based “number of queries” metric store takes around 1.5GB.

You can also use the SQL queries to get this data from the ClickHouse system tables instead:

FROM system.columns
WHERE (database = 'pmm') AND (table = 'test') AND (name = 'nq_UInt32')
Row 1:
database: pmm
table: test
name: nq_UInt32
type: UInt32
data_compressed_bytes: 7250570
data_uncompressed_bytes: 1545913232
marks_bytes: 754848
is_in_partition_key: 0
is_in_sorting_key: 0
is_in_primary_key: 0
is_in_sampling_key: 0
1 rows in set. Elapsed: 0.002 sec.
WHERE (database = 'pmm') AND (table = 'test')
Row 1:
partition: 201902
name: 201902_1_372_4
active: 1
marks: 47178
rows: 386478308
bytes_on_disk: 1401028031
data_compressed_bytes: 1390993287
data_uncompressed_bytes: 29642900064
marks_bytes: 7548480
modification_time: 2019-02-10 23:26:20
remove_time: 0000-00-00 00:00:00
refcount: 1
min_date: 0000-00-00
max_date: 0000-00-00
min_time: 2019-02-08 14:50:32
max_time: 2019-02-08 15:58:30
partition_id: 201902
min_block_number: 1
max_block_number: 372
level: 4
data_version: 1
primary_key_bytes_in_memory: 4373363
primary_key_bytes_in_memory_allocated: 6291456
database: pmm
table: test
engine: MergeTree
path: /var/lib/clickhouse/data/pmm/test/201902_1_372_4/
1 rows in set. Elapsed: 0.003 sec.

Now let’s look at the queries

We tested with two queries.  One of them – we’ll call it Q1 – is a very trivial query, simply taking the sum across all column values. This query needs only to access one column to return results so it is likely to be the most impacted by a change of data type:

SELECT sum(nq_UInt32)
FROM test

The second query – which we’ll call Q2 – is a typical ranking query which computes the number of queries per period and then shows periods with the highest amount of queries in them:

    sum(nq_UInt32) AS cnt,
FROM test
GROUP BY period_start

This query needs to access two columns and do more complicated processing so we expect it to be less impacted by the change of data type.

Before we get to results I think it is worth drawing attention to the raw performance we’re getting.  I did these tests on DigitalOcean Droplet with just six virtual CPU cores, yet still I see numbers like these:

SELECT sum(nq_UInt32)
FROM test
??sum(nq_UInt32) ???
?     386638984    ?
1 rows in set. Elapsed: 0.205 sec. Processed 386.48 million rows, 1.55 GB (1.88 billion rows/s., 7.52 GB/s.)

Processing more than 300M rows/sec per core and more than 1GB/sec per core is very cool!

Query Performance

Results between different compression levels show similar differences between column types, so let’s focus on those with the least compression:

Q1 least compression

Q2 least compression

As you can see, the width of the data type (32 bit vs 64 bit) matters a lot more than the type (float vs integer). In some cases float may even perform faster than integer. This was the most unexpected result for me.

Another metric ClickHouse reports is the processing speed in GB/sec. We see a different picture here:

Q1 GB per second

64 bit data types have a higher processing speed than their 32 bit counter parts, but queries run slower as there is more raw data to process.


Let’s now take a closer look at compression.  For this test we use default LZ4 compression. ClickHouse has powerful support for Per Column Compression Codecs but testing them is outside of scope for this post.

So let’s look at size on disk for UInt32 Column:

On disk data size for UINT32

What you can see from these results is that when data is very compressible ClickHouse can compress it to almost nothing.  The compression ratio for our very compressible data set is about 200x (or 99.5% size reduction if you prefer this metric).

Somewhat compressible data compression rate is 1.4x.  That’s not bad but considering we are only storing 1-1000 range in this column – which requires 10 bits out of 32 – I would hope for better compression. I guess LZ4 is not compressing such data very well.

Now let’s look at compression for a 64 bit integer column:

On disk data size for UINT64

We can see that while the size almost doubled for very compressible data, increases for our somewhat compressible data and poorly compressible data are not that large.  Somewhat compressible data now compresses 2.5x.

Now let’s take a look at Performance depending on data compressibility:

Q1 time for UINT32

Poorly compressible data which takes a larger space on disk is processed faster than somewhat compressible data? This did not make sense. I repeated the run a few times to make sure that the results were correct. When I looked at the compression ratio, though, it suddenly made sense to me.

Poorly compressible data for the UInt32 data type was not compressible by LZ4 so it seems the original data was stored, significantly speeding up “decompression” process.   With somewhat compressible data, compression worked and so real decompression needed to take place too. This makes things slower.

This is why we can only observe these results with UInt32 and Float32 data types.  UInt64 and Float64 show the more expected results:

Q1 time for UINT64


Here are my conclusions:

  • Even with “slower” data types, ClickHouse is very fast
  • Data type choice matters – but less than I expected
  • Width (32bit vs 64bit) impacts performance more than integer vs float data types
  • Storing a small range of values in a wider column type is likely to yield better compression, though with default compression it is not as good as theoretically possible
  • Compression is interesting. We get the best performance when data can be well compressed. Second best is when we do not have to spend a lot of time decompressing it, as long as it is fits in memory.

Should You Use ClickHouse as a Main Operational Database?

ClickHouse as a main operational database

ClickHouse as a main operational databaseFirst of all, this post is not a recommendation but more like a “what if” story. What if we use ClickHouse (which is a columnar analytical database) as our main datastore? Well, typically, an analytical database is not a replacement for a transactional or key/value datastore. However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node).  So can we use it as our main datastore?

Let’s imagine we are running a webservice and provide a public API. Public API as -a-service has become a good business model: examples include social networks like Facebook/Twitter, messaging as a service like Twilio, and even credit card authorization platforms like Marqeta. Let’s also imagine we need to store all messages (SMS messages, email messages, etc) we are sending and allow our customers to get various information about the message. This information can be a mix of analytical (OLAP) queries (i.e. how many messages was send for some time period and how much it cost) and a typical key/value queries like: “return 1 message by the message id”.

Using a columnar analytical database can be a big challenge here. Although such databases can be very efficient with counts and averages, some queries will be slow or simply non existent. Analytical databases are optimized for a low number of slow queries. The most important limitations of the analytical databases are:

  1. Deletes and updates are non-existent or slow
  2. Inserts are efficient for bulk inserts only
  3. No secondary indexes means that point selects (select by ID) tend to be very slow

This is all true for ClickHouse, however, we may be able to live with it for our task.

To simulate text messages I have used ~3 billion of reddit comments (10 years from 2007 to 2017), downloaded from . Vadim published a blog post about analyzing reddit comments with ClickHouse. In my case, I’m using this data as a simulation of text messages, and will show how we can use ClickHouse as a backend for an API.

Loading the JSON data to Clickhouse

I used the following table in Clickhouse to load all data:

CREATE TABLE reddit.rc(
body String,
score_hidden Nullable(UInt8),
archived Nullable(UInt8),
name String,
author String,
author_flair_text Nullable(String),
downs Nullable(Int32),
created_utc UInt32,
subreddit_id String,
link_id Nullable(String),
parent_id Nullable(String),
score Nullable(Int16),
retrieved_on Nullable(UInt32),
controversiality Nullable(Int8),
gilded Nullable(Int8),
id String,
subreddit String,
ups Nullable(Int16),
distinguished Nullable(String),
author_flair_css_class Nullable(String),
stickied Nullable(UInt8),
edited Nullable(UInt8)
) ENGINE = MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY created_utc ;

Then I used the following command to load the JSON data (downloaded from to ClickHouse:

$ bzip2 -d -c RC_20*.bz2 | clickhouse-client --input_format_skip_unknown_fields 1 --input_format_allow_errors_num 1000000 -d reddit -n --query="INSERT INTO rc FORMAT JSONEachRow"

The data on disk in ClickHouse is not significantly larger than compressed files, which is great:

#  du -sh /data/clickhouse/data/reddit/rc/
638G    /data/clickhouse/data/reddit/rc/
# du -sh /data/reddit/
404G    /data/reddit/

We have ~4 billion rows:

?               2006-01-01 ?               2018-05-31 ? 4148248585 ?
1 rows in set. Elapsed: 11.554 sec. Processed 4.15 billion rows, 16.59 GB (359.02 million rows/s., 1.44 GB/s.)

The data is partitioned and sorted by created_utc so queries which include created_utc will be able to using partition pruning: therefore skip the not-needed partitions. However, let’s say our API needs to support the following features, which are not common for analytical databases:

  1. Selecting a single comment/message by ID
  2. Retrieving the last 10 or 100 of the messages/comments
  3. Updating a single message in the past (e.g. in the case of messages, we may need to update the final price; in the case of comments, we may need to upvote or downvote a comment)
  4. Deleting messages
  5. Text search

With the latest ClickHouse version, all of these features are available, but some of them may not perform fast enough.

Retrieving a single row in ClickHouse

Again, this is not a typical operation in any analytical database, those databases are simply not optimized for it. ClickHouse does not have secondary indexes, and we are using created_utc as a primary key (sort by). So, selecting a message by just ID will require a full table scan:

WHERE id = 'dbumnpz'
? dbumnpz ?  1483228800 ?
1 rows in set. Elapsed: 18.070 sec. Processed 4.15 billion rows, 66.37 GB (229.57 million rows/s., 3.67 GB/s.)

Only if we know the timestamp (created_utc)… Then it will be lighting fast: ClickHouse will use the primary key:

WHERE (id = 'dbumnpz') AND (created_utc = 1483228800)
1 rows in set. Elapsed: 0.010 sec. Processed 8.19 thousand rows, 131.32 KB (840.27 thousand rows/s., 13.47 MB/s.)

Actually, we can simulate an additional index set by creating a materialized view in ClickHouse:

create materialized view rc_id_v
ENGINE MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY (id)
POPULATE AS SELECT id, created_utc from rc;

Here I’m creating a materialized view and populating it initially from the main (rc) table. The view will be updated automatically when there are any inserts into table reddit.rc. The view is actually another MergeTree table sorted by id. Now we can use this query:

WHERE (id = 'dbumnpz') AND (created_utc =
    SELECT created_utc
    FROM rc_id_v
    WHERE id = 'dbumnpz'
1 rows in set. Elapsed: 0.053 sec. Processed 8.19 thousand rows, 131.32 KB (153.41 thousand rows/s., 2.46 MB/s.)

This is a single query which will join our materialized view to pass the created_utc (timestamp) to the original table. It is a little bit slower but still less than 100ms response time.

Using this trick (materialized views) we can potentially simulate other indexes.

Retrieving the last 10 messages

This is where ClickHouse is not very efficient. Let’s say we want to retrieve the last 10 comments:

ORDER BY created_utc DESC
? dzwso7l ?  1527811199 ?
? dzwso7j ?  1527811199 ?
? dzwso7k ?  1527811199 ?
? dzwso7m ?  1527811199 ?
? dzwso7h ?  1527811199 ?
? dzwso7n ?  1527811199 ?
? dzwso7o ?  1527811199 ?
? dzwso7p ?  1527811199 ?
? dzwso7i ?  1527811199 ?
? dzwso7g ?  1527811199 ?
10 rows in set. Elapsed: 24.281 sec. Processed 4.15 billion rows, 82.96 GB (170.84 million rows/s., 3.42 GB/s.)

In a conventional relational database (like MySQL) this can be done by reading a btree index sequentially from the end, as the index is sorted (like “tail” command on linux). In a partitioned massively parallel database system, the storage format and sorting algorithm may not be optimized for that operation as we are reading multiple partitions in parallel. Currently, an issue has been opened to make the “tailing” based on the primary key much faster: slow order by primary key with small limit on big data. As a temporary workaround we can do something like this:

SELECT count()
WHERE (created_utc > (
    SELECT max(created_utc)
    FROM rc
) - ((60 * 60) * 24))) AND (subreddit = 'programming')
?    1248 ?
1 rows in set. Elapsed: 4.510 sec. Processed 3.05 million rows, 56.83 MB (675.38 thousand rows/s., 12.60 MB/s.) ```

It is still a five seconds query. Hopefully, this type of query will become faster in ClickHouse.

Updating / deleting data in ClickHouse

The latest ClickHouse version allows running update/delete in the form of “ALTER TABLE .. UPDATE / DELETE” (it is called mutations in ClickHouse terms). For example, we may want to upvote a specific comment.

SELECT score
FROM rc_2017
WHERE (id = 'dbumnpz') AND (created_utc =
    SELECT created_utc
    FROM rc_id_v
    WHERE id = 'dbumnpz'
?     2 ?
1 rows in set. Elapsed: 0.048 sec. Processed 8.19 thousand rows, 131.08 KB (168.93 thousand rows/s., 2.70 MB/s.)
:) alter table rc_2017 update score = score +1 where id =  'dbumnpz' and created_utc = (select created_utc from rc_id_v where id =  'dbumnpz');
    UPDATE score = score + 1 WHERE (id = 'dbumnpz') AND (created_utc =
        SELECT created_utc
        FROM rc_id_v
        WHERE id = 'dbumnpz'
0 rows in set. Elapsed: 0.052 sec.

“Mutation” queries will return immediately and will be executed asynchronously. We can see the progress by reading from the system.mutations table:

select * from system.mutations\G
FROM system.mutations
Row 1:
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_857.txt
command:                    UPDATE score = score + 1 WHERE (id = 'dbumnpz') AND (created_utc = (SELECT created_utc FROM reddit.rc_id_v  WHERE id = 'dbumnpz'))
create_time:                2018-12-27 22:22:05
block_numbers.partition_id: ['']
block_numbers.number:       [857]
parts_to_do:                0
is_done:                    1
1 rows in set. Elapsed: 0.002 sec.

Now we can try deleting comments that have been marked for deletion (body showing “[deleted]”):

    DELETE WHERE body = '[deleted]'
0 rows in set. Elapsed: 0.002 sec.
:) select * from system.mutations\G
FROM system.mutations
Row 2:
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_858.txt
command:                    DELETE WHERE body = '[deleted]'
create_time:                2018-12-27 22:41:01
block_numbers.partition_id: ['']
block_numbers.number:       [858]
parts_to_do:                64
is_done:                    0
2 rows in set. Elapsed: 0.017 sec.

After a while, we can do the count again:

:) select * from system.mutations\G
FROM system.mutations
Row 2:
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_858.txt
command:                    DELETE WHERE body = '[deleted]'
create_time:                2018-12-27 22:41:01
block_numbers.partition_id: ['']
block_numbers.number:       [858]
parts_to_do:                0
is_done:                    1

As we can see our “mutation” is done.

Text analysis

ClickHouse does not offer full text search, however we can use some text functions. In my previous blog post about ClickHouse I used it to find the most popular wikipedia page of the month. This time I’m trying to find the news keywords of the year using all reddit comments: basically I’m calculating the most frequently used new words for the specific year (algorithm based on an article about finding trending topics using Google Books n-grams data). To do that I’m using the ClickHouse function alphaTokens(body) which will split the “body” field into words. From there, I can count the words or use arrayJoin to create a list (similar to MySQL’s group_concat function). Here is the example:

First I created a table word_by_year_news:

create table word_by_year_news ENGINE MergeTree() PARTITION BY y ORDER BY (y) as
select a.w as w, b.y as y, sum(a.occurrences)/ as ratio from
 lower(arrayJoin(alphaTokens(body))) as w,
 toYear(toDate(created_utc)) as y,
 count() as occurrences
from rc
where body <> '[deleted]'
and created_utc < toUnixTimestamp('2018-01-01 00:00:00')
and created_utc >= toUnixTimestamp('2007-01-01 00:00:00')
and subreddit in ('news', 'politics', 'worldnews')
group by w, y
having length(w) > 4
) as a
 toYear(toDate(created_utc)) as y,
 sum(length(alphaTokens(body))) as total
from rc
where body <> '[deleted]'
and subreddit in ('news', 'politics', 'worldnews')
and created_utc < toUnixTimestamp('2018-01-01 00:00:00')
and created_utc >= toUnixTimestamp('2007-01-01 00:00:00')
group by y
) AS b
ON a.y = b.y
group by
0 rows in set. Elapsed: 787.032 sec. Processed 7.35 billion rows, 194.32 GB (9.34 million rows/s., 246.90 MB/s.)

This will store all frequent words (I’m filtering by subreddits; the examples are: “news, politics and worldnews” or “programming”) as well as its occurrence this year; actually I want to store “relative” occurrence which is called “ratio” above: for each word I divide its occurrence by the number of total words this year (this is needed as the number of comments grows significantly year by year).

Now we can actually calculate the words of the year:

    groupArray(w) as words,
    y + 1 as year
        CAST((y - 1) AS UInt16) AS y,
        ratio AS a_ratio
    FROM word_by_year_news
    WHERE ratio > 0.00001
) AS a
        ratio AS b_ratio
    FROM word_by_year_news
    WHERE ratio > 0.00001
) AS b USING (w, y)
WHERE (y > 0) AND (a_ratio / b_ratio > 3)
LIMIT 100;
10 rows in set. Elapsed: 0.232 sec. Processed 14.61 million rows, 118.82 MB (63.01 million rows/s., 512.29 MB/s.)

And the results are (here I’m grouping words for each year):

For “programming” subreddit:

? 2007 ? ['audio','patents','swing','phones','gmail','opera','devices','phone','adobe','vista','backup','mercurial','mobile','passwords','scala','license','copyright','licenses','photoshop'] ?
? 2008 ? ['webkit','twitter','teacher','android','itunes']                                                                                                                                     ?
? 2009 ? ['downvotes','upvote','drupal','android','upvoted']                                                                                                                                   ?
? 2010 ? ['codecs','imgur','floppy','codec','adobe','android']                                                                                                                                 ?
? 2011 ? ['scala','currency','println']                                                                                                                                                        ?
? 2013 ? ['voting','maven']                                                                                                                                                                    ?
? 2014 ? ['compose','xamarin','markdown','scrum','comic']                                                                                                                                      ?
? 2015 ? ['china','sourceforge','subscription','chinese','kotlin']                                                                                                                             ?
? 2016 ? ['systemd','gitlab','autotldr']                                                                                                                                                       ?
? 2017 ? ['offices','electron','vscode','blockchain','flash','collision']                                                                                                                      ?

For news subreddit:

? 2008 ? ['michigan','delegates','obama','alaska','georgia','russians','hamas','biden','hussein','barack','elitist','mccain']                                                                                                                                       ?
? 2009 ? ['stimulus','reform','medicare','franken','healthcare','payer','insurance','downvotes','hospitals','patients','option','health']                                                                                                                           ?
? 2010 ? ['blockade','arizona']                                                                                                                                                                                                                                     ?
? 2011 ? ['protests','occupy','romney','weiner','protesters']                                                                                                                                                                                                       ?
? 2012 ? ['santorum','returns','martin','obamacare','romney']                                                                                                                                                                                                       ?
? 2013 ? ['boston','chemical','surveillance']                                                                                                                                                                                                                       ?
? 2014 ? ['plane','poland','radar','subreddits','palestinians','putin','submission','russia','automoderator','compose','rockets','palestinian','hamas','virus','removal','russians','russian']                                                                      ?
? 2015 ? ['refugees','refugee','sanders','debates','hillary','removal','participating','removed','greece','clinton']                                                                                                                                                ?
? 2016 ? ['morons','emails','opponent','establishment','trump','reply','speeches','presidency','clintons','electoral','donald','trumps','downvote','november','subreddit','shill','domain','johnson','classified','bernie','nominee','users','returns','primaries','foundation','voters','autotldr','clinton','email','supporter','election','feedback','clever','leaks','accuse','candidate','upvote','rulesandregs','convention','conduct','uncommon','server','trolls','supporters','hillary'] ?
? 2017 ? ['impeached','downvotes','monitored','accusations','alabama','violation','treason','nazis','index','submit','impeachment','troll','collusion','bannon','neutrality','permanent','insults','violations']                                                    ?


ClickHouse is a great massively parallel analytical system. It is extremely efficient and can potentially (with some hacks) be used as a main backend database powering a public API gateway serving both realtime and analytical queries. At the same time, it was not originally designed that way. Let me know in the comments if you are using ClickHouse for this or similar projects.

Photo by John Baker on Unsplash




ClickHouse: Two Years!

historical trend of ClickHouse popularity

Following my post from a year ago, I wanted to review what happened in ClickHouse during this year.
There is indeed some interesting news to share.

1. ClickHouse in DB-Engines Ranking. It did not quite get into the top 100, but the gain from position 174 to 106 is still impressive. Its DB-Engines Ranking score tripled from 0.54 last September to 1.57 this September

And indeed, in my conversation with customers and partners, the narrative has changed from: “ClickHouse, what is it?” to “We are using or considering ClickHouse for our analytics needs”.

2. ClickHouse changed their versioning schema. Unfortunately it changed from the unconventional …; 1.1.54390; 1.1.54394 naming structure to the still unconventional 18.6.0; 18.10.3; 18.12.13 naming structure, where “18.” is a year of the release.

Now to the more interesting technical improvements.

3. Support of the more traditional JOIN syntax. Now if you join two tables you can use SELECT ... FROM tab1 ANY LEFT JOIN tab2 ON tab1_col=tab2_col .

So now, if we take a query from the workload described in

We can write this:

FROM lineorder ANY INNER JOIN customer

instead of the monstrous:

FROM lineorder
    FROM customer

4. Support for DELETE and UPDATE operations. This has probably been the most requested feature since the first ClickHouse release.
ClickHouse uses an LSM-tree like structure—MergeTree—and it is not friendly to single row operations. To highlight this specific limitation, ClickHouse uses ALTER TABLE UPDATE / ALTER TABLE DELETE syntax to highlight this will be executed as a bulk operation, so please consider it as such. Updating or deleting rows in ClickHouse should be an exceptional operation, rather than a part of your day-to-day workload.

We can update a column like this: ALTER TABLE lineorder UPDATE LO_DISCOUNT = 5 WHERE LO_CUSTKEY = 199568

5. ClickHouse added a feature which I call Dictionary Compression, but ClickHouse uses the name “LowCardinality”. It is still experimental, but I hope soon it will be production ready. Basically it allows internally to replace long strings with a short list of enumerated values.

For example, consider the table from our example lineorder which contains 600037902 rows, but has only five different values for the column LO_ORDERPRIORITY:

FROM lineorder
? 1-URGENT         ?
? 5-LOW            ?
? 2-HIGH           ?
? 3-MEDIUM         ?

So we can define our table as:

CREATE TABLE lineorder_dict (
        LO_ORDERKEY             UInt32,
        LO_LINENUMBER           UInt8,
        LO_CUSTKEY              UInt32,
        LO_PARTKEY              UInt32,
        LO_SUPPKEY              UInt32,
        LO_ORDERDATE            Date,
        LO_ORDERPRIORITY        LowCardinality(String),
        LO_SHIPPRIORITY         UInt8,
        LO_QUANTITY             UInt8,
        LO_EXTENDEDPRICE        UInt32,
        LO_ORDTOTALPRICE        UInt32,
        LO_DISCOUNT             UInt8,
        LO_REVENUE              UInt32,
        LO_SUPPLYCOST           UInt32,
        LO_TAX                  UInt8,
        LO_COMMITDATE           Date,
        LO_SHIPMODE             LowCardinality(String)

How does this help? Firstly, it offers space savings. The table will take less space in storage, as it will use integer values instead of strings. And secondly, performance. The filtering operation will be executed faster.

For example: here’s a query against the table with LO_ORDERPRIORITY stored as String:

SELECT count(*)
FROM lineorder
? 119995822 ?
1 rows in set. Elapsed: 0.859 sec. Processed 600.04 million rows, 10.44 GB (698.62 million rows/s., 12.16 GB/s.)

And now the same query against table with LO_ORDERPRIORITY as LowCardinality(String):

SELECT count(*)
FROM lineorder_dict
? 119995822 ?
1 rows in set. Elapsed: 0.350 sec. Processed 600.04 million rows, 600.95 MB (1.71 billion rows/s., 1.72 GB/s.)

This is 0.859 sec vs 0.350 sec (for the LowCardinality case).

Unfortunately this feature is not optimized for all use cases, and actually in aggregation it performs slower.

An aggregation query against table with LO_ORDERPRIORITY as String:

FROM lineorder
? 1-URGENT         ?
? 2-HIGH           ?
? 3-MEDIUM         ?
? 5-LOW            ?
5 rows in set. Elapsed: 1.200 sec. Processed 600.04 million rows, 10.44 GB (500.22 million rows/s., 8.70 GB/s.)

Versus an aggregation query against table with LO_ORDERPRIORITY as LowCardinality(String):

FROM lineorder_dict
? 1-URGENT         ?
? 2-HIGH           ?
? 3-MEDIUM         ?
? 5-LOW            ?
5 rows in set. Elapsed: 2.334 sec. Processed 600.04 million rows, 600.95 MB (257.05 million rows/s., 257.45 MB/s.)

This is 1.200 sec vs 2.334 sec (for the LowCardinality case)

6. And the last feature I want to mention is the better support of Tableau Software: this required ODBC drivers. It may not seem significant, but Tableau is the number one software for data analysts, and by supporting this, ClickHouse will reach a much wider audience.

Summing up: ClickHouse definitely became much more user friendly since a year ago!

The post ClickHouse: Two Years! appeared first on Percona Database Performance Blog.


Archiving MySQL Tables in ClickHouse

Archiving MySQL Tables in ClickHouse

Archiving MySQL Tables in ClickHouseIn this blog post, I will talk about archiving MySQL tables in ClickHouse for storage and analytics.

Why Archive?

Hard drives are cheap nowadays, but storing lots of data in MySQL is not practical and can cause all sorts of performance bottlenecks. To name just a few issues:

  1. The larger the table and index, the slower the performance of all operations (both writes and reads)
  2. Backup and restore for terabytes of data is more challenging, and if we need to have redundancy (replication slave, clustering, etc.) we will have to store all the data N times

The answer is archiving old data. Archiving does not necessarily mean that the data will be permanently removed. Instead, the archived data can be placed into long-term storage (i.e., AWS S3) or loaded into a special purpose database that is optimized for storage (with compression) and reporting. The data is then available.

Actually, there are multiple use cases:

  • Sometimes the data just needs to be stored (i.e., for regulatory purposes) but does not have to be readily available (it’s not “customer facing” data)
  • The data might be useful for debugging or investigation (i.e., application or access logs)
  • In some cases, the data needs to be available for the customer (i.e., historical reports or bank transactions for the last six years)

In all of those cases, we can move the older data away from MySQL and load it into a “big data” solution. Even if the data needs to be available, we can still move it from the main MySQL server to another system. In this blog post, I will look at archiving MySQL tables in ClickHouse for long-term storage and real-time queries.

How To Archive?

Let’s say we have a 650G table that stores the history of all transactions, and we want to start archiving it. How can we approach this?

First, we will need to split this table into “old” and “new”. I assume that the table is not partitioned (partitioned tables are much easier to deal with). For example, if we have data from 2008 (ten years worth) but only need to store data from the last two months in the main MySQL environment, then deleting the old data would be challenging. So instead of deleting 99% of the data from a huge table, we can create a new table and load the newer data into that. Then rename (swap) the tables. The process might look like this:

  1. CREATE TABLE transactions_new LIKE transactions
  2. INSERT INTO transactions_new SELECT * FROM transactions WHERE trx_date > now() – interval 2 month
  3. RENAME TABLE transactions TO transactions_old, transactions_new TO transactions

Second, we need to move the transactions_old into ClickHouse. This is straightforward — we can pipe data from MySQL to ClickHouse directly. To demonstrate I will use the Wikipedia:Statistics project (a real log of all requests to Wikipedia pages).

Create a table in ClickHouse:

    id bigint,
    dt DateTime,
    project String,
    subproject String,
    path String,
    hits UInt64,
    size UInt64
ENGINE = MergeTree
0 rows in set. Elapsed: 0.010 sec.

Please note that I’m using the new ClickHouse custom partitioning. It does not require that you create a separate date column to map the table in MySQL to the same table structure in ClickHouse

Now I can “pipe” data directly from MySQL to ClickHouse:

mysql --quick -h localhost wikistats -NBe
"SELECT concat(id,',"',dt,'","',project,'","',subproject,'","', path,'",',hits,',',size) FROM wikistats" |
clickhouse-client -d wikistats --query="INSERT INTO wikistats FORMAT CSV"

Thirdwe need to set up a constant archiving process so that the data is removed from MySQL and transferred to ClickHouse. To do that we can use the “pt-archiver” tool (part of Percona Toolkit). In this case, we can first archive to a file and then load that file to ClickHouse. Here is the example:

Remove data from MySQL and load to a file (tsv):

pt-archiver --source h=localhost,D=wikistats,t=wikistats,i=dt --where "dt <= '2018-01-01 0:00:00'"  --file load_to_clickhouse.txt --bulk-delete --limit 100000 --progress=100000
TIME                ELAPSED   COUNT
2018-01-25T18:19:59       0       0
2018-01-25T18:20:08       8  100000
2018-01-25T18:20:17      18  200000
2018-01-25T18:20:26      27  300000
2018-01-25T18:20:36      36  400000
2018-01-25T18:20:45      45  500000
2018-01-25T18:20:54      54  600000
2018-01-25T18:21:03      64  700000
2018-01-25T18:21:13      73  800000
2018-01-25T18:21:23      83  900000
2018-01-25T18:21:32      93 1000000
2018-01-25T18:21:42     102 1100000

Load the file to ClickHouse:

cat load_to_clickhouse.txt | clickhouse-client -d wikistats --query="INSERT INTO wikistats FORMAT TSV"

The newer version of pt-archiver can use a CSV format as well:

pt-archiver --source h=localhost,D=wikitest,t=wikistats,i=dt --where "dt <= '2018-01-01 0:00:00'"  --file load_to_clickhouse.csv --output-format csv --bulk-delete --limit 10000 --progress=10000

How Much Faster Is It?

Actually, it is much faster in ClickHouse. Even the queries that are based on index scans can be much slower in MySQL compared to ClickHouse.

For example, in MySQL just counting the number of rows for one year can take 34 seconds (index scan):

mysql> select count(*) from wikistats where dt between '2017-01-01 00:00:00' and '2017-12-31 00:00:00';
| count(*)  |
| 103161991 |
1 row in set (34.82 sec)
mysql> explain select count(*) from wikistats where dt between '2017-01-01 00:00:00' and '2017-12-31 00:00:00'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wikistats
   partitions: NULL
         type: range
possible_keys: dt
          key: dt
      key_len: 6
          ref: NULL
         rows: 227206802
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

In ClickHouse, it only takes 0.062 sec:

:) select count(*) from wikistats where dt between  toDateTime('2017-01-01 00:00:00') and  toDateTime('2017-12-31 00:00:00');
SELECT count(*)
FROM wikistats
WHERE (dt >= toDateTime('2017-01-01 00:00:00')) AND (dt <= toDateTime('2017-12-31 00:00:00'))
? 103161991 ?
1 rows in set. Elapsed: 0.062 sec. Processed 103.16 million rows, 412.65 MB (1.67 billion rows/s., 6.68 GB/s.)

Size on Disk

In my previous blog on comparing ClickHouse to Apache Spark to MariaDB, I also compared disk size. Usually, we can expect a 10x to 5x decrease in disk size in ClickHouse due to compression. Wikipedia:Statistics, for example, contains actual URIs, which can be quite large due to the article name/search phrase. This can be compressed very well. If we use only integers or use MD5 / SHA1 hashes instead of storing actual URIs, we can expect much smaller compression (i.e., 3x). Even with a 3x compression ratio, it is still pretty good as long-term storage.


As the data in MySQL keeps growing, the performance for all the queries will keep decreasing. Typically, queries that originally took milliseconds can now take seconds (or more). That requires a lot of changes (code, MySQL, etc.) to make faster.

The main goal of archiving the data is to increase performance (“make MySQL fast again”), decrease costs and improve ease of maintenance (backup/restore, cloning the replication slave, etc.). Archiving to ClickHouse allows you to preserve old data and make it available for reports.


Updating/Deleting Rows From Clickhouse (Part 2)


ClickHouseIn this post, we’ll look at updating and deleting rows with ClickHouse. It’s the second of two parts.

In the first part of this post, we described the high-level overview of implementing incremental refresh on a ClickHouse table as an alternative support for UPDATE/DELETE. In this part, we will show you the actual steps and sample code.

Prepare Changelog Table

First, we create the changelog table below. This can be stored on any other MySQL instance separate from the source of our analytics table. When we run the change capture script, it will record the data on this table that we can consume later with the incremental refresh script:

CREATE TABLE `clickhouse_changelog` (
  `db` varchar(255) NOT NULL DEFAULT '',
  `tbl` varchar(255) NOT NULL DEFAULT '',
  `created_at` date NOT NULL,
  `log_file` varchar(255) NOT NULL,
  `log_pos` int(10) unsigned NOT NULL,
  PRIMARY KEY (`db`,`tbl`,`created_at`),
  KEY `log_file` (`log_file`,`log_pos`)

Create ClickHouse Table

Next, let’s create the target ClickhHouse table. Remember, that the corresponding MySQL table is below:

  `type` varchar(100) DEFAULT NULL,
  `user_id` int(11) NOT NULL,
  `location_id` int(11) NOT NULL,
  `created_at` datetime DEFAULT NULL
  PRIMARY KEY (`id`),
  KEY `created_at` (`created_at`)

Converting this table to ClickHouse looks like below, with the addition of a “created_day” column that serves as the partitioning key:

  id Int32,
  created_day Date,
  type String,
  user_id Int32,
  location_id Int32,
  created_at Int32
) ENGINE = MergeTree PARTITION BY toMonday(created_day)
ORDER BY (created_at, id) SETTINGS index_granularity = 8192;

Run Changelog Capture

Once the tables are ready, running the change capture script. An example script can be found in this gist, which is written in Python and uses the python-mysql-replication library. This library acts as replication client, continuously downloads the binary logs from the source and sifts through it to find any UPDATE/DELETE executed against our source table.

There are a few configuration options that need to be customized in the script.

  • LOG_DB_HOST: The MySQL host where we created the


  • LOG_DB_NAME: The database name where the

     table is created.

  • SRC_DB_HOST: The MySQL host where we will be downloading binary logs from. This can either be a primary or secondary/replica as long as its the same server where our raw table is also located.
  • MYSQL_USER: MySQL username.
  • MYSQL_PASS: MySQL password.
  • TABLE: The table we want to watch for changes.

When the script is successfully configured and running, the


 table should start populating with data like below.

mysql> select * from mydb.clickhouse_changelog;
| db   | tbl  | created_at | updated_at          | log_file         | log_pos   |
| mydb | hits | 2014-06-02 | 2017-12-23 17:19:33 | mysql-bin.016353 |  18876747 |
| mydb | hits | 2014-06-09 | 2017-12-23 22:10:29 | mysql-bin.016414 |   1595314 |
| mydb | hits | 2014-06-16 | 2017-12-23 02:59:37 | mysql-bin.016166 |  33999981 |
| mydb | hits | 2014-06-23 | 2017-12-23 18:09:33 | mysql-bin.016363 |  84498477 |
| mydb | hits | 2014-06-30 | 2017-12-23 06:08:59 | mysql-bin.016204 |  23792406 |
| mydb | hits | 2014-08-04 | 2017-12-23 18:09:33 | mysql-bin.016363 |  84499495 |
| mydb | hits | 2014-08-18 | 2017-12-23 18:09:33 | mysql-bin.016363 |  84500523 |
| mydb | hits | 2014-09-01 | 2017-12-23 06:09:19 | mysql-bin.016204 |  27120145 |

Full Table Import

So we have our changelog capture in place, the next step is to initially populate the ClickHouse table from MySQL. Normally, we can easily do this with a 


 into a tab-separated format, but remember we have to transform the


 column from MySQL into ClickHouse’s


 format to be used as partitioning key.

A simple way to do this is by using a simple set of shell commands like below:

SQL=$(cat <<EOF
	id, DATE_FORMAT(created_at, "%Y-%m-%d"),
	type, user_id, location_id, UNIX_TIMESTAMP(created_at)
FROM hits
mysql -h source_db_host mydb -BNe "$sql" > hist.txt
cat hist.txt | clickhouse-client -d mydb --query="INSERT INTO hits FORMAT TabSeparated"

One thing to note about this process is that the


 client buffers the results for the whole query, and it could eat up all the memory on the server you run this from if the table is really large. To avoid this, chunk the table into several million rows at a time. Since we already have the changelog capture running and in place from the previous step, you do not need to worry about any changes between chunks. We will consolidate those changes during the incremental refreshes.

Incremental Refresh

After initially populating the ClickHouse table, we then set up our continuous incremental refresh using a separate script. A template script we use for the table on our example can be found in this gist.

What this script does is twofold:

  • Determines the list of weeks recently modified based on

    , dump rows for those weeks and re-imports to ClickHouse.

  • If the current week is not on the list of those with modifications, it also checks for new rows based on the auto-incrementing primary key and appends them to the ClickHouse table.

An example output of this script would be:

ubuntu@mysql~/clickhouse$ bash hits
2017-12-24_00_20_19 incr-refresh Starting changelog processing for hits
2017-12-24_00_20_19 incr-refresh Current week is: 2017-12-18
2017-12-24_00_20_19 incr-refresh Processing week: 2017-12-18
2017-12-24_00_20_20 incr-refresh Changelog import for hits complete
ubuntu@mysql~/clickhouse$ bash hits
2017-12-24_00_20_33 incr-refresh Starting changelog processing for hits
2017-12-24_00_20_33 incr-refresh Current week is: 2017-12-18
2017-12-24_00_20_33 incr-refresh Weeks is empty, nothing to do
2017-12-24_00_20_33 incr-refresh Changelog import for hits complete
2017-12-24_00_20_33 incr-refresh Inserting new records for hits > id: 5213438
2017-12-24_00_20_33 incr-refresh No new rows found
2017-12-24_00_20_33 incr-refresh Incremental import for hits complete

Note that, on step 4, if you imported a really large table and the changelog had accumulated a large number of changes to refresh, the initial incremental execution might take some time. After that though, it should be faster. This script can be run every minute, longer or shorter, depending on how often you want the ClickHouse table to be refreshed.

To wrap up, here is a query from MySQL on the same table, versus ClickHouse.

mysql> SELECT COUNT(DISTINCT user_id) FROM hits WHERE created_at
    -> BETWEEN '2016-01-01 00:00:00' AND '2017-01-01 00:00:00';
| COUNT(DISTINCT user_id) |
|                 3023028 |
1 row in set (25.89 sec)

:) SELECT COUNT(DISTINCT user_id) FROM hits WHERE created_at BETWEEN 1451606400 AND 1483228800;
SELECT COUNTDistinct(user_id)
FROM hits
WHERE (created_at >= 1451606400) AND (created_at <= 1483228800)
?            3023028 ?
1 rows in set. Elapsed: 0.491 sec. Processed 35.82 million rows, 286.59 MB (73.01 million rows/s., 584.06 MB/s.)



ClickHouse MySQL Silicon Valley Meetup Wednesday, October 25 at Uber Engineering with Percona’s CTO Vadim Tkachenko

ClickHouse MySQL

ClickHouse MySQLI will be presenting at the ClickHouse MySQL Silicon Valley Meetup on Wednesday, October 25, 2017, at 6:30 PM.

ClickHouse is a real-time analytical database system. Even though they’re only celebrating one year as open source software, it has already proved itself ready for the serious workloads. We will talk about ClickHouse in general, some internals and why it is so fast. ClickHouse works in conjunction with MySQL – traditionally weak for analytical workloads – and this presentation demonstrates how to make the two systems work together.

My talk will cover how we can improve the experience with real-time analytics using ClickHouse, and how we can integrate ClickHouse with MySQL.

I want to thank our friends at Uber Engineering who agreed to host this event.

Please join us here:

Vadim TkachenkoVadim Tkachenko, Percona CTO

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks.

Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products.

He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition. Previously, he founded a web development company in his native Ukraine and spent two years in the High Performance Group within the official MySQL support team.

Powered by WordPress | Theme: Aeros 2.0 by