In this blog, we’ll look at ClickHouse on its one year anniversary.
It’s been a year already since the Yandex team released ClickHouse as open source software. I’ve had an interest in this project from the very start, as I didn’t think there was an open source analytical database that could compete with industry leaders like Vertica (for example).
This was an exciting year for ClickHouse early adopters. Let’s look at what it accomplished so far.
ClickHouse initially generated interest due to the Yandex name – the most popular search engine in Russia. It wasn’t long before jaw-dropping responses popped up: guys, this thing is crazy fast! Many early adopters who tried ClickHouse were really impressed.
Fast doesn’t mean convenient though. That was the main community concern to ClickHouse over the past months. Developed as an internal project for an internal customer (Yandex.Metrica), ClickHouse had a lot of limitations for general community use. It took several months before Yandex could restructure the team and mindset, establish proper communication with the community and start addressing external needs. There are still a lot of things that need to be done. The public roadmap is not easily available, for example, but the wheels are pointed in the right direction. The ClickHouse team has added a lot of the features people were screaming for, and more are in ClickHouse’s future plans.
The Yandex guys are actively attending international conferences, and they were:
- Welcomed speakers at Percona Live Amsterdam 2016 (https://www.percona.com/live/plam16/sites/default/files/slides/ClickHouse%20Percona%20Live%20v2.9.pdf)
- At Percona Live 2017 (https://www.percona.com/live/17/sessions/clickhouse-high-performance-distributed-dbms-analytics, https://www.percona.com/live/17/sessions/clickhouse-time-series-storage-graphite),
- Recently at Date@Scale in Seatle (https://atscaleconference.com/videos/yandex-clickhouse-a-dbms-for-interactive-analytics-at-scale/).
They are speaking much more in Russia (no big surprise).
We were very excited by Yandex’s ClickHouse performance claims at Percona, and could not resist making our own benchmarks:
- ClickHouse: New Open Source Columnar Database
- Column Store Database Benchmarks: MariaDB ColumnStore vs. ClickHouse vs. Apache Spark
- ClickHouse in a General Analytical Workload (Based on a Star Schema Benchmark)
ClickHouse did very well in these benchmarks. There are many other benchmarks by other groups as well, including a benchmark against Amazon RedShift by Altinity.
The first ClickHouse production deployments outside of Yandex started in October-November 2016. Today, Yandex reports that dozens of companies around the world are using ClickHouse in production, with the biggest installations operating with up to several petabytes of data. Hundreds of other enterprises are deploying pilot installations or actively evaluating the software.
There are also interesting reports from CloudFare (How Cloudflare analyzes 1M DNS queries per second) and from Carto (Geospatial processing with ClickHouse).
There are also various community projects around ClickHouse worth mentioning:
- Tabix, Web GUI for ClickHouse: https://github.com/smi2/tabix.ui
- SQLAlchemy integration from CloudFare: https://github.com/cloudflare/sqlalchemy-clickhouse
- Support for Superset, a data exploration platform, from AirBnB
Percona is also working to adapt ClickHouse to our projects. We are using ClickHouse to handle Query Analytics and as a long term metrics data for Metrics inside a new version (under development) of Percona Monitoring and Management.
I also will be speaking about ClickHouse at BIG DATA DAY LA 2017 on August 5th, 2017. You are welcome to attend if you are in Los Angeles this day!
ClickHouse has the potential to become one of the leading open source analytical DBMSs – much like MySQL and PostreSQL are leaders for OLTP workloads. We will see in the next couple of years if it happens or not. Congratulations to the Yandex team on their one-year milestone!