PingCAP, the open-source developer behind TiDB, closes $270 million Series D

PingCAP, the open-source software developer best known for NewSQL database TiDB, has raised a $270 million Series D. TiDB handles hybrid transactional and analytical processing (HTAP), and is aimed at high-growth companies, including payment and e-commerce services, that need to handle increasingly large amounts of data.

The round’s lead investors were GGV Capital, Access Technology Ventures, Anatole Investment, Jeneration Capital and 5Y Capital (formerly known as Morningside Venture Capital). It also included participation from Coatue, Bertelsmann Asia Investment Fund, FutureX Capital, Kunlun Capital, Trustbridge Partners, and returning investors Matrix Partners China and Yunqi Partners.

The funding brings PingCAP’s total raised so far to $341.6 million. Its last round, a Series C of $50 million, was announced back in September 2018.

PingCAP says TiDB has been adopted by about 1,500 companies across the world. Some examples include Square; Japanese mobile payments company PayPay; e-commerce app Shopee; video-sharing platform Dailymotion; and ticketing platfrom BookMyShow. TiDB handles online transactional processing (OLTP) and online analytical processing (OLAP) in the same database, which PingCAP says results in faster real-time analytics than other distributed databases.

In June, PingCAP launched TiDB Cloud, which it describes as fully-managed “TiDB as a Service,” on Amazon Web Services and Google Cloud. The company plans to add more platforms, and part of the funding will be used to increase TiDB Cloud’s global user base.


Upcoming Webinar Wed 5/1: Horizontally scale MySQL with TiDB while avoiding sharding

horizontally scale MySQL with TiDB

horizontally scale MySQL with TiDBJoin Percona CEO Peter Zaitsev and PingCAP Senior Product and Community Manager Morgan Tocker as they present How to horizontally scale MySQL with TiDB while avoiding sharding issues on Wednesday, May 1st, 2019, at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

Register Now

In this joint webinar, PingCAP will provide an introduction and overview of TiDB, tailored for those with a strong background in MySQL. PingCAP will use MySQL as an example to explain various implementation details of TiDB and translate terminology to MySQL/InnoDB terms.

Percona will then discuss sharding issues within MySQL and how TiDB resolves many of those issues through horizontal scalability.

In this webinar, the following topics will be covered:

– TiDB and the problems it resolves
– Issues with MySQL sharding, including cautionary use cases
– Benchmark results from different versions of TiDB
– MySQL compatibility with TiDB
– Summary, and question and answer

Register for this webinar on how to scale MySQL with TiDB.


A Quick Look into TiDB Performance on a Single Server

TiDB MySQL plot

TiDB is an open-source distributed database developed by PingCAP. This is a very interesting project as it is can be used as a MySQL drop-in replacement: it implements MySQL protocol, and basically emulates MySQL. PingCAP defines TiDB is as a “one-stop data warehouse for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing) workloads”. In this blog post I have decided to see how TiDB performs on a single server compared to MySQL for both OLTP and OLAP workload. Please note, this benchmark is very limited in scope: we are only testing TiDB and MySQL on a single server – TiDB is a distributed database out of the box.

Short version: TiDB supports parallel query execution for selects and can utilize many more CPU cores – MySQL is limited to a single CPU core for a single select query. For the higher-end hardware – ec2 instances in my case – TiDB can be 3-4 times faster for complex select queries (OLAP workload) which do not use, or benefit from, indexes. At the same time point selects and writes, especially inserts, can be 5x-10x slower. Again, please note that this test was on a single server, with a single TiKV process.


Please note: the following setup is only intended for testing and not for production. 

I installed the latest version of TiDB to take advantage of the latest performance improvements, at the time of writing:

cat make-full-tidb-server
set -x
cd /tidb
tar -xzf tidb-*.tar.gz
cd tidb-*-linux-amd64/
./bin/pd-server  --data-dir=pd --log-file=pd.log &
sleep 5
./bin/tikv-server --pd="" --data-dir=tikv -A --log-file=tikv.log &
sleep 5
cd ~/go/src/
make server
./bin/tidb-server --store=tikv --path=""

The normal installation process is described here (different methods are available).


The main purpose of this test is to compare MySQL to TiDB. As with any distributed database it is hard to design an “apples to apples” comparison: we may compare a distributed workload spanning across many servers/nodes (in this case TiDB) to a single server workload (in this case MySQL). To overcome this challenge, I decided to focus on “efficiency”. If the distributed database is not efficient – i.e. it may require 10s or 100s of nodes to do the same job as the non-distributed database – it may be cost prohibitive to use such database for a small or medium size DB.

The preliminary results are: TiDB is much more efficient for SELECT (OLAP workload) but much less efficient for WRITES and typical OLTP workload. To overcome these limitations it is possible to use more servers.

For this test I was using two types of benchmarks:

  1. OLAP: a set of complex queries on top of an “ontime” database (airline historical flight information database). For this benchmark I used different AWS ec2 instances with CPU cores ranging from 2 to 96. This is response time test (not a throughput test)
  2. OLTP: sysbench (as always): point-select and write-only standard workloads. This is throughput test, increasing the number of threads.

OLAP / analytical queries test

Database size is 70Gb in MySQL and 30Gb in TiDB (compressed). The table has no secondary indexes (except the primary key).

I used the following four queries:

  1. Simple count(*): select count(*) from ontime;
  2. Simple group by: select count(*), year from ontime group by year order by year;
  3. Complex filter for a full table scan: select * from ontime where UniqueCarrier = 'DL' and TailNum = 'N317NB' and FlightNum = '2' and Origin = 'JFK' and Dest = 'FLL' limit 10;
  4. Complex group by and order by query:
    FlightDate, UniqueCarrier as carrier,
    FROM ontime
    DestState not in ('AK', 'HI', 'PR', 'VI')
    and OriginState not in ('AK', 'HI', 'PR', 'VI')
    and flightdate > '2015-01-01'
    and ArrDelay < 15
    and cancelled = 0 and Diverted = 0
    and DivAirportLandings = '0'
    ORDER by DepDelay DESC
    LIMIT 10;

I used five ec2 instances:

  • t2.medium: 2 CPU cores
  • x1e.xlarge: 4 CPU cores
  • r4.4xlarge: 16 CPU cores
  • m4.16xlarge: 64 CPU cores
  • m5.24xlarge: 96 CPU cores

The following graph represents the results (bars represents the query response time, the smaller the better):

As we can see, TiDB scales very well increasing the number of CPU cores, as we go from lower to higher end instances. t2.medium and x1e.xlarge is interesting here thou:

  1. t2.medium has 2 CPU cores and not enough RAM (2Gb) to store database in memory. Both MySQL/InnoDB and TiDB/TiKV performs a lots of disk reads – this is disk bound workload
  2. x1e.xlarge is an example of the opposite instance type: 4 CPU core and  122GB RAM, I’m using memory bound workload here (where both MySQL and TiDB data is cached).

All other instances have enough RAM to cache the database in memory, and with more CPU TiDB can take advantages of query parallelism and provide better response time.

Sysbench test

Select test

I used point select (meaning select one row by primary key, threads ranges from 1 to 128) with Sysbench on an m4.16xlarge instance (memory bound: no disk reads). The results are here.  The bars represent the number of transactions per second, the more the better:

This workload is actually gives a great advantage to MySQL/InnoDB as it retrieves a single row based on the primary key. MySQL is significantly faster here: 5x to 10x faster. Unlike the previous workload – 1 single slow query – for “point select” queries MySQL scales much better than TiDB with more CPU cores.

Write only test

I have used a write-only sysbench workload as well with threads ranging from 1 to 128. The instance has enough memory to cache full datast. Here are the results:

Here we can see that TiDB is also significatly slower than MySQL (for an in-memory workload).

Limitations of the write only test
Running TiDB as a single server is not a recommended (or documented) configuration, so some optimizations for this case may be missing. To create a production level test, we would need to compare TiDB to MySQL with the binlog enabled + some sort of synchronous/virtually synchronous or semi-sync replication  (e.g. Percona XtraDB Cluster, group replication or semi-sync replication).  Both of these changes are known to decrease the write-throughput of MySQL considerably. Some tuning may be done to reduce the effects of that.
Some of the performance characteristics here are also derived from TiDB using RocksDB. The performance of InnoDB should be higher for an in-memory insert, with an LSM tree performing better for data sets that no longer fit in memory.


TiDB scales very well for OLAP / analytical queries (typically complex queries not able to take advantages of indexes) – this is the area where MySQL performs much worse as it does not take advantage of multiple CPU cores. At the same time, there is always a price to pay: TiDB has worse “efficiency” for fast queries (i.e. select by primary key) and writes. TiDB can scale across multiple servers (nodes). However, if we need to archive the same level of write efficiency as MySQL we will have to setup tens of nodes. In my opinion, TiDB can be a great fit for an analytical workload when you need almost full compatibility with MySQL: syntax compatibility, inserts/updates, etc.


This Week in Data with Colin Charles 48: Coinbase Powered by MongoDB and Prometheus Graduates in the CNCF

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

The call for submitting a talk to Percona Live Europe 2018 is closing today, and while there may be a short extension, have you already got your talk submitted? I suggest doing so ASAP!

I’m sure many of you have heard of cryptocurrencies, the blockchain, and so on. But how many of you realiize that Coinbase, an application that handles cryptocurrency trades, matching book orders, and more, is powered by MongoDB? With the hype and growth in interest in late 2017, Coinbase has had to scale. They gave an excellent talk at MongoDB World, titled MongoDB & Crypto Mania (the video is worth a watch), and they’ve also written a blog post, How we’re scaling our platform for spikes in customer demand. They even went driver hacking (the Ruby driver for MongoDB)!

It is great to see there be a weekly review of happenings in the Vitess world.

PingCap and TiDB have been to many Percona Live events to present, and recently hired Morgan Tocker. Morgan has migrated his blog from MySQL to TiDB. Read more about his experience in, This blog, now Powered by WordPress + TiDB. Reminds me of the early days of Galera Cluster and showing how Drupal could be powered by it!


Link List

  • Sys Schema MySQL 5.7+ – blogger from Wipro, focusing on an introduction to the sys schema on MySQL (note: still not available in the MariaDB Server fork).
  • Prometheus Graduates in the CNCF, so is considered a mature project. Criteria for graduation is such that “projects must demonstrate thriving adoption, a documented, structured governance process, and a strong commitment to community sustainability and inclusivity.” Percona benefits from Prometheus in Percona Monitoring & Management (PMM), so we should celebrate this milestone!
  • Replicating from MySQL 8.0 to MySQL 5.7
  • A while ago in this column, we linked to Shlomi Noach’s excellent post on MySQL High Availability at GitHub. We were also introduced to GitHub Load Balancer (GLB), which they ran on top of HAProxy. However back then, GLB wasn’t open; now you can get GLB Director: GLB: GitHub’s open source load balancer. The project describes GLB Director as: “… a Layer 4 load balancer which scales a single IP address across a large number of physical machines while attempting to minimise connection disruption during any change in servers. GLB Director does not replace services like haproxy and nginx, but rather is a layer in front of these services (or any TCP service) that allows them to scale across multiple physical machines without requiring each machine to have unique IP addresses.”
  • F1 Query: Declarative Querying at Scale – a well-written paper.

Upcoming Appearances


I look forward to feedback/tips via e-mail at or on Twitter @bytebot.


The post This Week in Data with Colin Charles 48: Coinbase Powered by MongoDB and Prometheus Graduates in the CNCF appeared first on Percona Database Performance Blog.


What’s Next for SQL Databases?

SQL Databases

SQL DatabasesIn this blog, I’ll go over my thoughts on what we can expect in the world of SQL databases.

After reading Baron’s prediction on databases, here:

I want to provide my own view on what’s coming up next for SQL databases. I think we live in interesting times, when we can see the beginning of the next-generation of RDBMSs.

There are defining characteristics of such databases:

  1. Auto-scaling. The ability to add and use resources depending on the current load and database size. This is done transparently for users and DBAs.
  2. Auto-healing. The automatic handling of node failures.
  3. Multi-regional, cloud-agnostic, geo-distributed. The ability to support multiple data centers and multiple clouds, in different parts of the world.
  4. Transactional. All the above, with the ability to support multi-statements transactional workloads.
  5. Strong consistency. The full definition of strong consistency is pretty involved. For simplicity, let’s say it means that reads (in the absence of ongoing writes) will return the same data, despite what region or data center you are getting it from. A simple counter-example is the famous MySQL asynchronous replication, where (with the slave delay) reading the data on a slave can return very outdated data. I am focusing on reads, because in a distributed environment the consistent reads performance will be affected. This is where network latency (often limited by the speed of light) will define performance.
  6. SQL language. SQL, despite being old and widely criticized, is not going anywhere. This is a universal language for app developers to access data.

With this, I see following interesting projects:

  • Google Cloud Spanner ( Recently announced and still in the Beta stage. Definitely an interesting projects, with the obvious limitation of running only in Google Cloud.
  • FaunaDB ( Also very recently announced, so it is hard to say how it performs. The major downside I see is that it does not provide SQL access, but uses a custom language.
  • Two open source projects:
    • CockroachDB ( This is still in the Beta stage, but definitely an interesting project to follow. Initially, the project planned to support only key-value access, but later they made a very smart decision to provide SQL access via a PostgreSQL-compatible protocol.
    • TiDB ( Right now in RC stages, and the target is to provide SQL access over a MySQL compatible protocol (and later PostgreSQL protocol).

Protocol compatibility is a wise approach, although not strictly necessary. It lowers an entry barrier for the existing applications.

Both CockroachDB and TiDB, at the moment of this writing, still have rough edges and can’t be used in serious deployments (from my experience). I expect both projects will make a big progress in 2017.

What shared characteristics can we expect from these systems?

As I mentioned above, we may see that the read performance is degraded (as latency increases), and often it will be defined more by network performance than anything else. Storage IO and CPU cycles will be secondary factors. There will be more work on how to understand and tune the network traffic.

We may need to get used to the fact that point or small range selects become much slower. Right now, we see very fast point selects for traditional RDBM (MySQL, PostgreSQL, etc.).

Heavy writes will be problematic. The problem is that all writes will need to go through the consistency protocol. Write-optimized storage engines will help (both CockroachDB and TiDB use RocksDB in the storage layer).

The long transactions (let’s say changing 100000 or more rows) also will be problematic. There is just too much network round-trips and housekeeping work on each node, making long transactions an issue for distributed systems.

Another shared property (at least between CockroachDB and TiDB) is the active use of the Raft protocol to achieve consistency. So it will be important to understand how this protocol works to use it effectively. You can find a good overview of the Raft protocol here:

There probably are more NewSQL technologies than I have mentioned here, but I do not think any of them captured critical market- or mind-share. So we are at the beginning of interesting times . . .

What about MySQL? Can MySQL become the database that provides all these characteristics? It is possible, but I do not think it will happen anytime soon. MySQL would need to provide automatic sharding to do this, which will be very hard to implement given the current internal design. It may happen in the future, though it will require a lot of engineering efforts to make it work properly.

Powered by WordPress | Theme: Aeros 2.0 by