Mar
19
2019
--

Upcoming Webinar Thurs 3/21: MySQL Performance Schema in 1 hour

MySQL Performance Schema in 1 hour

MySQL Performance Schema in 1 hourPlease join Percona’s Principal Support Engineer, Sveta Smirnova, as she presents MySQL Performance Schema in 1 hour on Thursday, March 21st, 2019, at 10:00 am PDT (UTC-7) / 1:00 pm EDT (UTC-4).

Register Now

MySQL 8.0 Performance Schema is a mature tool, used by humans and monitoring products. It was born in 2010 as “a feature for monitoring server execution at a low level.” The tool has grown over the years with performance fixes and DBA-faced features. In this webinar, I will give an overview of Performance Schema, focusing on its tuning, performance, and usability.

Performance Schema helps to troubleshoot query performance, complicated locking issues and memory leaks. It can also troubleshoot resource usage, problematic behavior caused by inappropriate settings and much more. Additionally, it comes with hundreds of options which allow for greater precision tuning.

Performance Schema is a potent and very complicated tool. What’s more, it does not affect performance in most cases. However, it collects a lot of data and sometimes this data is hard to read.

In this webinar, I will guide you through the main Performance Schema features, design, and configuration. You will learn how to get the best of it. I will cover its companion sys schema and graphical monitoring tools.

In order to learn more, register for MySQL Performance Schema in 1 hour today.

Mar
07
2019
--

Reducing High CPU on MySQL: a Case Study

CPU Usage after query tuning

In this blog post, I want to share a case we worked on a few days ago. I’ll show you how we approached the resolution of a MySQL performance issue and used Percona Monitoring and Management PMM to support troubleshooting. The customer had noticed a linear high CPU usage in one of their MySQL instances and was not able to figure out why as there was no much traffic hitting the app. We needed to reduce the high CPU usage on MySQL. The server is a small instance:

Models | 6xIntel(R) Xeon(R) CPU E5-2430 0 @ 2.20GHz
10GB RAM

This symptom can be caused by various different reasons. Let’s see how PMM can be used to troubleshoot the issue.

CPU

The original issue - CPU usage at almost 100% during application use

It’s important to understand where the CPU time is being consumed: user space, system space, iowait, and so on. Here we can see that CPU usage was hitting almost 100% and the majority of the time was being spent on user space. In other words, the time the CPU was executing user code, such as MySQL. Once we determined that the time was being spent on user space, we could discard other possible issues. For example, we could eliminate the possibility that a high amount of threads were competing for CPU resources, since that would cause an increase in context switches, which in turn would be taken care of by the kernel – system space.

With that we decided to look into MySQL metrics.

MySQL

Thread activity graph in PMM for MySQL

Queries per second

As expected, there weren’t a lot of threads running—10 on average—and MySQL wasn’t being hammered with questions/transactions. It was running from 500 to 800 QPS (queries per second). Next step was to check the type of workload that was running on the instance:

All the commands are of a SELECT type, in red in this graph

In red we can see that almost all commands are SELECTS. With that in mind, we checked the handlers using 

SHOW STATUS LIKE 'Handler%'

 to verify if those selects were doing an index scan, a full table scan or what.

Showing that the query was a full table scan

Blue in this graph represents

Handler_read_rnd_next

 , which is the counter MySQL increments every time it reads a row when it’s doing a full table scan. Bingo!!! Around 350 selects were reading 2.5 million rows. But wait—why was this causing CPU issues rather than IO issues? If you refer to the first graph (CPU graph) we cannot see iowait.

That is because the data was stored in the InnoDB Buffer Pool, so instead of having to read those 2.5M rows per second from disk, it was fetching them from memory. The stress had moved from disk to CPU. Now that we identified that the issue had been caused by some queries or query, we went to QAN to verify the queries and check their status:

identifying the long running query in QAN

First query, a

SELECT

  on table 

store.clients

 was responsible for 98% of the load and was executing in 20+ seconds.

The initial query load

EXPLAIN confirmed our suspicions. The query was accessing the table using type ALL, which is the last type we want as it means “Full Table Scan”. Taking a look into the fingerprint of the query, we identified that it was a simple query:

Fingerprint of query
Indexes on table did not include a key column

The query was filtering clients based on the status field

SELECT * FROM store.clients WHERE status = ?

 As shown in the indexes, that column was not indexed. Talking with the customer, this turned out to be a query that was introduced as part of a new software release.

From that point, we were confident that we had identified the problem. There could be more, but this particular query was definitely hurting the performance of the server. We decided to add an index and also sent an annotation to PMM, so we could refer back to the graphs to check when the index has been added, check if CPU usage had dropped, and also check Handler_read_rnd_next.

To run the alter we decided to use pt-online-schema-change as it was a busy table, and the tool has safeguards to prevent the situation from becoming even worse. For example, we wanted to pause or even abort the alter in the case of the number of Threads_Running exceeding a certain threshold. The threshold is controlled by

--max-load

  (25 by default) and

--critical-load

  (50 by default):

pmm-admin annotate "Started ALTER store.clients ADD KEY (status)" && \
pt-online-schema-change --alter "ADD KEY (status)" --execute u=root,D=store,t=clients && \
pmm-admin annotate "Finished ALTER store.clients ADD KEY (status)"
Your annotation was successfully posted.
No slaves found. See --recursion-method if host localhost.localdomain has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
analyze_table, 10, 1
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering `store`.`clients`...
Creating new table...
Created new table store._clients_new OK.
Altering new table...
Altered `store`.`_clients_new` OK.
2019-02-22T18:26:25 Creating triggers...
2019-02-22T18:27:14 Created triggers OK.
2019-02-22T18:27:14 Copying approximately 4924071 rows...
Copying `store`.`clients`: 7% 05:46 remain
Copying `store`.`clients`: 14% 05:47 remain
Copying `store`.`clients`: 22% 05:07 remain
Copying `store`.`clients`: 30% 04:29 remain
Copying `store`.`clients`: 38% 03:59 remain
Copying `store`.`clients`: 45% 03:33 remain
Copying `store`.`clients`: 52% 03:06 remain
Copying `store`.`clients`: 59% 02:44 remain
Copying `store`.`clients`: 66% 02:17 remain
Copying `store`.`clients`: 73% 01:50 remain
Copying `store`.`clients`: 79% 01:23 remain
Copying `store`.`clients`: 87% 00:53 remain
Copying `store`.`clients`: 94% 00:24 remain
2019-02-22T18:34:15 Copied rows OK.
2019-02-22T18:34:15 Analyzing new table...
2019-02-22T18:34:15 Swapping tables...
2019-02-22T18:34:27 Swapped original and new tables OK.
2019-02-22T18:34:27 Dropping old table...
2019-02-22T18:34:32 Dropped old table `store`.`_clients_old` OK.
2019-02-22T18:34:32 Dropping triggers...
2019-02-22T18:34:32 Dropped triggers OK.
Successfully altered `store`.`clients`.
Your annotation was successfully posted.

Results

MySQL Handlers after query tuning MySQL query throughput after query tuning
Query analysis by EXPLAIN in PMM after tuning

As we can see, above, CPU usage dropped to less than 25%, which is 1/4 of the previous usage level. Handler_read_rnd_next dropped and we can’t even see it once pt-osc has finished. We had a small increase on Handler_read_next as expected because now MySQL is using the index to resolve the WHERE clause. One interesting outcome is that the instance was able to increase it’s QPS by 2x after the index was added as CPU/Full Table Scan was no longer limiting performance. On average, query time has dropped from 20s to only 661ms.

Summary:

  1. Applying the correct troubleshooting steps to your problems is crucial:
    a) Understand what resources have been saturated.
    b) Understand what if anything is causing an error.
    c) From there you can divert into the areas that are related to that resource and start to narrow down the issue.
    d) Tackle the problems bit by bit.
  2. Having the right tools for the job key for success. PMM is a great example of a tool that can help you quickly identify, drill in, and fix bottlenecks.
  3. Have realistic load tests. In this case, they had tested the new release on a concurrency level that was not like their production
  4. By identifying the culprit query we were able to:
    a.) Drop average query time from 20s to 661ms
    b.) Increase QPS by 2x
    c.) Reduce the usage of CPU to 1/4 of its level prior to our intervention

Disclosure: For security reasons, sensitive information, such as database, table, column names have been modified and graphs recreated to simulate a similar problem.

Jan
28
2019
--

Monitor and Optimize Slow Queries with PMM and EverSQL – Part 2

percona_pmm_eversql

EverSQL is a platform that intelligently tunes your SQL queries by providing query optimization recommendations, and feedback on missing indexes. This is the second post of our EverSQL series, if you missed our introductory post take a look there first and then come back to this article.

We’ll use the Stackoverflow data set again as we did in our first post.

Diving into query optimization

We’ll grab the worst performing query in the list from PMM and optimize it. This query builds a list of the top 50 most recent posts which have a score greater than two, and involves joining two large tables – posts and comments. The original runtime of that query is above 20 minutes and causes high load on the server while running.

worst-query-in-PMM

Assuming you have EverSQL’s chrome extension installed, you’ll see a new button in the PMM Query Analytics page, allowing you to send the query and schema structure directly to EverSQL, to retrieve indexing and query optimization recommendations.

eversql recommendations

 

eversql-dashboard1

After implementing EverSQL’s recommendations, the query’s execution duration significantly improved:

improved-query-response-time

Optimization Internals

So what was the actual optimization in this specific case? And why did it work so well? Let’s look at the original query:

SELECT
   p.title
FROM
   so.posts p
       INNER JOIN
   so.comments c ON p.id = c.postid
WHERE
c.score > 2
GROUP BY p.id
ORDER BY p.creationdate DESC
LIMIT 100;

The tables’ structure:

CREATE TABLE `posts` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  ...
  PRIMARY KEY (`Id`),
  KEY `posts_idx_creationdate` (`CreationDate`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `comments` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  `PostId` int(11) NOT NULL,
  `Score` int(11) DEFAULT NULL,
  ....
  PRIMARY KEY (`Id`),
  KEY `comments_idx_postid` (`PostId`),
  KEY `comments_idx_postid_score` (`PostId`,`Score`),
  KEY `comments_idx_score` (`Score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This query will return the post title of the latest 100 stackoverflow posts, which had at least one popular comment (with a score higher than two). The posts table contains 39,646,923 records, while the comments table contains 64,510,258 records.

This is the execution plan MySQL (v5.7.20) chose:

original-execution-plan

One of the challenges with this query is that the GROUP BY and ORDER BY clauses contain different fields, which prevent MySQL from using an index for the ORDER BY. As MySQL’s documentation states:

“In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it may still use indexes to find the rows that match the WHERE clause. Examples:  … The query has different ORDER BY and GROUP BY expressions.”.

Now let’s look into the optimized query:

SELECT
   p.title
FROM
   so.posts p
WHERE
   EXISTS( SELECT
           1
       FROM
           so.comments c
       WHERE
           p.id = c.postid AND c.score > 2)
ORDER BY p.creationdate DESC
LIMIT 100;

Since the comments table is joined in this query only to check for existence of matching records in the posts table, we can use an EXISTS subquery instead. This will allow us to avoid inflating the results (by using JOIN) and then deflating them (by using GROUP BY), which are costly operations.

Now that the GROUP BY is redundant and removed, the database can optionally choose to use an index for the ORDER BY clause.

The new execution plan MySQL chooses is:

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms. We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL!

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms.

We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL!

Co-Author: Tomer Shay

Tomer Shay, EverSQL

 

Tomer Shay is the Founder of EverSQL. He loves being where the challenge is. In the last 12 years, he had the privilege to code a lot and lead teams of developers, while focusing on databases and performance. He enjoys using technology to bring ideas into reality, help people and see them smile.

Jan
23
2019
--

MySQL 8.0.14: A Road to Parallel Query Execution is Wide Open!

road to MySQL parallel query

road to MySQL parallel queryFor a very long time – since when multiple CPU cores were commonly available – I dreamed about MySQL having the ability to execute queries in parallel. This feature was lacking from MySQL, and I wrote a lots of posts on how to emulate parallel queries in MySQL using different methods: from simple parallel bash script to using Apache Spark to using ClickHouse together with MySQL. I have watched parallelism coming to PostgreSQL, to new databases like TiDB, to Amazon Aurora… And finally: MySQL 8.0.14 has (for now limited) an ability to perform parallel query execution. At the time of writing it is limited to select count(*) from table queries as well as check table queries.

MySQL 8.0.14 contains this in the release notes: “As of MySQL 8.0.14, InnoDB supports parallel clustered index reads, which can improve CHECK TABLE performance.” Actually parallel clustered index reads also works for simple count(*) (without a “where” condition). You can control the parallel threads with the innodb_parallel_read_threads parameter.

Here is the simple test (machine has 32 cpu cores):

mysql> set local innodb_parallel_read_threads=1;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from ontime;
+-----------+
| count(*)  |
+-----------+
| 177920306 |
+-----------+
1 row in set (2 min 33.93 sec)
mysql> set local innodb_parallel_read_threads=DEFAULT; -- 4 is default
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from ontime;
+-----------+
| count(*)  |
+-----------+
| 177920306 |
+-----------+
1 row in set (21.85 sec)
mysql> set local innodb_parallel_read_threads=32;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from ontime;
+-----------+
| count(*)  |
+-----------+
| 177920306 |
+-----------+
1 row in set (5.35 sec)

The following graph shows CPU utilization during the execution with 4 threads and 32 threads:

Unfortunately it only works for count(*) from table without a “where” condition.

Conclusion: although this feature is currently limited it is a great start for MySQL and opens a road to real parallel query executions.


Photo by Vidar Nordli-Mathisen on Unsplash

Jan
22
2019
--

Monitor and Optimize Slow Queries with PMM and EverSQL – Part One

PMM EverSQL optimization integration

A common challenge with continuously deployed applications is that new and modified SQL queries are constantly being introduced to the application. Many companies choose to use a database monitoring system (such as PMM) to identify those slow queries. But identifying slow queries is only the start – what about actually optimizing them?

In this post we’ll demonstrate a new way to both identify and optimize slow queries, by utilizing the recent integration of Percona Monitoring and Management with EverSQL Query Optimizer via Chrome browser extension. This integration allows you to identify slow queries using PMM, and optimize them automatically using EverSQL Query Optimizer.

Optimizing queries with PMM & EverSQL

We’re using PMM to monitor our MySQL instance, which was pre-loaded with the publicly available StackOverflow dataset. PMM is configured to monitor for slow queries from MySQL’s slow log file.

monitor slow queries dashboard on PMM

We’ll begin with a basic example of how EverSQL can provide value for  a simple SELECT statement. In a follow-up blog post we’ll go through a more sophisticated multi-table query to show how response time can be reduced from 20 minutes to milliseconds(!) using EverSQL.

Let’s have a look at one of the slow queries identified by PMM:

PMM EverSQL optimization integration

In this example, the table posts contains two indexes by default (in addition to the primary key). One that contains the column AnswerCount, and the other contains the column CreationDate.

CREATE TABLE `posts` (
 `Id` int(11) NOT NULL,
 `AcceptedAnswerId` int(11) DEFAULT NULL,
 `AnswerCount` int(11) DEFAULT NULL,
 `ClosedDate` datetime DEFAULT NULL,
 ….
 `CreationDate` datetime NOT NULL,
  ….
 `ViewCount` int(11) NOT NULL,
 PRIMARY KEY (`Id`),
 KEY `posts_idx_answercount` (`AnswerCount`),
 KEY `posts_idx_creationdate` (`CreationDate`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

As you can see below, EverSQL identifies that a composite index which contains both columns can be more beneficial in this case, and recommends to add an index for posts(AnswerCount, CreationDate).

EverSQL optimization report

After using pt-online-schema-change to apply the schema modification, using PMM we are able to observe that the query execution duration changed from 3m 40s to 83 milliseconds!

execution time improvement with EverSQL

 

Note that this Extension is available for Chrome from the chrome web store:

EverSQL for Database Monitoring Applications

Summary

If you’re looking for an easy way to both monitor for slow queries and quickly optimize them, consider deploying Percona Monitoring and Management and then integrating it with EverSQL’s Chrome extension!

Co-Author: Tomer Shay

Tomer Shay, EverSQL

 

Tomer Shay is the Founder of EverSQL. He loves being where the challenge is. In the last 12 years, he had the privilege to code a lot and lead teams of developers, while focusing on databases and performance. He enjoys using technology to bring ideas into reality, help people and see them smile.

 

Jan
17
2019
--

Using Parallel Query with Amazon Aurora for MySQL

parallel query amazon aurora for mysql

parallel query amazon aurora for mysqlParallel query execution is my favorite, non-existent, feature in MySQL. In all versions of MySQL – at least at the time of writing – when you run a single query it will run in one thread, effectively utilizing one CPU core only. Multiple queries run at the same time will be using different threads and will utilize more than one CPU core.

On multi-core machines – which is the majority of the hardware nowadays – and in the cloud, we have multiple cores available for use. With faster disks (i.e. SSD) we can’t utilize the full potential of IOPS with just one thread.

AWS Aurora (based on MySQL 5.6) now has a version which will support parallelism for SELECT queries (utilizing the read capacity of storage nodes underneath the Aurora cluster). In this article, we will look at how this can improve the reporting/analytical query performance in MySQL. I will compare AWS Aurora with MySQL (Percona Server) 5.6 running on an EC2 instance of the same class.

In Short

Aurora Parallel Query response time (for queries which can not use indexes) can be 5x-10x better compared to the non-parallel fully cached operations. This is a significant improvement for the slow queries.

Test data and versions

For my test, I need to choose:

  1. Aurora instance type and comparison
  2. Dataset
  3. Queries

Aurora instance type and comparison

According to Jeff Barr’s excellent article (https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/) the following instance classes will support parallel query (PQ):

“The instance class determines the number of parallel queries that can be active at a given time:

  • db.r*.large – 1 concurrent parallel query session
  • db.r*.xlarge – 2 concurrent parallel query sessions
  • db.r*.2xlarge – 4 concurrent parallel query sessions
  • db.r*.4xlarge – 8 concurrent parallel query sessions
  • db.r*.8xlarge – 16 concurrent parallel query sessions
  • db.r4.16xlarge – 16 concurrent parallel query sessions”

As I want to maximize the concurrency of parallel query sessions, I have chosen db.r4.8xlarge. For the EC2 instance I will use the same class: r4.8xlarge.

Aurora:

mysql> show global variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| aurora_version          | 1.18.0                       |
| innodb_version          | 1.2.10                       |
| protocol_version        | 10                           |
| version                 | 5.6.10                       |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | Linux                        |
+-------------------------+------------------------------+

MySQL on ec2

mysql> show global variables like '%version%';
+-------------------------+------------------------------------------------------+
| Variable_name           | Value                                                |
+-------------------------+------------------------------------------------------+
| innodb_version          | 5.6.41-84.1                                          |
| protocol_version        | 10                                                   |
| slave_type_conversions  |                                                      |
| tls_version             | TLSv1.1,TLSv1.2                                      |
| version                 | 5.6.41-84.1                                          |
| version_comment         | Percona Server (GPL), Release 84.1, Revision b308619 |
| version_compile_machine | x86_64                                               |
| version_compile_os      | debian-linux-gnu                                     |
| version_suffix          |                                                      |
+-------------------------+------------------------------------------------------+

Table

I’m using the “Airlines On-Time Performance” database from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time  (You can find the scripts I used here: https://github.com/Percona-Lab/ontime-airline-performance).

mysql> show table status like 'ontime'\G
*************************** 1. row ***************************
          Name: ontime
        Engine: InnoDB
       Version: 10
    Row_format: Compact
          Rows: 173221661
Avg_row_length: 409
   Data_length: 70850183168
Max_data_length: 0
  Index_length: 0
     Data_free: 7340032
Auto_increment: NULL
   Create_time: 2018-09-26 02:03:28
   Update_time: NULL
    Check_time: NULL
     Collation: latin1_swedish_ci
      Checksum: NULL
Create_options:
       Comment:
1 row in set (0.00 sec)

The table is very wide, 84 columns.

Working with Aurora PQ (Parallel Query)

Documentation: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-mysql-parallel-query.html

Aurora PQ works by doing a full table scan (parallel reads are done on the storage level). The InnoDB buffer pool is not used when Parallel Query is utilized.

For the purposes of the test I turned PQ on and off (normally AWS Aurora uses its own heuristics to determine if the PQ will be helpful or not):

Turn on and force:

mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1;
Query OK, 0 rows affected (0.00 sec)

Turn off:

mysql> set session aurora_pq = 0;
Query OK, 0 rows affected (0.00 sec)

The EXPLAIN plan in MySQL will also show the details about parallel query execution statistics.

Queries

Here, I use the “reporting” queries, running only one query at a time. The queries are similar to those I’ve used in older blog posts comparing MySQL and Apache Spark performance (https://www.percona.com/blog/2016/08/17/apache-spark-makes-slow-mysql-queries-10x-faster/ )

Here is a summary of the queries:

  1. Simple queries:
    • select count(*) from ontime where flightdate > '2017-01-01'
    • select avg(DepDelay/ArrDelay+1) from ontime
  2. Complex filter, single table:
select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest
FROM ontime
WHERE
  DestState not in ('AK', 'HI', 'PR', 'VI')
  and OriginState not in ('AK', 'HI', 'PR', 'VI')
  and flightdate > '2015-01-01'
   and ArrDelay < 15
and cancelled = 0
and Diverted = 0
and DivAirportLandings = 0
  ORDER by DepDelay DESC
LIMIT 10;

3. Complex filter, join “reference” table

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
FROM ontime_ind o
JOIN carriers c on o.carrier = c.carrier_code
WHERE
  (carrier_name like 'United%' or carrier_name like 'Delta%')
  and ArrDelay > 30
  ORDER by DepDelay DESC
LIMIT 10\G

4. select one row only, no index

Query 1a: simple, count(*)

Let’s take a look at the most simple query: count(*). This variant of the “ontime” table has no secondary indexes.

select count(*) from ontime where flightdate > '2017-01-01';

Aurora, pq (parallel query) disabled:

I disabled the PQ first to compare:

mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (8 min 25.49 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (2 min 48.81 sec)
mysql> mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (2 min 48.25 sec)
Please note: the first run was “cold run”; data was read from disk. The second and third run used the cached data.
Now let's enable and force Aurora PQ:
mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1; 
Query OK, 0 rows affected (0.00 sec)
mysql> explain select count(*) from ontime where flightdate > '2017-01-01'\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using parallel query (1 columns, 1 filters, 0 exprs; 0 extra)
1 row in set (0.00 sec)

(from the EXPLAIN plan, we can see that parallel query is used).

Results:

mysql> select count(*) from ontime where flightdate > '2017-01-01';                                                                                                                          
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.53 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.56 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.36 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.56 sec)
mysql> select count(*) from ontime where flightdate > '2017-01-01';
+----------+
| count(*) |
+----------+
|  5660651 |
+----------+
1 row in set (16.36 sec)

As we can see the results are very stable. It does not use any cache (ie: innodb buffer pool) either. The result is also interesting: utilizing multiple threads (up to 16 threads) and reading data from disk (using disk cache, probably) can be ~10x faster compared to reading from memory in a single thread.

Result: ~10x performance gain, no index used

Query 1b: simple, avg

set aurora_pq = 1; set aurora_pq_force=1;
select avg(DepDelay) from ontime;
+---------------+
| avg(DepDelay) |
+---------------+
|        8.2666 |
+---------------+
1 row in set (1 min 48.17 sec)
set aurora_pq = 0; set aurora_pq_force=0;  
select avg(DepDelay) from ontime;
+---------------+
| avg(DepDelay) |
+---------------+
|        8.2666 |
+---------------+
1 row in set (2 min 49.95 sec)
Here we can see that PQ gives use ~2x performance increase.

Summary of simple query performance

Here is what we learned comparing Aurora PQ performance to native MySQL query execution:

  1. Select count(*), not using index: 10x performance increase with Aurora PQ.
  2. select avg(…), not using index: 2x performance increase with Aurora PQ.

Query 2: Complex filter, single table

The following query will always be slow in MySQL. This combination of the filters in the WHERE condition makes it extremely hard to prepare a good set of indexes to make this query faster.

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest
FROM ontime
WHERE
  DestState not in ('AK', 'HI', 'PR', 'VI')
  and OriginState not in ('AK', 'HI', 'PR', 'VI')
  and flightdate > '2015-01-01'
  and ArrDelay < 15
and cancelled = 0
and Diverted = 0
and DivAirportLandings = '0'
ORDER by DepDelay DESC
LIMIT 10;

Let’s compare the query performance with and without PQ.

PQ disabled:

mysql> set aurora_pq_force = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq = 0;                                                                                                                                                                  
Query OK, 0 rows affected (0.00 sec)
mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using filesort
1 row in set (0.00 sec)
mysql> select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10;
+------------+---------+-----------+--------+------+
| FlightDate | carrier | FlightNum | Origin | Dest |
+------------+---------+-----------+--------+------+
| 2017-10-09 | OO      | 5028      | SBP    | SFO  |
| 2015-11-03 | VX      | 969       | SAN    | SFO  |
| 2015-05-29 | VX      | 720       | TUL    | AUS  |
| 2016-03-11 | UA      | 380       | SFO    | BOS  |
| 2016-06-13 | DL      | 2066      | JFK    | SAN  |
| 2016-11-14 | UA      | 1600      | EWR    | LAX  |
| 2016-11-09 | WN      | 2318      | BDL    | LAS  |
| 2016-11-09 | UA      | 1652      | IAD    | LAX  |
| 2016-11-13 | AA      | 23        | JFK    | LAX  |
| 2016-11-12 | UA      | 800       | EWR    | SFO  |
+------------+---------+-----------+--------+------+

10 rows in set (3 min 42.47 sec)

/* another run */

10 rows in set (3 min 46.90 sec)

This query is 100% cached. Here is the graph from PMM showing the number of read requests:

  1. Read requests: logical requests from the buffer pool
  2. Disk reads: physical requests from disk

Buffer pool requests:

Buffer pool requests from PMM

Now let’s enable and force PQ:

PQ enabled:

mysql> set session aurora_pq = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq_force = 1;                                                                                                                              Query OK, 0 rows affected (0.00 sec)
mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE    DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01'     and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ontime
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173706586
       Extra: Using where; Using filesort; Using parallel query (12 columns, 4 filters, 3 exprs; 0 extra)
1 row in set (0.00 sec)
mysql> select SQL_CALC_FOUND_ROWS                                                                                                                                                                      -> FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest -> FROM ontime
   -> WHERE
   ->  DestState not in ('AK', 'HI', 'PR', 'VI')
   ->  and OriginState not in ('AK', 'HI', 'PR', 'VI')
   ->  and flightdate > '2015-01-01'
   ->   and ArrDelay < 15
   -> and cancelled = 0
   -> and Diverted = 0
   -> and DivAirportLandings = 0
   ->  ORDER by DepDelay DESC
   -> LIMIT 10;
+------------+---------+-----------+--------+------+
| FlightDate | carrier | FlightNum | Origin | Dest |
+------------+---------+-----------+--------+------+
| 2017-10-09 | OO      | 5028      | SBP    | SFO  |
| 2015-11-03 | VX      | 969       | SAN    | SFO  |
| 2015-05-29 | VX      | 720       | TUL    | AUS  |
| 2016-03-11 | UA      | 380       | SFO    | BOS  |
| 2016-06-13 | DL      | 2066      | JFK    | SAN  |
| 2016-11-14 | UA      | 1600      | EWR    | LAX  |
| 2016-11-09 | WN      | 2318      | BDL    | LAS  |
| 2016-11-09 | UA      | 1652      | IAD    | LAX  |
| 2016-11-13 | AA      | 23        | JFK    | LAX  |
| 2016-11-12 | UA      | 800       | EWR    | SFO  |
+------------+---------+-----------+--------+------+
10 rows in set (41.88 sec)
/* run 2 */
10 rows in set (28.49 sec)
/* run 3 */
10 rows in set (29.60 sec)

Now let’s compare the requests:

InnoDB Buffer Pool Requests

As we can see, Aurora PQ is almost NOT utilizing the buffer pool (there are a minor number of read requests. Compare the max of 4K requests per second with PQ to the constant 600K requests per second in the previous graph).

Result: ~8x performance gain

Query 3: Complex filter, join “reference” table

In this example I join two tables: the main “ontime” table and a reference table. If we have both tables without indexes it will simply be too slow in MySQL. To make it better, I have created an index for both tables and so it will use indexes for the join:

CREATE TABLE `carriers` (
 `carrier_code` varchar(8) NOT NULL DEFAULT '',
 `carrier_name` varchar(200) DEFAULT NULL,
 PRIMARY KEY (`carrier_code`),
 KEY `carrier_name` (`carrier_name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> show create table ontime_ind\G
...
 PRIMARY KEY (`id`),
 KEY `comb1` (`Carrier`,`Year`,`ArrDelayMinutes`),
 KEY `FlightDate` (`FlightDate`)
) ENGINE=InnoDB AUTO_INCREMENT=178116912 DEFAULT CHARSET=latin1

Query:

select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
FROM ontime_ind o
JOIN carriers c on o.carrier = c.carrier_code
WHERE
  (carrier_name like 'United%' or carrier_name like 'Delta%')
  and ArrDelay > 30
  ORDER by DepDelay DESC
LIMIT 10\G

PQ disabled, explain plan:

mysql> set aurora_pq_force = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> set aurora_pq = 0;                                                                                                                                                                  
Query OK, 0 rows affected (0.00 sec)
mysql> explain
   -> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: c
        type: range
possible_keys: PRIMARY,carrier_name
         key: carrier_name
     key_len: 203
         ref: NULL
        rows: 3
       Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: o
        type: ref
possible_keys: comb1
         key: comb1
     key_len: 3
         ref: ontime.c.carrier_code
        rows: 2711597
       Extra: Using index condition; Using where
2 rows in set (0.01 sec)

As we can see MySQL uses indexes for the join. Response times:

/* run 1 – cold run */

10 rows in set (29 min 17.39 sec)

/* run 2  – warm run */

10 rows in set (2 min 45.16 sec)

PQ enabled, explain plan:

mysql> explain
   -> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: c
        type: ALL
possible_keys: PRIMARY,carrier_name
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 1650
       Extra: Using where; Using temporary; Using filesort; Using parallel query (2 columns, 0 filters, 1 exprs; 0 extra)
*************************** 2. row ***************************
          id: 1
 select_type: SIMPLE
       table: o
        type: ALL
possible_keys: comb1
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 173542245
       Extra: Using where; Using join buffer (Hash Join Outer table o); Using parallel query (11 columns, 1 filters, 1 exprs; 0 extra)
2 rows in set (0.00 sec)

As we can see, Aurora does not use any indexes and uses a parallel scan instead.

Response time:

mysql> select SQL_CALC_FOUND_ROWS
   -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay
   -> FROM ontime_ind o
   -> JOIN carriers c on o.carrier = c.carrier_code
   -> WHERE
   ->  (carrier_name like 'United%' or carrier_name like 'Delta%')
   ->  and ArrDelay > 30
   ->  ORDER by DepDelay DESC
   -> LIMIT 10\G
...
*************************** 4. row ***************************
   FlightDate: 2017-05-04
UniqueCarrier: UA
      TailNum: N68821
    FlightNum: 1205
       Origin: KOA
OriginCityName: Kona, HI
         Dest: LAX
 DestCityName: Los Angeles, CA
     DepDelay: 1457
     ArrDelay: 1459
*************************** 5. row ***************************
   FlightDate: 1991-03-12
UniqueCarrier: DL
      TailNum:
    FlightNum: 1118
       Origin: ATL
OriginCityName: Atlanta, GA
         Dest: STL
 DestCityName: St. Louis, MO
...
10 rows in set (28.78 sec)
mysql> select found_rows();
+--------------+
| found_rows() |
+--------------+
|      4180974 |
+--------------+
1 row in set (0.00 sec)

Result: ~5x performance gain

(this is actually comparing the index cached read to a non-index PQ execution)

Summary

Aurora PQ can significantly improve the performance of reporting queries as such queries may be extremely hard to optimize in MySQL, even when using indexes. With indexes, Aurora PQ response time can be 5x-10x better compared to the non-parallel, fully cached operations. Aurora PQ can help improve performance of complex queries by performing parallel reads.

The following table summarizes the query response times:

Query Time, No PQ, index Time, PQ
select count(*) from ontime where flightdate > ‘2017-01-01’ 2 min 48.81 sec 16.53 sec
select avg(DepDelay) from ontime; 2 min 49.95 sec 1 min 48.17 sec
select SQL_CALC_FOUND_ROWS

FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest

FROM ontime

WHERE

DestState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’)

and OriginState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’)

and flightdate > ‘2015-01-01’

and ArrDelay < 15

and cancelled = 0

and Diverted = 0

and DivAirportLandings = 0

ORDER by DepDelay DESC

LIMIT 10;

3 min 42.47 sec 28.49 sec
select SQL_CALC_FOUND_ROWS

FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay

FROM ontime_ind o

JOIN carriers c on o.carrier = c.carrier_code

WHERE

(carrier_name like ‘United%’ or carrier_name like ‘Delta%’)

and ArrDelay > 30

ORDER by DepDelay DESC

LIMIT 10\G

2 min 45.16 sec 28.78 sec


Photo by Thomas Lipke on Unsplash

Jan
14
2019
--

Should You Use ClickHouse as a Main Operational Database?

ClickHouse as a main operational database

ClickHouse as a main operational databaseFirst of all, this post is not a recommendation but more like a “what if” story. What if we use ClickHouse (which is a columnar analytical database) as our main datastore? Well, typically, an analytical database is not a replacement for a transactional or key/value datastore. However, ClickHouse is super efficient for timeseries and provides “sharding” out of the box (scalability beyond one node).  So can we use it as our main datastore?

Let’s imagine we are running a webservice and provide a public API. Public API as -a-service has become a good business model: examples include social networks like Facebook/Twitter, messaging as a service like Twilio, and even credit card authorization platforms like Marqeta. Let’s also imagine we need to store all messages (SMS messages, email messages, etc) we are sending and allow our customers to get various information about the message. This information can be a mix of analytical (OLAP) queries (i.e. how many messages was send for some time period and how much it cost) and a typical key/value queries like: “return 1 message by the message id”.

Using a columnar analytical database can be a big challenge here. Although such databases can be very efficient with counts and averages, some queries will be slow or simply non existent. Analytical databases are optimized for a low number of slow queries. The most important limitations of the analytical databases are:

  1. Deletes and updates are non-existent or slow
  2. Inserts are efficient for bulk inserts only
  3. No secondary indexes means that point selects (select by ID) tend to be very slow

This is all true for ClickHouse, however, we may be able to live with it for our task.

To simulate text messages I have used ~3 billion of reddit comments (10 years from 2007 to 2017), downloaded from pushshift.io . Vadim published a blog post about analyzing reddit comments with ClickHouse. In my case, I’m using this data as a simulation of text messages, and will show how we can use ClickHouse as a backend for an API.

Loading the JSON data to Clickhouse

I used the following table in Clickhouse to load all data:

CREATE TABLE reddit.rc(
body String,
score_hidden Nullable(UInt8),
archived Nullable(UInt8),
name String,
author String,
author_flair_text Nullable(String),
downs Nullable(Int32),
created_utc UInt32,
subreddit_id String,
link_id Nullable(String),
parent_id Nullable(String),
score Nullable(Int16),
retrieved_on Nullable(UInt32),
controversiality Nullable(Int8),
gilded Nullable(Int8),
id String,
subreddit String,
ups Nullable(Int16),
distinguished Nullable(String),
author_flair_css_class Nullable(String),
stickied Nullable(UInt8),
edited Nullable(UInt8)
) ENGINE = MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY created_utc ;

Then I used the following command to load the JSON data (downloaded from pushshift.io) to ClickHouse:

$ bzip2 -d -c RC_20*.bz2 | clickhouse-client --input_format_skip_unknown_fields 1 --input_format_allow_errors_num 1000000 -d reddit -n --query="INSERT INTO rc FORMAT JSONEachRow"

The data on disk in ClickHouse is not significantly larger than compressed files, which is great:

#  du -sh /data/clickhouse/data/reddit/rc/
638G    /data/clickhouse/data/reddit/rc/
# du -sh /data/reddit/
404G    /data/reddit/

We have ~4 billion rows:

SELECT
    toDate(min(created_utc)),
    toDate(max(created_utc)),
    count(*)
FROM rc
??toDate(min(created_utc))???toDate(max(created_utc))??????count()??
?               2006-01-01 ?               2018-05-31 ? 4148248585 ?
????????????????????????????????????????????????????????????????????
1 rows in set. Elapsed: 11.554 sec. Processed 4.15 billion rows, 16.59 GB (359.02 million rows/s., 1.44 GB/s.)

The data is partitioned and sorted by created_utc so queries which include created_utc will be able to using partition pruning: therefore skip the not-needed partitions. However, let’s say our API needs to support the following features, which are not common for analytical databases:

  1. Selecting a single comment/message by ID
  2. Retrieving the last 10 or 100 of the messages/comments
  3. Updating a single message in the past (e.g. in the case of messages, we may need to update the final price; in the case of comments, we may need to upvote or downvote a comment)
  4. Deleting messages
  5. Text search

With the latest ClickHouse version, all of these features are available, but some of them may not perform fast enough.

Retrieving a single row in ClickHouse

Again, this is not a typical operation in any analytical database, those databases are simply not optimized for it. ClickHouse does not have secondary indexes, and we are using created_utc as a primary key (sort by). So, selecting a message by just ID will require a full table scan:

SELECT
    id,
    created_utc
FROM rc
WHERE id = 'dbumnpz'
??id????????created_utc??
? dbumnpz ?  1483228800 ?
?????????????????????????
1 rows in set. Elapsed: 18.070 sec. Processed 4.15 billion rows, 66.37 GB (229.57 million rows/s., 3.67 GB/s.)

Only if we know the timestamp (created_utc)… Then it will be lighting fast: ClickHouse will use the primary key:

SELECT *
FROM rc
WHERE (id = 'dbumnpz') AND (created_utc = 1483228800)
...
1 rows in set. Elapsed: 0.010 sec. Processed 8.19 thousand rows, 131.32 KB (840.27 thousand rows/s., 13.47 MB/s.)

Actually, we can simulate an additional index set by creating a materialized view in ClickHouse:

create materialized view rc_id_v
ENGINE MergeTree() PARTITION BY toYYYYMM(toDate(created_utc)) ORDER BY (id)
POPULATE AS SELECT id, created_utc from rc;

Here I’m creating a materialized view and populating it initially from the main (rc) table. The view will be updated automatically when there are any inserts into table reddit.rc. The view is actually another MergeTree table sorted by id. Now we can use this query:

SELECT *
FROM rc
WHERE (id = 'dbumnpz') AND (created_utc =
(
    SELECT created_utc
    FROM rc_id_v
    WHERE id = 'dbumnpz'
))
...
1 rows in set. Elapsed: 0.053 sec. Processed 8.19 thousand rows, 131.32 KB (153.41 thousand rows/s., 2.46 MB/s.)

This is a single query which will join our materialized view to pass the created_utc (timestamp) to the original table. It is a little bit slower but still less than 100ms response time.

Using this trick (materialized views) we can potentially simulate other indexes.

Retrieving the last 10 messages

This is where ClickHouse is not very efficient. Let’s say we want to retrieve the last 10 comments:

SELECT
    id,
    created_utc
FROM rc
ORDER BY created_utc DESC
LIMIT 10
??id????????created_utc??
? dzwso7l ?  1527811199 ?
? dzwso7j ?  1527811199 ?
? dzwso7k ?  1527811199 ?
? dzwso7m ?  1527811199 ?
? dzwso7h ?  1527811199 ?
? dzwso7n ?  1527811199 ?
? dzwso7o ?  1527811199 ?
? dzwso7p ?  1527811199 ?
? dzwso7i ?  1527811199 ?
? dzwso7g ?  1527811199 ?
?????????????????????????
10 rows in set. Elapsed: 24.281 sec. Processed 4.15 billion rows, 82.96 GB (170.84 million rows/s., 3.42 GB/s.)

In a conventional relational database (like MySQL) this can be done by reading a btree index sequentially from the end, as the index is sorted (like “tail” command on linux). In a partitioned massively parallel database system, the storage format and sorting algorithm may not be optimized for that operation as we are reading multiple partitions in parallel. Currently, an issue has been opened to make the “tailing” based on the primary key much faster: slow order by primary key with small limit on big data. As a temporary workaround we can do something like this:

SELECT count()
FROM rc
WHERE (created_utc > (
(
    SELECT max(created_utc)
    FROM rc
) - ((60 * 60) * 24))) AND (subreddit = 'programming')
??count()??
?    1248 ?
???????????
1 rows in set. Elapsed: 4.510 sec. Processed 3.05 million rows, 56.83 MB (675.38 thousand rows/s., 12.60 MB/s.) ```

It is still a five seconds query. Hopefully, this type of query will become faster in ClickHouse.

Updating / deleting data in ClickHouse

The latest ClickHouse version allows running update/delete in the form of “ALTER TABLE .. UPDATE / DELETE” (it is called mutations in ClickHouse terms). For example, we may want to upvote a specific comment.

SELECT score
FROM rc_2017
WHERE (id = 'dbumnpz') AND (created_utc =
(
    SELECT created_utc
    FROM rc_id_v
    WHERE id = 'dbumnpz'
))
??score??
?     2 ?
?????????
1 rows in set. Elapsed: 0.048 sec. Processed 8.19 thousand rows, 131.08 KB (168.93 thousand rows/s., 2.70 MB/s.)
:) alter table rc_2017 update score = score +1 where id =  'dbumnpz' and created_utc = (select created_utc from rc_id_v where id =  'dbumnpz');
ALTER TABLE rc_2017
    UPDATE score = score + 1 WHERE (id = 'dbumnpz') AND (created_utc =
    (
        SELECT created_utc
        FROM rc_id_v
        WHERE id = 'dbumnpz'
    ))
Ok.
0 rows in set. Elapsed: 0.052 sec.

“Mutation” queries will return immediately and will be executed asynchronously. We can see the progress by reading from the system.mutations table:

select * from system.mutations\G
SELECT *
FROM system.mutations
Row 1:
??????
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_857.txt
command:                    UPDATE score = score + 1 WHERE (id = 'dbumnpz') AND (created_utc = (SELECT created_utc FROM reddit.rc_id_v  WHERE id = 'dbumnpz'))
create_time:                2018-12-27 22:22:05
block_numbers.partition_id: ['']
block_numbers.number:       [857]
parts_to_do:                0
is_done:                    1
1 rows in set. Elapsed: 0.002 sec.

Now we can try deleting comments that have been marked for deletion (body showing “[deleted]”):

ALTER TABLE rc_2017
    DELETE WHERE body = '[deleted]'
Ok.
0 rows in set. Elapsed: 0.002 sec.
:) select * from system.mutations\G
SELECT *
FROM system.mutations
...
Row 2:
??????
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_858.txt
command:                    DELETE WHERE body = '[deleted]'
create_time:                2018-12-27 22:41:01
block_numbers.partition_id: ['']
block_numbers.number:       [858]
parts_to_do:                64
is_done:                    0
2 rows in set. Elapsed: 0.017 sec.

After a while, we can do the count again:

:) select * from system.mutations\G
SELECT *
FROM system.mutations
...
Row 2:
??????
database:                   reddit
table:                      rc_2017
mutation_id:                mutation_858.txt
command:                    DELETE WHERE body = '[deleted]'
create_time:                2018-12-27 22:41:01
block_numbers.partition_id: ['']
block_numbers.number:       [858]
parts_to_do:                0
is_done:                    1

As we can see our “mutation” is done.

Text analysis

ClickHouse does not offer full text search, however we can use some text functions. In my previous blog post about ClickHouse I used it to find the most popular wikipedia page of the month. This time I’m trying to find the news keywords of the year using all reddit comments: basically I’m calculating the most frequently used new words for the specific year (algorithm based on an article about finding trending topics using Google Books n-grams data). To do that I’m using the ClickHouse function alphaTokens(body) which will split the “body” field into words. From there, I can count the words or use arrayJoin to create a list (similar to MySQL’s group_concat function). Here is the example:

First I created a table word_by_year_news:

create table word_by_year_news ENGINE MergeTree() PARTITION BY y ORDER BY (y) as
select a.w as w, b.y as y, sum(a.occurrences)/b.total as ratio from
(
select
 lower(arrayJoin(alphaTokens(body))) as w,
 toYear(toDate(created_utc)) as y,
 count() as occurrences
from rc
where body <> '[deleted]'
and created_utc < toUnixTimestamp('2018-01-01 00:00:00')
and created_utc >= toUnixTimestamp('2007-01-01 00:00:00')
and subreddit in ('news', 'politics', 'worldnews')
group by w, y
having length(w) > 4
) as a
ANY INNER JOIN
(
select
 toYear(toDate(created_utc)) as y,
 sum(length(alphaTokens(body))) as total
from rc
where body <> '[deleted]'
and subreddit in ('news', 'politics', 'worldnews')
and created_utc < toUnixTimestamp('2018-01-01 00:00:00')
and created_utc >= toUnixTimestamp('2007-01-01 00:00:00')
group by y
) AS b
ON a.y = b.y
group by
  a.w,
  b.y,
  b.total;
0 rows in set. Elapsed: 787.032 sec. Processed 7.35 billion rows, 194.32 GB (9.34 million rows/s., 246.90 MB/s.)

This will store all frequent words (I’m filtering by subreddits; the examples are: “news, politics and worldnews” or “programming”) as well as its occurrence this year; actually I want to store “relative” occurrence which is called “ratio” above: for each word I divide its occurrence by the number of total words this year (this is needed as the number of comments grows significantly year by year).

Now we can actually calculate the words of the year:

SELECT
    groupArray(w) as words,
    y + 1 as year
FROM
(
    SELECT
        w,
        CAST((y - 1) AS UInt16) AS y,
        ratio AS a_ratio
    FROM word_by_year_news
    WHERE ratio > 0.00001
) AS a
ALL INNER JOIN
(
    SELECT
        w,
        y,
        ratio AS b_ratio
    FROM word_by_year_news
    WHERE ratio > 0.00001
) AS b USING (w, y)
WHERE (y > 0) AND (a_ratio / b_ratio > 3)
GROUP BY y
ORDER BY
    y
LIMIT 100;
10 rows in set. Elapsed: 0.232 sec. Processed 14.61 million rows, 118.82 MB (63.01 million rows/s., 512.29 MB/s.)

And the results are (here I’m grouping words for each year):

For “programming” subreddit:

??year???words??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
? 2007 ? ['audio','patents','swing','phones','gmail','opera','devices','phone','adobe','vista','backup','mercurial','mobile','passwords','scala','license','copyright','licenses','photoshop'] ?
? 2008 ? ['webkit','twitter','teacher','android','itunes']                                                                                                                                     ?
? 2009 ? ['downvotes','upvote','drupal','android','upvoted']                                                                                                                                   ?
? 2010 ? ['codecs','imgur','floppy','codec','adobe','android']                                                                                                                                 ?
? 2011 ? ['scala','currency','println']                                                                                                                                                        ?
? 2013 ? ['voting','maven']                                                                                                                                                                    ?
? 2014 ? ['compose','xamarin','markdown','scrum','comic']                                                                                                                                      ?
? 2015 ? ['china','sourceforge','subscription','chinese','kotlin']                                                                                                                             ?
? 2016 ? ['systemd','gitlab','autotldr']                                                                                                                                                       ?
? 2017 ? ['offices','electron','vscode','blockchain','flash','collision']                                                                                                                      ?
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

For news subreddit:

??year???words???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
? 2008 ? ['michigan','delegates','obama','alaska','georgia','russians','hamas','biden','hussein','barack','elitist','mccain']                                                                                                                                       ?
? 2009 ? ['stimulus','reform','medicare','franken','healthcare','payer','insurance','downvotes','hospitals','patients','option','health']                                                                                                                           ?
? 2010 ? ['blockade','arizona']                                                                                                                                                                                                                                     ?
? 2011 ? ['protests','occupy','romney','weiner','protesters']                                                                                                                                                                                                       ?
? 2012 ? ['santorum','returns','martin','obamacare','romney']                                                                                                                                                                                                       ?
? 2013 ? ['boston','chemical','surveillance']                                                                                                                                                                                                                       ?
? 2014 ? ['plane','poland','radar','subreddits','palestinians','putin','submission','russia','automoderator','compose','rockets','palestinian','hamas','virus','removal','russians','russian']                                                                      ?
? 2015 ? ['refugees','refugee','sanders','debates','hillary','removal','participating','removed','greece','clinton']                                                                                                                                                ?
? 2016 ? ['morons','emails','opponent','establishment','trump','reply','speeches','presidency','clintons','electoral','donald','trumps','downvote','november','subreddit','shill','domain','johnson','classified','bernie','nominee','users','returns','primaries','foundation','voters','autotldr','clinton','email','supporter','election','feedback','clever','leaks','accuse','candidate','upvote','rulesandregs','convention','conduct','uncommon','server','trolls','supporters','hillary'] ?
? 2017 ? ['impeached','downvotes','monitored','accusations','alabama','violation','treason','nazis','index','submit','impeachment','troll','collusion','bannon','neutrality','permanent','insults','violations']                                                    ?
?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

Conclusion

ClickHouse is a great massively parallel analytical system. It is extremely efficient and can potentially (with some hacks) be used as a main backend database powering a public API gateway serving both realtime and analytical queries. At the same time, it was not originally designed that way. Let me know in the comments if you are using ClickHouse for this or similar projects.


Photo by John Baker on Unsplash

 

 

Jan
03
2019
--

Upcoming Webinar Friday 1/4: High-Performance PostgreSQL, Tuning and Optimization Guide

High-Performance PostgreSQL, Tuning and Optimization Guide

High-Performance PostgreSQL, Tuning and Optimization GuidePlease join Percona’s Senior Software Engineer, Ibrar Ahmed as he presents his High-Performance PostgreSQL, Tuning and Optimization Guide on Friday, January, 4th, at 8:00 AM PDT (UTC-7) / 11:00 AM EDT (UTC-4).

Register Now

PostgreSQL is one of the leading open-source databases. Out of the box, the default PostgreSQL configuration is not tuned for any workload. Thus, any system with least resources can run it. PostgreSQL does not give optimum performance on high permanence machines because it is not using the all available resource. PostgreSQL provides a system where you can tune your database according to your workload and machine’s specifications. In addition to PostgreSQL, we can also tune our Linux box so that the database load can work optimally.

In this webinar on High-Performance PostgreSQL, Tuning and Optimization, we will learn how to tune PostgreSQL and we’ll see the results of that tuning. We will also touch on tuning some Linux kernel parameters.

 

Dec
20
2018
--

Percona Database Performance Blog 2018 Year in Review: Top Blog Posts

Percona Database Performance Blog

Percona Database Performance BlogLet’s look at some of the most popular Percona Database Performance Blog posts in 2018.

The closing of a year lends itself to looking back. And making lists. With the Percona Database Performance Blog, Percona staff and leadership work hard to provide the open source community with insights, technical support, predictions and metrics around multiple open source database software technologies. We’ve had nearly 4 million visits to the blog in 2018: thank you! We look forward to providing you with even better articles, news and information in 2019.

As 2018 moves into 2019, let’s take a quick look back at some of the most popular posts on the blog this year.

Top 10 Most Read

These posts had the most number of views (working down from the highest):

When Should I Use Amazon Aurora and When Should I use RDS MySQL?

Now that Database-as-a-service (DBaaS) is in high demand, there is one question regarding AWS services that cannot always be answered easily : When should I use Aurora and when RDS MySQL?

About ZFS Performance

ZFS has many very interesting features, but I am a bit tired of hearing negative statements on ZFS performance. It feels a bit like people are telling me “Why do you use InnoDB? I have read that MyISAM is faster.” I found the comparison of InnoDB vs. MyISAM quite interesting, and I’ll use it in this post.

Linux OS Tuning for MySQL Database Performance

In this post we will review the most important Linux settings to adjust for performance tuning and optimization of a MySQL database server. We’ll note how some of the Linux parameter settings used OS tuning may vary according to different system types: physical, virtual or cloud.

A Look at MyRocks Performance

As the MyRocks storage engine (based on the RocksDB key-value store http://rocksdb.org ) is now available as part of Percona Server for MySQL 5.7, I wanted to take a look at how it performs on a relatively high-end server and SSD storage.

How to Restore MySQL Logical Backup at Maximum Speed

The ability to restore MySQL logical backups is a significant part of disaster recovery procedures. It’s a last line of defense.

Why MySQL Stored Procedures, Functions and Triggers Are Bad For Performance

MySQL stored procedures, functions and triggers are tempting constructs for application developers. However, as I discovered, there can be an impact on database performance when using MySQL stored routines. Not being entirely sure of what I was seeing during a customer visit, I set out to create some simple tests to measure the impact of triggers on database performance. The outcome might surprise you.

AMD EPYC Performance Testing… or Don’t get on the wrong side of SystemD

Ever since AMD released their EPYC CPU for servers I wanted to test it, but I did not have the opportunity until recently, when Packet.net started offering bare metal servers for a reasonable price. So I started a couple of instances to test Percona Server for MySQL under this CPU. In this benchmark, I discovered some interesting discrepancies in performance between  AMD and Intel CPUs when running under systemd.

Tuning PostgreSQL Database Parameters to Optimize Performance

Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. Default values are set to ensure that PostgreSQL runs everywhere, with the least resources it can consume and so that it doesn’t cause any vulnerabilities. It is primarily the responsibility of the database administrator or developer to tune PostgreSQL according to their system’s workload. In this blog, we will establish basic guidelines for setting PostgreSQL database parameters to improve database performance according to workload.

Using AWS EC2 instance store vs EBS for MySQL: how to increase performance and decrease cost

If you are using large EBS GP2 volumes for MySQL (i.e. 10TB+) on AWS EC2, you can increase performance and save a significant amount of money by moving to local SSD (NVMe) instance storage. Interested? Then read on for a more detailed examination of how to achieve cost-benefits and increase performance from this implementation.

Why You Should Avoid Using “CREATE TABLE AS SELECT” Statement

In this blog post, I’ll provide an explanation why you should avoid using the CREATE TABLE AS SELECT statement. The SQL statement “create table <table_name> as select …” is used to create a normal or temporary table and materialize the result of the select. Some applications use this construct to create a copy of the table. This is one statement that will do all the work, so you do not need to create a table structure or use another statement to copy the structure.

Honorable Mention:

Is Serverless Just a New Word for Cloud-Based?

Top 10 Most Commented

These posts generated some healthy discussions (not surprisingly, this list overlaps with the first):

Posts Worth Revisiting

Don’t miss these great posts that have excellent information on important topics:

Have a great end of the year celebration, and we look forward to providing more great blog posts in 2019.

Nov
23
2018
--

Compression options in MySQL (part 1)

Over the last year, I have been pursuing a part time hobby project exploring ways to squeeze as much data as possible in MySQL. As you will see, there are quite a few different ways. Of course things like compression ratio matters a lot but, other items like performance of inserts, selects and updates, along with the total amount of bytes written are also important. When you start combining all the possibilities, you end up with a large set of compression options and, of course, I am surely missing a ton. This project has been a great learning opportunity and I hope you’ll enjoy reading about my results. Given the volume of results, I’ll have to write a series of posts. This post is the first of the series. I also have to mention that some of my work overlaps work done by one of my colleague, Yura Sorokin, in a presentation he did in Dublin.

The compression options

  • InnoDB page size in {16k, 32k, 64k} (as references)
  • InnoDB barracuda page compression, block_size in {8k, 4k}
  • InnoDB Transparent page compression with punch holes, page size in {16k, 32k, 64k} * compression algo in {LZ4, Zlib}
  • MyISAM, MyISAM Packed, MyISAM on ZFS with recordsize in {16k, 32k}
  • InnoDB on ZFS, ZFS compression algo in {LZ4, Zlib}, ZFS record size in {16k, 32k, 64k, 128k}, InnoDB page size in {16k, 32k, 64k}
  • TokuDB, TokuDB compression algo in {ZLIB, LZMA, QUICKLZ, SNAPPY}
  • TokuDB on ZFS, TokuDB compression algo set to None, ZFS compression Zlib, ZFS record size in {16k, 32k, 64k, 128k}
  • MyRocks, compression algo in {None, ZSTD}
  • MyRocks on ZFS, MyRocks compression algo set to None, ZFS compression Zlib, ZFS record size in {16k, 32k, 64k, 128k}

In many interesting cases, the ZFS experiments have been conducted with and without a SLOG.

The test datasets

In order to cover these solutions efficiently, I used a lot of automation and I restricted myself to two datasets. My first dataset consists of a set of nearly 1 billion rows from the Wikipedia access stats. The table schema is:

CREATE TABLE `wiki_pagecounts` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `day` date NOT NULL,
  `hour` tinyint(4) NOT NULL,
  `project` varchar(30) NOT NULL,
  `title` text NOT NULL,
  `request` int(10) unsigned NOT NULL,
  `size` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_time` (`day`,`hour`)
);

and here’s a typical row:

mysql> select * from wiki_pagecounts where id = 16\G
*************************** 1. row ***************************
     id: 16
    day: 2016-01-01
   hour: 0
project: aa
  title: 'File:Wiktionary-logo-en.png'
request: 1
   size: 10752
1 row in set (0.00 sec)

The average length of the title columns is above 70 and it often has HTML escape sequences for UTF-8 characters in it. The actual column content is not really important but it is not random data. Loading this dataset in plain InnoDB results in a data file of about 113GB.

The second dataset is from the defunct Percona cloud tool project and is named “o1543”. Instead of a large number of rows, it is made of only 77M rows but this time, the table has 134 columns, mostly using float or bigint. The table definition is:

CREATE TABLE `query_class_metrics` (
   `day` date NOT NULL,
   `query_class_id` int(10) unsigned NOT NULL,
   `instance_id` int(10) unsigned NOT NULL,
   `start_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   `end_ts` timestamp NOT NULL DEFAULT '1970-01-01 00:00:01',
   `query_count` bigint(20) unsigned NOT NULL,
   `lrq_count` bigint(20) unsigned NOT NULL DEFAULT '0',
   ...
   `Sort_scan_sum` bigint(20) unsigned DEFAULT NULL,
   `No_index_used_sum` bigint(20) unsigned DEFAULT NULL,
   `No_good_index_used_sum` bigint(20) unsigned DEFAULT NULL,
   PRIMARY KEY (`start_ts`,`instance_id`,`query_class_id`),
   KEY `start_ts` (`instance_id`)
 );

When loaded in plain InnoDB, the resulting data file size is slightly above 87GB.

The test queries

The first test query is, of course, the inserts used to load the datasets. These are multi-inserts statements in primary key order generated by the mysqldump utility.

The second test query is a large range select. For the Wikipedia dataset I used:

select `day`, `hour`, max(request), sum(request), sum(size)
from wikipedia_pagecounts.wiki_pagecounts
where day = '2016-01-05'
group by `day`,`hour`;

while for the o1543 dataset, I used:

select query_class_id, sum(Query_time_sum) as Totat_time,sum(query_count), sum(Rows_examined_sum),
  sum(InnoDB_pages_distinct_sum)
from o1543.query_class_metrics
where start_ts between '2014-10-01 00:00:00' and '2015-06-30 00:00:00'
group by query_class_id
order by Totat_time desc limit 10;

In both cases, a significant amount of data needs to be processed. Finally, I tested random access with the updates. I generated 20k distinct single row updates in random primary key order for the Wikipedia dataset like:

update wikipedia_pagecounts.wiki_pagecounts set request = request + 1 where id = 377748793;

For the o1543 dataset, I used the following update statement:

update o1543.query_class_metrics set Errors_sum = Errors_sum + 1
where query_class_id = 472 and start_ts between '2014-10-01 00:00:00' and '2014-12-31 23:59:59';

which ends up updating very close to 20k rows, well spaced in term of primary key values.

The metrics recorded

In order to compare the compression options, I recorded key metrics.

  • Time: that’s simply the execution time of the queries. It is very difficult to minimally tune a server for all the different engines. Some also rely on the OS file cache. Take the execution time as a rough performance indicator which could be modified substantially through targeted tuning.
  • Amount of data read and written by the MySQL process, as reported by /proc/$(pidof mysqld)/io.
  • Amount of data written to the actual physical device where the data is stored, from /sys/block/$datadevice/stat. That matters a lot for flash devices that have a finite endurance. The amount of data written to the storage is the main motivation of Facebook with MyRocks.
  • The actual size of the final datasets

Even with these simple metrics, you’ll see there is quite a lot to discuss and learn.

The procedure

For the benchmarks, I used a LXC virtual machine. My main goal was to simulate a dataset much larger than the available memory. I tried to limit the MySQL buffers to 128MB but in some cases, like with MyRocks, that was pretty unfair and it impacted the performance results. Basically, the procedure was:

  1. Start mysqld (no buffer pool load)
  2. Sync + drop cache
  3. Capture: du -hs the datadir
  4. Capture: cat /proc/$(pidof mysqld)/io
  5. Capture: cat /sys/block/vdb/stat
  6. Capture: show global variables and show global status;
  7. Run the test query
  8. Wait for 30 minutes for any flushing or maintenance to complete
  9. Capture: du -hs the datadir
  10. Capture: cat /proc/$(pidof mysqld)/io
  11. Capture: cat /sys/block/vdb/stat
  12. Capture: show global variables and show global status;
  13. Stop mysqld

As much as possible, I automated the whole procedure. On many occasions, I ran multiple runs of the same benchmark to validate unexpected behaviors.

First results: Traditional storage options

Inserting the data

In this first post of the series, I’ll report on the traditional “built-in” options which are InnoDB with Barracuda compression (IBC) and MyISAM with packing. I debated a bit the inclusion of MyISAM in this post since tables become read-only once packed but still, I personally implemented solutions using MyISAM packed a few times.

On the first figure (above), we have the final table sizes in GB for the different options we are considering in this first round of results. Over all the post series, we’ll use the plain InnoDB results as a reference. The Wikipedia dataset has a final size in InnoDB of 112.6GB while the o1543 dataset is slightly smaller, at 87.4GB.

The MyISAM sizes are smaller by 10 to 15% which is expected since InnoDB is page based, with page and row headers, and it doesn’t fully pack its pages. The InnoDB dataset size could have been up to twice as large if the data was inserted in random order of the primary key.

Adding Barracuda page compression with an 8KB block size (InnoDBCmp8k), both datasets shrink by close to 50%. Pushing to a block size of 4KB (InnoDBCmp4k), the Wikipedia dataset clearly doesn’t compress that much but, the o1543 dataset is very compressible, by more than 75%. Looking at the MyISAM Packed results, the o1543 dataset compressed to only 8.8GB, a mere 11% of the original MyISAM size. That means the o1543 could have done well with IBC using block size of 2KB. Such a compression ratio is really exceptional. I’ve rarely encountered a ratio that favorable in a production database.

With IBC, we know when the block size is too small for the compressibility of the dataset when the final size is no longer following the ratio of compressed block size over the original block size. For example, the Wikipedia dataset started at 112.6GB and a fourth (4KB/16KB) of this number is much smaller than the 46.9GB size of the InnoDB compression with 4k block size. You’ll also see a large number of compression failure in the innodb_cmp table of the information schema.

When the compression fails with IBC, InnoDB splits the page in two and recompresses each half. That process adds an overhead which can observed in the insertion time figure. While the insertion time of the Wikipedia dataset for the InnoDBCmp4k explodes to 2.6 times the uncompressed insertion time, the o1543 dataset takes only 30% more time. I suggest you do not take the times here too formally, the test environment I used was not fully isolated. View these times as trends.

The amount of data written during the inserts, shown on the above figure, has bugged me for a while. Let’s consider, for example, the amount of writes needed to insert all the rows of the Wikipedia dataset in plain InnoDB. The total is 503GB for a dataset of 113GB. From the MySQL status variables, I have 114GB written to the data files (innodb_pages_written * 16kb), 114GB written to the double write buffer (Inndb_dblwr_pages_written * 16kb) and 160GB written to the InnoDB log files (innodb_os_log_written). If you sum all these values, you have about 388GB, a value short by about… 114GB, too close to the size of the dataset to be an accident. What else is written?

After some research, I finally found it! When a datafile is extended, InnoDB first writes zeros to allocate the space on disk. All the tablespaces have to go through that initial allocation phase so here are the missing 114GB.

Back to the data written during the inserts figure, look at the number of writes required for the Wikipedia dataset when using InnoDBCmp4k. Any idea why it is higher? What if I tell you the double write buffer only writes 16KB pages (actually, it depends on innodb_page_size)? So, when the pages are compressed, they are padded with zeros when written to the double write buffer. We remember that, when compressing to 4KB, we had many compression misses so we ended up with many more pages to write. Actually, Domas Mituzas filed a bug back in 2013 about this. Also, by default (innodb_log_compressed_pages), the compressed and uncompressed versions of the pages are written to the InnoDB log files. Actually, only the writes to the tablespace are reduced by IBC.

MyISAM, since it is not a transactional engine, cheats here. The only overhead are the writes to the b-trees of the index. So, to the expense of durability, MyISAM writes much less data in this simple benchmark.

Range selects

So great, we have now inserted a lot of rows in our tables. How fast can we access these rows? The following figure presents the times to perform large range scans on the datasets. In InnoDB, the times are “fairly” stable. For the Wikipedia dataset, InnoDB compression improves the select performance, the time to decompress a page is apparently shorter than the time to read a full page. MyISAM without compression is the fastest solution but, once again, the benchmark offers the most favorable conditions to MyISAM. MyISAMPacked doesn’t fare as well for the large range select, likely too much data must be decompressed.

20k updates

Going to the updates, the time required to perform 20k updates by a single thread is shown on the above figure. For both datasets, InnoDB and InnoDBCmp8k show similar times. There is a divergence with Wikipedia dataset stored on InnoDBCmp4k, the execution time is 57% larger, essentially caused by a large increase in the number of pages read. MyISAM is extremely efficient dealing with the updates of the o1543 dataset since the record size is fixed and the update process is single-threaded.


Finally, let’s examine the number of bytes written per updates as shown on the figure above. I was naively expecting about two pages written per update statement, one for the double write buffer and one for the tablespace file. The Wikipedia dataset shows more like three pages written per update while the o1543 dataset fits rather well with what I was expecting. I had to look at the file_summary_by_instance table of the Performance schema and the innodb_metrics table to understand. Actually, my updates to the Wikipedia dataset are single row updates executed in autocommit mode, while the updates to the o1543 dataset are from a single statement updating 20k rows. When you do multiple small transactions, you end up writing much more to the undo log and to the system tablespace. The worse case is when the updates are in separate transactions and a long time is allowed for MySQL to flush the dirty pages.

Here are the writes associated with 30 updates, in autocommit mode, 20s apart:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ;
+---------------------------------+-------------+
| NAME                            | COUNT_RESET |
+---------------------------------+-------------+
| buffer_pages_written            |         120 |
| buffer_data_written             |     2075136 |
| buffer_page_written_index_leaf  |          30 |
| buffer_page_written_undo_log    |          30 |
| buffer_page_written_system_page |          30 |
| buffer_page_written_trx_system  |          30 |
| os_log_bytes_written            |       77312 |
| innodb_dblwr_pages_written      |         120 |
+---------------------------------+-------------+
8 rows in set (0.01 sec)

The index leaf write is where the row is stored in the tablespace, 30 matches the number of rows updated. Each update has dirtied one leaf page as expected. The undo log is used to store the previous version of the row for rollback, in the ibdata1 file. I wrongly assumed these undo entries would not be actually written to disk, and would only live in the buffer pool and purged before they needed to be flushed to disk. I don’t clearly enough understand what is written to the system page and trx system to attempt a clear explanation for these ones. The sum of pages to write is 120, four per update but you need to multiply by two because of the double write buffer. So, in this worse case scenario, a simple single row update may cause up to eight pages to be written to disk.

Grouping the updates in a single transaction basically removes the pages written to the system_page and trx_system as these are per transaction.

Here is the result for the same 30 updates, send at a 20s interval, but this time in a single transaction:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ;
+---------------------------------+-------------+
| NAME                            | COUNT_RESET |
+---------------------------------+-------------+
| buffer_pages_written            |          63 |
| buffer_data_written             |     1124352 |
| buffer_page_written_index_leaf  |          30 |
| buffer_page_written_undo_log    |          31 |
| buffer_page_written_system_page |           1 |
| buffer_page_written_trx_system  |           1 |
| os_log_bytes_written            |       60928 |
| innodb_dblwr_pages_written      |          63 |
+---------------------------------+-------------+

The write load, in terms of the number of pages written, is cut by half, to four per update. The most favorable case will be a single transaction with no sleep in between.

For 30 updates in a single transaction with no sleep, the results are:

mysql> select NAME, COUNT_RESET from innodb_metrics where name like '%writt%' and count_reset > 0 ;
+---------------------------------+-------------+
| NAME                            | COUNT_RESET |
+---------------------------------+-------------+
| buffer_pages_written            |          33 |
| buffer_data_written             |      546304 |
| buffer_page_written_index_leaf  |          30 |
| buffer_page_written_undo_log    |           1 |
| buffer_page_written_system_page |           1 |
| buffer_page_written_trx_system  |           1 |
| os_log_bytes_written            |        4608 |
| innodb_dblwr_pages_written      |          33 |
+---------------------------------+-------------+
8 rows in set (0.00 sec)

Now, the undo log is flushed only once and we are down to approximately two page writes per update. This is what I was originally expecting. The other results falls well into place if you keep in mind that only the index_leaf writes are compressed. The InnoDBCmp4k results for the Wikipedia dataset are higher, essentially because it took much more time and thus more page flushing occurred.

What we learned?

Everything can be a pretext to explore and learn. Just to summarize, what have we learned in this post?

  • The InnoDB log file logs compressed and uncompressed result by default (see innodb_log_compressed_pages)
  • The double write buffer only writes full pages, compressed pages are zero padded
  • With InnoDB the total amount of data written to disk during the inserts is more than 5 times the final size. Compression worsen the ratio.
  • A single row update causes from two up to eight pages to be written to disk

Not bad in term of collateral learning…

Next?

In this post, we reviewed the traditional data compression solutions available with MySQL. In future posts, we’ll start looking at the alternatives. In the next one, I will evaluate InnoDB Transparent page compression with punch hole, a feature available since MySQL 5.7.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com