Jun
26
2018
--

Webinar 6/27: MySQL Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance

performance troubleshooting MySQL monitoring tools

performance troubleshooting MySQL monitoring toolsPlease join Percona’s Principal Support Escalation Specialist Sveta Smirnova as she presents Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance on Wednesday, June 27th at 11:00 AM PDT (UTC-7) / 2:00 PM EDT (UTC-4).

 

During the MySQL Troubleshooting webinar series, I covered many monitoring and logging tools such as:

  • General, slow, audit, binary, error log files
  • Performance Schema
  • Information Schema
  • System variables
  • Linux utilities
  • InnoDB monitors
  • PMM

However, I did not spend much time on the impact these instruments have on overall MySQL performance. And they do have an impact.

And this is the conflict many people face. MySQL Server users try exploring these monitoring instruments, see that they slow down their installations, and turn them off. This is unfortunate. If the instrument that can help you resolve a problem is OFF, you won’t have good and necessary information to help understand when, how and why the issue occurred. In the best case, you’ll re-enable instrumentation and wait for the next disaster occurrence. In the worst case, you try various fix options without any real knowledge if they solve the problem or not.

This is why it is important to understand the impact monitoring tools have on your database, and therefore how to minimize it.

Understanding and controlling the impact of MySQL monitoring tools

In this webinar, I cover why certain monitoring tools affect performance, and how to minimize the impact without turning the instrument off. You will learn how to monitor safely and effectively.

Register Now

 

Sveta Smirnova

Principal Support Escalation Specialist

Sveta joined Percona in 2015. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can quickly solve typical issues and teaching others how to deal with MySQL issues, bugs and gotchas effectively. Before joining Percona, Sveta worked as Support Engineer in MySQL Bugs Analysis Support Group in MySQL AB-Sun-Oracle. She is the author of book “MySQL Troubleshooting” and JSON UDF functions for MySQL.

The post Webinar 6/27: MySQL Troubleshooting Best Practices: Monitoring the Production Database Without Killing Performance appeared first on Percona Database Performance Blog.

Jun
19
2018
--

Chunk Change: InnoDB Buffer Pool Resizing

innodb buffer pool chunk size

Since MySQL 5.7.5, we have been able to resize dynamically the InnoDB Buffer Pool. This new feature also introduced a new variable — innodb_buffer_pool_chunk_size — which defines the chunk size by which the buffer pool is enlarged or reduced. This variable is not dynamic and if it is incorrectly configured, could lead to undesired situations.

Let’s see first how innodb_buffer_pool_size , innodb_buffer_pool_instances  and innodb_buffer_pool_chunk_size interact:

The buffer pool can hold several instances and each instance is divided into chunks. There is some information that we need to take into account: the number of instances can go from 1 to 64 and the total amount of chunks should not exceed 1000.

So, for a server with 3GB RAM, a buffer pool of 2GB with 8 instances and chunks at default value (128MB) we are going to get 2 chunks per instance:

This means that there will be 16 chunks.

I’m not going to explain the benefits of having multiple instances, I will focus on resizing operations. Why would you want to resize the buffer pool? Well, there are several reasons, such as:

  • on a virtual server you can add more memory dynamically
  • for a physical server, you might want to reduce database memory usage to make way for other processes
  • on systems where the database size is smaller than available RAM
  • if you expect a huge growth and want to increase the buffer pool on demand

Reducing the buffer pool

Let’s start reducing the buffer pool:

| innodb_buffer_pool_size | 2147483648 |
| innodb_buffer_pool_instances | 8     |
| innodb_buffer_pool_chunk_size | 134217728 |
mysql> set global innodb_buffer_pool_size=1073741824;
Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 1073741824 |
+-------------------------+------------+
1 row in set (0.00 sec)

If we try to decrease it to 1.5GB, the buffer pool will not change and a warning will be showed:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+---------------------------------------------------------------------------------+
| Level   | Code | Message                                                                         |
+---------+------+---------------------------------------------------------------------------------+
| Warning | 1210 | InnoDB: Cannot resize buffer pool to lesser than chunk size of 134217728 bytes. |
+---------+------+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 2147483648 |
+-------------------------+------------+
1 row in set (0.01 sec)

Increasing the buffer pool

When we try to increase the value from 1GB to 1.5GB, the buffer pool is resized but the requested innodb_buffer_pool_size is considered to be incorrect and is truncated:

mysql> set global innodb_buffer_pool_size=1610612736;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+-----------------------------------------------------------------+
| Level   | Code | Message                                                         |
+---------+------+-----------------------------------------------------------------+
| Warning | 1292 | Truncated incorrect innodb_buffer_pool_size value: '1610612736' |
+---------+------+-----------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 2147483648 |
+-------------------------+------------+
1 row in set (0.01 sec)

And the final size is 2GB. Yes! you intended to set the value to 1.5GB and you succeeded in setting it to 2GB. Even if you set 1 byte higher, like setting: 1073741825, you will end up with a buffer pool of 2GB.

mysql> set global innodb_buffer_pool_size=1073741825;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 2147483648 |
+-------------------------------+------------+
2 rows in set (0.01 sec)

Interesting scenarios

Increasing size in the config file

Let’s suppose one day you get up willing to change or tune some variables in your server, and you decide that as you have free memory you will increase the buffer pool. In this example, we are going to use a server with 

innodb_buffer_pool_instances = 16

  and 2GB of buffer pool size which will be increased to 2.5GB

So, we set in the configuration file:

innodb_buffer_pool_size = 2684354560

But then after restart, we found:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 134217728  |
| innodb_buffer_pool_size       | 4294967296 |
+-------------------------------+------------+
2 rows in set (0.00 sec)

And the error log says:

2018-05-02T21:52:43.568054Z 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 16, chunk size = 128M

So, after we have set innodb_buffer_pool_size in the config file to 2.5GB, the database gives us a 4GB buffer pool, because of the number of instances and the chunk size. What the message doesn’t tell us is the number of chunks, and this would be useful to understand why such a huge difference.

Let’s take a look at how that’s calculated.

Increasing instances and chunk size

Changing the number of instances or the chunk size will require a restart and will take into consideration the buffer pool size as an upper limit to set the chunk size. For instance, with this configuration:

innodb_buffer_pool_size = 2147483648
innodb_buffer_pool_instances = 32
innodb_buffer_pool_chunk_size = 134217728

We get this chunk size:

mysql> show global variables like 'innodb_buffer_pool_%size' ;
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| innodb_buffer_pool_chunk_size | 67108864   |
| innodb_buffer_pool_size       | 2147483648 |
+-------------------------------+------------+
2 rows in set (0.00 sec)

However, we need to understand how this is really working. To get the innodb_buffer_pool_chunk_size it will make this calculation: innodb_buffer_pool_size / innodb_buffer_pool_instances with the result rounded to a multiple of 1MB.

In our example, the calculation will be 2147483648 / 32 = 67108864 which 67108864%1048576=0, no rounding needed. The number of chunks will be one chunk per instance.

When does it consider that it needs to use more chunks per instance? When the difference between the required size and the innodb_buffer_pool_size configured in the file is greater or equal to 1MB.

That is why, for instance, if you try to set the innodb_buffer_pool_size equal to 1GB + 1MB – 1B you will get 1GB of buffer pool:

innodb_buffer_pool_size = 1074790399
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:26:43.328313Z 0 [Note] InnoDB: Initializing buffer pool, total size = 1G, instances = 16, chunk size = 64M

But if you set the innodb_buffer_pool_size equals to 1GB + 1MB you will get 2GB of buffer pool:

innodb_buffer_pool_size = 1074790400
innodb_buffer_pool_instances = 16
innodb_buffer_pool_chunk_size = 67141632
2018-05-07T09:25:48.204032Z 0 [Note] InnoDB: Initializing buffer pool, total size = 2G, instances = 16, chunk size = 64M

This is because it considers that two chunks will fit. We can say that this is how the InnoDB Buffer pool size is calculated:

determine_best_chunk_size{
  if innodb_buffer_pool_size / innodb_buffer_pool_instances < innodb_buffer_pool_chunk_size
  then
    innodb_buffer_pool_chunk_size = roundDownMB(innodb_buffer_pool_size / innodb_buffer_pool_instances)
  fi
}
determine_amount_of_chunks{
  innodb_buffer_amount_chunks_per_instance = roundDown(innodb_buffer_pool_size / innodb_buffer_pool_instances / innodb_buffer_pool_chunk_size)
  if innodb_buffer_amount_chunks_per_instance * innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size - innodb_buffer_pool_size > 1024*1024
  then
    innodb_buffer_amount_chunks_per_instance++
  fi
}
determine_best_chunk_size
determine_amount_of_chunks
innodb_buffer_pool_size = innodb_buffer_pool_instances * innodb_buffer_pool_chunk_size * innodb_buffer_amount_chunks_per_instance

What is the best setting?

In order to analyze the best setting you will need to know that there is a upper limit of 1000 chunks. In our example with 16 instances, we can have no more than 62 chunks per instance.

Another thing to consider is what each chunk represents in percentage terms. Continuing with the example, each chunk per instance represent 1.61%, which means that we can increase or decrease the complete buffer pool size in multiples of this percentage.

From a management point of view, I think that you might want to consider at least a range of 2% to 5% to increase or decrease the buffer. I performed some tests to see the impact of having small chunks and I found no issues but this is something that needs to be thoroughly tested.

The post Chunk Change: InnoDB Buffer Pool Resizing appeared first on Percona Database Performance Blog.

Jan
06
2017
--

Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads

PostgreSQL and MySQL

This blog compares how PostgreSQL and MySQL handle millions of queries per second.

Anastasia: Can open source databases cope with millions of queries per second? Many open source advocates would answer “yes.” However, assertions aren’t enough for well-grounded proof. That’s why in this blog post, we share the benchmark testing results from Alexander Korotkov (CEO of Development, Postgres Professional) and Sveta Smirnova (Principal Technical Services Engineer, Percona). The comparative research of PostgreSQL 9.6 and MySQL 5.7 performance will be especially valuable for environments with multiple databases.

The idea behind this research is to provide an honest comparison for the two popular RDBMSs. Sveta and Aleksander wanted to test the most recent versions of both MySQL and PostgreSQL with the same tool, under the same challenging workloads and using the same configuration parameters (where possible). However, because both PostgreSQL and MySQL ecosystems evolved independently, with standard testing tools (pgbench and SysBench) used for each database, it wasn’t an easy journey.

The task fell to database experts with years of hands-on experience. Sveta has worked as Senior Principal Technical Support Engineer in the Bugs Verification Group of the MySQL Support Group at Oracle for more than eight years, and since 2015  has worked as a Principal Technical Services Engineer at Percona. Alexander Korotkov is a PostgreSQL major contributor, and the developer of a number PostgreSQL features – including the CREATE ACCESS METHOD command, generic WAL interface, lockfree Pin/UnpinBuffer, index-based search for regular expressions and much more. So we have a pretty decent cast for this particular play!

SvetaDimitri Kravtchuk regularly publishes detailed benchmarks for MySQL, so my main task wasn’t confirming that MySQL can do millions of queries per second. As our graphs will show, we’ve passed that mark already. As a Support Engineer, I often work with customers who have heterogeneous database environments in their shops, and want to know about the impact of migrating jobs from one database to another. So instead, I found the chance to work with the Postgres Professional company and identify both the strong and weak points of the two databases an excellent opportunity.

We wanted to test both databases on the same hardware, using the same tools and tests. We expected to test base functionality, and then work on more detailed comparisons. That way we could compare different real-world use case scenarios and popular options.

Spoiler: We are far from the final results. This is the start of a blog series.

OpenSource Databases on Big Machines, Series 1: “That Was Close…”

PostgreSQL Professional together with Freematiq provided two modern, powerful machines for tests.

Hardware configuration:

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

I also used a smaller Percona machine.

Hardware configuration:

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 251.9G
Disk speed: about 33K IOPS
OS: Ubuntu 14.04.5 LTS
File system: EXT4

Note that machines with smaller numbers of CPU cores and faster disks are more common for MySQL installations than machines with larger numbers of cores.

The first thing we needed to agree on is which tool to use. A fair comparison only makes sense if the workloads are as close as possible.

The standard PostgreSQL tool for performance tests is pgbench, while for MySQL it’s SysBench. SysBench supports multiple database drivers and scriptable tests in the Lua programming language, so we decided to use this tool for both databases.

The initial plan was to convert pgbench tests into SysBench Lua syntax, and then run standard tests on both databases. After initial results, we modified our tests to better examine specific MySQL and PostgreSQL features.

I converted pgbench tests into SysBench syntax, and put the tests into an open-database-bench GitHub repository.

And then we both faced difficulties.

As I wrote already, I also ran the tests on a Percona machine. For this converted test, the results were almost identical:

Percona machine:

OLTP test statistics:
       transactions:                        1000000 (28727.81 per sec.)
       read/write requests:                 5000000 (143639.05 per sec.)
       other operations:                    2000000 (57455.62 per sec.)

Freematiq machine:

OLTP test statistics:
       transactions:                        1000000 (29784.74 per sec.)
       read/write requests:                 5000000 (148923.71 per sec.)
       other operations:                    2000000 (59569.49 per sec.)

I started investigating. The only place where the Percona machine was better than Freematiq’s was disk speed. So I started running the pgbench read-only test, which was identical to SysBench’s point select test with full dataset in memory. But this time SysBench used 50% of available CPU resources:

PID  USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
4585 smirnova  20   0  0,157t 0,041t   9596 S  7226  1,4  12:27.16 mysqld
8745 smirnova  20   0 1266212 629148   1824 S  7126  0,0   9:22.78 sysbench

Alexander, in turn, had issues with SysBench, which could not create a high load on PostgreSQL when prepared statements were used:

93087 korotkov  20   0 9289440 3,718g   2964 S 242,6  0,1   0:32.82 sysbench
93161 korotkov  20   0 32,904g  81612  80208 S   4,0  0,0   0:00.47 postgres
93116 korotkov  20   0 32,904g  80828  79424 S   3,6  0,0   0:00.46 postgres
93118 korotkov  20   0 32,904g  80424  79020 S   3,6  0,0   0:00.47 postgres
93121 korotkov  20   0 32,904g  80720  79312 S   3,6  0,0   0:00.47 postgres
93128 korotkov  20   0 32,904g  77936  76536 S   3,6  0,0   0:00.46 postgres
93130 korotkov  20   0 32,904g  81604  80204 S   3,6  0,0   0:00.47 postgres
93146 korotkov  20   0 32,904g  81112  79704 S   3,6  0,0   0:00.46 postgres

We contacted SysBench author Alexey Kopytov, and he fixed MySQL issue. The solution is:

  • Use SysBench with options
    --percentile=0 --max-requests=0

      (reasonable CPU usage)

  • Use concurrency_kit branch (better concurrency and Lua processing)
  • Rewrite Lua scripts to support prepared statements (pull request: https://github.com/akopytov/sysbench/pull/94)
  • Start both SysBench and mysqld with the jemalloc or tmalloc library pre-loaded

A fix for PostgreSQL is on the way. For now, Alexander converted a standard SysBench test into pgbench format and we stuck with it. Not much new for MySQL, but at least we had a baseline for comparison.

The next difficulty I faced was the default operating system parameters. To make the long story short, I changed them to the recommended ones (described below):

vm.swappiness=1
cpupower frequency-set --governor performance
kernel.sched_autogroup_enabled=0
kernel.sched_migration_cost_ns= 5000000
vm.dirty_background_bytes=67108864
vm.dirty_bytes=536870912
IO scheduler [deadline]

The same parameters were better for PostgreSQL performance as well. Alexander set his machine similarly.

After solving these issues we learned and implemented the following:

  • We cannot use a single tool (for now)
  • Alexander wrote a test for pgbench, imitating the standard SysBench tests
  • We are still not able to write custom tests because we use different tools

But we could use these tests as a baseline. After work done by Alexander, we stuck with the standard SysBench tests. I converted them to use prepared statements, and Alexander converted them into pgbench format.

I should mention that I was not able to get the same results for the Read Only and Point Select tests as Dimitri. They are close, but slightly slower. We need to investigate if this is the result of different hardware, or my lack of performance testing abilities. The results from the Read-Write tests are similar.

Another difference was between the PostgreSQL and MySQL tests. MySQL users normally have many connections. Setting the value of the variable

max_conenctions

, and limiting the total number of parallel connections to thousands is not rare nowadays. While not recommended, people use this option even without the thread pool plugin. In real life, most of these connections are sleeping. But there is always a chance they all will used in cases of increased website activity.

For MySQL I tested up to 1024 connections. I used powers of two and multiplies of the number of cores: 1, 2, 4, 8, 16, 32, 36, 64, 72, 128, 144, 256, 512 and 1024 threads.

For Alexander, it was more important to test in smaller steps. He started from one thread and increased by 10 threads, until 250 parallel threads were reached. So you will see a more detailed graph for PostgreSQL, but no results after 250 threads.

Here are our comparison results.

Point SELECTs

PostgreSQL and MySQL

  • pgsql-9.6 is standard PostgreSQL
  • pgsql-9.6 + pgxact-align is PostgreSQL with this patch (more details can be found in this blog post)
  • MySQL-5.7 Dimitri is Oracle’s MySQL Server
  • MySQL-5.7 Sveta is Percona Server 5.7.15

OLTP RO

PostgreSQL and MySQL

OLTP RW

PostgreSQL and MySQL

Sync commit in PostgreSQL is a feature, similar to

innodb_flush_log_at_trx_commit=1

 in InnoDB, and async commit is similar to

innodb_flush_log_at_trx_commit=2

.

You see that the results are very similar: both databases are developing very fast and work with modern hardware well.

MySQL results which show 1024 threads for reference.

Point SELECT and OLTP RO

PostgreSQL and MySQL

OLTP RW with innodb_flush_log_at_trx_commit set to 1 and 2

PostgreSQL and MySQL

After receiving these results, we did a few feature-specific tests that will be covered in separate blog posts.

More Information

MySQL Options for OLTP RO and Point SELECT tests:

# general
table_open_cache = 8000
table_open_cache_instances=16
back_log=1500
query_cache_type=0
max_connections=4000
# files
innodb_file_per_table
innodb_log_file_size=1024M
innodb_log_files_in_group=3
innodb_open_files=4000
# Monitoring
innodb_monitor_enable = '%'
performance_schema=OFF #cpu-bound, matters for performance
#Percona Server specific
userstat=0
thread-statistics=0
# buffers
innodb_buffer_pool_size=128000M
innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex
innodb_log_buffer_size=64M
# InnoDB-specific
innodb_checksums=1 #Default is CRC32 in 5.7, very fast
innodb_use_native_aio=1
innodb_doublewrite= 1 #https://www.percona.com/blog/2016/05/09/percona-server-5-7-parallel-doublewrite/
innodb_stats_persistent = 1
innodb_support_xa=0 #(We are read-only, but this option is deprecated)
innodb_spin_wait_delay=6 #(Processor and OS-dependent)
innodb_thread_concurrency=0
join_buffer_size=32K
innodb_flush_log_at_trx_commit=2
sort_buffer_size=32K
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10
innodb_lru_scan_depth=4000
innodb_page_cleaners=4
# perf special
innodb_adaptive_flushing = 1
innodb_flush_neighbors = 0
innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_purge_threads=4
innodb_max_purge_lag_delay=30000000
innodb_max_purge_lag=0
innodb_adaptive_hash_index=0 (depends on workload, always check)

MySQL Options for OLTP RW:

#Open files
table_open_cache = 8000
table_open_cache_instances = 16
query_cache_type = 0
join_buffer_size=32k
sort_buffer_size=32k
max_connections=16000
back_log=5000
innodb_open_files=4000
#Monitoring
performance-schema=0
#Percona Server specific
userstat=0
thread-statistics=0
#InnoDB General
innodb_buffer_pool_load_at_startup=1
innodb_buffer_pool_dump_at_shutdown=1
innodb_numa_interleave=1
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_doublewrite=1
innodb_support_xa=1
innodb_checksums=1
#Concurrency
innodb_thread_concurrency=144
innodb_page_cleaners=8
innodb_purge_threads=4
innodb_spin_wait_delay=12 Good value for RO is 6, for RW and RC is 192
innodb_log_file_size=8G
innodb_log_files_in_group=16
innodb_buffer_pool_size=128G
innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex
innodb_io_capacity=18000
innodb_io_capacity_max=36000
innodb_flush_log_at_timeout=0
innodb_flush_log_at_trx_commit=2
innodb_flush_sync=1
innodb_adaptive_flushing=1
innodb_flush_neighbors = 0
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10
innodb_lru_scan_depth=4000
innodb_adaptive_hash_index=0
innodb_change_buffering=none #can be inserts, workload-specific
optimizer_switch="index_condition_pushdown=off" #workload-specific

MySQL SysBench parameters.

LD_PRELOAD=/data/sveta/5.7.14/lib/mysql/libjemalloc.so
 /data/sveta/sbkk/bin/sysbench
 [ --test=/data/sveta/sysbench/sysbench/tests/db/oltp_prepared.lua | --test=/data/sveta/sysbench/sysbench/tests/db/oltp_simple_prepared.lua ]
 --db-driver=mysql --oltp-tables-count=8 --oltp-table-size=10000000
--mysql-table-engine=innodb --mysql-user=msandbox --mysql-password=msandbox
 --mysql-socket=/tmp/mysql_sandbox5715.sock
--num-threads=$i --max-requests=0 --max-time=300
--percentile=0 [--oltp-read-only=on --oltp-skip-trx=on]

PostgreSQL pgbench parameters:

$ git clone https://github.com/postgrespro/pg_oltp_bench.git
$ cd pg_oltp_bench
$ make USE_PGXS=1
$ sudo make USE_PGXS=1 install
$ psql DB -f oltp_init.sql
$ psql DB -c "CREATE EXTENSION pg_oltp_bench;"
$ pgbench -c 100 -j 100 -M prepared -f oltp_ro.sql -T 300 -P 1 DB
$ pgbench -c 100 -j 100 -M prepared -f oltp_rw.sql -T 300 -P 1 DB

Features in MySQL 5.7 that significantly improved performance:

Anastasia: The initial findings of this research were announced at Percona Live Amsterdam 2016. More findings were added to the second version of the same talk given at Moscow HighLoad++ 2016. Hopefully the third iteration of this talk will be available at Percona Live Open Source Database Conference 2017 in Santa Clara. Stay tuned: the Percona Live Committee is working on the program!

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Feb
17
2012
--

The relationship between Innodb Log checkpointing and dirty Buffer pool pages

This is a time-honored topic, and there’s no shortage of articles on the topic on this blog. I wanted to write a post trying to condense and clarify those posts, as it has taken me a while to really understand this relationship.

Some basic facts

  • Most of us know that writing into Innodb updates buffer pool pages in memory and records page operations in the transaction (redo) log.
  • Behind the scenes those updated (dirty) buffer pool pages are flushed down the to the tablespace.
  • If Innodb stops (read: crashes) with dirty buffer pool pages, Innodb recovery must be done to rebuild the last consistent picture of the database.
  • Recovery uses the transaction log by redoing (hence the name ‘redo log’) the page operations in the log that had not already been flushed to the tablespaces.

Ultimately this mechanism was an optimization for slow drives:  if you can sequentially write all the changes into a log, it will be faster to do on the fly as transactions come in than trying to randomly write the changes across the tablespaces.  Sequential IO trumps Random IO.

However, even today in our modern flash storage world where random IO is significantly less expensive (from a latency perspective, not dollars), this is still an optimization because the longer we delay updating the tablespace, the more IOPs we can potentially conserve, condense, merge, etc.  This is because:

  • The same row may be written multiple times before the page is flushed
  • Multiple rows within the same page can be written before the page is flushed

Innodb Log Checkpointing

So, first of all, what can we see about Innodb log checkpointing and what does it tell us?

mysql> SHOW ENGINE INNODB STATUS\G
---
LOG
---
Log sequence number 9682004056
Log flushed up to   9682004056
Last checkpoint at  9682002296

This shows us the virtual head of our log (Log sequence Number), the last place the log was flushed to disk (Log flushed up to), and our last Checkpoint.  The LSN grows forever, while the actual locations inside the transaction logs are reused in a circular fashion.    Based on these numbers, we can determine how many bytes back in the transaction log our oldest uncheckpointed transaction is by subtracting our ‘Log sequence number’ from the ‘Last checkpoint at’ value.  More on what a Checkpoint is in a minute.    If you use Percona server, it does the math for you by including some more output:

---
LOG
---
Log sequence number 9682004056
Log flushed up to   9682004056
Last checkpoint at  9682002296
Max checkpoint age    108005254
Checkpoint age target 104630090
Modified age          1760
Checkpoint age        1760

Probably most interesting here is the Checkpoint age, which is the subtraction I described above.  I think of the Max checkpoint age as roughly the furthest back Innodb will allow us to go in the transaction logs; our Checkpoint age cannot exceed this without blocking client operations in Innodb to flush dirty buffers.  Max checkpoint age appears to be approximately 80% of the total number of bytes in all the transaction logs, but I’m unsure if that’s always the case.

Remember our transaction logs are circular, and the checkpoint age represents how far back the oldest unflushed transaction is in the log.  We cannot overwrite that without potentially losing data on a crash, so Innodb does not permit such an operation and will block incoming writes until the space is available to continue (safely) writing in the log.

Dirty Buffer Pool Pages

On the other side, we have dirty buffers.  These two numbers are relevant from the BUFFER POOL AND MEMORY section of SHOW ENGINE INNODB STATUS:

Database pages          65530
...
Modified db pages       3

So we have 3 pages that have modified data in them, and that (in this case) is a very small percentage of the total buffer pool.  A page in Innodb contains rows, indexes, etc., while a transaction may modify 1 or millions of rows.  Also realize that a single modified page in the buffer pool may contain modified data from multiple transactions in the transaction log.

As I said before, dirty pages are flushed to disk in the background.  The order in which they are flushed really has little to nothing to do with the transaction they are associated with, nor with the position associated with their modification in the transaction log.    The effect of this is that as the thread managing the dirty page flushing goes about its business, it is not necessarily flushing to optimize the Checkpoint age, it is flushing to try to optimize IO and to obey the LRU in the buffer pool.

Since buffers can and will be flushed out of order, it may be the case that there are a lot of transactions in the transaction log that are fully flushed to disk (i.e., all the pages associated with said transaction are clean), but there still could be  older transactions that are not flushed.  This, in essence, is what fuzzy checkpointing is.

The checkpoint process is really a logical operation.  It occasionally  (as chunks of dirty pages get flushed) has a look through the dirty pages in the buffer pool to find the one with the oldest LSN, and that’s the Checkpoint.  Everything older must be fully flushed.

The main reason this is important is if the Checkpoint Age is not a factor in dirty buffer flushing, it can get too big and cause stalls in client operations:  the algorithm that decides which dirty pages to flush does not optimize for this [well] and sometimes it is not good enough on its own.

So, how can we optimize here?  The short of it is: make innodb flush more dirty pages.  However, I can’t help but wonder if some tweaks could be made to the page flushing algorithm to be more effective there in choosing older dirty pages.   It is clear how that algorithm works without reading the source code.

There are a lot of ways to tune this, here is a list of the most signficant, roughly ordered from oldest to newest, and simultaneously listed from least effective to most effective:

  • innodb_max_dirty_pages_pct:  This attempts to keep the percentage of dirty pages under control, and before the Innodb plugin this was really the only way to tune dirty buffer flushing.  However, I have seen servers with 3% dirty buffers and they are hitting their max checkpoint age.  The way this increases dirty buffer flushing also doesn’t scale well on high io subsystems, it effectively just doubles the dirty buffer flushing per second when the % dirty pages exceeds this amount.
  • innodb_io_capacity: This setting, in spite of all our grand hopes that it would allow Innodb to make better use of our IO in all operations, simply controls the amount of dirty page flushing per second (and other background tasks like read-ahead).  Make this bigger, you flush more per second.  This does not adapt, it simply does that many iops every second if there are dirty buffers to flush.  It will effectively eliminate any optimization of IO consolidation if you have a low enough write workload (that is, dirty pages get flushed almost immediately, we might be better off without a transaction log in this case).  It also can quickly starve data reads and writes to the transaction log if you set this too high.
  • innodb_write_io_threads: Controls how many threads will have writes in progress to the disk.   I’m not sure why this is still useful if you can use Linux native AIO.  These can also be rendered useless by filesystems that don’t allow parallel writing to the same file by more than one thread (particularly if you have relatively few tables and/or use the global tablespaces) *cough ext3 cough*.
  • innodb_adaptive_flushing: An Innodb plugin/5.5 setting that tries to be smarter about flushing more aggressively based on the number of dirty pages and the rate of transaction log growth.
  • innodb_adaptive_flushing_method:  (Percona Server only)  This adds a few new different algorithms, but the more effective ones adjust the amount of dirty page flushing using a formula that considers the Checkpoint age and the Checkpoint age target (something you can set in Percona Server, otherwise it is effectively just a hair under the Max Checkpoint age).  The two main methods here would be ‘estimate’ (good default) and ‘keep_average’ designed for SSD type storage.  Running Percona Server and using this method is my go-to setting for managing Checkpoint age.
To be clear, the best we have today (IMO) is using innodb_adaptive_flushing_method in Percona server.
In any case, if you run any kind of production MySQL server, you should be monitoring your Checkpoint age and your Innodb dirty pages and try to see the relationship with those values and your write operations on disk.  The additional controls in 5.5 and Percona server are excellent reasons to consider an upgrade.
Nov
15
2010
--

Shard-Query adds parallelism to queries

Preamble: On performance, workload and scalability:
MySQL has always been focused on OLTP workloads. In fact, both Percona Server and MySQL 5.5.7rc have numerous performance improvements which benefit workloads that have high concurrency. Typical OLTP workloads feature numerous clients (perhaps hundreds or thousands) each reading and writing small chunks of data. The recent improvements to MySQL make it scale better for this workload when more resources (such as additional CPUs) are added. By scaling better I mean that it is able to take advantage of all available resources in order to handle the workload. One reason that it works best for this workload is that a single query executes in a single thread. MySQL never takes advantage of more than a single CPU when aggregating data and fetching rows from the buffer pool, with respect to a single query, but it can run many queries at once.

There are workloads other than OLTP and the recent optimizations to MySQL still leave a lot of low hanging fruit where these are concerned. This is particularly true for OLAP workloads. While I’m not going to diverge into a discussion of how OLAP varies from OLTP, it suffices to say that a typical OLAP workload features a low number of concurrent queries which each examine large amounts of data. Since a single query is single threaded in MySQL, the new optimizations don’t really help with this workload.

The following tests assume a workload consisting of a small number of concurrent queries (or only one) to demonstrate how much improvement could be made to MySQL so that is could better utilize all available resources (that is, scale better) when running small numbers of queries which examine a lot of data.

What is Shard-Query?
Shard-Query was initially conceived as a utility to add parallelism to horizontally partitioned data sets by running queries against each host in parallel, with the added feature of supporting aggregation. Then I hit upon the idea of taking SQL constructs like IN and BETWEEN and making these queries execute in parallel on a each host. If you have a sharded data set, then this gives you the opportunity for additional parallelism for each query. If you have only a single server, but it has enough resources to answer queries in parallel, then it can be used to add parallelism to queries which use IN or BETWEEN clauses. This added parallelism can have significant performance advantages as demonstrated in this blog post.

Many database servers can add this parallelism natively, but most of those are not open source. In the future, Shard-Query can be extended to other database servers such as PostgreSQL or Firebird fairly easily.

What machine did I use?

I used MySQL 5.5.7rc on a powerful Cisco UCS server with 12 real cores and 384GB of ram. The amount of ram is significantly larger than my already hefty 55GB testing data set, so this means that if MySQL could fully utilize all cores for my workload, then this test would be CPU bound.

What data did I use?
I loaded 55GB of the same data used in this earlier post into a partitioned InnoDB table.

Partitioning example:

SQL:

  1. /*!50500 PARTITION BY RANGE  COLUMNS(FlightDate)
  2. (PARTITION p2 VALUES LESS THAN (‘1988-01-01’) ENGINE = InnoDB,
  3. PARTITION p3 VALUES LESS THAN (‘1988-02-01’) ENGINE = InnoDB,
  4. PARTITION p253 VALUES LESS THAN (‘2008-12-01’) ENGINE = InnoDB,
  5. PARTITION p254 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */

What queries did I use?
I used a version of the queries in that same blog post. The original queries tend to filter on the Year column. I partitioned the table into months using the FlightDate column using the improved MySQL 5.5 partitioning options which work directly on columns without the need to use TO_DAYS. To accommodate my partitioning schema I modified the queries to use the FlightDate column instead of the Year column. See the “full disclosure” section at the end for the complete SQL.

These tests were done using ‘run_query.php’, which is the example application which comes with Shard-Query. As the name implies, it takes a list of queries in a file (or stdin) and a config file. It runs the SQL via ShardQuery and it prints the results.

Test #1
This set of queries tests the most basic aggregation (count(*)) on a range of records. This table is partitioned by month which means that MySQL can use partition pruning to reduce the amount of data which must be examined. With this in mind, I modified Vadim’s queries to use the FlightDate column in the WHERE clause instead of Year.

Each iteration reads an additional year of data. That is, the first query reads one year, and the last query reads 21 years of collected flight data.

For example (the final query):

SQL:

  1. SELECT count(*)
  2.    FROM ontime.ontime 
  3.  WHERE FlightDate BETWEEN ‘1988-01-01’ AND ‘2008-12-31’;


Graph shows shard query is more scalable than regular MySQL

.

Period

No Shard-Query

16 worker

24 worker

32 worker

No Shard-Query

16 worker

24 worker

32 worker

.

1 year

4.548

1.044

0.854

0.852

4.548445224762

1.0439739227295

0.85434007644653

0.85227012634277

.

2 year

8.408

1.769

0.98

0.931

8.4081881046295

1.7690436840057

0.9801869392395

0.93061804771423

.

3 year

11.97

2.225

0.975

1.276

11.970124006271

2.2246358394623

0.97536420822144

1.2755739688873

.

4 year

16.261

2.98

1.393

1.427

16.260906934738

2.9798262119293

1.392954826355

1.4265050888062

.

5 year

20.567

3.905

1.834

1.83

20.566961050034

3.9051797389984

1.8339827060699

1.829803943634

.

6 year

24.186

4.234

1.881

1.92

24.186297893524

4.2335538864136

1.8810119628906

1.9195370674133

.

7 year

27.589

4.833

2.248

2.143

27.589097976685

4.8326716423035

2.2480010986328

2.1432509422302

.

8 year

32.148

5.612

2.447

2.496

32.147665023804

5.6120498180389

2.4470851421356

2.4959940910339

.

9 year

35.904

6.205

2.848

2.739

35.903709173203

6.2054171562195

2.8475351333618

2.7391619682312

.

10 year

40.56

7.024

3.062

3.101

40.560017108917

7.0244588851929

3.0623469352722

3.1008150577545

.

11 year

44.4

7.596

3.139

3.13

44.400232791901

7.5963890552521

3.1386148929596

3.1295669078827

.

12 year

48.726

8.184

3.486

3.464

48.725682020187

8.1844367980957

3.4864890575409

3.4639980792999

.

13 year

53.847

8.929

3.656

3.597

53.846739053726

8.9290680885315

3.6557168960571

3.5969679355621

.

14 year

58.199

9.586

4.301

4.24

58.198734998703

9.5861990451813

4.3012638092041

4.240373134613

.

15 year

59.211

10.097

4.368

4.4

59.211224794388

10.097247838974

4.3683612346649

4.4004030227661

.

16 year

64.325

10.826

4.72

4.725

64.325300931931

10.825580835342

4.7198491096497

4.7251362800598

.

17 year

70.181

11.815

5.061

4.898

70.18121099472

11.814760923386

5.0613157749176

4.8978409767151

.

18 year

75.424

12.564

5.237

5.134

75.42404794693

12.564003944397

5.2374458312988

5.1342740058899

.

19 year

80.461

13.304

5.459

5.447

80.4611120224

13.304013967514

5.4592618942261

5.4474551677704

.

20 year

86.625

14.298

5.749

5.704

86.625191688538

14.297990083694

5.7486560344696

5.7037448883057

.

21 year

89.091

14.738

6.08

6.032

89.090814113617

14.73762011528

6.0798182487488

6.0321369171143

The reason that Shard-Query performs better is that it turns the OLAP query into something more like OLTP. Instead of getting one big chunk of data in one query, it runs many smaller queries each requesting significantly less data. On the other hand, MySQL 5.5.7 does not do this on its own. This is the low hanging fruit I was talking about. Even though the data is partitioned, MySQL will examine each partition serially. In the end, this means that things get slower as the query has to examine larger volumes of data.

Regarding the performance of Shard-Query, this machine has 12 real cores and 12 virtual cores, so we don’t see any advantage after increasing the number of workers past 24. The query becomes CPU bound at that point. If I needed more performance I could divide the data between two or more shards, or if possible, I could add more CPU cores. Regardless, even with a single server Shard-Query will perform much better than regular MySQL as the volume of data grows. Remember that this workload fits entirely in the buffer pool so adding CPUs will help only until we run out of memory bandwidth.

Test #2
My second test involved the next four queries on Vadim’s list. The purpose of this test is to demonstrate that Shard-Query works with GROUP BY and other constructs.

SQL:

  1. Q1 – SELECT count(*) FROM ontime.ontime WHERE FlightDate BETWEEN ‘1988-01-01’ AND ‘1988-12-31’;
  2.  
  3. Q2 – SELECT DayOfWeek, count(*) AS c FROM ontime WHERE FlightDate BETWEEN ‘2000-01-01’ AND ‘2008-12-31’ GROUP BY DayOfWeek ORDER BY c DESC;
  4.  
  5. Q3 – SELECT Origin, count(*) AS c FROM ontime WHERE DepDelay>10 AND FlightDate BETWEEN ‘2000-01-01’ AND ‘2008-12-31’  GROUP BY Origin ORDER BY c DESC LIMIT 10;
  6.  
  7. Q4 – SELECT carrier, count(*) FROM ontime WHERE DepDelay>10 AND FlightDate BETWEEN ‘2007-01-01’ AND ‘2007-12-31’ GROUP BY carrier ORDER BY 2 DESC;


Comparing the performance of four queries at 16 workers


.

Query Number

No shard-query

16 workers

No shard-query

16 workers

.

1

4.661

0.86

4.660569190979

0.8596088886261

.

2

84.362

19.355

84.362482070923

19.355075120926

.

3

128.603

33.271

128.60283708572

33.270862102509

.

4

8.686

1.929

8.68621301651

1.9287579059601


As you can see, each of the queries runs significantly faster than just running the SQL via MySQL.

The remainder of Vadim’s queries use subqueries in the FROM clause, which Shard-Query does not yet add parallelism to. I plan to add support for those queries though, and I’ll post a follow-up when I do.
-- Full Disclosure --
This is a text file containing the information.


Entry posted by Justin Swanhart |
12 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com