It is always hard to select a CPU for your own purpose. You could waste hours reviewing different benchmarks, reviews, and bloggers, and in the end, we would limit all our requirements to performance and price. For performance measuring we already have some specific metrics (e.g. in MHz to some specific tool), however, for economic comparison, it is quite hard. Mostly we are limited by our budget. Again, for our personal purposes, we are limited only with the cash in our pockets. It is easy to compare only two or three CPUs; it is required just to compare their price and performance and then create a simple bar plot and then check the results. However, what do you do if you have at least three types of CPU, a different number of CPUs cores on board, and seven different scenarios? It was a challenge to do it for performance, and for economic efficiency, it has become a nightmare. For a one-time purchase, it should be easier than for the long-term and for rent (as we do for renting CPU on AWS).
Since October 2021, there have been three performance reviews for CPUs for MySQL (mostly it was comparing ARM with others):
Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 1)
Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 2)
Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 3)
I thought it was hard to visualize multiple scenarios for multiple processor types and multiple CPU amounts. The real challenge appeared when it was needed to compare the economical efficiency of these CPUs. There were four attempts to write this article. As a professional, at first, I wanted to show all numbers and details, because I didn’t want to be subjective, but rather allow readers to make decisions by themselves. In this case, that variant of those articles became so long and unreadable. So I’ve decided to simplify and present it without all the details (all details graphs and numbers readers could find on our GitHub arm_cpu_comparison_m5, csv_file_with_all_data_m5, arm_cpu_comparison_c5, csv_file_with_all_data_c5, arm_cpu_comparison_m6, csv_file_with_all_data_m6).
The main goal of this post is to show a general picture of the economical efficiency of different CPUs for MySQL in comparison to each other. The main value of this post is to provide performance and economical comparison of different CPUs for MySQL (AWS only). It should help readers to see alternatives for their existing solution in performance and see if it is possible to save some money using a similar EC2 with a different CPU. Also, it will be useful for everyone for planning a migration or planning infrastructure for the long term.
This post contains a lot of technical information based on a large amount of data. It tries to show the efficiency of all instances from previous test runs. So it would be a comparison of m5.* (Intel), m5a.* (AMD). m6g.* (Graviton), c5.* (Intel), c5a.* (AMD), c5g.* (Graviton), m6i.* (Intel), and m6a.* (AMD) types of instances.
In general, there could be a lot of questions about methodology and comparison approach, and I would be glad to hear (read) other opinions and try to do it much better and more efficiently.
The main idea was to find what EC2 (with what CPU type) is more economical to use and more effective from a performance point of view for MySQL For these purposes, we would show two main indicators:
- How many queries we could run during one hour (because all of us pay hourly for EC2 instances and because AWS shows hourly prices).
- How many queries could be done for one US dollar. It was needed to create some universal economical value. So I thought that performance for one USD should be universal for economical comparison.
All other conclusions would be the results of these indicators.
The next few points are required to describe the approach of testing and our test description.
DISCLAIMER
- Tests were run on the M5.*, M6I.*, C5.* (Intel), M5a.*, C5a.*, M6a.* (AMD), M6g.*,C6g.* (Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix). We select only the same class of instances without any additional upgrades like M5n (network) or M5d (with fast SSD). The main goal is to take the same instances with only differences in CPU types. The main goal was to calculate price efficiency with only one variable CPU type.
- Monitoring was done with Percona Monitoring and Management.
- OS: Ubuntu 20.04 LTS
- Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance. It was done to exclude any network impact. It could have minimal impact on performance results because all instances are in the same conditions, and results are comparable.
- Oracle MySQL Community Server — 8.0.26-0 — installed from official packages (it was installed from Ubuntu repositories).
- Load tool: sysbench — 1.0.18
- innodb_buffer_pool_size=80% of available RAM
- Test duration is five minutes for each thread and then 90 seconds warm down before the next iteration.
- Each test was at least run three times (to smooth outliers / to have more reproducible results). Then results were averaged for graphs.
- We are going to use a “high-concurrency” scenario definition for scenarios when the number of threads would be bigger than the number of vCPU. And “low-concurrent” scenario definition with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
- We were comparing MySQL behavior on the same class of EC2, not CPU performance. This time we just want to know how many queries could be done for one US dollar and during one payment hour.
- The post is not sponsored by any external company. It was produced using only Percona resources. We do not control what AWS uses as CPU in their instances, we only operate with what they offer.
- Some graphs are simplified, to make them easy to understand. There are too many conditions to visualize in one graph. Each one that is simplified is commented on directly.
TEST case description
Prerequisite:
To use only CPU (without disk and network) we decided to use only read queries from memory. To do this we did the following actions.
Create DB with 10 tables with 10 000 000 rows each table using sysbench tool%
sysbench oltp_read_only –threads=10 –mysql-user=sbtest –mysql-password=sbtest –table-size=10000000 –tables=10 –db-driver=mysql –mysql-db=sbtest prepare
Load all data to LOAD_buffer with reading queries , using sysbench:
sysbench oltp_read_only –time=300 –threads=10 –table-size=1000000 –mysql-user=sbtest –mysql-password=sbtest –db-driver=mysql –mysql-db=sbtest run
Test:
Run in a loop for same scenario but with different concurrency THREADs (2,4,8,16,32,64,128) on each EC2 (again using sysbench tool)
sysbench oltp_read_only –time=300 –threads=${THREAD} –table-size=100000 –mysql-user=sbtest –mysql-password=sbtest –db-driver=mysql –mysql-db=sbtest run
Result Overview
It was decided to visualize results in some specific lollipops graphs. These graphs will show at least both variables – performance per hour and performance per dollar.
Also, there are simple point plots with several dimensions. It will show not only point values but also the type of CPU (by color) and the number of vCPU (by shape).
All CPU colors will be the same throughout this article.
- Graviton – orange
- Intel – blue
- AMD – red
To simplify visualization it was decided to leave only results when the load (in the number of active threads) had been equal to the number of vCPU on an EC2 instance. In most cases, it shows the best results, due to minimal 95th percentile latency. However, there were a few exceptions, which we will talk about later.
Request Per Hour vs. Price For Equal Load
First, let’s review simple dependency price and performance at plot 1.1.
Plot 1.1. Number of requests per hour compared to EC2 instance price
Plot 1.2. Number of requests per hour compared to EC2 instance price with EC2 labels
Plot 1.2. illustrates the same information as plot 1.1. with EC2 labels.
Request Per Dollar vs. Price
Plot 2.1. Number of requests per one USD comparing with instance price for equal load
Plot 2.1. Showed an approximate number of transactions that could be generated for one USD. it shows much more interesting pictures than plot 1.1 Data shows how many queries MySQL could execute for one USD. Looks like the best economic performance shows 16 cores EC2, next goes 32 Intel and AMD, then went eight cores AMD. What is interesting here is that for one USD, two cores AMD could execute a little bit more queries than 64 cores Intels EC2 per one USD. It is definitely that 64 cores Intel could execute more queries for an hour, but not always it is required to do it in one hour.
Plot 2.2. Number of requests per one USD comparing with instance price for equal load with EC2 instance labels
RATING
Plot 3.1. All EC2 sorted by approximate amount of transactions they could generate during one hour.
Plot 3.1. Illustrates two variables. The number of transactions each EC2 could generate for one hour (purple circle) and the number of executed transactions it could execute for one USD. This rating is sorted by performance. And there is no surprise more virtual cores exist in EC2 more transactions it could generate. On the top is the latest m6i.16xlarge (Intel) and m6a.16xlarge (AMD). It is the latest “general-purpose” instance. What is interesting here is that on the third and fourth place it is seen the same Graviton vCPU but in two different instance types c5g.16xlarge (third place) and m6g.16xlarge. Looks like a “compute-optimized” instance really has some internal optimization, because on average it showed better performance than a general-purpose Graviton instance.
Plot 3.2. All EC2 sorted by approximate amount of transaction they could generate for one USD
Plot 3.2. Illustrates two variables. The number of transactions each EC2 could generate for one hour (purple circle) and the number of executed transactions it could execute for one USD. This rating is sorted by economic efficiency. And here we got a surprise: it appears that the best economic efficiency has EC2 instances with 16 and 32 vCPU on board. On the top there is m6a.4xlarge (AMD, 16 vCPU) which is a little more efficient than m6i.4xlarge (Intel, 16 vCPU), however, Intel, was a little bit performance efficient. In third place, m6i.8xlarge (Intel with 32 vCPU) was a little less economically efficient, than EC2 from second place. And Graviton is only in fourth place. However, these results are valid only, when the load was equal to the number of vCPU. Is important because performance Intel and AMD vs Graviton have absolutely different results. In most cases, Intel and AMD had maximum performance when the load was equal (additional visualization would be provided next on plot 5.1. and plot 5.2.).
How I Would Select a CPU For The Next Project
The next words can’t be an official recommendation; just the option of the person to be stuck in performance data of test results and spend a few months here.
To select some vCPU for MySQL I would be oriented on my previous research. At first, I would focus on the load. How many transactions per second (or per hour) my DB should handle. And after that, I would select the cheapest EC2 instance for that load.
For example, my DB should handle 500 Million transactions per hour. In this case, I would build some graphs with the cheapest instances from different CPU developers. And then just select the cheapest one.
Plot 4.1. Cheapest EC2 instances that can handle 500 million transactions per hour
Plot 4.1. Showed the cheapest instance that could handle 500 million transactions per hour. These results could be reached by overloading the system eight times. This load could handle EC2 with 16 vCPU and they easily could handle this load even if there would be a load with 128 active threads. It is talking only about reading translations right now. And we’re talking about an hour because most of us are oriented on hourly price on AWS, so it should be oriented on hourly load, even if it is not constant value during the hour.
However, let’s review the same example for load in transaction per second. The approach would be the same. Take your load and find the cheapest instance that could handle your load. For example, let’s take a load of 10,000 transactions per second (a kindly reminder that we are talking about read transactions).
Plot 4.2. Cheapest EC2 instances that can handle 10,000 transactions per second
Plot 4.2. Showed that 10k transactions per second could be handled by two vCPU compute-optimized instances – c5.large(Intel), c5a.large(AMD), c6g.large(Graviton). Again Graviton became cheaper. It is cheaper than Intel by 20 percent and 9 percent compared with AMD.
The short table you could find in the appendix, full one on Github (with all scenarios).
But if someone doesn’t want to build the graph or analyze the table, I’ve built graph (almost heatmap) plot 4.3 (transactions per hour) and plot 4.4 (transaction per second).
The next plots show the cheapest EC2 instances that could handle some load (on the y-axis) for some specific class of these instances depending on the number of vCPU (on the x-axis). By color, it is easy to identify the type of CPU, but it was labeled the cheapest EC2 instance in the cell. Of course, some other instance could also handle that load but in the cell label of the cheapest one.
Short summary regarding plot 4.3. and plot 4.4. Graviton is the cheapest solution in most cases, however, it can’t handle the maximal load that Intel or AMD can.
Plot 4.3. The cheapest EC2 instances for a particular load in transaction per hour depends on the number of vCPU
Plot 4.4. The cheapest EC2 instances for a particular load in transaction per second depends on the number of vCPU
In case someone wants to identify the cheapest instance for some particular load and doesn’t care about vCPU onboard – welcome to plot 4.4.1., which shows the cheapest EC2 for some load when the load in an active thread was equal to the number of vCPU on board.
Plot 4.4.1. Cheapest EC2 instance for required load with a load that was equal to the number of vCPU on an instance
Plot 4.4.2. Cheapest EC2 instance for required load with a load that was maximal during research
There are not a lot of differences between plot 4.4.1 and plot 4.4.2. However, sometimes Intel overran AMD. But in the overall picture, Graviton is still cheaper in most cases.
Important Exceptions
Next, there were a few exceptions that are required to talk about. Plots and scenarios were taken from its particular research, so it could be that they are not equal to the picture above. All details will be provided.
Plot 5.1. Graviton behavior on higher load
Plot 5.1. Illustrates that Graviton (m6g.16xlarge with 64 vCPU) showed better performance on higher loads. Previously all results were shown when loads were equal to the amount of vCPU. However, most CPUs did not show impressive performance with loads bigger than the number of vCPU. On the other hand, Graviton (most of the time) showed better performance than on equal load. An example of it you could see on the first two lines on plot 5.1. This is a very interesting feature of Gravitons, and this feature is reproducible. On plot 5.1.1. it is seen that Graviton on EC2 with 16, 32, 64 vCPU on board produces more transactions on double load than on equal load. In percentages, it is an additional boost of 10 percent when we overload Graviton EC2 compared with other CPUs, and their result could be a statistical error.
Plot 5.1.1 Performance comparison of high-performance EC2 instances with an equal and double load
Plot 5.1.2 Advantage of high concurrency instances with double load over equal load in percents
Plot 5.2. Economical efficiency of 8 and 16 cores EC2
The next interesting exception is shown in plot 5.2. In the case of different loads (not only when load max or equal to the number of vCPU), Graviton also showed the best economic efficiency compared with all other vCPUs. On plot 5.2. I left only the results with maximal load and we could see that Graviton had the best economical potential. What is more interesting is that all EC2 with 8 and 16 vCPU on board were on the top of this rating. Looks like it is more economically effective to use 8 or 16 core instances than others. If the load near 200k per second (read transaction) is fine for more than 16 cores, EC2 instances are the best economic value for you (look at plot 4.4.2).
Plot 5.4. Economical efficiency of 12 core Intel vs 16 core Graviton and AMD
Sometimes some particular load could handle instances with fewer vCPU, but even in this case, it could be more expensive than using EC2 with more vCPU. Plot 5.4. showed this example. It was the maximum load that all three CPUs could handle over 2.1 billion transactions. EC2 instances that could handle it are c6g.16xlarge (Graviton with 64 vCPU), m6i.12xlarge (Intel with 48 vCPU), and m6a.16xlarge (AMD with 64 vCPU). Here EC2 with AMD appeared to be more expensive than its competitors. Next is Intel with less vCPU onboard and cheaper price, 48 vCPU compared to AMDs 64. However, Gravitons EC2 with 64 cores on board could handle the same load while cheaper than Intel with less vCPU. It could be done few conclusions:
- The number of CPUs does not always correlate with higher performance
- It is possible to find a better price and better conditions
Final Thoughts
I’ve spent a few weeks preparing a script to run benchmark tests. It took a week to run and re-run all the benchmark tests. And it took months to write this article. Multiple attempts to describe everything lead me to this show article with a lot of limitations and ranges. It is a really difficult thing to speak about difficult things in easy matters. It is easy to compare one dimension, but it is harder to compare multiple dimensions like performance for different CPU types which depend on different numbers of vCPU and in different test cases. But even this is easier because the previous time we compared performance to performance. This time it was required to compare multi-dimensional performance to economic efficiency and prices. It is like comparing the calories of different fruits with their prices and identifying the best one, without thinking about personal tastes.
This task became quite difficult for me personally. However, I think this is a good point to start a discussion about it. I’ve started thinking about its unique comparable measurement like the number of transactions for one USD. Based on this measurement, EC2 instances with Graviton CPU become most effective in most cases. It didn’t show equal performance measurements like the latest Intel and AMD, but if it joins the economy and performance it is definitely a good choice to try it in future DB projects.
PS: On our GitHub — there are scripts to reproduce this research and more interesting graphs, that couldn’t be inserted here.
APPENDIX
Simplified table with results and list of EC2 that were used in research
VM_type |
Number_of_threads |
cpu_amount |
avg_qps |
price_usd |
cpu_type |
c5a.large |
2 |
2 |
19287 |
0.077 |
AMD |
m5a.large |
2 |
2 |
12581 |
0.086 |
AMD |
m6a.large |
2 |
2 |
23280 |
0.0864 |
AMD |
c5a.xlarge |
4 |
4 |
29305 |
0.154 |
AMD |
m5a.xlarge |
4 |
4 |
21315 |
0.172 |
AMD |
m6a.xlarge |
4 |
4 |
37681 |
0.1728 |
AMD |
c5a.2xlarge |
8 |
8 |
81575 |
0.308 |
AMD |
m5a.2xlarge |
8 |
8 |
58396 |
0.344 |
AMD |
m6a.2xlarge |
8 |
8 |
98622 |
0.3456 |
AMD |
c5a.4xlarge |
16 |
16 |
158539 |
0.616 |
AMD |
m5a.4xlarge |
16 |
16 |
113172 |
0.688 |
AMD |
m6a.4xlarge |
16 |
16 |
211681 |
0.6912 |
AMD |
m5a.8xlarge |
32 |
32 |
189879 |
1.376 |
AMD |
m6a.8xlarge |
32 |
32 |
376935 |
1.3824 |
AMD |
c5a.16xlarge |
64 |
64 |
482989 |
2.464 |
AMD |
m5a.16xlarge |
64 |
64 |
312920 |
2.752 |
AMD |
m6a.16xlarge |
64 |
64 |
612503 |
2.7648 |
AMD |
c6g.large |
2 |
2 |
17523 |
0.068 |
Graviton |
m6g.large |
2 |
2 |
17782 |
0.077 |
Graviton |
c6g.xlarge |
4 |
4 |
30836 |
0.136 |
Graviton |
m6g.xlarge |
4 |
4 |
31415 |
0.154 |
Graviton |
c6g.2xlarge |
8 |
8 |
61517 |
0.272 |
Graviton |
m6g.2xlarge |
8 |
8 |
65521 |
0.308 |
Graviton |
c6g.4xlarge |
16 |
16 |
156914 |
0.544 |
Graviton |
m6g.4xlarge |
16 |
16 |
155558 |
0.616 |
Graviton |
m6g.8xlarge |
32 |
32 |
298258 |
1.232 |
Graviton |
c6g.16xlarge |
64 |
64 |
542983 |
2.176 |
Graviton |
m6g.16xlarge |
64 |
64 |
534836 |
2.464 |
Graviton |
c5.large |
2 |
2 |
19751 |
0.085 |
Intel |
m5.large |
2 |
2 |
17836 |
0.096 |
Intel |
m6i.large |
2 |
2 |
23012 |
0.096 |
Intel |
c5.xlarge |
4 |
4 |
33891 |
0.17 |
Intel |
m5.xlarge |
4 |
4 |
33937 |
0.192 |
Intel |
m6i.xlarge |
4 |
4 |
40156 |
0.192 |
Intel |
c5.2xlarge |
8 |
8 |
81039 |
0.34 |
Intel |
m5.2xlarge |
8 |
8 |
68327 |
0.384 |
Intel |
m6i.2xlarge |
8 |
8 |
86793 |
0.384 |
Intel |
c5.4xlarge |
16 |
16 |
178295 |
0.68 |
Intel |
m5.4xlarge |
16 |
16 |
162387 |
0.768 |
Intel |
m6i.4xlarge |
16 |
16 |
225371 |
0.768 |
Intel |
m5.8xlarge |
32 |
32 |
313932 |
1.536 |
Intel |
m6i.8xlarge |
32 |
32 |
443327 |
1.536 |
Intel |
m5.16xlarge |
64 |
64 |
483716 |
3.072 |
Intel |
m6i.16xlarge |
64 |
64 |
803180 |
3.072 |
Intel |
My.cnf
My.cnf
[mysqld]
ssl=0
performance_schema=OFF
skip_log_bin
server_id = 7
# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000
join_buffer_size=256K
sort_buffer_size=256K
# files
innodb_file_per_table
innodb_log_file_size=2G
innodb_log_files_in_group=2
innodb_open_files=4000
# buffers
innodb_buffer_pool_size=${80%_OF_RAM}
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_log_buffer_size=64M
default_storage_engine=InnoDB
innodb_flush_log_at_trx_commit = 1
innodb_doublewrite= 1
innodb_flush_method= O_DIRECT
innodb_file_per_table= 1
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_flush_neighbors=0
max_prepared_stmt_count=1000000
bind_address = 0.0.0.0
[client]