Jan
17
2022
--

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 3)

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL

Comparing Graviton (ARM) Performance to Intel and AMD for MySQLRecently we published the first part (m5, m5a, m6g) and the second part (C5, C5a, C6g) of research regarding comparing Graviton ARM with AMD and Intel CPU on AWS. We selected general-purpose EC2 instances with the same configurations (amount of vCPU in the first part). In the second part, we compared compute-optimized EC2 instances with the same conditions. The main goal was to see the trend and make a general comparison of CPU types on the AWS platform only for MySQL. We didn’t set the goal to compare the performance of different CPU types. Our expertise is in MySQL performance tuning. We share research “as is” with all scripts, and anyone interested could rerun and reproduce it.
All scripts, raw logs and additional plots are available on GitHub: 

(arm_cpu_comparison_m5, csv_file_with_all_data_m5,

arm_cpu_comparison_c5csv_file_with_all_data_c5

arm_cpu_comparison_m6, csv_file_with_all_data_m6). 

We were happy to see the reactions from our Percona Blog readers to our research. And we are open to any feedback. If anyone has any ideas on updating our methodology, we would be happy to correct it. 

This post is a continuation of research based on our interest in general-purpose EC2 (and, of course, because we saw that our audience wanted to see it). The main inspiration for this research was the feedback of our readers that we compared different generations of instances, especially old AMD instances (m5a.*), and compared it with the latest Graviton instances (m6g.*).  Additionally, we also decided to use the latest Intels instances (m6i.*) too.

Today, we will talk about (AWS) the latest general-purpose EC2: M6i, M6a, M6g (complete list in appendix). 

Short Conclusion:

  1. In most cases for m6i, m6g, and m6a instances, Intel shows better performance in throughput for MySQL read transactions. However, AMD instances are pretty close to Intel’s results.
  2. Sometimes Intel could show a significant advantage — more than almost 200k rps (almost 45% better) than Graviton. However, AMD’s gap wasn’t as significant as in previous results.
    Unfortunately, we compared Graviton with others. So we didn’t concentrate on comparing AMD with Intel. 
  3. If we could say in a few words: m6i instances (with Intel)  are better in their class than other m6a, m6g instances (in performance for MySql). And this advantage starts from 5%-10% and could be up to 45% compared with other CPUs.
  4. But Gravitons instances are still cheaper

Details, or How We Got Our Short Conclusion:

Disclaimer:

  1. Tests were run  on M6i.* (Intel) , M6a.* (AMD),  M6g.*(Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix). It was selected using only the same class of instances without additional upgrades. The main goal is to take the same instances with only differences in CPU types and identify their performance for MySQL.
  2. Monitoring was done with Percona Monitoring and Management (PMM).
  3. OS: Ubuntu 20.04 LTS 
  4. Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance.
  5. Oracle MySQL Community Server — 8.0.26-0 — installed from official packages (it was installed from Ubuntu repositories).
  6. Load tool: sysbench —  1.0.18
  7. innodb_buffer_pool_size=80% of available RAM
  8. Test duration is five minutes for each thread and then 90 seconds cool down before the next iteration. 
  9. Tests were run four times independently (to smooth outliers / to have more reproducible results). Then results were averaged for graphs. Also, graphs show min and max values that were during the test, which shows the range of variance. 
  10. We are going to use the “high-concurrency” scenario definition for scenarios when the number of threads would be bigger than the number of vCPU. And “low-concurrent” scenario definition with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
  11. We are comparing MySQL behavior on the same class of EC2, not CPU performance.
  12. We got some feedback regarding our methodology, and we would update it in the next iteration, with a different configuration, but for this particular research we leave previous to have possibility compare “apples to apples”.
  13. The post is not sponsored by any external company. It was produced using only Percona resources. We do not control what AWS uses as CPU in their instances, we only operate with what they offer. 

Test Case:

Prerequisite:

To use only CPU (without disk and network) we decided to use only read queries from memory. To do this we did the following actions. 

1. Create DB with 10 tables with 10 000 000 rows each table

sysbench oltp_read_only --threads=10 --mysql-user=sbtest --mysql-password=sbtest --table-size=10000000 --tables=10 --db-driver=mysql --mysql-db=sbtest prepare

2. Load all data to LOAD_buffer 

sysbench oltp_read_only --time=300 --threads=10 --table-size=1000000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Test:

3. Run in a loop for same scenario but  different concurrency THREAD (1,2,4,8,16,32,64,128) on each EC2 

sysbench oltp_read_only --time=300 --threads=${THREAD} --table-size=100000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Results:

Result reviewing was split into four parts:

  1. For “small” EC2 with 2, 4, and 8 vCPU
  2. For “medium” EC2 with 16  and 32 vCPU
  3. For  “large” EC2 with 48 and 64 vCPU
  4. For all scenarios to see the overall picture.

There would be four graphs for each test:

  1. Throughput (queries per second) that EC2 could perform for each scenario (number of threads)
  2. Latency 95 percentile that  EC2 could perform for each scenario, (number of threads)
  3. Relative comparing Graviton and Intel, Graviton, and AMD
  4. Absolute comparing Graviton and Intel, Graviton and AMD

Validation that all load goes to the CPU, not to DISK I/O or network, was done also using PMM (Percona Monitoring and Management). 

perona monitoring and management

pic 0.1. OS monitoring during all test stages

 

Result for EC2 with 2, 4, and 8 vCPU:

Result for EC2 with 2, 4, and 8 vCPU

Plot 1.1.  Throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.2.  Latencies (95 percentile) during the test for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.4.2. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

OVERVIEW:

  1. Based on plot 1.1. We could say that EC2 with Intel hasn’t an absolute advantage compared with Graviton and AMD. 
  2. Especially Intel and AMD, showing an advantage a little bit over – 20% over Graviton.
  3. In numbers, it is over five thousand and more requests per second. 
  4. AMD showed better results for two vCPU instances. 
  5. And it looks like in M6 class of Gravitons CPUs show the worst result compared with others.

Result for EC2 with 16 and 32 vCPU:

Result for EC2 with 16 and 32 vCPU

Plot 2.1. Throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 2.2. Latencies (95 percentile) during the test for EC2 with 16 and 32 vCPU for scenarios with 1,2 4,8,16,32,64,128 threads

 

Plot 2.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 2.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 2.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 2.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

OVERVIEW:

  1. Plot 2.1 shows that Intel vCPU is more performance efficient. AMD is the second, and Graviton is the third. 
  2. According to plots 2.3.1 and 2.3.2, Intel is better than Graviton up to 30 % and AMD is better than Graviton up to 20%. Graviton has some exceptional performance advantage over  AMD in some scenarios. But with this configuration and this instance classes, it is an exception according to the plot 2.3.2 scenarios for 8 and 16 concurrent threads. 
  3. In real numbers, Intel could execute up to 140 k read transactions more than Graviton CPUs, and AMD could read more than 70 k read transactions than Graviton. (plot 2.1. , plot 2.4.1.)
  4. In most cases, AMD and Intel are better than Graviton EC2 instances (plot 2.1, plot 2.3.2, plot 2.4.2).

 

Result for EC2 with 48 and 64 vCPU:

Result for EC2 with 48 and 64 vCPU

Plot 3.1. Throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 3.2. Latencies (95 percentile) during the test for EC2 with 48 and 64 vCPU for scenarios with 1,2 4,8,16,32,64,128 threads

 

Plot 3.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 3.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 3.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 3.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. For “Large” instances, Intel is still better than other vCPU. AMD was still in second place, except when Graviton stole some cases. (plot 3.1.)
  2. According to plot 3.3.1. Intel showed an advantage over Graviton up to 45%. On the other hand, AMD showed an advantage over Graviton up to 20% in the same case.
  3. There were two cases when Graviton showed some better results, but it is an exception. 
  4. In real numbers: Intel could generate over 150k-200k read transactions more than Graviton. And AMD could execute more than 70k – 130k read transactions than Graviton.

 

Full Result Overview:

Plot 4.1.1. Throughput (queries per second) – bar plot for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.1.2. Throughput (queries per second) – line plot for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.2.1. Latencies (95 percentile) during the test – bar plot for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.2.2. Latencies (95 percentile) during the test – line plot for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.3.2 Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.5.1. Percentage comparison INTEL and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.5.2. Numbers comparison INTEL and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

Final Thoughts

  1. We compare general-purpose EC2 (m6i, m6a, m6g) instances from the AWS platform and their behavior for MySQL.  
  2. In these competitions, Graviton instances (m6g) do not provide any competitive results for MySQL. 
  3. There was some strange behavior. AMD and Intel showed their best performance when loaded (in the number of threads) are equal numbers of vCPU. According to plot 4.1.2. We could see some jump in performance when the load becomes the same as the amount of vCPU. This point was hard to see on the bar chart. But this is very interesting. However, Graviton worked more slightly without any “jumps”, and that’s why it showed exceptionally better results in some scenarios with AMD.
  4. Last point. Everyone wants to see an AMD vs Intel comparison. Plot 4.5.1 and 4.5.2.  The result – Intel is better in most cases.  And AMD was better only in one case with 2 vCPU. So the advantage of Intel compared with AMD could rise up to 96% for “large instances” (in some cases). It is unbelievable. But in most cases, this advantage is that Intel could run in 30% more MySql read transactions than AMD.
  5. It is still an open question regarding the economic efficiency of all this EC2. We would research this topic and answer this question a little bit later.

APPENDIX:

List of EC2 used in research:

CPU type Cpu info:

Model name

EC2 Memory GB Amount vCPU EC2 price per hour (USD)
AMD AMD EPYC 7R13 Processor 2650 MHz m6a.large 8 2 $0.0864
AMD m6a.xlarge 16 4 $0.1728
AMD m6a.2xlarge 32 8 $0.3456
AMD m6a.4xlarge 64 16 $0.6912
AMD m6a.8xlarge 128 32 $1.3824
AMD m6a.12xlarge 192 48 $2.0736
AMD m6a.16xlarge 256 64 $2.7648
Graviton ARMv8 AWS Graviton2 2500 MHz m6g.large 8 2 $0.077 
Graviton m6g.xlarge 16 4 $0.154
Graviton m6g.2xlarge 32 8 $0.308
Graviton m6g.4xlarge 64 16 $0.616
Graviton m6g.8xlarge 128 32 $1.232
Graviton m6g.12xlarge 192 48 $1.848
Graviton m6g.16xlarge 256 64 $2.464
Intel Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz m6i.large 8 2 $0.096000
Intel m6i.xlarge 16 4 $0.192000
Intel m6i.2xlarge 32 8 $0.384000 
Intel m6i.4xlarge 64 16 $0.768000
Intel m6i.8xlarge 128 32 $1.536000
Intel m6i.12xlarge 192 48 $2.304000
Intel m6i.16xlarge 256 64 $3.072000

 

my.cnf:

[mysqld]

ssl=0

performance_schema=OFF

skip_log_bin

server_id = 7




# general

table_open_cache = 200000

table_open_cache_instances=64

back_log=3500

max_connections=4000

 join_buffer_size=256K

 sort_buffer_size=256K




# files

innodb_file_per_table

innodb_log_file_size=2G

innodb_log_files_in_group=2

innodb_open_files=4000




# buffers

innodb_buffer_pool_size=${80%_OF_RAM}

innodb_buffer_pool_instances=8

innodb_page_cleaners=8

innodb_log_buffer_size=64M




default_storage_engine=InnoDB

innodb_flush_log_at_trx_commit  = 1

innodb_doublewrite= 1

innodb_flush_method= O_DIRECT

innodb_file_per_table= 1

innodb_io_capacity=2000

innodb_io_capacity_max=4000

innodb_flush_neighbors=0

max_prepared_stmt_count=1000000 

bind_address = 0.0.0.0

[client]

 

Oct
25
2021
--

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 2)

Comparing Graviton Performance to Intel and AMD for MySQL

Comparing Graviton Performance to Intel and AMD for MySQLRecently we published the first part of research comparing Graviton (ARM) with AMD and Intel CPU on AWS. In the first part, we selected general-purpose EC2 instances with the same configurations (amount of vCPU).  The main goal was to see the trend and make a general comparison of CPU types on the AWS platform only for MySQL. We didn’t set the goal to compare the performance of different CPU types. Our expertise is in MySQL performance tuning. We share research “as is” with all scripts, and anyone interested could rerun and reproduce it.
All scripts, raw logs and additional plots are available on GitHub: (2021_10_arm_cpu_comparison_c5csv_file_with_all_data).

We were happy to see the reactions from our Percona Blog readers to our research, and we are open to any feedback. If anyone has any ideas on updating our methodology, we would be happy to correct it.

This post is a continuation of research based on our interest in compute-optimized EC2 (and, of course, because we saw that our audience wanted to see it). Today, we will talk about (AWS) Compute Optimized EC2: C5, C5a, C6g (complete list in appendix).

Next time we are going to share our findings on the economic efficiency of m5 and c5 instances.

Short Conclusion:

  1. In most cases for c5, c5a, and c6g instances, Intel shows better performance in throughput for MySQL read transactions.
  2. Sometimes Intel could show a significant advantage — more than almost 100k rps than other CPUs.
  3. If we could say in a few words: c5 instances (with Intel)  are better in their class than other c5a, c6g instances (in performance). And this advantage starts from 5% and could be up to 40% compared with other CPUs.

Details and Disclaimer:

  1. Tests were run  on C5.* (Intel) , C5a.* (AMD),  C6g.*(Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix.)
  2. Monitoring was done with PMM
  3. OS: Ubuntu 20.04 TLS 
  4. Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance.
  5. Oracle MySQL Community Server — 8.0.26-0 — installed from official packages.
  6. Load tool: sysbench —  1.0.18
  7. innodb_buffer_pool_size=80% of available RAM
  8. Test duration is 5 minutes for each thread and then 90 seconds warm down before the next iteration. 
  9. Tests were run 3 times (to smooth outliers / to have more reproducible results). Then results were averaged for graphs. 
  10. We are going to use a “high-concurrency” scenario definition for scenarios when the number of threads would be bigger than the number of vCPU. And “low-concurrent” scenario definition with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
  11. We are comparing MySQL behavior on the same class of EC2, not CPU performance.

Test Case:

Prerequisite:

1. Create DB with 10 tables with 10 000 000 rows each table

sysbench oltp_read_only --threads=10 --mysql-user=sbtest --mysql-password=sbtest --table-size=10000000 --tables=10 --db-driver=mysql --mysql-db=sbtest prepare

2. Load all data to LOAD_buffer

sysbench oltp_read_only --time=300 --threads=10 --table-size=1000000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

 

3. Test:

Run in a loop for same scenario but  different concurrency THREAD (1,2,4,8,16,32,64,128) on each EC2 

sysbench oltp_read_only --time=300 --threads=${THREAD} --table-size=100000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Results:

Result reviewing was split into three parts:

  1. For “small” EC2 with 2, 4, and 8 vCPU
  2. For “medium” EC2 with 16 and for  “large” EC2 with 48 and 64 vCPU (AWS does not have C5 EC2 with 64 vCPU )
  3. For all scenarios to see the overall picture.

There would be four graphs for each test:

  1. Throughput (queries per second) that EC2 could perform for each scenario (amount of threads).
  2. Latency 95 percentile that  EC2 could perform for each scenario (amount of threads).
  3. Relative comparing Graviton and Intel, Graviton, and AMD.
  4. Absolute comparing Graviton and Intel, Graviton, and AMD.

Validation that all load goes to CPU, not to DISK I/O, was done also using PMM (Percona Monitoring and Management). 

pic 0.1. OS monitoring during all test stages (picture is for example)

Result for EC2 with 2, 4, and 8 vCPU:

plot 1.1. Throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.2. Latencies (95 percentile) during the test for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.4.2. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. Based on plot 1.1, we could say that EC2 with Intel has an absolute advantage compared with Graviton and AMD. 
  2. This advantage in most scenarios fluctuates between 10% – 20%.
  3. In numbers, it is over 3,000 requests per second. 
  4. There is one scenario when Graviton becomes better EC2 with 8 vCPU (c6g.2xlarge). But the advantage is so tiny (near 2%) that it could be a statistical error. So we can’t say that benefits are relevant.

Result for EC2 with 16, 48 and 64 vCPU:

plot 2.1.  Throughput (queries per second)  for EC2 with 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.2.  Latencies (95 percentile) during the test for EC2 with 16, 48 and 64  vCPU for scenarios with 1,2 4,8,16,32,64,128 threads

 

plot 2.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16, 48 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16, 48 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. Plot 2.1 shows that it has an advantage over the other vCPU in our conditions (there are no EC2 with 64 Intel’s vCPU to have a full picture of comparison). 
  2. This advantage could be near 20% for EC2 with 16 vCPU and up to 40% for EC2 with 48 vCPU. However, it is possible to see that this advantage decreases with an increasing amount of threads. 
  3. In real numbers, Intel could execute up to 100 k read transactions more than other CPUs (plot 2.1. , plot 2.4.1).
  4. On the other hand, in one high-performance scenario, we could see a small advantage (3%) of Graviton. However, it is so small that it could be a statistical error again (plot  2.3.1.).
  5. In most cases, Graviton shows better results than AMD (plot 2.1, plot 2.3.2, plot 2.4.2).

Whole Result Overview:

plot 3.1. Throughput (queries per second) for EC2 with 2, 4, 8, 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.2.  Latencies (95 percentile) during the test for EC2 with 2, 4, 8, 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.3.1. Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16 and 48 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.3.2. Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16 AND 48 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Final Thoughts

  1. We compare compute-optimized ec2 (c5, c5a, c6g) instances from the AWS platform and their behavior for MySQL.  
  2. It is still an open question regarding the economic efficiency of all this EC2. We will research this topic and answer this question a little bit later.
  3. In these tests, AMD does not provide any competitive results for MySQL. It is possible that in other tasks, it could show much better and competitive results.

APPENDIX:

List of EC2 used in research:

CPU type EC2 Amount vCPU Memory GB EC2 price per hour (USD)
AMD c5a.large 2 4 0.077
AMD c5a.xlarge 4 8 0.154
AMD c5a.2xlarge 8 16 0.308
AMD c5a.4xlarge 16 32 0.616
AMD c5a.12xlarge 48 96 1.848
AMD c5a.16xlarge 64 128 2.464
Intel c5.large 2 4 0.085
Intel c5.xlarge 4 8 0.170
Intel c5.2xlarge 8 16 0.340
Intel c5.4xlarge 16 32 0.680
Intel c5.12xlarge 48 96 2.040
Graviton c6g.large 2 4 0.068
Graviton c6g.xlarge 4 8 0.136
Graviton c6g.2xlarge 8 16 0.272
Graviton c6g.4xlarge 16 32 0.544
Graviton c6g.12xlarge 48 96 1.632
Graviton c6g.16xlarge 64 128 2.176

 

my.cnf:

[mysqld]
ssl=0
performance_schema=OFF
skip_log_bin
server_id = 7

# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000
 join_buffer_size=256K
 sort_buffer_size=256K

# files
innodb_file_per_table
innodb_log_file_size=2G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size=${80%_OF_RAM}
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_log_buffer_size=64M

default_storage_engine=InnoDB
innodb_flush_log_at_trx_commit  = 1
innodb_doublewrite= 1
innodb_flush_method= O_DIRECT
innodb_file_per_table= 1
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_flush_neighbors=0
max_prepared_stmt_count=1000000 
bind_address = 0.0.0.0
[client]

 

Oct
15
2021
--

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL

Comparing Graviton (ARM) performance to Intel and AMD for MySQL

Comparing Graviton (ARM) performance to Intel and AMD for MySQLRecently, AWS presented its own CPU on ARM architecture for server solutions.

It was Graviton. As a result, they update some lines of their EC2 instances with new postfix “g” (e.g. m6g.small, r5g.nano, etc.). In their review and presentation, AWS showed impressive results that it is faster in some benchmarks up to 20 percent. On the other hand, some reviewers said that Graviton does not show any significant results and, in some cases, showed fewer performance results than Intel.

We decided to investigate it and do our research regarding Graviton performance, comparing it with other CPUs (Intel and AMD) directly for MySQL.

Disclaimer

  1. The test is designed to be CPU bound only, so we will use a read-only test and make sure there is no I/O activity during the test.
  2. Tests were run  on m5.* (Intel) , m5a.* (AMD),  m6g.*(Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix).
  3. Monitoring was done with Percona Monitoring and Management (PMM).
  4. OS: Ubuntu 20.04 TLS.
  5. Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance.
  6. MySQL– 8.0.26-0 — installed from official packages.
  7. Load tool: sysbench —  1.0.18.
  8. innodb_buffer_pool_size=80% of available RAM.
  9. Test duration is five minutes for each thread and then 90 seconds warm down before the next iteration. 
  10. Tests were run three times (to smooth outliers or have more reproducible results), then results were averaged for graphs. 
  11. We are going to use high concurrency scenarios for those scenarios when the number of threads would be bigger than the number of vCPU. And low concurrent scenario with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
  12. Scripts to reproduce results on our GitHub.

Test Case

Prerequisite:

1. Create DB with 10 tables with 10 000 000 rows each table

sysbench oltp_read_only --threads=10 --mysql-user=sbtest --mysql-password=sbtest --table-size=10000000 --tables=10 --db-driver=mysql --mysql-db=sbtest prepare

2. Load all data to LOAD_buffer

sysbench oltp_read_only --time=300 --threads=10 --table-size=1000000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Test:

Run in a loop for same scenario but  different concurrency THREAD (1,2,4,8,16,32,64,128) on each EC2.

sysbench oltp_read_only --time=300 --threads=${THREAD} --table-size=100000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Results:

Result reviewing was split into 3 parts:

  1. for “small” EC2 with 2, 4, and 8 vCPU
  2. for “medium” EC2 with 16 and 32 vCPU
  3. for  “large” EC2 with 48 and 64 vCPU

This “small”, “medium”, and “large” splitting is just synthetic names for better reviewing depends on the amount of vCPu per EC2
There would be four graphs for each test:

  1. Throughput (Queries per second) that EC2 could perform for each scenario (amount of threads)
  2. Latency 95 percentile that  EC2 could perform for each scenario (amount of threads)
  3. Relative comparing Graviton and Intel
  4. Absolute comparing Graviton and Intel

Validation that all load goes to CPU, not to DISK I/O, was done also using PMM (Percona Monitoring and Management). 

pic 0.1. OS monitoring during all test stages

pic 0.1 – OS monitoring during all test stages

From pic.0.1, we can see that there was no DISK I/O activity during tests, only CPU activity. The main activity with disks was during the DB creation stage.

Result for EC2 with 2, 4, and 8 vCPU

plot 1.1. Throughput (queries per second) for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.2.  Latencies (95 percentile) during the test for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.3.  Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, and 8  vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.4.  Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. AMD has the biggest latencies in all scenarios and for all EC2 instances. We won’t repeat this information in all future overviews, and this is the reason why we exclude it in comparing with other CPUs in percentage and numbers values (in plots 1.3 and 1.4, etc).
  2. Instances with two and four vCPU Intel show some advantage for less than 10 percent in all scenarios.
  3. However, an instance with 8 vCPU intel shows an advantage only on scenarios with threads that less or equal amount of vCPU on EC2.
  4. On EC2 with eight vCPU, Graviton started to show an advantage. It shows some good results in scenarios when the number of threads is more than the amount of vCPU on EC2. It grows up to 15 percent in high-concurrency scenarios with 64 and 128 threads, which are 8 and 16 times bigger than the amount of vCPU available for performing.
  5. Graviton start showing an advantage on EC2 with eight vCPU and with scenarios when threads are more than vCPU amount. This feature would appear in all future scenarios – more load than CPU, better result it shows.

 

Result for EC2 with 16 and 32 vCPU

plot 2.1.  Throughput (queries per second)  for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.2.  Latencies (95 percentile) during the test for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.3.  Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.4.  Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. In scenarios with the same load for ec2 with 16 and 32 vCPU, Graviton is continuing to have advantages when the amount of threads is more significant than the amount of available vCPU on instances.
  2. Graviton shows an advantage of up to 10 percent in high concurrency scenarios. However, Intel has up to 20 percent in low concurrency scenarios.
  3. In high-concurrency scenarios, Graviton could show an incredible difference in the number of (read) transactions per second up to 30 000 TPS.

Result for EC2 with 48 and 64 vCPU

plot 3.1.  Throughput (queries per second)  for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.2.  Latencies (95 percentile) during the test for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.3.  Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.4.  Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. It looks like Intel shows a significant advantage in most scenarios when its number of threads is less or equal to the amount of vCPU. It seems Intel is really good for such kind of task. When it has some additional free vCPU, it would be better, and this advantage could be up to 35 percent.
  2. However, Graviton shows outstanding results when the amount of threads is larger than the amount of vCPU. It shows an advantage from 5 to 14 percent over Intel.
  3. In real numbers, Graviton advantage could be up to 70 000 transactions per second over Intel performance in high-concurrency scenarios.

Total Result Overview

 

plot 4.2.  Latencies (95 percentile) during the test for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 4.3.  Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 4.4.  Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

Conclusions

  1. ARM CPUs show better results on EC2 with more vCPU and with higher load, especially in high-concurrency scenarios.
  2. As a result of small EC2 instances and small load, ARM CPUs show less impressive performance. So we can’t see its benefits comparing with Intel EC2
  3. Intel is still the leader in the area of low-concurrency scenarios. And it is definitely winning on EC2 with a small amount of vCPU.
  4. AMD does not show any competitive results in all cases.   

Final Thoughts

  1. AMD — we have a lot of questions about EC2 instances on AMD. So it would be a good idea to check what was going on that EC2 during the test and check the general performance of CPUs on those EC2.
  2. We found out that in some specific conditions, Intel and Graviton could compete with each other. But the other side of the coin is economical. What is cheaper to use in each situation? The next article will be about it. 
  3. It would be a good idea to try to use EC2 with Graviton for real high-concurrency DB.  
  4. It seems it needs to run some additional scenarios with 256 and 512 threads to check the hypothesis that Graviton could work better when threads are more than vCPU.

APPENDIX:

List of EC2 used in research:

CPU type EC2 EC2 price per hour (USD) vCPU RAM
Graviton m6g.large 0.077 2 8 Gb
Graviton m6g.xlarge 0.154 4 16 Gb
Graviton m6g.2xlarge 0.308 8 32 Gb
Graviton m6g.4xlarge 0.616 16 64 Gb
Graviton m6g.8xlarge 1.232 32 128 Gb
Graviton m6g.12xlarge 1.848 48 192 Gb
Graviton m6g.16xlarge 2.464 64 256 Gb
Intel m5.large 0.096 2 8 Gb
Intel m5.xlarge 0.192 4 16 Gb
Intel m5.2xlarge 0.384 8 32 Gb
Intel m5.4xlarge 0.768 16 64 Gb
Intel m5.8xlarge 1.536 32 128 Gb
Intel m5.12xlarge 2.304 48 192 Gb
Intel m5.16xlarge 3.072 64 256 Gb
AMD m5a.large 0.086 2 8 Gb
AMD m5a.xlarge 0.172 4 16 Gb
AMD m5a.2xlarge 0.344 8 32 Gb
AMD m5a.4xlarge 0.688 16 64 Gb
AMD m5a.8xlarge 1.376 32 128 Gb
AMD m5a.12xlarge 2.064 48 192 Gb
AMD m5a.16xlarge 2.752 64 256 Gb

 

my.cnf

my.cnf:
[mysqld]
ssl=0
performance_schema=OFF
skip_log_bin
server_id = 7

# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000
 join_buffer_size=256K
 sort_buffer_size=256K

# files
innodb_file_per_table
innodb_log_file_size=2G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size=${80%_OF_RAM}
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_log_buffer_size=64M

default_storage_engine=InnoDB
innodb_flush_log_at_trx_commit  = 1
innodb_doublewrite= 1
innodb_flush_method= O_DIRECT
innodb_file_per_table= 1
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_flush_neighbors=0
max_prepared_stmt_count=1000000 
bind_address = 0.0.0.0
[client]

 

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com