Jan
17
2022
--

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 3)

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL

Comparing Graviton (ARM) Performance to Intel and AMD for MySQLRecently we published the first part (m5, m5a, m6g) and the second part (C5, C5a, C6g) of research regarding comparing Graviton ARM with AMD and Intel CPU on AWS. We selected general-purpose EC2 instances with the same configurations (amount of vCPU in the first part). In the second part, we compared compute-optimized EC2 instances with the same conditions. The main goal was to see the trend and make a general comparison of CPU types on the AWS platform only for MySQL. We didn’t set the goal to compare the performance of different CPU types. Our expertise is in MySQL performance tuning. We share research “as is” with all scripts, and anyone interested could rerun and reproduce it.
All scripts, raw logs and additional plots are available on GitHub: 

(arm_cpu_comparison_m5, csv_file_with_all_data_m5,

arm_cpu_comparison_c5csv_file_with_all_data_c5

arm_cpu_comparison_m6, csv_file_with_all_data_m6). 

We were happy to see the reactions from our Percona Blog readers to our research. And we are open to any feedback. If anyone has any ideas on updating our methodology, we would be happy to correct it. 

This post is a continuation of research based on our interest in general-purpose EC2 (and, of course, because we saw that our audience wanted to see it). The main inspiration for this research was the feedback of our readers that we compared different generations of instances, especially old AMD instances (m5a.*), and compared it with the latest Graviton instances (m6g.*).  Additionally, we also decided to use the latest Intels instances (m6i.*) too.

Today, we will talk about (AWS) the latest general-purpose EC2: M6i, M6a, M6g (complete list in appendix). 

Short Conclusion:

  1. In most cases for m6i, m6g, and m6a instances, Intel shows better performance in throughput for MySQL read transactions. However, AMD instances are pretty close to Intel’s results.
  2. Sometimes Intel could show a significant advantage — more than almost 200k rps (almost 45% better) than Graviton. However, AMD’s gap wasn’t as significant as in previous results.
    Unfortunately, we compared Graviton with others. So we didn’t concentrate on comparing AMD with Intel. 
  3. If we could say in a few words: m6i instances (with Intel)  are better in their class than other m6a, m6g instances (in performance for MySql). And this advantage starts from 5%-10% and could be up to 45% compared with other CPUs.
  4. But Gravitons instances are still cheaper

Details, or How We Got Our Short Conclusion:

Disclaimer:

  1. Tests were run  on M6i.* (Intel) , M6a.* (AMD),  M6g.*(Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix). It was selected using only the same class of instances without additional upgrades. The main goal is to take the same instances with only differences in CPU types and identify their performance for MySQL.
  2. Monitoring was done with Percona Monitoring and Management (PMM).
  3. OS: Ubuntu 20.04 LTS 
  4. Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance.
  5. Oracle MySQL Community Server — 8.0.26-0 — installed from official packages (it was installed from Ubuntu repositories).
  6. Load tool: sysbench —  1.0.18
  7. innodb_buffer_pool_size=80% of available RAM
  8. Test duration is five minutes for each thread and then 90 seconds cool down before the next iteration. 
  9. Tests were run four times independently (to smooth outliers / to have more reproducible results). Then results were averaged for graphs. Also, graphs show min and max values that were during the test, which shows the range of variance. 
  10. We are going to use the “high-concurrency” scenario definition for scenarios when the number of threads would be bigger than the number of vCPU. And “low-concurrent” scenario definition with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
  11. We are comparing MySQL behavior on the same class of EC2, not CPU performance.
  12. We got some feedback regarding our methodology, and we would update it in the next iteration, with a different configuration, but for this particular research we leave previous to have possibility compare “apples to apples”.
  13. The post is not sponsored by any external company. It was produced using only Percona resources. We do not control what AWS uses as CPU in their instances, we only operate with what they offer. 

Test Case:

Prerequisite:

To use only CPU (without disk and network) we decided to use only read queries from memory. To do this we did the following actions. 

1. Create DB with 10 tables with 10 000 000 rows each table

sysbench oltp_read_only --threads=10 --mysql-user=sbtest --mysql-password=sbtest --table-size=10000000 --tables=10 --db-driver=mysql --mysql-db=sbtest prepare

2. Load all data to LOAD_buffer 

sysbench oltp_read_only --time=300 --threads=10 --table-size=1000000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Test:

3. Run in a loop for same scenario but  different concurrency THREAD (1,2,4,8,16,32,64,128) on each EC2 

sysbench oltp_read_only --time=300 --threads=${THREAD} --table-size=100000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Results:

Result reviewing was split into four parts:

  1. For “small” EC2 with 2, 4, and 8 vCPU
  2. For “medium” EC2 with 16  and 32 vCPU
  3. For  “large” EC2 with 48 and 64 vCPU
  4. For all scenarios to see the overall picture.

There would be four graphs for each test:

  1. Throughput (queries per second) that EC2 could perform for each scenario (number of threads)
  2. Latency 95 percentile that  EC2 could perform for each scenario, (number of threads)
  3. Relative comparing Graviton and Intel, Graviton, and AMD
  4. Absolute comparing Graviton and Intel, Graviton and AMD

Validation that all load goes to the CPU, not to DISK I/O or network, was done also using PMM (Percona Monitoring and Management). 

perona monitoring and management

pic 0.1. OS monitoring during all test stages

 

Result for EC2 with 2, 4, and 8 vCPU:

Result for EC2 with 2, 4, and 8 vCPU

Plot 1.1.  Throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.2.  Latencies (95 percentile) during the test for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 1.4.2. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

OVERVIEW:

  1. Based on plot 1.1. We could say that EC2 with Intel hasn’t an absolute advantage compared with Graviton and AMD. 
  2. Especially Intel and AMD, showing an advantage a little bit over – 20% over Graviton.
  3. In numbers, it is over five thousand and more requests per second. 
  4. AMD showed better results for two vCPU instances. 
  5. And it looks like in M6 class of Gravitons CPUs show the worst result compared with others.

Result for EC2 with 16 and 32 vCPU:

Result for EC2 with 16 and 32 vCPU

Plot 2.1. Throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 2.2. Latencies (95 percentile) during the test for EC2 with 16 and 32 vCPU for scenarios with 1,2 4,8,16,32,64,128 threads

 

Plot 2.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 2.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 2.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 2.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

OVERVIEW:

  1. Plot 2.1 shows that Intel vCPU is more performance efficient. AMD is the second, and Graviton is the third. 
  2. According to plots 2.3.1 and 2.3.2, Intel is better than Graviton up to 30 % and AMD is better than Graviton up to 20%. Graviton has some exceptional performance advantage over  AMD in some scenarios. But with this configuration and this instance classes, it is an exception according to the plot 2.3.2 scenarios for 8 and 16 concurrent threads. 
  3. In real numbers, Intel could execute up to 140 k read transactions more than Graviton CPUs, and AMD could read more than 70 k read transactions than Graviton. (plot 2.1. , plot 2.4.1.)
  4. In most cases, AMD and Intel are better than Graviton EC2 instances (plot 2.1, plot 2.3.2, plot 2.4.2).

 

Result for EC2 with 48 and 64 vCPU:

Result for EC2 with 48 and 64 vCPU

Plot 3.1. Throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 3.2. Latencies (95 percentile) during the test for EC2 with 48 and 64 vCPU for scenarios with 1,2 4,8,16,32,64,128 threads

 

Plot 3.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 3.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 3.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 3.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. For “Large” instances, Intel is still better than other vCPU. AMD was still in second place, except when Graviton stole some cases. (plot 3.1.)
  2. According to plot 3.3.1. Intel showed an advantage over Graviton up to 45%. On the other hand, AMD showed an advantage over Graviton up to 20% in the same case.
  3. There were two cases when Graviton showed some better results, but it is an exception. 
  4. In real numbers: Intel could generate over 150k-200k read transactions more than Graviton. And AMD could execute more than 70k – 130k read transactions than Graviton.

 

Full Result Overview:

Plot 4.1.1. Throughput (queries per second) – bar plot for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.1.2. Throughput (queries per second) – line plot for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.2.1. Latencies (95 percentile) during the test – bar plot for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.2.2. Latencies (95 percentile) during the test – line plot for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.3.2 Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.5.1. Percentage comparison INTEL and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Plot 4.5.2. Numbers comparison INTEL and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 32, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

Final Thoughts

  1. We compare general-purpose EC2 (m6i, m6a, m6g) instances from the AWS platform and their behavior for MySQL.  
  2. In these competitions, Graviton instances (m6g) do not provide any competitive results for MySQL. 
  3. There was some strange behavior. AMD and Intel showed their best performance when loaded (in the number of threads) are equal numbers of vCPU. According to plot 4.1.2. We could see some jump in performance when the load becomes the same as the amount of vCPU. This point was hard to see on the bar chart. But this is very interesting. However, Graviton worked more slightly without any “jumps”, and that’s why it showed exceptionally better results in some scenarios with AMD.
  4. Last point. Everyone wants to see an AMD vs Intel comparison. Plot 4.5.1 and 4.5.2.  The result – Intel is better in most cases.  And AMD was better only in one case with 2 vCPU. So the advantage of Intel compared with AMD could rise up to 96% for “large instances” (in some cases). It is unbelievable. But in most cases, this advantage is that Intel could run in 30% more MySql read transactions than AMD.
  5. It is still an open question regarding the economic efficiency of all this EC2. We would research this topic and answer this question a little bit later.

APPENDIX:

List of EC2 used in research:

CPU type Cpu info:

Model name

EC2 Memory GB Amount vCPU EC2 price per hour (USD)
AMD AMD EPYC 7R13 Processor 2650 MHz m6a.large 8 2 $0.0864
AMD m6a.xlarge 16 4 $0.1728
AMD m6a.2xlarge 32 8 $0.3456
AMD m6a.4xlarge 64 16 $0.6912
AMD m6a.8xlarge 128 32 $1.3824
AMD m6a.12xlarge 192 48 $2.0736
AMD m6a.16xlarge 256 64 $2.7648
Graviton ARMv8 AWS Graviton2 2500 MHz m6g.large 8 2 $0.077 
Graviton m6g.xlarge 16 4 $0.154
Graviton m6g.2xlarge 32 8 $0.308
Graviton m6g.4xlarge 64 16 $0.616
Graviton m6g.8xlarge 128 32 $1.232
Graviton m6g.12xlarge 192 48 $1.848
Graviton m6g.16xlarge 256 64 $2.464
Intel Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz m6i.large 8 2 $0.096000
Intel m6i.xlarge 16 4 $0.192000
Intel m6i.2xlarge 32 8 $0.384000 
Intel m6i.4xlarge 64 16 $0.768000
Intel m6i.8xlarge 128 32 $1.536000
Intel m6i.12xlarge 192 48 $2.304000
Intel m6i.16xlarge 256 64 $3.072000

 

my.cnf:

[mysqld]

ssl=0

performance_schema=OFF

skip_log_bin

server_id = 7




# general

table_open_cache = 200000

table_open_cache_instances=64

back_log=3500

max_connections=4000

 join_buffer_size=256K

 sort_buffer_size=256K




# files

innodb_file_per_table

innodb_log_file_size=2G

innodb_log_files_in_group=2

innodb_open_files=4000




# buffers

innodb_buffer_pool_size=${80%_OF_RAM}

innodb_buffer_pool_instances=8

innodb_page_cleaners=8

innodb_log_buffer_size=64M




default_storage_engine=InnoDB

innodb_flush_log_at_trx_commit  = 1

innodb_doublewrite= 1

innodb_flush_method= O_DIRECT

innodb_file_per_table= 1

innodb_io_capacity=2000

innodb_io_capacity_max=4000

innodb_flush_neighbors=0

max_prepared_stmt_count=1000000 

bind_address = 0.0.0.0

[client]

 

Apr
28
2021
--

Compiling Percona XtraBackup for ARM

Compiling Percona XtraBackup for ARM

Compiling Percona XtraBackup for ARMThis blog post will show how to compile the Percona XtraBackup (PXB) tool for ARM. For this, we are going to use an AWS EC2 ARM instance with Ubuntu 20.04(Focal Fossa).

The motivation for this was born in my interest in the new generation of ARM processors and if this is a viable option for the future. Ideally, I do not recommend installing all the necessary packages to compile Xtrabackup in a production environment for security reasons. Still, you can have a “compiling” server for this purpose and then move the binaries around.

Machine Configuration

For this blog post, I picked a c6g.2xlarge instance. The machine has the following hardware configuration:

Instance Size vCPU Memory (GiB) Instance Storage (GiB) Network Bandwidth (Gbps) EBS Bandwidth (Mbps)
c6g.2xlarge 8 16 EBS-Only Up to 10 Up to 4,750

And the OS specifications:

$ cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Note that the hardware capacity does not matter in the final result. But having a higher number of cores allows us to compile PXB faster. With all set, let’s start the compiling process.

Installing Dependencies

First, we will install and upgrade some packages necessary for this task:

# apt-get -y install dirmngr
# apt-get update -y
# apt-get -y install cmake
# apt-get -y install lsb-release wget
# apt-get -y install build-essential flex bison automake autoconf \
libtool cmake libaio-dev mysql-client libncurses-dev zlib1g-dev \
libev-dev libcurl4-gnutls-dev vim-common
# apt-get install git
# apt-get -y install devscripts libnuma-dev openssl libssl-dev libgcrypt20-dev

Cloning XtraBackup Repository

Now, we will get the XtraBackup code in GitHub:

# mkdir -p /compile_xtrabackup
# cd /compile_xtrabackup/
# git clone https://github.com/percona/percona-xtrabackup.git
# cd percona-xtrabackup
# git checkout percona-xtrabackup-8.0.23-16

Note that I’m picking the percona-xtrabackup-8.0.23-16  version. The reason is that the latest MySQL release is MySQL 8.0.24, and PXB will work only with equal or lower versions than the one we currently selected. In this case, it is MySQL 8.0.23.

Compiling

Ok, we are ready to begin our first compilation! Let’s fire the necessary commands:

cd /compile_xtrabackup/percona-xtrabackup
mkdir build && cd build
mkdir /boost
cmake .. -DWITH_NUMA=1 -DDOWNLOAD_BOOST=1 -DWITH_BOOST=/boost -DWITH_NUMA=1 -DCMAKE_INSTALL_PREFIX=/compile_xtrabackup/percona-xtrabackup/build
make -j 8 && make install

Remember in the beginning when I mentioned a better machine could compile faster? So, note in the last command above the option j -8. This will tell to make command to compile using all the 8 CPUs that I have available for this instance. You can adjust for any number of threads that you have (ideally should lesser or equal).

This process will take a while to finish, so we need to be patient.

Testing

Once the process is done, we are ready to run the binaries. They will be located in the /compile_xtrabackup/percona-xtrabackup/build/bin directory.

# ./bin/xtrabackup -version
./bin/xtrabackup version 8.0.23-16 based on MySQL server 8.0.23 Linux (aarch64) (revision id: 934bc8fb1d1)

The backup process is the same for x86 platforms. For example:

./bin/xtrabackup --user=root --socket=/tmp/n1.sock --backup --target-dir=<target-backup-dir> --datadir=<datadir>
./bin/xtrabackup --prepare --target-dir=<target-dir>

At this moment, Oracle does not provide ARM packages for Ubuntu 20.04, so I opened this Feature Request for it:

Add Packages for Ubuntu to ARM architecture

However, if you are enjoying the compiling path and want to give a shot at compiling MySQL for Ubuntu in ARM as well, you can check this blog post:

Compiling MySQL (Desmistifying)

And if you are wondering about PMM for ARM, my colleague Agustin Galego (GitHub) wrote a blog post about compiling the Percona Monitoring and Management client for ARM:

Compiling a Percona Monitoring and Management v2 Client in ARM Architecture

Conclusion

Compiling MySQL and PXB are not recommended for production systems, but for me, it is an excellent way to learn more about the code and its internal things. If you are interested in ARM, it is a way to start testing in advance to evaluate if it is the right architecture for your needs.

Useful Resources

Finally, you can reach us through the social networks, our forum, or access our material using the links presented below:

Apr
27
2021
--

Arm launches its latest chip design for HPC, data centers and the edge

Arm today announced the launch of two new platforms, Arm Neoverse V1 and Neoverse N2, as well as a new mesh interconnect for them. As you can tell from the name, V1 is a completely new product and maybe the best example yet of Arm’s ambitions in the data center, high-performance computing and machine learning space. N2 is Arm’s next-generation general compute platform that is meant to span use cases from hyperscale clouds to SmartNICs and running edge workloads. It’s also the first design based on the company’s new Armv9 architecture.

Not too long ago, high-performance computing was dominated by a small number of players, but the Arm ecosystem has scored its fair share of wins here recently, with supercomputers in South Korea, India and France betting on it. The promise of V1 is that it will vastly outperform the older N1 platform, with a 2x gain in floating-point performance, for example, and a 4x gain in machine learning performance.

Image Credits: Arm

“The V1 is about how much performance can we bring — and that was the goal,” Chris Bergey, SVP and GM of Arm’s Infrastructure Line of Business, told me. He also noted that the V1 is Arm’s widest architecture yet. He noted that while V1 wasn’t specifically built for the HPC market, it was definitely a target market. And while the current Neoverse V1 platform isn’t based on the new Armv9 architecture yet, the next generation will be.

N2, on the other hand, is all about getting the most performance per watt, Bergey stressed. “This is really about staying in that same performance-per-watt-type envelope that we have within N1 but bringing more performance,” he said. In Arm’s testing, NGINX saw a 1.3x performance increase versus the previous generation, for example.

Image Credits: Arm

In many ways, today’s release is also a chance for Arm to highlight its recent customer wins. AWS Graviton2 is obviously doing quite well, but Oracle is also betting on Ampere’s Arm-based Altra CPUs for its cloud infrastructure.

“We believe Arm is going to be everywhere — from edge to the cloud. We are seeing N1-based processors deliver consistent performance, scalability and security that customers want from Cloud infrastructure,” said Bev Crair, senior VP, Oracle Cloud Infrastructure Compute. “Partnering with Ampere Computing and leading ISVs, Oracle is making Arm server-side development a first-class, easy and cost-effective solution.”

Meanwhile, Alibaba Cloud and Tencent are both investing in Arm-based hardware for their cloud services as well, while Marvell will use the Neoverse V2 architecture for its OCTEON networking solutions.

Apr
24
2019
--

Docker developers can now build Arm containers on their desktops

Docker and Arm today announced a major new partnership that will see the two companies collaborate in bringing improved support for the Arm platform to Docker’s tools.

The main idea here is to make it easy for Docker developers to build their applications for the Arm platform right from their x86 desktops and then deploy them to the cloud (including the Arm-based AWS EC2 A1 instances), edge and IoT devices. Developers will be able to build their containers for Arm just like they do today, without the need for any cross-compilation.

This new capability, which will work for applications written in JavaScript/Node.js, Python, Java, C++, Ruby, .NET core, Go, Rust and PHP, will become available as a tech preview next week, when Docker hosts its annual North American developer conference in San Francisco.

Typically, developers would have to build the containers they want to run on the Arm platform on an Arm-based server. With this system, which is the first result of this new partnership, Docker essentially emulates an Arm chip on the PC for building these images.

“Overnight, the 2 million Docker developers that are out there can use the Docker commands they already know and become Arm developers,” Docker EVP of Strategic Alliances David Messina told me. “Docker, just like we’ve done many times over, has simplified and streamlined processes and made them simpler and accessible to developers. And in this case, we’re making x86 developers on their laptops Arm developers overnight.”

Given that cloud-based Arm servers like Amazon’s A1 instances are often significantly cheaper than x86 machines, users can achieve some immediate cost benefits by using this new system and running their containers on Arm.

For Docker, this partnership opens up new opportunities, especially in areas where Arm chips are already strong, including edge and IoT scenarios. Arm, similarly, is interested in strengthening its developer ecosystem by making it easier to develop for its platform. The easier it is to build apps for the platform, the more likely developers are to then run them on servers that feature chips from Arm’s partners.

“Arm’s perspective on the infrastructure really spans all the way from the endpoint, all the way through the edge to the cloud data center, because we are one of the few companies that have a presence all the way through that entire path,” Mohamed Awad, Arm’s VP of Marketing, Infrastructure Line of Business, said. “It’s that perspective that drove us to make sure that we engage Docker in a meaningful way and have a meaningful relationship with them. We are seeing compute and the infrastructure sort of transforming itself right now from the old model of centralized compute, general purpose architecture, to a more distributed and more heterogeneous compute system.”

Developers, however, Awad rightly noted, don’t want to have to deal with this complexity, yet they also increasingly need to ensure that their applications run on a wide variety of platforms and that they can move them around as needed. “For us, this is about enabling developers and freeing them from lock-in on any particular area and allowing them to choose the right compute for the right job that is the most efficient for them,” Awad said.

Messina noted that the promise of Docker has long been to remove the dependence of applications from the infrastructure on which they run. Adding Arm support simply extends this promise to an additional platform. He also stressed that the work on this was driven by the company’s enterprise customers. These are the users who have already set up their systems for cloud-native development with Docker’s tools — at least for their x86 development. Those customers are now looking at developing for their edge devices, too, and that often means developing for Arm-based devices.

Awad and Messina both stressed that developers really don’t have to learn anything new to make this work. All of the usual Docker commands will just work.

Mar
20
2019
--

MongoDB on ARM Processors

reads updates transactions per hour per $

ARM processors have been around for a while. In mid-2015/2016 there were a couple of attempts by the community to port MongoDB to work with this architecture. At the time, the main storage engine was MMAP and most of the available ARM boards were 32-bits. Overall, the port worked, but the fact is having MongoDB running on a Raspberry Pi was more a hack than a setup. The public cloud providers didn’t yet offer machines running with these processors.

The ARM processors are power-efficient and, for this reason, they are used in smartphones, smart devices and, now, even laptops. It was just a matter of time to have them available in the cloud as well. Now that AWS is offering ARM-based instances you might be thinking: “Hmmm, these instances include the same amount of cores and memory compared to the traditional x86-based offers, but cost a fraction of the price!”.

But do they perform alike?

In this blog, we selected three different AWS instances to compare: one powered by  an ARM processor, the second one backed by a traditional x86_64 Intel processor with the same number of cores and memory as the ARM instance, and finally another Intel-backed instance that costs roughly the same as the ARM instance but carries half as many cores. We acknowledge these processors are not supposed to be “equivalent”, and we do not intend to go deeper in CPU architecture in this blog. Our goal is purely to check how the ARM-backed instance fares in comparison to the Intel-based ones.

These are the instances we will consider in this blog post.

Methodology

We will use the Yahoo Cloud Serving Benchmark (YCSB, https://github.com/brianfrankcooper/YCSB) running on a dedicated instance (c5d.4xlarge) to simulate load in three distinct tests:

  1. a load of 1 billion documents in one collection having only the primary key (which we’ll call Inserts).
  2. a workload comprised of exclusively reads (Reads)
  3. a workload comprised of a mix of 75% reads with 5% scans plus 25% updates (Reads/Updates)

We will run each test with a varying number of concurrent threads (32, 64, and 128), repeating each set three times and keeping only the second-best result.

All instances will run the same MongoDB version (4.0.3, installed from a tarball and running with default settings) and operating system, Ubuntu 16.04. We chose this setup because MongoDB offer includes an ARM version for Ubuntu-based machines.

All the instances will be configured with:

  • 100 GB EBS with 5000 PIOPS and 20 GB EBS boot device
  • Data volume formatted with XFS, 4k blocks
  • Default swappiness and disk scheduler
  • Default kernel parameters
  • Enhanced cloud watch configured
  • Free monitoring tier enabled

Preparing the environment

We start with the setup of the benchmark software we will use for the test, YCSB. The first task was to spin up a powerful machine (c5d.4xlarge) to run the software and then prepare the environment:

The YCSB program requires Java, Maven, Python, and pymongo which doesn’t come by default in our Linux version – Ubuntu server x86. Here are the steps we used to configure our environment:

Installing Java

sudo apt-get install java-devel

Installing Maven

wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
sudo tar xzf apache-maven-*-bin.tar.gz -C /usr/local
cd /usr/local
sudo ln -s apache-maven-* maven
sudo vi /etc/profile.d/maven.sh

Add the following to maven.sh

export M2_HOME=/usr/local/maven
export PATH=${M2_HOME}/bin:${PATH}

Installing Python 2.7

sudo apt-get install python2.7

Installing pip to resolve the pymongo dependency

sudo apt-get install python-pip

Installing pymongo (driver)

sudo pip install pymongo

Installing YCSB

curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.5.0/ycsb-0.5.0.tar.gz
tar xfvz ycsb-0.5.0.tar.gz
cd ycsb-0.5.0

YCSB comes with different workloads, and also allows for the customization of a workload to match our own requirements. If you want to learn more about the workloads have a look at https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workload_template

First, we will edit the workloads/workloada file to perform 1 billion inserts (for our first test) while also preparing it to later perform only reads (for our second test):

recordcount=1000000
operationcount=1000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0.0

We will then change the workloads/workloadb file so as to provide a mixed workload for our third test.  We also set it to perform 1 billion reads, but we break it down into 70% of read queries and 30% of updates with a scan ratio of 5%, while also placing a cap on the maximum number of scanned documents (2000) in an effort to emulate real traffic – workloads are not perfect, right?

recordcount=10000000
operationcount=10000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.7
updateproportion=0.25
scanproportion=0.05
insertproportion=0
maxscanlength=2000

With that, we have the environment configured for testing.

Running the tests

With all instances configured and ready, we run the stress test against our MongoDB servers using the following command :

./bin/ycsb [load/run] mongodb -s -P workloads/workload[ab] -threads [32/64/128] \
 -p mongodb.url=mongodb://xxx.xxx.xxx.xxx.:27017/ycsb0000[0-9] \
 -jvm-args="-Dlogback.configurationFile=disablelogs.xml"

The parameters between brackets varied according to the instance and operation being executed:

  • [load/run] load means insert data while run means perform action (update/read)
  • workload[a/b] reference the different workloads we’ve used
  • [32/64/128] indicate the number of concurrent threads being used for the test
  • ycsb0000[0-9] is the database name we’ve used for the tests (for reference only)

Results

Without further ado, the table below summarizes the results for our tests:

 

 

 

Performance cost

Considering throughput alone – and in the context of those tests, particularly the last one – you may get more performance for the same cost. That’s certainly not always the case, which our results above also demonstrate. And, as usual, it depends on “how much performance do you need” – a matter that is even more pertinent in the cloud. With that in mind, we had another look at our data under the “performance cost” lens.

As we saw above, the c5.4xlarge instance performed better than the other two instances for a little over 50% more (in terms of cost). Did it deliver 50% more (performance) as well? Well, sometimes it did even more than that, but not always. We used the following formula to extrapolate the OPS (Operations Per Second) data we’ve got from our tests into OPH (Operations Per Hour), so we could them calculate how much bang (operations) for the buck (US$1) each instance was able to provide:

transactions/hour/US$1 = (OPS * 3600) / instance cost per hour

This is, of course, an artificial metric that aims to correlate performance and cost. For this reason, instead of plotting the raw values, we have normalized the results using the best performer instance as baseline(100%):

 

 

The intent behind these was only to demonstrate another way to evaluate how much we’re getting for what we’re paying. Of course, you need to have a clear understanding of your own requirements in order to make a balanced decision.

Parting thoughts

We hope this post awakens your curiosity not only about how MongoDB may perform on ARM-based servers, but also by demonstrating another way you can perform your own tests with the YCSB benchmark. Feel free to reach out to us through the comments section below if you have any suggestions, questions, or other observations to make about the work we presented here.

Feb
20
2019
--

Arm expands its push into the cloud and edge with the Neoverse N1 and E1

For the longest time, Arm was basically synonymous with chip designs for smartphones and very low-end devices. But more recently, the company launched solutions for laptops, cars, high-powered IoT devices and even servers. Today, ahead of MWC 2019, the company is officially launching two new products for cloud and edge applications, the Neoverse N1 and E1. Arm unveiled the Neoverse brand a few months ago, but it’s only now that it is taking concrete form with the launch of these new products.

“We’ve always been anticipating that this market is going to shift as we move more towards this world of lots of really smart devices out at the endpoint — moving beyond even just what smartphones are capable of doing,” Drew Henry, Arms’ SVP and GM for Infrastructure, told me in an interview ahead of today’s announcement. “And when you start anticipating that, you realize that those devices out of those endpoints are going to start creating an awful lot of data and need an awful lot of compute to support that.”

To address these two problems, Arm decided to launch two products: one that focuses on compute speed and one that is all about throughput, especially in the context of 5G.

ARM NEOVERSE N1

The Neoverse N1 platform is meant for infrastructure-class solutions that focus on raw compute speed. The chips should perform significantly better than previous Arm CPU generations meant for the data center and the company says that it saw speedups of 2.5x for Nginx and MemcacheD, for example. Chip manufacturers can optimize the 7nm platform for their needs, with core counts that can reach up to 128 cores (or as few as 4).

“This technology platform is designed for a lot of compute power that you could either put in the data center or stick out at the edge,” said Henry. “It’s very configurable for our customers so they can design how big or small they want those devices to be.”

The E1 is also a 7nm platform, but with a stronger focus on edge computing use cases where you also need some compute power to maybe filter out data as it is generated, but where the focus is on moving that data quickly and efficiently. “The E1 is very highly efficient in terms of its ability to be able to move data through it while doing the right amount of compute as you move that data through,” explained Henry, who also stressed that the company made the decision to launch these two different platforms based on customer feedback.

There’s no point in launching these platforms without software support, though. A few years ago, that would have been a challenge because few commercial vendors supported their data center products on the Arm architecture. Today, many of the biggest open-source and proprietary projects and distributions run on Arm chips, including Red Hat Enterprise Linux, Ubuntu, Suse, VMware, MySQL, OpenStack, Docker, Microsoft .Net, DOK and OPNFV. “We have lots of support across the space,” said Henry. “And then as you go down to that tier of languages and libraries and compilers, that’s a very large investment area for us at Arm. One of our largest investments in engineering is in software and working with the software communities.”

And as Henry noted, AWS also recently launched its Arm-based servers — and that surely gave the industry a lot more confidence in the platform, given that the biggest cloud supplier is now backing it, too.

Aug
02
2018
--

Arm acquires data management service Treasure Data to bolster its IoT platform

Arm, the semiconductor firm you probably still remember as ARM, today announced that it has acquired Treasure Data, a data management platform for large enterprise customers. The companies didn’t announce the financial details of the transaction, but earlier reporting by Bloomberg pegged the price at $600 million.

This move strengthens Arm’s IoT nascent play, given that Treasure Data’s specialty is dealing with the large streams of data that these systems produce (as well as data from CRM, e-commerce systems and other third-party services).

This move follows Arm’s recent acquisition of Stream and indeed, the company calls the acquisition of Treasure Data “the final piece” of its “IoT enablement puzzle.” The result of this completed puzzle is the Arm Pelion IoT Platform, which combines Stream, Treasure Data and the existing Arm Mbed Cloud into a single solution for connecting and managing IoT devices and the data they produce.

Arm says Treasure Data will continue to operate as before and continue to serve new clients as well as its existing users. “It will remain an important part of industry IoT enablement, providing the ability to harness new, complex edge and device data within a comprehensive customer profile to personalize their products and improve their experiences,” the company says.

Nov
24
2017
--

This Week in Data with Colin Charles 16: FOSDEM, Percona Live call for papers, and ARM

Colin Charles

Colin CharlesJoin Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Hurry up – the call for papers (CFP) for FOSDEM 2018 ends December 1, 2017. I highly recommend submitting as its a really fun, free, and technically-oriented event.

Don’t forget that the CFP for Percona Live Open Source Database Conference 2018 in Santa Clara closes December 22, 2017, so please also consider submitting as soon as possible. We want to make an early announcement of the talks, so we’ll definitely do the first pass even before the CFP date closes.

Is ARM the new hotness? Marvell confirms $6 billion purchase of chip maker Cavium. This month we’ve seen Red Hat Enterprise Linux for ARM arrive. We’ve also seen a press release from MariaDB about the performance on the Qualcomm Centriq 2400 Processor.

Some new books to add to your bookshelf and read: MariaDB and MySQL Common Table Expressions and Window Functions Revealed by Daniel Bartholomew. The accompanying source code repository will also be useful. Much awaited for, by Percona Live keynote speakers, Database Reliability Engineering: Designing and Operating Resilient Database Systems by Laine Campbell and Charity Majors is now ready to read.

Releases

Link List

Upcoming appearances

  • ACMUG 2017 gathering – Beijing, China, December 9-10 2017 – it was very exciting being there in 2016, I can only imagine it’s going to be bigger and better for 2017 since it is now two days long!

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com