Oct
25
2021
--

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL (Part 2)

Comparing Graviton Performance to Intel and AMD for MySQL

Comparing Graviton Performance to Intel and AMD for MySQLRecently we published the first part of research comparing Graviton (ARM) with AMD and Intel CPU on AWS. In the first part, we selected general-purpose EC2 instances with the same configurations (amount of vCPU).  The main goal was to see the trend and make a general comparison of CPU types on the AWS platform only for MySQL. We didn’t set the goal to compare the performance of different CPU types. Our expertise is in MySQL performance tuning. We share research “as is” with all scripts, and anyone interested could rerun and reproduce it.
All scripts, raw logs and additional plots are available on GitHub: (2021_10_arm_cpu_comparison_c5csv_file_with_all_data).

We were happy to see the reactions from our Percona Blog readers to our research, and we are open to any feedback. If anyone has any ideas on updating our methodology, we would be happy to correct it.

This post is a continuation of research based on our interest in compute-optimized EC2 (and, of course, because we saw that our audience wanted to see it). Today, we will talk about (AWS) Compute Optimized EC2: C5, C5a, C6g (complete list in appendix).

Next time we are going to share our findings on the economic efficiency of m5 and c5 instances.

Short Conclusion:

  1. In most cases for c5, c5a, and c6g instances, Intel shows better performance in throughput for MySQL read transactions.
  2. Sometimes Intel could show a significant advantage — more than almost 100k rps than other CPUs.
  3. If we could say in a few words: c5 instances (with Intel)  are better in their class than other c5a, c6g instances (in performance). And this advantage starts from 5% and could be up to 40% compared with other CPUs.

Details and Disclaimer:

  1. Tests were run  on C5.* (Intel) , C5a.* (AMD),  C6g.*(Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix.)
  2. Monitoring was done with PMM
  3. OS: Ubuntu 20.04 TLS 
  4. Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance.
  5. Oracle MySQL Community Server — 8.0.26-0 — installed from official packages.
  6. Load tool: sysbench —  1.0.18
  7. innodb_buffer_pool_size=80% of available RAM
  8. Test duration is 5 minutes for each thread and then 90 seconds warm down before the next iteration. 
  9. Tests were run 3 times (to smooth outliers / to have more reproducible results). Then results were averaged for graphs. 
  10. We are going to use a “high-concurrency” scenario definition for scenarios when the number of threads would be bigger than the number of vCPU. And “low-concurrent” scenario definition with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
  11. We are comparing MySQL behavior on the same class of EC2, not CPU performance.

Test Case:

Prerequisite:

1. Create DB with 10 tables with 10 000 000 rows each table

sysbench oltp_read_only --threads=10 --mysql-user=sbtest --mysql-password=sbtest --table-size=10000000 --tables=10 --db-driver=mysql --mysql-db=sbtest prepare

2. Load all data to LOAD_buffer

sysbench oltp_read_only --time=300 --threads=10 --table-size=1000000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

 

3. Test:

Run in a loop for same scenario but  different concurrency THREAD (1,2,4,8,16,32,64,128) on each EC2 

sysbench oltp_read_only --time=300 --threads=${THREAD} --table-size=100000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Results:

Result reviewing was split into three parts:

  1. For “small” EC2 with 2, 4, and 8 vCPU
  2. For “medium” EC2 with 16 and for  “large” EC2 with 48 and 64 vCPU (AWS does not have C5 EC2 with 64 vCPU )
  3. For all scenarios to see the overall picture.

There would be four graphs for each test:

  1. Throughput (queries per second) that EC2 could perform for each scenario (amount of threads).
  2. Latency 95 percentile that  EC2 could perform for each scenario (amount of threads).
  3. Relative comparing Graviton and Intel, Graviton, and AMD.
  4. Absolute comparing Graviton and Intel, Graviton, and AMD.

Validation that all load goes to CPU, not to DISK I/O, was done also using PMM (Percona Monitoring and Management). 

pic 0.1. OS monitoring during all test stages (picture is for example)

Result for EC2 with 2, 4, and 8 vCPU:

plot 1.1. Throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.2. Latencies (95 percentile) during the test for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.4.2. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4 and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. Based on plot 1.1, we could say that EC2 with Intel has an absolute advantage compared with Graviton and AMD. 
  2. This advantage in most scenarios fluctuates between 10% – 20%.
  3. In numbers, it is over 3,000 requests per second. 
  4. There is one scenario when Graviton becomes better EC2 with 8 vCPU (c6g.2xlarge). But the advantage is so tiny (near 2%) that it could be a statistical error. So we can’t say that benefits are relevant.

Result for EC2 with 16, 48 and 64 vCPU:

plot 2.1.  Throughput (queries per second)  for EC2 with 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.2.  Latencies (95 percentile) during the test for EC2 with 16, 48 and 64  vCPU for scenarios with 1,2 4,8,16,32,64,128 threads

 

plot 2.3.1 Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16, 48 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.3.2  Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16, 48 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. Plot 2.1 shows that it has an advantage over the other vCPU in our conditions (there are no EC2 with 64 Intel’s vCPU to have a full picture of comparison). 
  2. This advantage could be near 20% for EC2 with 16 vCPU and up to 40% for EC2 with 48 vCPU. However, it is possible to see that this advantage decreases with an increasing amount of threads. 
  3. In real numbers, Intel could execute up to 100 k read transactions more than other CPUs (plot 2.1. , plot 2.4.1).
  4. On the other hand, in one high-performance scenario, we could see a small advantage (3%) of Graviton. However, it is so small that it could be a statistical error again (plot  2.3.1.).
  5. In most cases, Graviton shows better results than AMD (plot 2.1, plot 2.3.2, plot 2.4.2).

Whole Result Overview:

plot 3.1. Throughput (queries per second) for EC2 with 2, 4, 8, 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.2.  Latencies (95 percentile) during the test for EC2 with 2, 4, 8, 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.3.1. Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16 and 48 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.3.2. Percentage comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.4.1. Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16 AND 48 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.4.2. Numbers comparison Graviton and AMD CPU in throughput (queries per second) for EC2 with 2, 4, 8, 16, 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

Final Thoughts

  1. We compare compute-optimized ec2 (c5, c5a, c6g) instances from the AWS platform and their behavior for MySQL.  
  2. It is still an open question regarding the economic efficiency of all this EC2. We will research this topic and answer this question a little bit later.
  3. In these tests, AMD does not provide any competitive results for MySQL. It is possible that in other tasks, it could show much better and competitive results.

APPENDIX:

List of EC2 used in research:

CPU type EC2 Amount vCPU Memory GB EC2 price per hour (USD)
AMD c5a.large 2 4 0.077
AMD c5a.xlarge 4 8 0.154
AMD c5a.2xlarge 8 16 0.308
AMD c5a.4xlarge 16 32 0.616
AMD c5a.12xlarge 48 96 1.848
AMD c5a.16xlarge 64 128 2.464
Intel c5.large 2 4 0.085
Intel c5.xlarge 4 8 0.170
Intel c5.2xlarge 8 16 0.340
Intel c5.4xlarge 16 32 0.680
Intel c5.12xlarge 48 96 2.040
Graviton c6g.large 2 4 0.068
Graviton c6g.xlarge 4 8 0.136
Graviton c6g.2xlarge 8 16 0.272
Graviton c6g.4xlarge 16 32 0.544
Graviton c6g.12xlarge 48 96 1.632
Graviton c6g.16xlarge 64 128 2.176

 

my.cnf:

[mysqld]
ssl=0
performance_schema=OFF
skip_log_bin
server_id = 7

# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000
 join_buffer_size=256K
 sort_buffer_size=256K

# files
innodb_file_per_table
innodb_log_file_size=2G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size=${80%_OF_RAM}
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_log_buffer_size=64M

default_storage_engine=InnoDB
innodb_flush_log_at_trx_commit  = 1
innodb_doublewrite= 1
innodb_flush_method= O_DIRECT
innodb_file_per_table= 1
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_flush_neighbors=0
max_prepared_stmt_count=1000000 
bind_address = 0.0.0.0
[client]

 

Oct
15
2021
--

Comparing Graviton (ARM) Performance to Intel and AMD for MySQL

Comparing Graviton (ARM) performance to Intel and AMD for MySQL

Comparing Graviton (ARM) performance to Intel and AMD for MySQLRecently, AWS presented its own CPU on ARM architecture for server solutions.

It was Graviton. As a result, they update some lines of their EC2 instances with new postfix “g” (e.g. m6g.small, r5g.nano, etc.). In their review and presentation, AWS showed impressive results that it is faster in some benchmarks up to 20 percent. On the other hand, some reviewers said that Graviton does not show any significant results and, in some cases, showed fewer performance results than Intel.

We decided to investigate it and do our research regarding Graviton performance, comparing it with other CPUs (Intel and AMD) directly for MySQL.

Disclaimer

  1. The test is designed to be CPU bound only, so we will use a read-only test and make sure there is no I/O activity during the test.
  2. Tests were run  on m5.* (Intel) , m5a.* (AMD),  m6g.*(Graviton) EC2 instances in the US-EAST-1 region. (List of EC2 see in the appendix).
  3. Monitoring was done with Percona Monitoring and Management (PMM).
  4. OS: Ubuntu 20.04 TLS.
  5. Load tool (sysbench) and target DB (MySQL) installed on the same EC2 instance.
  6. MySQL– 8.0.26-0 — installed from official packages.
  7. Load tool: sysbench —  1.0.18.
  8. innodb_buffer_pool_size=80% of available RAM.
  9. Test duration is five minutes for each thread and then 90 seconds warm down before the next iteration. 
  10. Tests were run three times (to smooth outliers or have more reproducible results), then results were averaged for graphs. 
  11. We are going to use high concurrency scenarios for those scenarios when the number of threads would be bigger than the number of vCPU. And low concurrent scenario with scenarios where the number of threads would be less or equal to a number of vCPU on EC2.
  12. Scripts to reproduce results on our GitHub.

Test Case

Prerequisite:

1. Create DB with 10 tables with 10 000 000 rows each table

sysbench oltp_read_only --threads=10 --mysql-user=sbtest --mysql-password=sbtest --table-size=10000000 --tables=10 --db-driver=mysql --mysql-db=sbtest prepare

2. Load all data to LOAD_buffer

sysbench oltp_read_only --time=300 --threads=10 --table-size=1000000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Test:

Run in a loop for same scenario but  different concurrency THREAD (1,2,4,8,16,32,64,128) on each EC2.

sysbench oltp_read_only --time=300 --threads=${THREAD} --table-size=100000 --mysql-user=sbtest --mysql-password=sbtest --db-driver=mysql --mysql-db=sbtest run

Results:

Result reviewing was split into 3 parts:

  1. for “small” EC2 with 2, 4, and 8 vCPU
  2. for “medium” EC2 with 16 and 32 vCPU
  3. for  “large” EC2 with 48 and 64 vCPU

This “small”, “medium”, and “large” splitting is just synthetic names for better reviewing depends on the amount of vCPu per EC2
There would be four graphs for each test:

  1. Throughput (Queries per second) that EC2 could perform for each scenario (amount of threads)
  2. Latency 95 percentile that  EC2 could perform for each scenario (amount of threads)
  3. Relative comparing Graviton and Intel
  4. Absolute comparing Graviton and Intel

Validation that all load goes to CPU, not to DISK I/O, was done also using PMM (Percona Monitoring and Management). 

pic 0.1. OS monitoring during all test stages

pic 0.1 – OS monitoring during all test stages

From pic.0.1, we can see that there was no DISK I/O activity during tests, only CPU activity. The main activity with disks was during the DB creation stage.

Result for EC2 with 2, 4, and 8 vCPU

plot 1.1. Throughput (queries per second) for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.2.  Latencies (95 percentile) during the test for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.3.  Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, and 8  vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.4.  Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2, 4, and 8 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. AMD has the biggest latencies in all scenarios and for all EC2 instances. We won’t repeat this information in all future overviews, and this is the reason why we exclude it in comparing with other CPUs in percentage and numbers values (in plots 1.3 and 1.4, etc).
  2. Instances with two and four vCPU Intel show some advantage for less than 10 percent in all scenarios.
  3. However, an instance with 8 vCPU intel shows an advantage only on scenarios with threads that less or equal amount of vCPU on EC2.
  4. On EC2 with eight vCPU, Graviton started to show an advantage. It shows some good results in scenarios when the number of threads is more than the amount of vCPU on EC2. It grows up to 15 percent in high-concurrency scenarios with 64 and 128 threads, which are 8 and 16 times bigger than the amount of vCPU available for performing.
  5. Graviton start showing an advantage on EC2 with eight vCPU and with scenarios when threads are more than vCPU amount. This feature would appear in all future scenarios – more load than CPU, better result it shows.

 

Result for EC2 with 16 and 32 vCPU

plot 2.1.  Throughput (queries per second)  for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 1.2.  Latencies (95 percentile) during the test for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.3.  Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 2.4.  Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 16 and 32 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. In scenarios with the same load for ec2 with 16 and 32 vCPU, Graviton is continuing to have advantages when the amount of threads is more significant than the amount of available vCPU on instances.
  2. Graviton shows an advantage of up to 10 percent in high concurrency scenarios. However, Intel has up to 20 percent in low concurrency scenarios.
  3. In high-concurrency scenarios, Graviton could show an incredible difference in the number of (read) transactions per second up to 30 000 TPS.

Result for EC2 with 48 and 64 vCPU

plot 3.1.  Throughput (queries per second)  for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.2.  Latencies (95 percentile) during the test for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.3.  Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 3.4.  Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

OVERVIEW:

  1. It looks like Intel shows a significant advantage in most scenarios when its number of threads is less or equal to the amount of vCPU. It seems Intel is really good for such kind of task. When it has some additional free vCPU, it would be better, and this advantage could be up to 35 percent.
  2. However, Graviton shows outstanding results when the amount of threads is larger than the amount of vCPU. It shows an advantage from 5 to 14 percent over Intel.
  3. In real numbers, Graviton advantage could be up to 70 000 transactions per second over Intel performance in high-concurrency scenarios.

Total Result Overview

 

plot 4.2.  Latencies (95 percentile) during the test for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 4.3.  Percentage comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

 

plot 4.4.  Numbers comparison Graviton and Intel CPU in throughput (queries per second) for EC2 with 2,4,8,16,32,48 and 64 vCPU for scenarios with 1,2,4,8,16,32,64,128 threads

Conclusions

  1. ARM CPUs show better results on EC2 with more vCPU and with higher load, especially in high-concurrency scenarios.
  2. As a result of small EC2 instances and small load, ARM CPUs show less impressive performance. So we can’t see its benefits comparing with Intel EC2
  3. Intel is still the leader in the area of low-concurrency scenarios. And it is definitely winning on EC2 with a small amount of vCPU.
  4. AMD does not show any competitive results in all cases.   

Final Thoughts

  1. AMD — we have a lot of questions about EC2 instances on AMD. So it would be a good idea to check what was going on that EC2 during the test and check the general performance of CPUs on those EC2.
  2. We found out that in some specific conditions, Intel and Graviton could compete with each other. But the other side of the coin is economical. What is cheaper to use in each situation? The next article will be about it. 
  3. It would be a good idea to try to use EC2 with Graviton for real high-concurrency DB.  
  4. It seems it needs to run some additional scenarios with 256 and 512 threads to check the hypothesis that Graviton could work better when threads are more than vCPU.

APPENDIX:

List of EC2 used in research:

CPU type EC2 EC2 price per hour (USD) vCPU RAM
Graviton m6g.large 0.077 2 8 Gb
Graviton m6g.xlarge 0.154 4 16 Gb
Graviton m6g.2xlarge 0.308 8 32 Gb
Graviton m6g.4xlarge 0.616 16 64 Gb
Graviton m6g.8xlarge 1.232 32 128 Gb
Graviton m6g.12xlarge 1.848 48 192 Gb
Graviton m6g.16xlarge 2.464 64 256 Gb
Intel m5.large 0.096 2 8 Gb
Intel m5.xlarge 0.192 4 16 Gb
Intel m5.2xlarge 0.384 8 32 Gb
Intel m5.4xlarge 0.768 16 64 Gb
Intel m5.8xlarge 1.536 32 128 Gb
Intel m5.12xlarge 2.304 48 192 Gb
Intel m5.16xlarge 3.072 64 256 Gb
AMD m5a.large 0.086 2 8 Gb
AMD m5a.xlarge 0.172 4 16 Gb
AMD m5a.2xlarge 0.344 8 32 Gb
AMD m5a.4xlarge 0.688 16 64 Gb
AMD m5a.8xlarge 1.376 32 128 Gb
AMD m5a.12xlarge 2.064 48 192 Gb
AMD m5a.16xlarge 2.752 64 256 Gb

 

my.cnf

my.cnf:
[mysqld]
ssl=0
performance_schema=OFF
skip_log_bin
server_id = 7

# general
table_open_cache = 200000
table_open_cache_instances=64
back_log=3500
max_connections=4000
 join_buffer_size=256K
 sort_buffer_size=256K

# files
innodb_file_per_table
innodb_log_file_size=2G
innodb_log_files_in_group=2
innodb_open_files=4000

# buffers
innodb_buffer_pool_size=${80%_OF_RAM}
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_log_buffer_size=64M

default_storage_engine=InnoDB
innodb_flush_log_at_trx_commit  = 1
innodb_doublewrite= 1
innodb_flush_method= O_DIRECT
innodb_file_per_table= 1
innodb_io_capacity=2000
innodb_io_capacity_max=4000
innodb_flush_neighbors=0
max_prepared_stmt_count=1000000 
bind_address = 0.0.0.0
[client]

 

Jul
16
2021
--

Intel rumored to be in talks to buy chip manufacturer GlobalFoundries for $30B

When it comes to M&A in the chip world, the numbers are never small. In 2020, four deals involving chip companies totaled $106 billion, led by Nvidia snagging ARM for $40 billion. One surprise from last year’s chip-laced M&A frenzy was Intel remaining on the sidelines. That would change if a rumored $30 billion deal to buy chip manufacturing concern GlobalFoundries comes to fruition.

The rumor was first reported by The Wall Street Journal yesterday.

Patrick Moorhead, founder and principal analyst at Moor Insight & Strategies, who watches the chip industry closely, says that snagging GlobalFoundries would certainly make sense for Intel. The company is currently pursuing a new strategy to manufacture and sell chips for both Intel and to others under CEO Pat Gelsinger, who came on board in January to turn around the flagging chip maker.

“GlobalFoundries has technologies and processes that are specialized for 5G RF, IoT and automotive. Intel with GlobalFoundries would become what I call a ‘full-stack provider’ that could offer a customer everything. This is in full alignment with IDM 2.0 (Intel’s chip manufacturing strategy) and would get Intel there years before it could without GlobalFoundries,” Moorhead told TechCrunch.

It would also give Intel a chip manufacturing facility at a time when there are global chip shortages and huge demand for product from every corner, due in part to the pandemic and the impact it has had on the global supply chain. Intel has already indicated it has plans to spend more than $20 billion to build two fabs (chip manufacturing plants) in Arizona. Adding GlobalFoundries to these plans would give them a broad set of manufacturing capabilities in the coming years if it came to pass, but would also involve a significant investment of tens of billions of dollars to get there.

GlobalFoundries is a worldwide chip manufacturing concern based in the U.S. The company was spun off from Intel’s rival chip maker AMD in 2012, and is currently owned by Mubadala Investment Company, the investment arm of the government of Abu Dhabi.

Investors seem to like the idea of combining these two companies, with Intel stock up 1.59% as of publication. It’s important to note that this deal is still in the rumor stage and nothing is definitive or final yet. We will let you know if that changes.

Apr
13
2021
--

SambaNova raises $676M at a $5.1B valuation to double down on cloud-based AI software for enterprises

Artificial intelligence technology holds a huge amount of promise for enterprises — as a tool to process and understand their data more efficiently; as a way to leapfrog into new kinds of services and products; and as a critical stepping stone into whatever the future might hold for their businesses. But the problem for many enterprises is that they are not tech businesses at their core, so bringing on and using AI will typically involve a lot of heavy lifting. Today, one of the startups building AI services is announcing a big round of funding to help bridge that gap.

SambaNova — a startup building AI hardware and integrated systems that run on it that only officially came out of three years in stealth last December — is announcing a huge round of funding today to take its business out into the world. The company has closed on $676 million in financing, a Series D that co-founder and CEO Rodrigo Liang has confirmed values the company at $5.1 billion.

The round is being led by SoftBank, which is making the investment via Vision Fund 2. Temasek and the government of Singapore Investment Corp. (GIC), both new investors, are also participating, along with previous backers BlackRock, Intel Capital, GV (formerly Google Ventures), Walden International and WRVI, among other unnamed investors. (Sidenote: BlackRock and Temasek separately kicked off an investment partnership yesterday, although it’s not clear if this falls into that remit.)

Co-founded by two Stanford professors, Kunle Olukotun and Chris Ré, and Liang, who had been an engineering executive at Oracle, SambaNova has been around since 2017 and has raised more than $1 billion to date — both to build out its AI-focused hardware, which it calls DataScale, and to build out the system that runs on it. (The “Samba” in the name is a reference to Liang’s Brazilian heritage, he said, but also the Latino music and dance that speaks of constant movement and shifting, not unlike the journey AI data regularly needs to take that makes it too complicated and too intensive to run on more traditional systems.)

SambaNova on one level competes for enterprise business against companies like Nvidia, Cerebras Systems and Graphcore — another startup in the space which earlier this year also raised a significant round. However, SambaNova has also taken a slightly different approach to the AI challenge.

In December, the startup launched Dataflow-as-a-Service as an on-demand, subscription-based way for enterprises to tap into SambaNova’s AI system, with the focus just on the applications that run on it, without needing to focus on maintaining those systems themselves. It’s the latter that SambaNova will be focusing on selling and delivering with this latest tranche of funding, Liang said.

SambaNova’s opportunity, Liang believes, lies in selling software-based AI systems to enterprises that are keen to adopt more AI into their business, but might lack the talent and other resources to do so if it requires running and maintaining large systems.

“The market right now has a lot of interest in AI. They are finding they have to transition to this way of competing, and it’s no longer acceptable not to be considering it,” said Liang in an interview.

The problem, he said, is that most AI companies “want to talk chips,” yet many would-be customers will lack the teams and appetite to essentially become technology companies to run those services. “Rather than you coming in and thinking about how to hire scientists and hire and then deploy an AI service, you can now subscribe, and bring in that technology overnight. We’re very proud that our technology is pushing the envelope on cases in the industry.”

To be clear, a company will still need data scientists, just not the same number, and specifically not the same number dedicating their time to maintaining systems, updating code and other more incremental work that comes managing an end-to-end process.

SambaNova has not disclosed many customers so far in the work that it has done — the two reference names it provided to me are both research labs, the Argonne National Laboratory and the Lawrence Livermore National Laboratory — but Liang noted some typical use cases.

One was in imaging, such as in the healthcare industry, where the company’s technology is being used to help train systems based on high-resolution imagery, along with other healthcare-related work. The coincidentally-named Corona supercomputer at the Livermore Lab (it was named after the 2014 lunar eclipse, not the dark cloud of a pandemic that we’re currently living through) is using SambaNova’s technology to help run calculations related to some COVID-19 therapeutic and antiviral compound research, Marshall Choy, the company’s VP of product, told me.

Another set of applications involves building systems around custom language models, for example in specific industries like finance, to process data quicker. And a third is in recommendation algorithms, something that appears in most digital services and frankly could always do to work a little better than it does today. I’m guessing that in the coming months it will release more information about where and who is using its technology.

Liang also would not comment on whether Google and Intel were specifically tapping SambaNova as a partner in their own AI services, but he didn’t rule out the prospect of partnering to go to market. Indeed, both have strong enterprise businesses that span well beyond technology companies, and so working with a third party that is helping to make even their own AI cores more accessible could be an interesting prospect, and SambaNova’s DataScale (and the Dataflow-as-a-Service system) both work using input from frameworks like PyTorch and TensorFlow, so there is a level of integration already there.

“We’re quite comfortable in collaborating with others in this space,” Liang said. “We think the market will be large and will start segmenting. The opportunity for us is in being able to take hold of some of the hardest problems in a much simpler way on their behalf. That is a very valuable proposition.”

The promise of creating a more accessible AI for businesses is one that has eluded quite a few companies to date, so the prospect of finally cracking that nut is one that appeals to investors.

“SambaNova has created a leading systems architecture that is flexible, efficient and scalable. This provides a holistic software and hardware solution for customers and alleviates the additional complexity driven by single technology component solutions,” said Deep Nishar, senior managing partner at SoftBank Investment Advisers, in a statement. “We are excited to partner with Rodrigo and the SambaNova team to support their mission of bringing advanced AI solutions to organizations globally.”

Mar
22
2021
--

Google Cloud hires Intel veteran to head its custom chip efforts

There has been a growing industry trend in recent years for large-scale companies to build their own chips. As part of that, Google announced today that it has hired long-time Intel executive Uri Frank as vice president to run its custom chip division.

“The future of cloud infrastructure is bright, and it’s changing fast. As we continue to work to meet computing demands from around the world, today we are thrilled to welcome Uri Frank as our VP of Engineering for server chip design,” Amin Vahdat, Google Fellow and VP of systems infrastructure wrote in a blog post announcing the hire.

With Frank, Google gets an experienced chip industry executive, who spent more than two decades at Intel rising from engineering roles to corporate vice president at the Design Engineering Group, his final role before leaving the company earlier this month.

Frank will lead the custom chip division in Israel as part of Google. As he said in his announcement on LinkedIn, this was a big step to join a company with a long history of building custom silicon.

“Google has designed and built some of the world’s largest and most efficient computing systems. For a long time, custom chips have been an important part of this strategy. I look forward to growing a team here in Israel while accelerating Google Cloud’s innovations in compute infrastructure,” Frank wrote.

Google’s history of building its own chips dates back to 2015 when it launched the first TensorFlow chips. It moved into video processing chips in 2018 and added OpenTitan , an open-source chip with a security angle in 2019.

Frank’s job will be to continue to build on this previous experience to work with customers and partners to build new custom chip architectures. The company wants to move away from buying motherboard components from different vendors to building its own “system on a chip” or SoC, which it says will be drastically more efficient.

“Instead of integrating components on a motherboard where they are separated by inches of wires, we are turning to “Systems on Chip” (SoC) designs where multiple functions sit on the same chip, or on multiple chips inside one package. In other words, the SoC is the new motherboard,” Vahdat wrote.

While Google was early to the “Build Your Own Chip” movement, we’ve seen other large scale companies like Amazon, Facebook, Apple and Microsoft begin building their own custom chips in recent years to meet each company’s unique needs and give more precise control over the relationship between the hardware and software.

It will be Frank’s job to lead Google’s custom chip unit and help bring it to the next level.

Feb
18
2021
--

Anthony Lin named permanent managing director and head of Intel Capital

When Wendell Brooks stepped down as managing partner and head of Intel Capital last August, Anthony Lin was named to replace him on an interim basis. At the time, it wasn’t clear if he would be given the role permanently, but today, six months later, the answer is known.

In a letter to the firm’s portfolio CEOs published on the company website, Lin mentioned, almost casually that he had taken on the two roles on a permanent basis. “Personally, I want to share that I have been appointed to managing partner and head of Intel Capital. I have been a member of the investment committee for the past several years and am humbly awed by the talent of our entrepreneurs and our team,” he wrote.

Lin takes over in a time of turmoil for Intel as the company struggles to regain its place in the semiconductor business that it dominated for decades. Meanwhile, Intel itself has a new CEO with Pat Gelsinger returning in January from VMware to lead the organization.

As the corporate investment arm of Intel, it looks for companies that can help the parent company understand where to invest resources in the future. If that is its goal, perhaps it hasn’t done a great job as Intel has lost some of its edge when it comes to innovation.

Lin, who was formerly head of mergers and acquisitions and international investing at the firm, can use the power of the firm’s investment dollars to try help point the parent company in the right direction and help find new ways to build innovative solutions on the Intel platform.

Lin acknowledged how challenging 2020 was for everyone, and his company was no exception, but the firm invested in 75 startups including 35 new deals and 40 deals involving companies it had previously invested in. It  has also made a commitment to invest in companies with more diverse founders. To that end, 30% of new venture stage dollars went to startups led by diverse leaders, according to Lin.

What’s more, the company made a five year commitment that 15% of all of its deals would go to companies with Black founders. It made some progress towards that goal, but there is still a ways to go. “At the end of 2020, 9% of our new venture deals and 15% of our venture dollars committed were in companies led by Black founders. We know there is more progress to be made and we will continue to encourage, foster and invest in diverse and inclusive teams,” he wrote.

Lin faces a big challenge ahead as he takes over a role that had the same leader for the first 28 years in Arvind Sodhani. His predecessor, Brooks, was there for five years. Now it passes to Lin and he needs to use the firm’s investment might to help Gelsinger advance the goals of the broader firm, while making sound investments.

Jan
13
2021
--

Pat Gelsinger stepping down as VMware CEO to replace Bob Swan at Intel

In a move that could have wide ramifications across the tech landscape, Intel announced that VMware CEO Pat Gelsinger would be replacing interim CEO Bob Swann at Intel on February 15th. The question is why would he leave his job to run a struggling chip giant.

The bottom line is he has a long history with Intel, working with some of the biggest names in chip industry lore before he joined VMware in 2009. It has to be a thrill for him to go back to his roots and try to jump start the company.

“I was 18 years old when I joined Intel, fresh out of the Lincoln Technical Institute. Over the next 30 years of my tenure at Intel, I had the honor to be mentored at the feet of Grove, Noyce and Moore,” Gelsinger wrote in a blog post announcing his new position.

Certainly Intel recognized that the history and that Gelsinger’s deep executive experience should help as the company attempts to compete in an increasingly aggressive chip industry landscape. “Pat is a proven technology leader with a distinguished track record of innovation, talent development, and a deep knowledge of Intel. He will continue a values-based cultural leadership approach with a hyper focus on operational execution,” Omar Ishrak, independent chairman of the Intel board said in a statement.

But Gelsinger is walking into a bit of a mess. As my colleague Danny Crichton wrote in his year-end review of the chip industry last month, Intel is far behind its competitors, and it’s going to be tough to play catch-up:

Intel has made numerous strategic blunders in the past two decades, most notably completely missing out on the smartphone revolution and also the custom silicon market that has come to prominence in recent years. It’s also just generally fallen behind in chip fabrication, an area it once dominated and is now behind Taiwan-based TSMC, Crichton wrote.

Patrick Moorhead, founder and principal analyst at Moor Insights & Strategy agrees with this assertion, saying that Swan was dealt a bad hand, walking in to clean up a mess that has years long timelines. While Gelsinger faces similar issues, Moorhead thinks he can refocus the company. “I am not foreseeing any major strategic changes with Gelsinger, but I do expect him to focus on the company’s engineering culture and get it back to an execution culture” Moorhead told me.

The announcement comes against the backdrop of massive chip industry consolidation last year with over $100 billion changing hands in four deals with NVidia nabbing ARM for $40 billion, the $35 billion AMD-Xilink deal, Analog snagging Maxim for $21 billion and Marvell grabbing Inphi for a mere $10 billion, not to mention Intel dumping its memory unit to SK Hynix for $9 billion.

As for VMware, it has to find a new CEO now. As Moorhead says, the obvious choice would be current COO Sanjay Poonen, but for the time being, it will be CFO Zane Rowe serving as interim CEO, rather than Poonen. In fact, it appears that the company will be casting a wider net than internal options. The official announcement states, “VMware’s Board of Directors is initiating a global executive search process to name a permanent CEO…”

Holger Mueller, an analyst at Constellation Research says it will be up to Michael Dell who to hand the reins to, but he believes Gelsinger was stuck at Dell and would not get a broader role, so he left.

“VMware has a deep bench, but it will be up to Michael Dell to get a CEO who can innovate on the software side and keep the unique DNA of VMware inside the Dell portfolio going strong, Dell needs the deeper profits of this business for its turnaround,” he said.

The stock market seems to like the move for Intel with the company stock up 7.26%, but not so much for VMware, whose stock was down close to the same amount at 7.72% as went to publication.

Nov
12
2020
--

Mirantis brings extensions to its Lens Kubernetes IDE, launches a new Kubernetes distro

Earlier this year, Mirantis, the company that now owns Docker’s enterprise business, acquired Lens, a desktop application that provides developers with something akin to an IDE for managing their Kubernetes clusters. At the time, Mirantis CEO Adrian Ionel told me that the company wants to offer enterprises the tools to quickly build modern applications. Today, it’s taking another step in that direction with the launch of an extensions API for Lens that will take the tool far beyond its original capabilities.

In addition to this update to Lens, Mirantis also today announced a new open-source project: k0s. The company describes it as “a modern, 100% upstream vanilla Kubernetes distro that is designed and packaged without compromise.”

It’s a single optimized binary without any OS dependencies (besides the kernel). Based on upstream Kubernetes, k0s supports Intel and Arm architectures and can run on any Linux host or Windows Server 2019 worker nodes. Given these requirements, the team argues that k0s should work for virtually any use case, ranging from local development clusters to private data centers, telco clusters and hybrid cloud solutions.

“We wanted to create a modern, robust and versatile base layer for various use cases where Kubernetes is in play. Something that leverages vanilla upstream Kubernetes and is versatile enough to cover use cases ranging from typical cloud based deployments to various edge/IoT type of cases,” said Jussi Nummelin, senior principal engineer at Mirantis and founder of k0s. “Leveraging our previous experiences, we really did not want to start maintaining the setup and packaging for various OS distros. Hence the packaging model of a single binary to allow us to focus more on the core problem rather than different flavors of packaging such as debs, rpms and what-nots.”

Mirantis, of course, has a bit of experience in the distro game. In its earliest iteration, back in 2013, the company offered one of the first major OpenStack distributions, after all.

Image Credits: Mirantis

As for Lens, the new API, which will go live next week to coincide with KubeCon, will enable developers to extend the service with support for other Kubernetes-integrated components and services.

“Extensions API will unlock collaboration with technology vendors and transform Lens into a fully featured cloud native development IDE that we can extend and enhance without limits,” said Miska Kaipiainen, the co-founder of the Lens open-source project and senior director of engineering at Mirantis. “If you are a vendor, Lens will provide the best channel to reach tens of thousands of active Kubernetes developers and gain distribution to your technology in a way that did not exist before. At the same time, the users of Lens enjoy quality features, technologies and integrations easier than ever.”

The company has already lined up a number of popular CNCF projects and vendors in the cloud-native ecosystem to build integrations. These include Kubernetes security vendors Aqua and Carbonetes, API gateway maker Ambassador Labs and AIOps company Carbon Relay. Venafi, nCipher, Tigera, Kong and StackRox are also currently working on their extensions.

“Introducing an extensions API to Lens is a game-changer for Kubernetes operators and developers, because it will foster an ecosystem of cloud-native tools that can be used in context with the full power of Kubernetes controls, at the user’s fingertips,” said Viswajith Venugopal, StackRox software engineer and developer of KubeLinter. “We look forward to integrating KubeLinter with Lens for a more seamless user experience.”

Nov
03
2020
--

Intel has acquired Cnvrg.io, a platform to manage, build and automate machine learning

Intel continues to snap up startups to build out its machine learning and AI operations. In the latest move, TechCrunch has learned that the chip giant has acquired Cnvrg.io, an Israeli company that has built and operates a platform for data scientists to build and run machine learning models, which can be used to train and track multiple models and run comparisons on them, build recommendations and more.

Intel confirmed the acquisition to us with a short note. “We can confirm that we have acquired Cnvrg,” a spokesperson said. “Cnvrg will be an independent Intel company and will continue to serve its existing and future customers.” Those customers include Lightricks, ST Unitas and Playtika.

Intel is not disclosing any financial terms of the deal, nor who from the startup will join Intel. Cnvrg, co-founded by Yochay Ettun (CEO) and Leah Forkosh Kolben, had raised $8 million from investors that include Hanaco Venture Capital and Jerusalem Venture Partners, and PitchBook estimates that it was valued at around $17 million in its last round. 

It was only a week ago that Intel made another acquisition to boost its AI business, also in the area of machine learning modeling: it picked up SigOpt, which had developed an optimization platform to run machine learning modeling and simulations.

While SigOpt is based out of the Bay Area, Cnvrg is in Israel, and joins an extensive footprint that Intel has built in the country, specifically in the area of artificial intelligence research and development, banked around its Mobileye autonomous vehicle business (which it acquired for more than $15 billion in 2017) and its acquisition of AI chipmaker Habana (which it acquired for $2 billion at the end of 2019).

Cnvrg.io’s platform works across on-premise, cloud and hybrid environments and it comes in paid and free tiers (we covered the launch of the free service, branded Core, last year). It competes with the likes of Databricks, Sagemaker and Dataiku, as well as smaller operations like H2O.ai that are built on open-source frameworks. Cnvrg’s premise is that it provides a user-friendly platform for data scientists so they can concentrate on devising algorithms and measuring how they work, not building or maintaining the platform they run on.

While Intel is not saying much about the deal, it seems that some of the same logic behind last week’s SigOpt acquisition applies here as well: Intel has been refocusing its business around next-generation chips to better compete against the likes of Nvidia and smaller players like GraphCore. So it makes sense to also provide/invest in AI tools for customers, specifically services to help with the compute loads that they will be running on those chips.

It’s notable that in our article about the Core free tier last year, Frederic noted that those using the platform in the cloud can do so with Nvidia-optimized containers that run on a Kubernetes cluster. It’s not clear if that will continue to be the case, or if containers will be optimized instead for Intel architecture, or both. Cnvrg’s other partners include Red Hat and NetApp.

Intel’s focus on the next generation of computing aims to offset declines in its legacy operations. In the last quarter, Intel reported a 3% decline in its revenues, led by a drop in its data center business. It said that it’s projecting the AI silicon market to be bigger than $25 billion by 2024, with AI silicon in the data center to be greater than $10 billion in that period.

In 2019, Intel reported some $3.8 billion in AI-driven revenue, but it hopes that tools like SigOpt’s will help drive more activity in that business, dovetailing with the push for more AI applications in a wider range of businesses.

Nov
02
2020
--

AWS launches its next-gen GPU instances

AWS today announced the launch of its newest GPU-equipped instances. Dubbed P4d, these new instances are launching a decade after AWS launched its first set of Cluster GPU instances. This new generation is powered by Intel Cascade Lake processors and eight of Nvidia’s A100 Tensor Core GPUs. These instances, AWS promises, offer up to 2.5x the deep learning performance of the previous generation — and training a comparable model should be about 60% cheaper with these new instances.

Image Credits: AWS

For now, there is only one size available, the p4d.24xlarge instance, in AWS slang, and the eight A100 GPUs are connected over Nvidia’s NVLink communication interface and offer support for the company’s GPUDirect interface as well.

With 320 GB of high-bandwidth GPU memory and 400 Gbps networking, this is obviously a very powerful machine. Add to that the 96 CPU cores, 1.1 TB of system memory and 8 TB of SSD storage and it’s maybe no surprise that the on-demand price is $32.77 per hour (though that price goes down to less than $20/hour for one-year reserved instances and $11.57 for three-year reserved instances.

Image Credits: AWS

On the extreme end, you can combine 4,000 or more GPUs into an EC2 UltraCluster, as AWS calls these machines, for high-performance computing workloads at what is essentially a supercomputer-scale machine. Given the price, you’re not likely to spin up one of these clusters to train your model for your toy app anytime soon, but AWS has already been working with a number of enterprise customers to test these instances and clusters, including Toyota Research Institute, GE Healthcare and Aon.

“At [Toyota Research Institute], we’re working to build a future where everyone has the freedom to move,” said Mike Garrison, Technical Lead, Infrastructure Engineering at TRI. “The previous generation P3 instances helped us reduce our time to train machine learning models from days to hours and we are looking forward to utilizing P4d instances, as the additional GPU memory and more efficient float formats will allow our machine learning team to train with more complex models at an even faster speed.”

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com