Mar
17
2021
--

OctoML raises $28M Series B for its machine learning acceleration platform

OctoML, a Seattle-based startup that offers a machine learning acceleration platform built on top of the open-source Apache TVM compiler framework project, today announced that it has raised a $28 million Series B funding round led by Addition. Previous investors Madrona Venture Group and Amplify Partners also participated in this round, which brings the company’s total funding to $47 million. The company last raised in April 2020, when it announced its $15 million Series A round led by Amplify

The promise of OctoML, which was founded by the team that also created TVM, is that developers can bring their models to its platform and the service will automatically optimize that model’s performance for any given cloud or edge device.

As Brazil-born OctoML co-founder and CEO Luis Ceze told me, since raising its Series A round, the company started onboarding some early adopters to its “Octomizer” SaaS platform.

Image Credits: OctoML

“It’s still in early access, but we are we have close to 1,000 early access sign-ups on the waitlist,” Ceze said. “That was a pretty strong signal for us to end up taking this [funding]. The Series B was pre-emptive. We were planning on starting to raise money right about now. We had barely started spending our Series A money — we still had a lot of that left. But since we saw this growth and we had more paying customers than we anticipated, there were a lot of signals like, ‘hey, now we can accelerate the go-to-market machinery, build a customer success team and continue expanding the engineering team to build new features.’ ”

Ceze tells me that the team also saw strong growth signals in the overall community around the TVM project (with about 1,000 people attending its virtual conference last year). As for its customer base (and companies on its waitlist), Ceze says it represents a wide range of verticals that range from defense contractors to financial services and life science companies, automotive firms and startups in a variety of fields.

Recently, OctoML also launched support for the Apple M1 chip — and saw very good performance from that.

The company has also formed partnerships with industry heavyweights like Microsoft (which is also a customer), Qualcomm and AMD to build out the open-source components and optimize its service for an even wider range of models (and larger ones, too).

On the engineering side, Ceze tells me that the team is looking at not just optimizing and tuning models but also the training process. Training ML models can quickly become costly and any service that can speed up that process leads to direct savings for its users — which in turn makes OctoML an easier sell. The plan here, Ceze tells me, is to offer an end-to-end solution where people can optimize their ML training and the resulting models and then push their models out to their preferred platform. Right now, its users still have to take the artifact that the Octomizer creates and deploy that themselves, but deployment support is on OctoML’s roadmap.

“When we first met Luis and the OctoML team, we knew they were poised to transform the way ML teams deploy their machine learning models,” said Lee Fixel, founder of Addition. “They have the vision, the talent and the technology to drive ML transformation across every major enterprise. They launched Octomizer six months ago and it’s already becoming the go-to solution developers and data scientists use to maximize ML model performance. We look forward to supporting the company’s continued growth.”


Early Stage is the premier “how-to” event for startup entrepreneurs and investors. You’ll hear firsthand how some of the most successful founders and VCs build their businesses, raise money and manage their portfolios. We’ll cover every aspect of company building: Fundraising, recruiting, sales, product-market fit, PR, marketing and brand building. Each session also has audience participation built-in — there’s ample time included for audience questions and discussion. Use code “TCARTICLE at checkout to get 20% off tickets right here.

Oct
27
2020
--

AMD grabs Xilinx for $35 billion as chip industry consolidation continues

The chip industry consolidation dance continued this morning as AMD has entered into an agreement to buy Xilinx for $35 billion, giving the company access to a broad set of specialized workloads.

AMD sees this deal as combining two companies that complement each other’s strengths without cannibalizing its own markets. CEO Lisa Su believes the acquisition will help make her company the high performance chip leader.

“By combining our world-class engineering teams and deep domain expertise, we will create an industry leader with the vision, talent and scale to define the future of high performance computing,” Su said in a statement.

In an article earlier this year, TechCrunch’s Darrell Etherington described Xilinx new satellite focused chips as offering a couple of industry firsts:

It’s the first 20nm process that’s rated for use in space, offering power and efficiency benefits, and it’s the first to offer specific support for high performance machine learning through neural network-based inference acceleration.

What’s more, the chips are designed to handle radiation and the rigors of launch, using a thick ceramic packaging.

In a call with analysts this morning, Su pointed to these kinds of specialized workloads as one of Xilinx’s strengths. “Xilinx has also built deep strategic partnerships across a diverse set of growing markets in 5G communications, data center, automotive, industrial, aerospace and defense. Xilinx is establishing themselves as a strategic technology partner to a broad set of industry leaders,” she said.

The success of these kinds of mega deals tend to hinge on whether the combined companies can work well together. Su pointed out that the two companies have been partnering for a number of years and already have a relationship, and the two company leaders share a common vision.

“Both AMD and Xilinx share common culture, focused on innovation, execution and collaborating deeply with customers. From a leadership standpoint, Victor and I have a shared vision of where we can take high performance and adaptive computing in the future,” Su said.

In a nod to shareholders of both companies, she said, “This is truly a compelling combination that will create significant value for all stakeholders, including AMD and Xilinx shareholders who will benefit from the future growth and upside potential of the combined company.”

So far stockholders aren’t impressed with AMD stock down over 4% in pre-trading, while Xilinx stock is up over 11% in pre-trading.  Xilinx has a market cap over $28 billion compared with AMD’s $96.5 billion, creating a massive combined company.

This deal comes on the heels of last month’s ARM acquisition by Nvidia for $40 billion. With two deals in less than two months totaling $75 million, the industry is looking at the bigger is better theory. Meanwhile Intel took a hit earlier this month after its earnings report showed weakness in its data center business.

While the deal has been approved by both company’s boards of directors, it still has to pass muster with shareholders and regulators, and is not expected to close until the end of next year.

When that happens Su will be chairman of the combined company, while Xilinx president and CEO, Victor Peng will join AMD as president, where he will be in charge of the Xilinx business and strategic growth initiatives.

It’s worth noting that the Wall Street Journal first reported that a deal between these two companies could be coming together earlier this month.

Aug
07
2019
--

Google and Twitter are using AMD’s new EPYC Rome processors in their data centers

Google and Twitter are among the companies now using EPYC Rome processors, AMD announced today during a launch event for the 7nm chips. The release of EPYC Rome marks a major step in AMD’s processor war with Intel, which said last month that its own 7nm chips, Ice Lake, won’t be available until 2021 (though it is expected to release its 10nm node this year).

Intel is still the biggest data center processor maker by far, however, and also counts Google and Twitter among its customers. But AMD’s latest releases and its strategy of undercutting competitors with lower pricing have quickly transformed it into a formidable rival.

Google has used other AMD chips before, including in its “Millionth Server,” built in 2008, and says it is now the first company to use second-generation EPYC chips in its data centers. Later this year, Google will also make available to Google Cloud customers virtual machines that run on the chips.

In a press statement, Bart Sano, Google vice president of engineering, said “AMD 2nd Gen Epyc processors will help us continue to do what we do best in our datacenters: innovate. Its scalable compute, memory and I/O performance will expand out ability to drive innovation forward in our infrastructure and will give Google Cloud customers the flexibility to choose the best VM for their workloads.”

Twitter plans to begin using EPYC Rome in its data center infrastructure later this year. Its senior director of engineering, Jennifer Fraser, said the chips will reduce the energy consumption of its data centers. “Using the AMD EPYC 7702 processor, we can scale out our compute clusters with more cores in less space using less power, which translates to 25% lower [total cost of ownership] for Twitter.”

In a comparison test between 2-socket Intel Xeon 6242 and AMD EPYC 7702P processors, AMD claimed that its chips were able to reduce total cost of ownership by up to 50% across “numerous workloads.” AMD EPYC Rome’s flagship is the 64-core, 128-thread 7742 chip, with a 2.25 base frequency, 225 default TDP and 256MB of total cache, starts at $6,950.

May
26
2019
--

AMD unveils the 12-core Ryzen 9 3900X, at half the price of Intel’s competing Core i9 9920X chipset

AMD CEO Lisa Su gave the Computex keynote in Taipei today, the first time the company has been invited to do so (the event officially starts tomorrow). During the presentation, AMD unveiled news about its chips and graphics processors that will increase pressure on competitors Intel and Nvidia, both in terms of pricing and performance.

Chips

All new third-generation Ryzen CPUs, the first with 7-nanometer desktop chips, will go on sale on July 7. The showstopper of Su’s keynote was the announcement of AMD’s 12-core, 24-thread Ryzen 9 3900x chip, the flagship of its third-generation Ryzen family. It will retail starting at $499, half the price of Intel’s competing Core i9 9920X chipset, which is priced at $1,189 and up.

The 3900x has 4.6 Ghz boost speed and 70 MB of total cache and uses 105 watts of thermal design power (versus the i9 9920x’s 165 watts), making it more efficient. AMD says that in a Blender demo against Intel i9-9920x, the 3900x finished about 18 percent more quickly.

Starting prices for other chips in the family are $199 for the 6-core, 12-thread 3600; $329 for the 8-core, 16-thread Ryzen 3700x (with 4.4 Ghz boost, 36 MB of total cache and a 65 watt TDP); and $399 for the 8-core, 16-thread Ryzen 3800X (4.5 Ghz, 32MB cache, 105w).

GPUs

AMD also revealed that its first Navi graphics processor units will be the Radeon RX 5000 series. Pricing is being closely watched because it may pressure Nvidia to bring down prices on competing products. AMD announced that the GPUs will be available in July, but more details, including pricing, performance and new features, won’t be announced until E3 next month in Los Angeles.

Data processors

AMD announced that its EPYC Rome data center processors, first demoed at CES in January, will launch next quarter, one quarter earlier than previously anticipated, to compete with Intel’s Cascade Lake. AMD says that during a benchmark test, EPYC Rome performed twice as fast as Cascade Lake.

Apr
25
2019
--

AWS expands cloud infrastructure offerings with new AMD EPYC-powered T3a instances

Amazon is always looking for ways to increase the options it offers developers in AWS, and to that end, today it announced a bunch of new AMD EPYC-powered T3a instances. These were originally announced at the end of last year at re:Invent, AWS’s annual customer conference.

Today’s announcement is about making these chips generally available. They have been designed for a specific type of burstable workload, where you might not always need a sustained amount of compute power.

“These instances deliver burstable, cost-effective performance and are a great fit for workloads that do not need high sustained compute power but experience temporary spikes in usage. You get a generous and assured baseline amount of processing power and the ability to transparently scale up to full core performance when you need more processing power, for as long as necessary,” AWS’s Jeff Barr wrote in a blog post.

These instances are built on the AWS Nitro System, Amazon’s custom networking interface hardware that the company has been working on for the last several years. The primary components of this system include the Nitro Card I/O Acceleration, Nitro Security Chip and the Nitro Hypervisor.

Today’s release comes on top of the announcement last year that the company would be releasing EC2 instances powered by Arm-based AWS Graviton Processors, another option for developers looking for a solution for scale-out workloads.

It also comes on the heels of last month’s announcement that it was releasing EC2 M5 and R5 instances, which use lower-cost AMD chips. These are also built on top of the Nitro System.

The EPCY processors are available starting today in seven sizes in your choice of spot instances, reserved instances or on-demand, as needed. They are available in US East in northern Virginia, US West in Oregon, Europe in Ireland, US East in Ohio and Asia-Pacific in Singapore.

Jul
11
2018
--

AMD EPYC Performance Testing… or Don’t get on the wrong side of SystemD

Ubuntu 16 AMD EPYC

Ever since AMD released their EPYC CPU for servers I wanted to test it, but I did not have the opportunity until recently, when Packet.net started offering bare metal servers for a reasonable price. So I started a couple of instances to test Percona Server for MySQL under this CPU. In this benchmark, I discovered some interesting discrepancies in performance between  AMD and Intel CPUs when running under systemd .

The set up

To test CPU performance, I used a read-only in-memory sysbench OLTP benchmark, as it burns CPU cycles and no IO is performed by Percona Server.

For this benchmark I used Packet.net c2.medium.x86 instances powered by AMD EPYC 7401P processors. The OS is exposed to 48 CPU threads.

For the OS I tried

  • Ubuntu 16.04 with default kernel 4.4 and upgraded to 4.15
  • Ubuntu 18.04 with kernel 4.15
  • Percona Server started from SystemD and without SystemD (for reasons which will become apparent later)

To have some points for comparison, I also ran a similar workload on my 2 socket Intel CPU server, with CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. I recognize this is not most recent Intel CPU, but this was the best I had at the time, and it also gave 48 CPU Threads.

Ubuntu 16

First, let’s review the results for Ubuntu 16

Or in tabular format:

Threads Ubuntu 16, kernel 4.4; systemd Ubuntu 16, kernel 4.4;

NO systemd

Ubuntu 16, kernel 4.15
1 943.44 948.7 899.82
2 1858.58 1792.36 1774.45
4 3533.2 3424.05 3555.94
8 6762.35 6731.57 7010.51
12 10012.18 9950.3 10062.82
16 13063.39 13043.55 12893.17
20 15347.68 15347.56 14756.27
24 16886.24 16864.81 16176.76
30 18150.2 18160.07 17860.5
36 18923.06 18811.64 19528.27
42 19374.86 19463.08 21537.79
48 20110.81 19983.05 23388.18
56 20548.51 20362.31 23768.49
64 20860.51 20729.52 23797.14
72 21123.71 21001.06 23645.95
80 21370 21191.24 23546.03
90 21622.54 21441.73 23486.29
100 21806.67 21670.38 23392.72
128 22161.42 22031.53 23315.33
192 22388.51 22207.26 22906.42
256 22091.17 21943.37 22305.06
512 19524.41 19381.69 19181.71

 

There are few conclusions we can see from this data

  1. AMD EPYC CPU scales quite well to the number of CPU Threads
  2. The recent kernel helps to boost the throughput.

Ubuntu 18.04

Now, let’s review the results for Ubuntu 18.04

Threads Ubuntu 18, systemd
Ubuntu 18, NO systemd
1 833.14 843.68
2 1684.21 1693.93
4 3346.42 3359.82
8 6592.88 6597.48
12 9477.92 9487.93
16 12117.12 12149.17
20 13934.27 13933
24 15265.1 15152.74
30 16846.02 16061.16
36 18488.88 16726.14
42 20493.57 17360.56
48 22217.47 17906.4
56 22564.4 17931.83
64 22590.29 17902.95
72 22472.75 17857.73
80 22421.99 17766.76
90 22300.09 17773.57
100 22275.24 17646.7
128 22131.86 17411.55
192 21750.8 17134.63
256 21177.25 16826.53
512 18296.61 17418.72

 

This is where the result surprised me: on Ubuntu 18.04 with SystemD running Percona Server for MySQL as a service the throughput was up to 24% better than if Percona Server for MySQL is started from a bash shell. I do not know exactly what causes this dramatic difference—systemd uses different slices for services and user commands, and somehow it affects the performance.

Baseline benchmark

To establish a baseline, I ran the same benchmark on my Intel box, running Ubuntu 16, and I tried two kernels: 4.13 and 4.15

Threads Ubuntu 16, kernel 4.13, systemd Ubuntu 16, kernel 4.15, systemd
Ubuntu 16, kernel 4.15, NO systemd
1 820.07 798.42 864.21
2 1563.31 1609.96 1681.91
4 2929.63 3186.01 3338.47
8 6075.73 6279.49 6624.49
12 8743.38 9256.18 9622.6
16 10580.14 11351.31 11984.64
20 12790.96 12599.78 14147.1
24 14213.68 14659.49 15716.61
30 15983.78 16096.03 17530.06
36 17574.46 18098.36 20085.9
42 18671.14 19808.92 21875.84
48 19431.05 22036.06 23986.08
56 19737.92 22115.34 24275.72
64 19946.57 21457.32 24054.09
72 20129.7 21729.78 24167.03
80 20214.93 21594.51 24092.86
90 20194.78 21195.61 23945.93
100 20753.44 21597.26 23802.16
128 20235.24 20684.34 23476.82
192 20280.52 20431.18 23108.36
256 20410.55 20952.64 22775.63
512 20953.73 22079.18 23489.3

 

Here we see the opposite result with SystemD: Percona Server running from a bash shell shows the better throughput compared with the SystemD service. So for some reason, systemd works differently for AMD and Intel CPUs. Please let me know if you have any ideas on how to deal with the impact that systemd has on performance.

Conclusions

So there are some conclusions from these results:

  1. AMD EPYC shows a decent performance scalability; the new kernel helps to improve it
  2. systemd shows different effects on throughput for AMD and Intel CPUs
  3. With AMD the throughput declines for a high concurrent workload with 512 threads, while Intel does not show a decline.

The post AMD EPYC Performance Testing… or Don’t get on the wrong side of SystemD appeared first on Percona Database Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com