Nov
19
2019
--

The Cerebras CS-1 computes deep learning AI problems by being bigger, bigger, and bigger than any other chip

Deep learning is all the rage these days in enterprise circles, and it isn’t hard to understand why. Whether it is optimizing ad spend, finding new drugs to cure cancer, or just offering better, more intelligent products to customers, machine learning — and particularly deep learning models — have the potential to massively improve a range of products and applications.

The key word though is ‘potential.’ While we have heard oodles of words sprayed across enterprise conferences the last few years about deep learning, there remain huge roadblocks to making these techniques widely available. Deep learning models are highly networked, with dense graphs of nodes that don’t “fit” well with the traditional ways computers process information. Plus, holding all of the information required for a deep learning model can take petabytes of storage and racks upon racks of processors in order to be usable.

There are lots of approaches underway right now to solve this next-generation compute problem, and Cerebras has to be among the most interesting.

As we talked about in August with the announcement of the company’s “Wafer Scale Engine” — the world’s largest silicon chip according to the company — Cerebras’ theory is that the way forward for deep learning is to essentially just get the entire machine learning model to fit on one massive chip. And so the company aimed to go big — really big.

Today, the company announced the launch of its end-user compute product, the Cerebras CS-1, and also announced its first customer of Argonne National Laboratory.

The CS-1 is a “complete solution” product designed to be added to a data center to handle AI workflows. It includes the Wafer Scale Engine (or WSE, i.e. the actual processing core) plus all the cooling, networking, storage, and other equipment required to operate and integrate the processor into the data center. It’s 26.25 inches tall (15 rack units), and includes 400,000 processing cores, 18 gigabytes of on-chip memory, 9 petabytes per second of on-die memory bandwidth, 12 gigabit ethernet connections to move data in and out of the CS-1 system, and sucks just 20 kilowatts of power.

A cross-section look at the CS-1. Photo via Cerebras

Cerebras claims that the CS-1 delivers the performance of more than 1,000 leading GPUs combined — a claim that TechCrunch hasn’t verified, although we are intently waiting for industry-standard benchmarks in the coming months when testers get their hands on these units.

In addition to the hardware itself, Cerebras also announced the release of a comprehensive software platform that allows developers to use popular ML libraries like TensorFlow and PyTorch to integrate their AI workflows with the CS-1 system.

In designing the system, CEO and co-founder Andrew Feldman said that “We’ve talked to more than 100 customers over the past year and a bit,“ in order to determine the needs for a new AI system and the software layer that should go on top of it. “What we’ve learned over the years is that you want to meet the software community where they are rather than asking them to move to you.”

I asked Feldman why the company was rebuilding so much of the hardware to power their system, rather than using already existing components. “If you were to build a Ferrari engine and put it in a Toyota, you cannot make a race car,” Feldman analogized. “Putting fast chips in Dell or [other] servers does not make fast compute. What it does is it moves the bottleneck.” Feldman explained that the CS-1 was meant to take the underlying WSE chip and give it the infrastructure required to allow it to perform to its full capability.

A diagram of the Cerebras CS-1 cooling system. Photo via Cerebras.

That infrastructure includes a high-performance water cooling system to keep this massive chip and platform operating at the right temperatures. I asked Feldman why Cerebras chose water, given that water cooling has traditionally been complicated in the data center. He said, “We looked at other technologies — freon. We looked at immersive solutions, we looked at phase-change solutions. And what we found was that water is extraordinary at moving heat.”

A side view of the CS-1 with its water and air cooling systems visible. Photo via Cerebras.

Why then make such a massive chip, which as we discussed back in August, has huge engineering requirements to operate compared to smaller chips that have better yield from wafers. Feldman said that “ it massively reduces communication time by using locality.”

In computer science, locality is placing data and compute in the right places within, let’s say a cloud, that minimizes delays and processing friction. By having a chip that can theoretically host an entire ML model on it, there’s no need for data to flow through multiple storage clusters or ethernet cables — everything that the chip needs to work with is available almost immediately.

According to a statement from Cerebras and Argonne National Laboratory, Cerebras is helping to power research in “cancer, traumatic brain injury and many other areas important to society today” at the lab. Feldman said that “It was very satisfying that right away customers were using this for things that are important and not for 17-year-old girls to find each other on Instagram or some shit like that.”

(Of course, one hopes that cancer research pays as well as influencer marketing when it comes to the value of deep learning models).

Cerebras itself has grown rapidly, reaching 181 engineers today according to the company. Feldman says that the company is hands down on customer sales and additional product development.

It has certainly been a busy time for startups in the next-generation artificial intelligence workflow space. Graphcore just announced this weekend that it was being installed in Microsoft’s Azure cloud, while I covered the funding of NUVIA, a startup led by the former lead chip designers from Apple who hope to apply their mobile backgrounds to solve the extreme power requirements these AI chips force on data centers.

Expect ever more announcements and activity in this space as deep learning continues to find new adherents in the enterprise.

Nov
15
2019
--

Three of Apple and Google’s former star chip designers launch NUVIA with $53M in series A funding

Silicon is apparently the new gold these days, or so VCs hope.

What was once a no-go zone for venture investors, who feared the long development lead times and high technical risk required for new entrants in the semiconductor field, has now turned into one of the hottest investment areas for enterprise and data VCs. Startups like Graphcore have reached unicorn status (after its $200 million series D a year ago) while Groq closed $52M from the likes of Chamath Palihapitiya of Social Capital fame and Cerebras raised $112 million in investment from Benchmark and others while announcing that it had produced the first trillion transistor chip (and who I profiled a bit this summer).

Today, we have another entrant with another great technical team at the helm, this time with a Santa Clara, CA-based startup called NUVIA. The company announced this morning that it has raised a $53 million series A venture round co-led by Capricorn Investment Group, Dell Technologies Capital (DTC), Mayfield, and WRVI Capital, with participation from Nepenthe LLC.

Despite only getting started earlier this year, the company currently has roughly 60 employees, 30 more at various stages of accepted offers, and the company may even crack 100 employees before the end of the year.

What’s happening here is a combination of trends in the compute industry. There has been an explosion in data and by extension, the data centers required to store all of that information, just as we have exponentially expanded our appetite for complex machine learning algorithms to crunch through all of those bits. Unfortunately, the growth in computation power is not keeping pace with our demands as Moore’s Law slows. Companies like Intel are hitting the limits of physics and our current know-how to continue to improve computational densities, opening the ground for new entrants and new approaches to the field.

Finding and building a dream team with a “chip” on their shoulder

There are two halves to the NUVIA story. First is the story of the company’s founders, which include John Bruno, Manu Gulati, and Gerard Williams III, who will be CEO. The three overlapped for a number of years at Apple, where they brought their diverse chip skillsets together to lead a variety of initiatives including Apple’s A-series of chips that power the iPhone and iPad. According to a press statement from the company, the founders have worked on a combined 20 chips across their careers and have received more than 100 patents for their work in silicon.

Gulati joined Apple in 2009 as a micro architect (or SoC architect) after a career at Broadcom, and a few months later, Williams joined the team as well. Gulati explained to me in an interview that, “So my job was kind of putting the chip together; his job was delivering the most important piece of IT that went into it, which is the CPU.” A few years later in around 2012, Bruno was poached from AMD and brought to Apple as well.

Gulati said that when Bruno joined, it was expected he would be a “silicon person” but his role quickly broadened to think more strategically about what the chipset of the iPhone and iPad should deliver to end users. “He really got into this realm of system-level stuff and competitive analysis and how do we stack up against other people and what’s happening in the industry,” he said. “So three very different technical backgrounds, but all three of us are very, very hands-on and, you know, just engineers at heart.”

Gulati would take an opportunity at Google in 2017 aimed broadly around the company’s mobile hardware, and he eventually pulled over Bruno from Apple to join him. The two eventually left Google earlier this year in a report first covered by The Information in May. For his part, Williams stayed at Apple for nearly a decade before leaving earlier this year in March.

The company is being stealthy about exactly what it is working on, which is typical in the silicon space because it can take years to design, manufacture, and get a product into market. That said, what’s interesting is that while the troika of founders all have a background in mobile chipsets, they are indeed focused on the data center broadly conceived (i.e. cloud computing), and specifically reading between the lines, to finding more energy-efficient ways that can combat the rising climate cost of machine learning workflows and computation-intensive processing.

Gulati told me that “for us, energy efficiency is kind of built into the way we think.”

The company’s CMO did tell me that the startup is building “a custom clean sheet designed from the ground up” and isn’t encumbered by legacy designs. In other words, the company isn’t building on top of ARM or other existing chip architectures.

Building an investor syndicate that’s willing to “chip” in

Outside of the founders, the other half of this NUVIA story is the collective of investors sitting around the table, all of whom not only have deep technical backgrounds, but also deep pockets who can handle the technical risk that comes with new silicon startups.

Capricorn specifically invested out of what it calls its Technology Impact Fund, which focuses on funding startups that use technology to make a positive impact on the world. Its portfolio according to a statement includes Tesla, Planet Labs, and Helion Energy.

Meanwhile, DTC is the venture wing of Dell Technologies and its associated companies, and brings a deep background in enterprise and data centers, particularly from the group’s server business like Dell EMC. Scott Darling, who leads DTC, is joining NUVIA’s board, although the company is not disclosing the board composition at this time. Navin Chaddha, an electrical engineer by training who leads Mayfield, has invested in companies like HashiCorp, Akamai, and SolarCity. Finally, WRVI has a long background in enterprise and semiconductor companies.

I chatted a bit with Darling of DTC about what he saw in this particular team and their vision for the data center. In addition to liking each founder individually, Darling felt the team as a whole was just very strong. “What’s most impressive is that if you look at them collectively, they have a skillset and breadth that’s also stunning,” he said.

He confirmed that the company is broadly working on data center products, but said the company is going to lie low on its specific strategy during product development. “No point in being specific, it just engenders immune reactions from other players so we’re just going to be a little quiet for a while,” he said.

He apologized for “sounding incredibly cryptic” but said that the investment thesis from his perspective for the product was that “the data center market is going to be receptive to technology evolutions that have occurred in places outside of the data center that’s going to allow us to deliver great products to the data center.”

Interpolating that statement a bit with the mobile chip backgrounds of the founders at Google and Apple, it seems evident that the extreme energy-to-performance constraints of mobile might find some use in the data center, particularly given the heightened concerns about power consumption and climate change among data center owners.

DTC has been a frequent investor in next-generation silicon, including joining the series A investment of Graphcore back in 2016. I asked Darling whether the firm was investing aggressively in the space or sort of taking a wait-and-see attitude, and he explained that the firm tries to keep a consistent volume of investments at the silicon level. “My philosophy on that is, it’s kind of an inverted pyramid. No, I’m not gonna do a ton of silicon plays. If you look at it, I’ve got five or six. I think of them as the foundations on which a bunch of other stuff gets built on top,” he explained. He noted that each investment in the space is “expensive” given the work required to design and field a product, and so these investments have to be carefully made with the intention of supporting the companies for the long haul.

That explanation was echoed by Gulati when I asked how he and his co-founders came to closing on this investor syndicate. Given the reputations of the three, they would have had easy access to any VC in the Valley. He said about the final investors:

They understood that putting something together like this is not going to be easy and it’s not for everybody … I think everybody understands that there’s an opportunity here. Actually capitalizing upon it and then building a team and executing on it is not something that just anybody could possibly take on. And similarly, it is not something that every investor could just possibly take on in my opinion. They themselves need to have a vision on their side and not just believe our story. And they need to strategically be willing to help and put in the money and be there for the long haul.

It may be a long haul, but Gulati noted that “on a day-to-day basis, it’s really awesome to have mostly friends you work with.” With perhaps 100 employees by the end of the year and tens of millions of dollars already in the bank, they have their war chest and their army ready to go. Now comes the fun (and hard) part as we learn how the chips fall.

Oct
08
2019
--

Arm brings custom instructions to its embedded CPUs

At its annual TechCon event in San Jose, Arm today announced Custom Instructions, a new feature of its Armv8-M architecture for embedded CPUs that, as the name implies, enables its customers to write their own custom instructions to accelerate their specific use cases for embedded and IoT applications.

“We already have ways to add acceleration, but not as deep and down to the heart of the CPU. What we’re giving [our customers] here is the flexibility to program your own instructions, to define your own instructions — and have them executed by the CPU,” ARM senior director for its automotive and IoT business, Thomas Ensergueix, told me ahead of today’s announcement.

He noted that Arm always had a continuum of options for acceleration, starting with its memory-mapped architecture for connecting over a bus GPUs and today’s neural processor units. This allows the CPU and the accelerator to run in parallel, but with the bus being the bottleneck. Customers also can opt for a co-processor that’s directly connected to the CPU, but today’s news essentially allows Arm customers to create their own accelerated algorithms that then run directly on the CPU. That means the latency is low, but it’s not running in parallel, as with the memory-mapped solution.

arm instructions

As Arm argues, this setup allows for the lowest-cost (and risk) path for integrating customer workload acceleration, as there are no disruptions to the existing CPU features and it still allows its customers to use the existing standard tools with which they are already familiar.

custom assemblerFor now, custom instructions will only be available to be implemented in the Arm Cortex-M33 CPUs, starting in the first half of 2020. By default, it’ll also be available for all future Cortex-M processors. There are no additional costs or new licenses to buy for Arm’s customers.

Ensergueix noted that as we’re moving to a world with more and more connected devices, more of Arm’s customers will want to optimize their processors for their often very specific use cases — and often they’ll want to do so because by creating custom instructions, they can get a bit more battery life out of these devices, for example.

Arm has already lined up a number of partners to support Custom Instructions, including IAR Systems, NXP, Silicon Labs and STMicroelectronics .

“Arm’s new Custom Instructions capabilities allow silicon suppliers like NXP to offer their customers a new degree of application-specific instruction optimizations to improve performance, power dissipation and static code size for new and emerging embedded applications,” writes NXP’s Geoff Lees, SVP and GM of Microcontrollers. “Additionally, all these improvements are enabled within the extensive Cortex-M ecosystem, so customers’ existing software investments are maximized.”

In related embedded news, Arm also today announced that it is setting up a governance model for Mbed OS, its open-source operating system for embedded devices that run an Arm Cortex-M chip. Mbed OS has always been open source, but the Mbed OS Partner Governance model will allow Arm’s Mbed silicon partners to have more of a say in how the OS is developed through tools like a monthly Product Working Group meeting. Partners like Analog Devices, Cypress, Nuvoton, NXP, Renesas, Realtek,
Samsung and u-blox are already participating in this group.

Sep
12
2019
--

The mainframe business is alive and well, as IBM announces new z15

It’s easy to think about mainframes as some technology dinosaur, but the fact is these machines remain a key component of many large organizations’ computing strategies. Today, IBM announced the latest in their line of mainframe computers, the z15.

For starters, as you would probably expect, these are big and powerful machines capable of handling enormous workloads. For example, this baby can process up to 1 trillion web transactions a day and handle 2.4 million Docker containers, while offering unparalleled security to go with that performance. This includes the ability to encrypt data once, and it stays encrypted, even when it leaves the system, a huge advantage for companies with a hybrid strategy.

Speaking of which, you may recall that IBM bought Red Hat last year for $34 billion. That deal closed in July and the companies have been working to incorporate Red Hat technology across the IBM business including the z line of mainframes.

IBM announced last month that it was making OpenShift, Red Hat’s Kubernetes-based cloud-native tools, available on the mainframe running Linux. This should enable developers, who have been working on OpenShift on other systems, to move seamlessly to the mainframe without special training.

IBM sees the mainframe as a bridge for hybrid computing environments, offering a highly secure place for data that when combined with Red Hat’s tools, can enable companies to have a single control plane for applications and data wherever it lives.

While it could be tough to justify the cost of these machines in the age of cloud computing, Ray Wang, founder and principal analyst at Constellation Research, says it could be more cost-effective than the cloud for certain customers. “If you are a new customer, and currently in the cloud and develop on Linux, then in the long run the economics are there to be cheaper than public cloud if you have a lot of IO, and need to get to a high degree of encryption and security,” he said.

He added, “The main point is that if you are worried about being held hostage by public cloud vendors on pricing, in the long run the z is a cost-effective and secure option for owning compute power and working in a multi-cloud, hybrid cloud world.”

Companies like airlines and financial services companies continue to use mainframes, and while they need the power these massive machines provide, they need to do so in a more modern context. The z15 is designed to provide that link to the future, while giving these companies the power they need.

Sep
10
2019
--

Q-CTRL raises $15M for software that reduces error and noise in quantum computing hardware

As hardware makers continue to work on ways of making wide-scale quantum computing a reality, a startup out of Australia that is building software to help reduce noise and errors on quantum computing machines has raised a round of funding to fuel its U.S. expansion.

Q-CTRL is designing firmware for computers and other machines (such as quantum sensors) that perform quantum calculations, firmware to identify the potential for errors to make the machines more resistant and able to stay working for longer (the Q in its name is a reference to qubits, the basic building block of quantum computing).

The startup is today announcing that it has raised $15 million, money that it plans to use to double its team (currently numbering 25) and set up shop on the West Coast, specifically Los Angeles.

This Series A is coming from a list of backers that speaks to the startup’s success to date in courting quantum hardware companies as customers. Led by Square Peg Capital — a prolific Australian VC that has backed homegrown startups like Bugcrowd and Canva, but also those further afield such as Stripe — it also includes new investor Sierra Ventures as well as Sequoia Capital, Main Sequence Ventures and Horizons Ventures.

Q-CTRL’s customers are some of the bigger names in quantum computing and IT, such as Rigetti, Bleximo and Accenture, among others. IBM — which earlier this year unveiled its first commercial quantum computer — singled it out last year for its work in advancing quantum technology.

The problem that Q-CTRL is aiming to address is basic but arguably critical to solving if quantum computing ever hopes to make the leap out of the lab and into wider use in the real world.

Quantum computers and other machines like quantum sensors, which are built on quantum physics architecture, are able to perform computations that go well beyond what can be done by normal computers today, with the applications for such technology including cryptography, biosciences, advanced geological exploration and much more. But quantum computing machines are known to be unstable, in part because of the fragility of the quantum state, which introduces a lot of noise and subsequent errors, which results in crashes.

As Frederic pointed out recently, scientists are confident that this is ultimately a solvable issue. Q-CTRL is one of the hopefuls working on that, by providing a set of tools that runs on quantum machines, visualises noise and decoherence and then deploys controls to “defeat” those errors.

Q-CTRL currently has four products it offers to the market: Black Opal, Boulder Opal, Open Controls and Devkit — aimed respectively at students/those exploring quantum computing, hardware makers, the research community and end users/algorithm developers.

Q-CTRL was founded in 2017 by Michael Biercuk, a professor of Quantum Physics & Quantum Technology at the University of Sydney and a chief investigator in the Australian Research Council Centre of Excellence for Engineered Quantum Systems, who studied in the U.S., with a PhD in physics from Harvard.

“Being at the vanguard of the birth of a new industry is extraordinary,” he said in a statement. “We’re also thrilled to be assembling one of the most impressive investor syndicates in quantum technology. Finding investors who understand and embrace both the promise and the challenge of building quantum computers is almost magical.”

Why choose Los Angeles for building out a U.S. presence, you might ask? Southern California, it turns out, has shaped up to be a key area for quantum research and development, with several of the universities in the region building out labs dedicated to the area, and companies like Lockheed Martin and Google also contributing to the ecosystem. This means a strong pipeline of talent and conversation in what is still a nascent area.

Given that it is still early days for quantum computing technology, that gives a lot of potential options to a company like Q-CTRL longer-term: The company might continue to build a business as it does today, selling its technology to a plethora of hardware makers and researchers in the field; or it might get snapped up by a specific hardware company to integrate Q-CTRL’s solutions more closely onto its machines (and keep them away from competitors).

Or, it could make like a quantum particle and follow both of those paths at the same time.

“Q-CTRL impressed us with their strategy; by providing infrastructure software to improve quantum computers for R&D teams and end-users, they’re able to be a central player in bringing this technology to reality,” said Tushar Roy, a partner at Square Peg. “Their technology also has applications beyond quantum computing, including in quantum-based sensing, which is a rapidly-growing market. In Q-CTRL we found a rare combination of world-leading technical expertise with an understanding of customers, products and what it takes to build an impactful business.”

Aug
26
2019
--

IBM’s quantum-resistant magnetic tape storage is not actually snake oil

Usually when someone in tech says the word “quantum,” I put my hands on my ears and sing until they go away. But while IBM’s “quantum computing safe tape drive” nearly drove me to song, when I thought about it, it actually made a lot of sense.

First of all, it’s a bit of a misleading lede. The tape is not resistant to quantum computing at all. The problem isn’t that qubits are going to escape their cryogenic prisons and go interfere with tape drives in the basement of some data center or HQ. The problem is what these quantum computers may be able to accomplish when they’re finally put to use.

Without going too deep down the quantum rabbit hole, it’s generally acknowledged that quantum computers and classical computers (like the one you’re using) are good at different things — to the point where in some cases, a problem that might take incalculable time on a traditional supercomputer could be done in a flash on quantum. Don’t ask me how — I said we’re not going down the hole!

One of the things quantum is potentially very good at is certain types of cryptography: It’s theorized that quantum computers could absolutely smash through many currently used encryption techniques. In the worst-case scenario, that means that if someone got hold of a large cache of encrypted data that today would be useless without the key, a future adversary may be able to force the lock. Considering how many breaches there have been where the only reason your entire life wasn’t stolen was because it was encrypted, this is a serious threat.

IBM and others are thinking ahead. Quantum computing isn’t a threat right now, right? quantum tapeIt isn’t being seriously used by anyone, let alone hackers. But what if you buy a tape drive for long-term data storage today, and then a decade from now a hack hits and everything is exposed because it was using “industry standard” encryption?

To prevent that from happening, IBM is migrating its tape storage over to encryption algorithms that are resistant to state of the art quantum decryption techniques — specifically lattice cryptography (another rabbit hole — go ahead). Because these devices are meant to be used for decades if possible, during which time the entire computing landscape can change. It will be hard to predict exactly what quantum methods will emerge in the future, but at the very least you can try not to be among the low-hanging fruit favored by hackers.

The tape itself is just regular tape. In fact, the whole system is pretty much the same as you’d have bought a week ago. All the changes are in the firmware, meaning earlier drives can be retrofitted with this quantum-resistant tech.

Quantum computing may not be relevant to many applications today, but next year who knows? And in 10 years, it might be commonplace. So it behooves companies like IBM that plan to be part of the enterprise world for decades to come to plan for it today.

Aug
20
2019
--

IBM is moving OpenPower Foundation to The Linux Foundation

IBM makes the Power Series chips, and as part of that has open-sourced some of the underlying technologies to encourage wider use of these chips. The open-source pieces have been part of the OpenPower Foundation. Today, the company announced it was moving the foundation under The Linux Foundation, and while it was at it, announced it was open-sourcing several other important bits.

Ken King, general manager for OpenPower at IBM, says that at this point in his organization’s evolution, they wanted to move it under the auspices of the Linux Foundation . “We are taking the OpenPower Foundation, and we are putting it as an entity or project underneath The Linux Foundation with the mindset that we are now bringing more of an open governance approach and open governance principles to the foundation,” King told TechCrunch.

But IBM didn’t stop there. It also announced that it was open-sourcing some of the technical underpinnings of the Power Series chip to make it easier for developers and engineers to build on top of the technology. Perhaps most importantly, the company is open-sourcing the Power Instruction Set Architecture (ISA). These are “the definitions developers use for ensuring hardware and software work together on Power,” the company explained.

King sees open-sourcing this technology as an important step for a number of reasons around licensing and governance. “The first thing is that we are taking the ability to be able to implement what we’re licensing, the ISA instruction set architecture, for others to be able to implement on top of that instruction set royalty free with patent rights,” he explained.

The company is also putting this under an open governance workgroup at the OpenPower Foundation. This matters to open-source community members because it provides a layer of transparency that might otherwise be lacking. What that means in practice is that any changes will be subject to a majority vote, so long as the changes meet compatibility requirements, King said.

Jim Zemlin, executive director at the Linux Foundation, says that making all of this part of the Linux Foundation open-source community could drive more innovation. “Instead of a very, very long cycle of building an application and working separately with hardware and chip designers, because all of this is open, you’re able to quickly build your application, prototype it with hardware folks, and then work with a service provider or a company like IBM to take it to market. So there’s not tons of layers in between the actual innovation and value captured by industry in that cycle,” Zemlin explained.

In addition, IBM made several other announcements around open-sourcing other Power Chip technologies designed to help developers and engineers customize and control their implementations of Power chip technology. “IBM will also contribute multiple other technologies including a softcore implementation of the Power ISA, as well as reference designs for the architecture-agnostic Open Coherent Accelerator Processor Interface (OpenCAPI) and the Open Memory Interface (OMI). The OpenCAPI and OMI technologies help maximize memory bandwidth between processors and attached devices, critical to overcoming performance bottlenecks for emerging workloads like AI,” the company said in a statement.

The softcore implementation of the Power ISA, in particular, should give developers more control and even enable them to build their own instruction sets, Hugh Blemings, executive director of the OpenPower Foundation explained. “They can now actually try crafting their own instruction sets, and try out new ways of the accelerated data processes and so forth at a lower level than previously possible,” he said.

The company is announcing all of this today at the The Linux Foundation Open Source Summit and OpenPower Summit in San Diego.

Aug
19
2019
--

The five technical challenges Cerebras overcame in building the first trillion-transistor chip

Superlatives abound at Cerebras, the until-today stealthy next-generation silicon chip company looking to make training a deep learning model as quick as buying toothpaste from Amazon. Launching after almost three years of quiet development, Cerebras introduced its new chip today — and it is a doozy. The “Wafer Scale Engine” is 1.2 trillion transistors (the most ever), 46,225 square millimeters (the largest ever), and includes 18 gigabytes of on-chip memory (the most of any chip on the market today) and 400,000 processing cores (guess the superlative).

CS Wafer Keyboard Comparison

Cerebras’ Wafer Scale Engine is larger than a typical Mac keyboard (via Cerebras Systems).

It’s made a big splash here at Stanford University at the Hot Chips conference, one of the silicon industry’s big confabs for product introductions and roadmaps, with various levels of oohs and aahs among attendees. You can read more about the chip from Tiernan Ray at Fortune and read the white paper from Cerebras itself.

Superlatives aside though, the technical challenges that Cerebras had to overcome to reach this milestone I think is the more interesting story here. I sat down with founder and CEO Andrew Feldman this afternoon to discuss what his 173 engineers have been building quietly just down the street here these past few years, with $112 million in venture capital funding from Benchmark and others.

Going big means nothing but challenges

First, a quick background on how the chips that power your phones and computers get made. Fabs like TSMC take standard-sized silicon wafers and divide them into individual chips by using light to etch the transistors into the chip. Wafers are circles and chips are squares, and so there is some basic geometry involved in subdividing that circle into a clear array of individual chips.

One big challenge in this lithography process is that errors can creep into the manufacturing process, requiring extensive testing to verify quality and forcing fabs to throw away poorly performing chips. The smaller and more compact the chip, the less likely any individual chip will be inoperative, and the higher the yield for the fab. Higher yield equals higher profits.

Cerebras throws out the idea of etching a bunch of individual chips onto a single wafer in lieu of just using the whole wafer itself as one gigantic chip. That allows all of those individual cores to connect with one another directly — vastly speeding up the critical feedback loops used in deep learning algorithms — but comes at the cost of huge manufacturing and design challenges to create and manage these chips.

CS Wafer Sean

Cerebras’ technical architecture and design was led by co-founder Sean Lie. Feldman and Lie worked together on a previous startup called SeaMicro, which sold to AMD in 2012 for $334 million (via Cerebras Systems).

The first challenge the team ran into, according to Feldman, was handling communication across the “scribe lines.” While Cerebras’ chip encompasses a full wafer, today’s lithography equipment still has to act like there are individual chips being etched into the silicon wafer. So the company had to invent new techniques to allow each of those individual chips to communicate with each other across the whole wafer. Working with TSMC, they not only invented new channels for communication, but also had to write new software to handle chips with trillion-plus transistors.

The second challenge was yield. With a chip covering an entire silicon wafer, a single imperfection in the etching of that wafer could render the entire chip inoperative. This has been the block for decades on whole-wafer technology: due to the laws of physics, it is essentially impossible to etch a trillion transistors with perfect accuracy repeatedly.

Cerebras approached the problem using redundancy by adding extra cores throughout the chip that would be used as backup in the event that an error appeared in that core’s neighborhood on the wafer. “You have to hold only 1%, 1.5% of these guys aside,” Feldman explained to me. Leaving extra cores allows the chip to essentially self-heal, routing around the lithography error and making a whole-wafer silicon chip viable.

Entering uncharted territory in chip design

Those first two challenges — communicating across the scribe lines between chips and handling yield — have flummoxed chip designers studying whole-wafer chips for decades. But they were known problems, and Feldman said that they were actually easier to solve than expected by re-approaching them using modern tools.

He likens the challenge to climbing Mount Everest. “It’s like the first set of guys failed to climb Mount Everest, they said, ‘Shit, that first part is really hard.’ And then the next set came along and said ‘That shit was nothing. That last hundred yards, that’s a problem.’ ”

And indeed, the toughest challenges, according to Feldman, for Cerebras were the next three, since no other chip designer had gotten past the scribe line communication and yield challenges to actually find what happened next.

The third challenge Cerebras confronted was handling thermal expansion. Chips get extremely hot in operation, but different materials expand at different rates. That means the connectors tethering a chip to its motherboard also need to thermally expand at precisely the same rate, lest cracks develop between the two.

As Feldman explained, “How do you get a connector that can withstand [that]? Nobody had ever done that before, [and so] we had to invent a material. So we have PhDs in material science, [and] we had to invent a material that could absorb some of that difference.”

Once a chip is manufactured, it needs to be tested and packaged for shipment to original equipment manufacturers (OEMs) who add the chips into the products used by end customers (whether data centers or consumer laptops). There is a challenge though: Absolutely nothing on the market is designed to handle a whole-wafer chip.

CS Wafer Inspection

Cerebras designed its own testing and packaging system to handle its chip (via Cerebras Systems).

“How on earth do you package it? Well, the answer is you invent a lot of shit. That is the truth. Nobody had a printed circuit board this size. Nobody had connectors. Nobody had a cold plate. Nobody had tools. Nobody had tools to align them. Nobody had tools to handle them. Nobody had any software to test,” Feldman explained. “And so we have designed this whole manufacturing flow, because nobody has ever done it.” Cerebras’ technology is much more than just the chip it sells — it also includes all of the associated machinery required to actually manufacture and package those chips.

Finally, all that processing power in one chip requires immense power and cooling. Cerebras’ chip uses 15 kilowatts of power to operate — a prodigious amount of power for an individual chip, although relatively comparable to a modern-sized AI cluster. All that power also needs to be cooled, and Cerebras had to design a new way to deliver both for such a large chip.

It essentially approached the problem by turning the chip on its side, in what Feldman called “using the Z-dimension.” The idea was that rather than trying to move power and cooling horizontally across the chip as is traditional, power and cooling are delivered vertically at all points across the chip, ensuring even and consistent access to both.

And so, those were the next three challenges — thermal expansion, packaging and power/cooling — that the company has worked around-the-clock to deliver these past few years.

From theory to reality

Cerebras has a demo chip (I saw one, and yes, it is roughly the size of my head), and it has started to deliver prototypes to customers, according to reports. The big challenge, though, as with all new chips, is scaling production to meet customer demand.

For Cerebras, the situation is a bit unusual. Because it places so much computing power on one wafer, customers don’t necessarily need to buy dozens or hundreds of chips and stitch them together to create a compute cluster. Instead, they may only need a handful of Cerebras chips for their deep-learning needs. The company’s next major phase is to reach scale and ensure a steady delivery of its chips, which it packages as a whole system “appliance” that also includes its proprietary cooling technology.

Expect to hear more details of Cerebras technology in the coming months, particularly as the fight over the future of deep learning processing workflows continues to heat up.

Aug
14
2019
--

Why chipmaker Broadcom is spending big bucks for aging enterprise software companies

Last year Broadcom, a chipmaker, raised eyebrows when it acquired CA Technologies, an enterprise software company with a broad portfolio of products, including a sizable mainframe software tools business. It paid close to $19 billion for the privilege.

Then last week, the company opened up its wallet again and forked over $10.7 billion for Symantec’s enterprise security business. That’s almost $30 billion for two aging enterprise software companies. There has to be some sound strategy behind these purchases, right? Maybe.

Here’s the thing about older software companies. They may not out-innovate the competition anymore, but what they have going for them is a backlog of licensing revenue that appears to have value.

Jul
31
2019
--

Calling all hardware startups! Apply to Hardware Battlefield @ TC Shenzhen

Got hardware? Well then, listen up, because our search continues for boundary-pushing, early-stage hardware startups to join us in Shenzhen, China for an epic opportunity; launch your startup on a global stage and compete in Hardware Battlefield at TC Shenzhen on November 11-12.

Apply here to compete in TC Hardware Battlefield 2019. Why? It’s your chance to demo your product to the top investors and technologists in the world. Hardware Battlefield, cousin to Startup Battlefield, focuses exclusively on innovative hardware because, let’s face it, it’s the backbone of technology. From enterprise solutions to agtech advancements, medical devices to consumer product goods — hardware startups are in the international spotlight.

If you make the cut, you’ll compete against 15 of the world’s most innovative hardware makers for bragging rights, plenty of investor love, media exposure and $25,000 in equity-free cash. Just participating in a Battlefield can change the whole trajectory of your business in the best way possible.

We chose to bring our fifth Hardware Battlefield to Shenzhen because of its outstanding track record of supporting hardware startups. The city achieves this through a combination of accelerators, rapid prototyping and world-class manufacturing. What’s more, TC Hardware Battlefield 2019 takes place as part of the larger TechCrunch Shenzhen that runs November 9-12.

Creativity and innovation no know boundaries, and that’s why we’re opening this competition to any early-stage hardware startup from any country. While we’ve seen amazing hardware in previous Battlefields — like robotic armsfood testing devicesmalaria diagnostic tools, smart socks for diabetics and e-motorcycles, we can’t wait to see the next generation of hardware, so bring it on!

Meet the minimum requirements listed below, and we’ll consider your startup:

Here’s how Hardware Battlefield works. TechCrunch editors vet every qualified application and pick 15 startups to compete. Those startups receive six rigorous weeks of free coaching. Forget stage fright. You’ll be prepped and ready to step into the spotlight.

Teams have six minutes to pitch and demo their products, which is immediately followed by an in-depth Q&A with the judges. If you make it to the final round, you’ll repeat the process in front of a new set of judges.

The judges will name one outstanding startup the Hardware Battlefield champion. Hoist the Battlefield Cup, claim those bragging rights and the $25,000. This nerve-wracking thrill-ride takes place in front of a live audience, and we capture the entire event on video and post it to our global audience on TechCrunch.

Hardware Battlefield at TC Shenzhen takes place on November 11-12. Don’t hide your hardware or miss your chance to show us — and the entire tech world — your startup magic. Apply to compete in TC Hardware Battlefield 2019, and join us in Shenzhen!

Is your company interested in sponsoring or exhibiting at Hardware Battlefield at TC Shenzhen? Contact our sponsorship sales team by filling out this form.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com