How the data centre moved to custom chips

Feature articles | July 28, 2022

By Nick Flaherty

MPUs/MCUs EDA & CAD tools

Data centre providers have been developing their own silicon on 7nm and 5nm processes and driving the use of cloud-based chip design to boost productivity.

Chinese giant ByteDance, the owner of social media site Tiktok, this week is reported to be designing its own chips. It has started at least four chip projects, including AI chips that have already taped out, video processing chips and a design around the RISC-V open instruction set. It is also recruiting chip designers with experience in 7nm process technology in a sign that the activity is ramping up.

This follows Amazon and T-Head, a subsidiary of Chinese data centre giant Alibaba Chip, that have both developed AI chips and ARM-based CPUs for use in their data centres.

Nafea Bshara, vice president and distinguished engineer at Amazon, is speaking at the company’s silicon innovation day next week about Annapurna Lab and how the Israeli startup, acquired in 2015, has been key to the development of the custom chips.

Annapurna has developed five generation of the Nitro chip, which hchanged the way hypervisor software is used, three generations of the ARM-based Graviton processors that support data-intensive workloads, as well as Trainium, and Inferentia chips optimized for machine learning training and inference.

“We started Annapurna Labs because we looked at the way the chip industry was investing in infrastructure and data centres; it was minuscule at that time because everybody was going after the gold rush of mobile phones, smartphones, and tablets,” said Bshara.

“We believed the industry was over indexing on investment for mobile, and under investing in the data centre. The data centre market was underserved. That, combined with the fact that there was increasing disappointment with the ineffective and non-productive method of developing chips, especially when compared with software development.”

Productivity in chip design has been a key issue for the industry that he wanted to address making more use of IP and re-usable building blocks and of the cloud.

“The productivity of software developers had improved significantly in the past 25 years, while the productivity of chip developers hadn’t improved much since the ‘90s. In assessing the opportunity, we saw a data-centre market that was being underserved, and an opportunity to redefine chip development with greater productivity, and with a better business model. Those factors contributed to us starting Annapurna Labs.

“Much in the same way that AWS defined the cloud operating model last decade, we believe it is once again leading in future systems. The secret sauce underpinning these innovations is specialized designs…We believe these moves position AWS to accommodate a diversity of workloads that span cloud, data centre as well as the near and far edge,” he said.

“While Annapurna has been a pioneer in advancing productivity and time to market, many others are following in our footsteps and transitioning to a building-blocks-centric development mindset, similar to how the software industry moved toward object-oriented, and service-oriented software design.

“Chip companies have now transitioned to what we refer to as an intellectual property-oriented, or IP-oriented, correct-by-design approach. Secondly, the chip industry has adopted the cloud. Cloud adoption has led to an explosion of compute power for building chips. Using the cloud, we are able to use compute in a ‘bursty’ way and in parallel. We and our chip-industry colleagues couldn’t deliver the silicon we do today without the cloud. This has led to the creation of a healthy market where chip companies have realized they don’t need to build everything in house, in much the same way software companies have realized they can buy libraries from open source or other library providers. The industry has matured to the point where now there is a healthy business model around buying building blocks, or IPs, from providers like ARM, Synopsys, Alphawave and Cadence.”

Nitro was the first step, re-designing the virtualization infrastructure. Traditionally, hypervisors protect the physical hardware and bios, virtualize the CPU, storage, networking, and provide a rich set of management capabilities. With the Nitro System, AWS broke apart those functions, offload them to dedicated hardware and software to reduce costs for the the EC2 instances.

“We built delivering multiple generations of Nitro and we had confidence in our ability to execute on building the chips and manufacturing them at high volume, and high quality. Designing for machine learning is one the most challenging, but also the most rewarding tasks,” said Bshara.

“Science and machine learning are moving very fast. As an organization that is building hardware, our job is to predict what customers will need three, four, five years down the road because the development cycle for a chip can be two years, and then it gets deployed for three years. The lifecycle is around five years and trying to predict how the needs of scientists and the machine-learning community will evolve over that time span is difficult. Unlike CPU workloads, which aren’t evolving very quickly, machine learning workloads are, and it’s a bit of an art to keep apace,” he added.

“It’s the art of predicting what customers will need three years from now, while still executing on time and budget. These things only come with the experience to strike the right balance between cost, schedule, and future-proofing the product.”

Using the Inferentia chip and Inf1, the EC2 instances that host these chips, Alexa is able to run much more advanced machine learning algorithms at lower costs and with lower latency than a standard general-purpose chip. “It’s not that the general-purpose chip couldn’t do the job, it’s that it would do so at higher costs and higher latency,” he said. “With Inferentia we deliver lower latency and support much more sophisticated algorithms.”

“What we can do at the chip level, at the EC2 level, is actually work on three vectors, which we’re doing right now. The first is drive to lower power quickly by using more advanced silicon processes. Every time we build a chip in an advanced silicon process we’re utilizing new semiconductor processes with smaller transistors that require less power for the same work. Because of our focus on efficient execution, we can deliver to EC2 customers a new chip based on a more modern, power-efficient silicon process every 18 months or so.

“The second vector is building more technologies, trying to accelerate in hardware and in algorithms, to get training and inference done faster. The faster we can handle training and inference, the less power is consumed. For example, one of the technologies we innovated in the last Trainium chip was something called stochastic rounding which, depending upon which measure you’re looking at for some neural workloads, could accelerate neural network training by up to 30%. When you say 30% less time that translates into 30% less power.

Other feature articles on eeNews Europe

“Another thing we’re doing at the algorithmic level is offering different data types. For example, historically machine learning used a 32-bit floating point. Now we’re offering multiple versions of 16-bit and a few versions of 8-bit. When these different data types are used, they not only accelerate machine learning training, they significantly reduce the power for the same amount of workload. For example, doing matrix multiplication on a 16-bit float point is less than one-third the total power if we had done it with 32-bit floating point. The ability to add things like stochastic rounding or new data types at the algorithmic level provides a step-function improvement in power consumption for the same amount of workload.

“The third vector is offering more choice. There are different chips optimized for different workloads, and the best way for customers to save energy is to follow the classic Amazon mantra – the everything store. We offer all different types of chips, including multiple generations of Nvidia GPUs, Intel Habana, and Trainium, and share with the customer the power profile and performance of each of the instances hosting these chips, so the customer can choose the right chip for the right workload, and optimize for the lowest possible power consumption at the lowest cost.

The second generation Graviton chips on average consume 60% less power than same-generation competitive chips. The third generation Graviton3 processors became available in May, so the performance is still being assessed.

“We have plans for many more generations,” he said. “We have a product roadmap, and a technology and investment strategy that extends to 2032. As much uncertainty as there is in the future, there are a few things we’re highly convicted in, and we’re investing in them, even though they may be ten years out.”

pages.awscloud.com/GLBL-Silicon-Innovation-Day-2022-reg-event.html

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to : eeNews on Google News