MENU

Power integrity for waferscale AI with 400,000 cores

Technology News |
By Nick Flaherty


Cerebras Systems has developed a waferscale accelerator for AI applications using a novel power integrity monitoring infrastructure.

“Our mission is to transform the landscape of compute by accelerating performance for AI by orders of magnitude over today,” said Dhiraj Mallick, VP of engineering at Cerebras Systems. “Both the massive size and shape of the problem make it difficult for today’s infrastructure and precludes the most interesting problems from being tackled. Computer requirements for AI have increased 300,000 fold in the last 8 years, this is a doubling every 3.4 months compared to 24 months for Moore’s Law. So we need a new computer solutions,” he said.

The Cerebras approach is to use a whole wafer for the AI processing. This system is 46,000 sq mm in size with 400,000 cores, each purpose built for deep learning and 18Gbytes of SRAM. These are connected directly in the silicon with a 2D mesh providing a 100Pbit/s fabric.

The power system is vital for such a massive system, and Cerebras used nearly 1000 instances of a power macro from Analog Bits.

“One of the challenges to overcome is power integrity,” said Mallick. “This includes the ability to monitor power events and take corrective actions at very high speeds. We have hundreds of thousands of cores where dynamic current surges can cause catastrophic failures. So we distributed 840 analog glitch detectors across the waferscale chip to provide real time health data.”

This can detect anomalies with significant higher bandwidth than other approaches and so catch short duration events. This comes from a sensitivity of 5pV for monitoring the power supply in real time. “This provides a wealth of data to optimise instantaneous current spikes,” he said.

“Our power supply glitch detector has an integrated voltage reference and the macro is easy to integrate with no additional components or special power requirements,” said Mahesh Tirupattur, executive vice president at Analog Bits.

The asynchronous macro is cascadable with up to 5 macros off the same power supply and there is independent programming for different glitch levels between 0.675 to 0.935 voltage with a trigger value of ± 10mV with 5pV sensitivity. There is no clock but the outputs are latched to make it easier to integrate the macro into the system.

“The silicon is proven in the 7nm 7FF process and is being ported to N5 which will be available in Q3 2020,” said Tirupattur. “This macro will be available in N3 in the first quarter of 2021.

“We are also working on a system power supply in N5,” he added. “The difference is that it has a programmable droop detection as well to program the system power. This will be available in Q3 2020 and we have an N7 test chip that is undergoing characterisation.”

Cerebras is planning to build its second generation system on TSMC’s N7 7nm process. This is set to have 850,000 cores with 2.6tn transistors.

The current waferscale system is packaged as the CS-1 and is fully programmable. It installs in a standard rack in a data centre and has the performance of a large cluster of GPUs with the programming benefits of a single node.

The engine is being integated with the US National Nuclear Security Administration’s (NNSA) 23-petaflop Lassen supercomputer at the Lawrence Livermore National Laboratory (LLNL).

www.cerebras.netwww.analogbits.com

Related articles

Other articles on eeNews Europe 


Share:

Linked Articles
eeNews Europe
10s