Measuring 46,225 mm2 and optimized for AI work, the Cerebras Wafer Scale Engine (WSE) is 56.7 times larger than the largest graphics processing unit, which measures 815 mm2 and contains 21.1 billion transistors. The WSE, says the company, also contains 3,000 times more high speed, on-chip memory, and has 10,000 times more memory bandwidth.
“Designed from the ground up for AI work, the Cerebras WSE contains fundamental innovations that advance the state-of-the-art by solving decades-old technical challenges that limited chip size – such as cross-reticle connectivity, yield, power delivery, and packaging,” says Andrew Feldman, founder and CEO of Cerebras Systems. “Every architectural decision was made to optimize performance for AI work. The result is that the Cerebras WSE delivers, depending on workload, hundreds or thousands of times the performance of existing solutions at a tiny fraction of the power draw and space.”
In AI, says the company, chip size is profoundly important, with bigger chips being able to process information more quickly, producing answers in less time. Reducing the time-to-insight, or “training time” – a major bottleneck to industry-wide progress – allows researchers to test more ideas, use more data, and solve new problems.
The chip’s performance gains are accomplished by accelerating all the elements of neural network training. A neural network is a multistage computational feedback loop. The faster inputs move through the loop, the faster the loop learns or “trains.” The way to move inputs through the loop faster is to accelerate the calculation and communication within the loop.
With 56.7 times more silicon area than the largest graphics processing unit, the WSE provides more cores to do calculations and more memory closer to the cores so the cores can operate efficiently. Because this vast array of cores and memory are on a single chip, says the company, all communication is kept on-silicon providing breakthrough bandwidth so groups of cores can collaborate with maximum efficiency, and memory bandwidth is no longer a bottleneck.
The Cerebras WSE houses 400,000 AI-optimized, no-cache, no-overhead, compute cores and 18 gigabytes of local, distributed, superfast SRAM memory as the one and only level of the memory hierarchy. Memory bandwidth is 9 petabytes per second. The cores are linked together with a fine-grained, all-hardware, on-chip mesh-connected communication network that delivers an aggregate bandwidth of 100 petabits per second. More cores, more local memory, and a low latency high bandwidth fabric together create the optimal architecture for accelerating AI work, says the company.
The WSE is manufactured by TSMC on its advanced 16-nm process technology.
Tesla, Nvidia spar over ‘best’ autonomous AI chip
Intel invests in ‘groundbreaking’ AI chip architecture startup
AI chip market set to rival microcontrollers by 2021
Steep growth of AI chip market will produce new winners