AI inference engine on 5nm hits 95TOPS/W

Technology News |
By Nick Flaherty

Nvidia has developed a test chip for an inference engine for machine learning built on a 5nm process technology.

The chip has been designed for quantized data at 4bits using the latest transformer technology. It can process 1024 4bit MACs/cycle (512 8-bit) and measures 0.153 mm2. It is powered from a voltage range of 0.46V – 1.05V and operates at a frequency of 152MHz to 1760MHz. This produces a performance of up to 95TOPS/Watt.

While Nvidia suppliers its GPUS for training AI frameworks in the data centre. Other chip designers are developing devices for inference of the trained frameworks at the edge of the network with low power consumption.

Nvidia’s custom architecture aims to efficiently execute different computations in transformers and optimise the dataflow to improve data reuse and energy efficiency. The architecture uses a combination of hardware-software techniques to tolerate quantization error when converting down to 4bit data widths that enable low cost multiply-accumulate (MAC) operations. It also has specialised hardware to improve efficiency of functions like Softmax that are unique to transformers.

The chip was shown at the recent 2022 VLSI Symposium.

Other articles on eeNews Europe



Linked Articles
eeNews Europe