Nvidia has developed a test chip for an inference engine for machine learning built on a 5nm process technology.
The chip has been designed for quantized data at 4bits using the latest transformer technology. It can process 1024 4bit MACs/cycle (512 8-bit) and measures 0.153 mm2. It is powered from a voltage range of 0.46V – 1.05V and operates at a frequency of 152MHz to 1760MHz. This produces a performance of up to 95TOPS/Watt.
While Nvidia suppliers its GPUS for training AI frameworks in the data centre. Other chip designers are developing devices for inference of the trained frameworks at the edge of the network with low power consumption.
- Perceive launches edge AI inference processor
- IBM shows first dedicated AI inference chip
- Edge AI chip market to overtake cloud by 2025
Nvidia’s custom architecture aims to efficiently execute different computations in transformers and optimise the dataflow to improve data reuse and energy efficiency. The architecture uses a combination of hardware-software techniques to tolerate quantization error when converting down to 4bit data widths that enable low cost multiply-accumulate (MAC) operations. It also has specialised hardware to improve efficiency of functions like Softmax that are unique to transformers.
The chip was shown at the recent 2022 VLSI Symposium.
Other articles on eeNews Europe
- ST looks at joint European wafer fab with GlobalFoundries
- UK looks to replace high accuracy satnav signals
- Graphcore aims at European AI supply chain with German deal
- NXP aims for microcontroller Nirvana
- The five biggest microcontroller suppliers in 2021
