SemiDynamics in Spain has developed a RISC-V Tensor Unit for AI chip design based on its fully customisable 64bit cores.
The RISC-V Tensor unit is integrated into the cache sub-system, which SemiDynamics makes it the first fully coherent such unit for high performance AI chip design in the data centre.
Large language machine learning models (LLMs) such as LLaMa-2 or ChatGPT use billions of parameters and require a large computation capability. The bulk of the computations in the LLM layers can be efficiently implemented as matrix multiplication in the Tensor Unit hardware.
- Spanish startup performs RISC-V open core surgery
- Configurable RISC-V vector unit
- Multicore RISC-V chiplet boost
The Tensor Unit is built on top of the Semidynamics RVV1.0 Vector Processing Unit and uses the existing vector registers to store matrices. This enables the Tensor Unit to be used for layers that require matrix multiply capabilities, such as Fully Connected and Convolution, and use the Vector Unit for the activation function layers (ReLU, Sigmoid, Softmax, etc), which is a big improvement over stand-alone NPUs that can struggle with activation layers.
The Tensor Unit uses both the Vector Unit capabilities as well as the Atrevido-423 Gazzillion CPU to fetch the data it needs from memory. The performance of the 64bit CPU core means a direct memory access (DMA) is not needed to manage the data flow. Because the Tensor Unit uses the vector registers to store its data and does not include new, architecturally-visible state, it can work with any RISC-V vector-enabled Linux without any changes.
“This new Tensor Unit is designed to fully integrate with our other innovative technologies to provide solutions with outstanding AI performance,” said Roger Espasa, founder and CEO of SemiDynamics.
“First, at the heart, is our 64-bit fully customisable RISC-V core. Then our Vector Unit which is constantly fed data by our Gazzillion technology so that there are no data misses. And then the Tensor Unit that does the matrix multiplications required by AI. Every stage of this solution has been designed to be fully integrated with the others for optimal AI performance and very easy programming. The result is a performance increase of 128x compared to just running the AI software on the scalar core.”
The Tensor Unit will be discussed at the RISC-V Summit in the US next month as part of a focus on AI chip design.