MENU

SemiDynamics details its all-in-one RISC-V NPU

SemiDynamics details its all-in-one RISC-V NPU

Technology News |
By Nick Flaherty

Cette publication existe aussi en Français


SemiDynamics in Spain has developed a fully programmable Neural Processing Unit (NPU) IP that combines CPU, vector, and tensor processing to deliver up to 256 TOPS for large language models and AI recommendation systems.

The Cervell NPU is based on the RISC-V open instruction set architecture that scales from 8 to 64 cores. This allows designers to tune performance to the requirements of the applications, from 8 TOPS INT8 at 1GHz in compact edge deployments to 256 TOPS INT4 in high-end AI inference in datacentre chips.

It follows the launch of the all-in-one architecture back in December detailed in this white paper.

“Cervell is designed for a new era of AI compute — where off-the-shelf solutions aren’t enough. As an NPU, it delivers the scalable performance needed for everything from edge inference to large language models. But what really sets it apart is how it’s built: fully programmable, with no lock-in thanks to the open RISC-V ISA, and deeply customizable down to the instruction level. Combined with our Gazillion Misses memory subsystem, Cervell removes traditional data bottlenecks and gives chip designers a powerful foundation to build differentiated, high-performance AI solutions,” says Roger Espasa, CEO of Semidynamics.

Cervell NPUs are purpose-built to accelerate matrix-heavy operations, enabling higher throughput, lower power consumption, and real-time response. By integrating NPU capabilities with standard CPU and vector processing in a unified architecture, designers can eliminate latency and maximize performance across diverse AI tasks, from recommendation systems to deep learning pipelines.

THe Cervell cores are tightly integrated with the Gazillion Misses memory management subsystem. This enables up to 128 simultaneous memory requests, eliminating latency stalls with over 60 bytes/cycle of sustained data streaming. There is also massively parallel access to off-chip memory, essential for large model inference and sparse data processing.

This maintains full pipeline saturation, even in bandwidth-heavy applications like recommendation systems and deep learning.

The core is fully customizable with the ability to add scalar or vector instructions, configure scratchpad memories and custom I/O FIFOs and define memory interfaces and synchronization schemes to provide differentiated AI hardware with future-proofing.

This deep customization at the RTL level, including the insertion of customer-defined instructions, allows companies to integrate the unique IP directly into the solution protecting their ASIC investment from imitation and ensuring the design is fully optimized for power, performance, and area. The development model includes early FPGA drops and parallel verification to reduce the development time and risks.

Configuration

INT8 @ 1GHz

INT4 @ 1GHz

INT8 @ 2GHz

INT4 @ 2GHz

C8

8 TOPS

16 TOPS

16 TOPS

32 TOPS

C16

16 TOPS

32 TOPS

32 TOPS

64 TOPS

C32

32 TOPS

64 TOPS

64 TOPS

128 TOPS

C64

64 TOPS

128 TOPS

128 TOPS

256 TOPS

www.semidynamics.com

 

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s