GPNPU processor IP blends NPU, DSP into new category of hybrid SoC

GPNPU processor IP blends NPU, DSP into new category of hybrid SoC

New Products |
By Rich Pell

General-purpose neural processor IP licensor Quadric has introduced what it says is the first family of general-purpose neural processors (GPNPUs). Chimera is a semiconductor intellectual property (IP) offering that blends the machine learning (ML) performance characteristics of a neural processing accelerator with the full C++ programmability of a modern digital signal processor (DSP).

Chimera GPNPUs provide one unified architecture for ML inference plus pre-and-post processing, says the company, greatly simplifying both system-on-chip (SoC) hardware design by the semiconductor developer today and subsequent software programming months and years later by application developers.

“Machine learning is infiltrating nearly all applications everywhere DSPs are traditionally used today for vision, audio, sound, communications, sensors, and so much more,” says Veerbhan Kheterpal, co-founder and CEO of Quadric. “Existing silicon solutions to the ML inference challenge have added accelerators as helper offload cores to existing DSPs or CPUs. The limitation of that approach is the clumsy way the programmer has to partition her code across the different cores in the system and then tune the interaction between those cores to get desired performance goals. The new Chimera GPNPU family creates a unified, single-core architecture for both ML inference and related conventional C++ processing of images, video, radar or other signals, eliminating multicore challenges.”

A significant advantage claimed for this new architecture is that neural network graphs and C++ code are merged into a single software code stream. Only one tool chain is required for scalar, vector, and matrix computations. Memory bandwidth is optimized by a single unified compilation stack that helps result in significant power minimization.

The QB series of the Chimera family of GPNPUs includes three cores:

  • Chimera QB1 – 1 trillion operations per second (TOPS) machine learning, 64 giga operations per second (GOPs) DSP capability
  • Chimera QB4 – 4 TOPS machine learning, 256 GOPs DSP
  • Chimera QB16 – 16 TOPS machine learning, 1 TOPS DSP

Chimera cores deliver ML inference performance similar to the efficiency of dedicated CNN offload engines but with full programmability, says the company. Unlike conventional accelerators that can only run a handful of predetermined ML operators, Chimera GPNPUs can run any ML operator. Custom operators can be added by the SW developer simply by writing a C++ kernel utilizing the Chimera Compute Library (CCL) application programming interface (API) then compiling that kernel using the Chimera Software Developers Toolkit (SDK).

Operators added with the Chimera SDK flow can be highly performant, utilizing the full performance of the Chimera GPNPU. Competing architecture solutions rely upon conventional CPUs or DSPs as the “fallback” programable solution for newly emergent ML operators, but those CPUs or DSP are 10x to 1000x lower performing than the accelerators they are paired with.

“Automobile market analysts have coined the term ‘Range Anxiety’ to describe consumer wariness about purchasing a battery-powered automobile and getting stranded too far from scarce charging stations, or being stuck without a high-voltage, fast charging option,” says Steve Roddy, Quadric’s chief marketing officer. “In the semiconductor world the term Operator Anxiety has come into vogue to describe the very real angst silicon companies fear in responding to evolving ML workloads. Just as the EV car owner wants to use an 800V fast charger and avoid the slow overnight charging speed of a standard wall socket, a fully programmable Chimera GPNPU solves the Operator Anxiety problem with high-speed custom operator support, not slow CPU support, for new ML operators.”

The benefits to the software developer of having a single architecture to program and fine-tune are obvious, says the company. Far more software developers are comfortable programming a single core than they are dealing with heterogenous multicore systems.

But a significant secondary benefit of combining C++ signal processing together with ML graph processing on a single Chimera GPNPU is the area and power savings of not requiring activation data (images, signals) to be shuffled back and forth between two or three (CPU, DSP, accelerator) processing engines. For legacy systems with three cores – and three associated memory subsystems used to buffer data transfers between cores – the area and related power savings realized by switching to a Chimera GPNPU solution can be substantial.

The Chimera architecture has already been proven at-speed in silicon, says the company. The company says that it is ready for immediate customer engagement by chip design teams looking to start an IP evaluation this fall or winter.


If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles