MENU

IP core combines programmable neural processor with DSP

IP core combines programmable neural processor with DSP

New Products |
By Nick Flaherty



Quadric in the US has introduced a family of general-purpose neural processors (GPNPUs) that combine a neural processing accelerator with the full C++ programmability of a digital signal processor (DSP).

The Chimera GPNPU IP provides a single unified architecture for machine learning inference plus pre-and-post processing, simplifying both system-on-chip (SoC) hardware design by the semiconductor developer today and subsequent software programming by application developers.

“Machine learning is infiltrating nearly all applications everywhere DSPs are traditionally used today for vision, audio, sound, communications, sensors, and so much more,” said Veerbhan Kheterpal, co-founder and CEO of Quadric.

“Existing silicon solutions to the ML inference challenge have added accelerators as helper offload cores to existing DSPs or CPUs. The limitation of that approach is the clumsy way the programmer has to partition her code across the different cores in the system and then tune the interaction between those cores to get desired performance goals. The new Chimera GPNPU family creates a unified, single-core architecture for both ML inference and related conventional C++ processing of images, video, radar or other signals, eliminating multicore challenges.”

A significant advantage of this new architecture is that neural network graphs and C++ code are merged into a single software code stream. Only one tool chain is required for scalar, vector, and matrix computations. Memory bandwidth is optimized by a single unified compilation stack that helps result in significant power minimization.

The IP has scalable performance from 512 MAC to 8K MAC with three cores. The Chimera QB1 has 1 trillion operations per second (TOPS) machine learning, 64 giga operations per second (GOPs) DSP capability, while the Chimera QB4 has 4 TOPS machine learning and 256 GOPs of DSP and the Chimera QB16 has 16 TOPS machine learning with 1 TOPS of DSP.

Chimera cores can be targeted to any silicon foundry and any process technology. The entire family of QB Series GPNPUs can achieve 1 GHz operation in mainstream 16nm or 7nm processes using conventional standard cell flows and commonly available single-ported SRAM. For applications requiring even greater levels of performance two or more Chimera cores can be paired together.

The Chimera GPNPU architecture is designed to acceleration convolution layers to provide machine learning inference performance similar to the efficiency of dedicated CNN offload engines but with full programmability. Unlike conventional accelerators that can only run a handful of predetermined ML operators, Chimera GPNPUs can run any ML operator. Custom operators can be added by the SW developer simply by writing a C++ kernel utilizing the Chimera Compute Library (CCL) application programming interface (API) then compiling that kernel using the Chimera Software Developers Toolkit (SDK). Operators added with the Chimera SDK flow can be highly performant, utilizing the full performance of the Chimera GPNPU. Competing architecture solutions rely upon conventional CPUs or DSPs as the “fallback” programable solution for newly emergent ML operators, but those CPUs or DSP are 10x to 1000x lower performing than the accelerators they are paired with.

The benefits to the software developer of having a single architecture to program and fine-tune are obvious. Far more software developers are comfortable programming a single core than they are dealing with heterogenous multicore systems. But a significant secondary benefit of combining C++ signal processing together with ML graph processing on a single Chimera GPNPU is the area and power savings of not requiring activation data (images, signals) to be shuffled back and forth between two or three (CPU, DSP, accelerator) processing engines. For legacy systems with three cores – and three associated memory subsystems used to buffer data transfers between cores – the area and related power savings realized by switching to a Chimera GPNPU solution can be substantial.

www.Quadric.io

Other articles on eeNews Europe

 

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s