Sparsity engine boost for neural network IP core

Sparsity engine boost for neural network IP core

Technology News |
Ceva has boosted the performance of its machine learning accelerator core by a factor of 5 to 15 with a compression algorithm and sparsity engine.
By Nick Flaherty

Share:

The NeuPro-M IP developed by Ceva is aimed at artificial intelligence and machine learning (AI/ML) inference workloads for edge AI and edge compute chips where the power efficiency is key.

The heterogeneous architecture uses multiple specialized co-processors and configurable hardware accelerators that seamlessly and simultaneously process diverse workloads of Deep Neural Networks (DNN) with 5 to 15x performance compared to its predecessor. This comes from an increase in the throughput of the core, a compression algorithm to reduce the amount of data needed and a sparsity engine to avoid processing data unnecessarily.

The IP, launched at CES 2022 this week, supports both system-on-chip (SoC) as well as Heterogeneous SoC (HSoC) chiplet designs for the first time. The NPM11 is a single NeuPro-M engine with up to 20 TOPS at 1.25GHz, and this scales to an eight core self-contained IP block with up to 160 TOPS at 1.25GHz.

The single NPM11 core, when processing a ResNet50 convolutional neural network, achieves a 5X performance increase and 6X memory bandwidth reduction versus its predecessor, which results in power efficiency of up to 24 TOPS per watt.

The NeuPro-M architecture is capable of processing all known neural network architectures, as well as integrated native support for next-generation networks like transformers, 3D convolution, self-attention and all types of recurrent neural networks. It has also been optimized to process more than 250 neural networks, more than 450 AI kernels and more than 50 algorithms.

The embedded vector processing unit (VPU) ensures future proof software-based support of new neural network topologies and new advances in AI workloads. An offline DNN compression tool can increase the FPS/Watt of the NeuPro-M by a factor of 5-10X for common benchmarks, with very minimal impact on accuracy.

“The artificial intelligence and machine learning processing requirements of edge AI and edge compute are growing at an incredible rate, as more and more data is generated and sensor-related software workloads continue to migrate to neural networks for better performance and efficiencies. With the power budget remaining the same for these devices, we need to find new and innovative methods of utilizing AI at the edge in these increasingly sophisticated systems,” said Ran Snir, Vice President and General Manager of the Vision Business Unit at Ceva.

“NeuPro-M is designed on the back of our extensive experience deploying AI processors and accelerators in millions of devices, from drones to security cameras, smartphones and automotive systems. Its innovative, distributed architecture and shared memory system controllers reduces bandwidth and latency to an absolute minimum and provides superb overall utilization and power efficiency. With the ability to connect multiple NeuPro-M compliant cores in a SoC or Chiplet to address the most demanding AI workloads, our customers can take their smart edge processor designs to the next level,” he said.

The NeuPro-M heterogenic architecture is composed of function-specific co-processors and load balancing mechanisms that are the main contributors to the huge leap in performance and efficiency compared to its predecessor. By distributing control functions to local controllers and implementing local memory resources in a hierarchical manner, the NeuPro-M achieves data flow flexibility that result in more than 90 percent utilization and protects against data starvation of the different co-processors and accelerators at any given time.

The optimal load balancing is obtained by practicing various data flow schemes that are adopted to the specific network, the desired bandwidth, the available memory and the target performance, by the DNN framework.

A main grid array consists of 4K MACs (Multiply And Accumulate units), with mixed precision of 2-16 bits, alongside a Winograd transform engine for weights and activations, reducing convolution time by half that allows 8bit convolution processing with a hit to the precision of under 0.5 percent. A sparsity engine to avoid operations with zero-value weights or activations per layer, for up to 4X performance gain, while reducing memory bandwidth and power consumption.

The fully programmable Vector Processing Unit, for handling new unsupported neural network architectures with all data types, from 32-bit Floating Point down to 2-bit Binary Neural Networks (BNN). The compression is configurable for weights and data down to 2bits while storing to memory, and real-time decompression upon reading, for reduced memory bandwidth. A dynamically configured two level memory architecture helps to minimise the power consumption of data transfers to and from an external SDRAM memory. 

As neural network Weights and Biases and the data set and network topology become key Intellectual Property of the owner, there is a strong need to protect these from unauthorized use. The NeuPro-M architecture supports secure access in the form of optional root of trust, authentication, and cryptographic accelerators.

Next: AI for automotive sensor chip designs


For the automotive market, NeuPro-M cores and the DNN compiler and software toolkit comply with Automotive ISO26262 ASIL-B functional safety standard and meets the quality assurance standards for IATF16949 and A-Spice.

The DNN compiler can fully utilize the NeuPro-M customized hardware to optimize power, performance & bandwidth and includes a memory manager for memory reduction and optimal load balancing algorithms, and wide support of various network formats including ONNX, Caffe, TensorFlow, TensorFlow Lite, Pytorch and more. It is also compatible with common open-source frameworks, including Glow, tvm, Halide and TensorFlow and includes model optimization features like ‘layer fusion’ and ‘post training quantization’ all while using precision conservation methods.

NeuPro-M is available for licensing to lead customers today and for general licensing in Q2 this year. Ceva is also aiming to assist customers with Heterogenous SoC design services to help integrate and support system design and chiplet development.

www.ceva-dsp.com/product/ceva-neupro-m/.

More CES News 

Related Ceva articles

Other articles on eeNews Europe 

 

 

Linked Articles
eeNews Europe
10s