AImotive double performance in fourth generation AI core

May 28, 2021 // By Nick Flaherty
AImotive double performance in fourth generation AI core
The aiWare4 NPU core from AImotive adds wavefront memory processing, upgraded safety and low-power features for both automotive and edge AI applications for 5nm and 3nm chips

Leading AI technology developer AImotive in Hungary has developed its next generation neural network processing unit (NPU) design, doubling performance over the previous design.

The fourth generation of the aiWare automotive NPU hardware IP delivers up to 64 TOPS per core at 2GHz, twice that of the previous design. It handles Up to 16,384 INT8 operations with 32bit internal accuracy. It is aimed at chip designs built in 5nm and 3nm process technologies. The company has a key relationship with chip maker NextChip in Taiwan.

The power-performance has been improved, with ISO26262 ASIL-B safety support as standard. There are configurable safety mechanisms up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives.

Each core is scalable up to 64 TOPs and up to 256 TOPS per multi-core cluster, with greater configurability of on-chip memory, hardware safety mechanisms and external/shared memory support

Enhanced standard hardware features and related documentation ensuring straightforward ISO26262 ASIL B compliant and higher compliance for both SEooC (Safety Element out of Context) and in-context safety element applications.

The core demonstrates 8-10 Effective TOPS/W for typical CNN neural network frameworks, with a theoretical peak up to 30 TOPS/W using a 5nm or smaller process node. AImotive says the core is up to 98 percent efficient for a wider range of CNN topologies, in particular the popular vgg16 and Yolo image frameworks. This was shown on the aiWare3 with NextChip.

This efficiency comes from MAC arrays that are optimized for 2D and 3D convolution and deconvolution without using matrix multipliers. It also uses a new memory architecture called Wavefront RAM (WFRAM) with interleaved multi-tasking scheduling algorithms. This is a technque developed for GPU data processing that more parallel execution with improved multi-tasking capability. It gives substantial reductions in memory bandwidth compared to aiWare3 for CNNs requiring access to significant external memory resources.

This combination enables aiWare4 to execute a wide range of CNN


Vous êtes certain ?

Si vous désactivez les cookies, vous ne pouvez plus naviguer sur le site.

Vous allez être rediriger vers Google.