Imagination Technologies has launched a scalable neural network accelerator IP core optimised for automotive and autonomous systems but also aimed at industrial designs.
The Series4 Neural Network Accelerator (NNA) core has been optimised for the YOLOv3 neural network framework, for processing large, rectangular images, rather than a general purpose execution unit.
It is aimed at developer of system-on-chip devices for sensor fusion in high performance autonomous vehicles such as robotaxis, last mile delivery and automated street sweepers.
The NNA core achieves 12.5TOPS of performance through 4096 multiply accumulate (MAC) units in 1mm2 on a 5nm process technology, all connected by a 256 network on chip (NOC). This that is over 20x faster than an embedded GPU and 1000x faster than an embedded CPU for AI inference says the company.
Up to 8 cores can be combined in a low latency cluster with 100TOPS, while multiple clusters can be placed on chip for even higher performance for Level 3 and Level 4 autonomous operation. It has been designed as part of an ISO26262 automotive safety process.
“We have already licensed one of these cores into a system on chip design,” said Andrew Grant, Senior Director for Artificial Intelligence at Imagination Technologies.
The core also uses a technique called Tensor Tiling that reduces bandwidth up to 90 percent by splitting input data tensors into multiple tiles for efficient data processing. This exploits local data dependencies to keep intermediate data in on-chip memory.
“It’s a tiling algorithm that allows you to group the network layers, looking at the workloads and using the on-chip SRAM tightly coupled to segment the workloads and adjust for the maximum workload,” said Grant.
For higher performance than 100TOPS a chip can use multiple clusters linked via the AXI bus. “You need to minimise the traffic between clusters so it’s more of a system design. When you go to 600TOPS you have to work with the customer to coordinate all the workloads,” said Gilberto Blanco, director of product management.
“With L3, 4 you need to go beyond 100TOPS but not doing the same tasks. The heavy lifting tasks are at 40 to 60 TOPS with multiple tasks and there is not large amounts of data transferred between clusters,” he said.
The optimisation for power at 30TOPS/W on a 5nm process technology also allows it to be used in industrial image processing chip designs. “This can also be used for edge processing designs,” said Grant. “We think there’s a real opportunity right now to introduce a platform that people can deploy at the edge that comes from the embedded industry not the data centre or desktop.”
The Series4 NNA core will be available as IP in December and works alongside Imagination’s automotive GPU core. Chip designs have already started, with layout in 2021 and Q3 or Q4 for test chips on a 7nm process with a lead customer says Blanco.
- TI BACKS IMAGINATION GPU FOR AUTOMOTIVE CHIPS
- INNOSILICON TO USE IMAGINATION GPU FOR CLOUD CHIPS
- MULTICORE GPU AIMS AT 3NM FOR DATA CENTRE DESIGNS
- CHINA’S SEMIDRIVE LAUNCHES AUTOMOTIVE COCKPIT CHIPSET
Other articles on eeNews Europe
- Synopsys buys Moortec to take on Mentor
- 3D printing graphene for electronic devices
- 3D printed Millenium Falcon is 100 microns long
- Sondrel tapes out its largest chip
- European exascale supercomputer chip project updates its roadmap