Low power edge AI core for robotics kills the heatsink

Low power edge AI core for robotics kills the heatsink

Technology News |
By Nick Flaherty

Renesas Electronics has developed embedded accelerator technology for low power edge AI without the need for a heatsink.

The dynamically reconfigurable processor (DRP-AI) developed by Renesas was shown at the ISSCC conference in the US this week for processing lightweight edge AI models for vision-based robotic application.

The heterogeneous architecture sees the DRP as an accelerator working with a microcontroller core.  Renesas produced a prototype of an embedded AI-MPU with these technologies and confirmed its high-speed and low-power-consumption operation. A test chip  achieved 130TOPS of performance and power consumption of 23.9 TOPS/W with a 0.8 V supply without needing a heatsink.

Significantly reducing heat generation will contribute to the spread of automation into various industries, such as the robotics and smart technology markets. These technologies will be applied to the Renesas RZ/V series of ARM A55-based microcontrollers for vision AI applications.

This is one of several edge AI technologies that Renesas is developing for edge AI in embedded chips for different end markets. It also has a deal with Brainchip for higher performance edge AI IP.

The severe restrictions on heat generation, particularly for embedded devices, are driving higher performance and lower power consumption are required in AI chips.

Renesas optimized the DRP-based AI accelerator (DRP-AI) for pruning, which can omit calculations that do not significantly affect recognition accuracy. However, it is common that calculations that do not affect recognition accuracy randomly exist in AI models. This causes a difference between the parallelism of hardware processing and the randomness of pruning, which makes processing inefficient.

By analyzing how pruning pattern characteristics and a pruning method are related to recognition accuracy in typical image recognition AI models (CNN models), we identified the hardware structure of an AI accelerator that can achieve both high recognition accuracy and an efficient pruning rate, and applied it to the DRP-AI design.

Software was also developed to reduce the weight of AI models optimized for this DRP-AI. This converts the random pruning model configuration into highly efficient parallel computing, resulting in higher-speed AI processing.

This flexible N:M pruning technology can dynamically change the number of cycles in response to changes in the local pruning rate in AI models, allows for fine control of the pruning rate according to the power consumption, operating speed, and recognition accuracy required by users.

This reduces the number of AI model processing cycles to as little as one-sixteenth of pruning incompatible models and consumes less than one-eighth of the power.

Robot applications require advanced vision AI processing for recognition of the surrounding environment while motion judgment and control require detailed condition programming in response to changes in the surrounding environment where CPU-based software processing is more suitable than AI-based processing.

The challenge has been that CPUs with current embedded processors are not fully capable of controlling robots in real time.

The DRP runs an application while dynamically changing the circuit connection configuration between the arithmetic units inside the chip for each operation clock according to the processing details. Since only the necessary arithmetic circuits operate even for complex processing, lower power consumption and higher speeds are possible.

For example, SLAM (Simultaneously Localization and Mapping), one of the typical robot applications, is a complex configuration that requires multiple programming processes for robot position recognition in parallel with environment recognition by vision AI processing.

Renesas demonstrated operating this SLAM through instantaneous program switching with the DRP and parallel operation of the AI accelerator and CPU, resulting in about 17 times faster operation speeds and about 12 times higher operating power efficiency than the embedded CPU alone.


If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles