Processing-in-Memory supports AI acceleration at 8.8 TOPS/W

By eeNews Europe

A test chip featuring this accelerator has achieved the power efficiency of 8.8 TOPS/W, claims Renesas who presented the technology in a paper titled “A Ternary Based Bit Scalable, 8.80 TOPS/W CNN Accelerator with Multi-Core Processing-in-Memory Architecture with896K Synapses/mm2” at the Symposia on VLSI Technology and Circuits in Kyoto, Japan. The Renesas accelerator is based on the processing-in-memory (PIM) architecture, an increasingly popular approach for AI technology, in which multiply-and-accumulate operations are performed in the memory circuit as data is read out from that memory.

To create the new AI accelerator, Renesas developed a ternary-valued (-1, 0, 1) SRAM structure PIM technology that can perform large-scale CNN computations. The SRAM circuit was then applied with comparators that can read out memory data at low power.

The company also implemented a novel technology that prevents calculation errors due to process variations in the manufacturing.

Combining these three approaches yields both a reduction in the memory access time in deep learning processing and a reduction in the power required for the multiply-and-accumulate operations, says Renesas, enabling the new accelerator to achieves the industry’s highest class of power efficiency while maintaining an accuracy ratio more than 99 percent when evaluated in a handwritten character recognition test (MNIST).

MAC (Multiply and Accumulate) operations in the memory
circuit as data is read out from that memory.

Until now, the Processing-in-Memory architecture was unable to achieve an adequate accuracy level for large-scale CNN computations with single-bit calculations since the binary (0,1) SRAM structure was only able to handle data with values 0 or 1. Furthermore, process variations in the manufacturing resulted in a reduction in the reliability of these calculations, and workarounds were required.

The ternary (-1, 0, 1) SRAM structure PIM architecture adopts a combination of a ternary memory with a simple digital calculation block to hold increases in the amount of hardware and increases in calculation errors to a minimum. At the same time, it allows switching the number of bits between, for example, 1.5-bit (ternary) and 4-bit calculations according to the required accuracy. Since this can support different required accuracies and calculation scales on a per-user basis, users can optimize the balance between accuracy and power consumption.

When a PIM architecture is adopted, memory data is read out by detecting the value of the bit line current in the SRAM structure. Although it is effective to use A/D converters for high-precision bit line current detection, this approach increases chip area and draws more power. Combining a comparator (1-bit sense amplifier) with a replica cell allows the current to be controlled flexibly to develop a high-precision memory data readout circuit. By stopping the operation of the readout circuits for the nodes (neurons) that are not activated, this approach also drastically reduces the number of nodes activated by the neural network operation, down to about 1%, further reducing power consumption.

To prevent process variations from causing errors in the values of the bit line currents in the SRAM structure, Renesas covered the inside of the chip with multiple SRAM calculation circuit blocks and uses blocks with minimal manufacturing process variations to perform the calculations. Since the activated nodes are only a small minority of all nodes, activated nodes are allocated selectively to SRAM calculation circuit blocks that have minimal manufacturing process variations to perform the calculations. This allows calculation errors to be reduced to a level where they can be essentially ignored.

Reneas –

Related articles:

Multi-bit memory chip stacks on top of CMOS for edge-AI

Consortium seeks to scale artificial intelligence


eeNews Europe