
Compute-in-memory chip runs AI apps at fraction of the energy
An international team of researchers has designed and built a chip that runs computations directly in memory and can run a wide variety of AI applications – all at a fraction of the energy consumed by computing platforms for general-purpose AI computing. The NeuRRAM neuromorphic chip, say the researchers, brings AI a step closer to running on a broad range of edge devices, disconnected from the cloud, where they can perform sophisticated cognitive tasks anywhere and anytime without relying on a network connection to a centralized server.
The chip is presented as the first compute-in-memory chip to demonstrate a wide range of AI applications at a fraction of the energy consumed by other platforms while maintaining equivalent accuracy. Applications for the chip abound in every corner of the world and every facet of our lives, say the researchers, and range from smart watches, to VR headsets, smart earbuds, smart sensors in factories and rovers for space exploration.
The NeuRRAM chip is offered as being not only twice as energy efficient as the state-of-the-art “compute-in-memory” chips – an innovative class of hybrid chips that runs computations in memory – but it also delivers results that are just as accurate as conventional digital chips. Conventional AI platforms are a lot bulkier and typically are constrained to using large data servers operating in the cloud.
In addition, the NeuRRAM chip is highly versatile and supports many different neural network models and architectures. As a result, the chip can be used for many different applications, including image recognition and reconstruction as well as voice recognition.
“The conventional wisdom is that the higher efficiency of compute-in-memory is at the cost of versatility, but our NeuRRAM chip obtains efficiency while not sacrificing versatility,” says Weier Wan, the first corresponding author of a paper on the chip and a recent Ph.D. graduate of Stanford University who worked on the chip while at UC San Diego.
By reducing power consumption needed for AI inference at the edge, say the researchers, this NeuRRAM chip could lead to more robust, smarter and accessible edge devices and smarter manufacturing. It could also lead to better data privacy as the transfer of data from devices to the cloud comes with increased security risks.
On AI chips, moving data from memory to computing units is one major bottleneck, say the researchers.
“It’s the equivalent of doing an eight-hour commute for a two-hour work day,” says Wan.
To solve this data transfer issue, the researchers used resistive random-access memory (RRAM), a type of non-volatile memory that allows for computation directly within memory rather than in separate computing units. While computation with RRAM chips is not necessarily new, say the researchers, generally it leads to a decrease in the accuracy of the computations performed on the chip and a lack of flexibility in the chip’s architecture.
“Compute-in-memory has been common practice in neuromorphic engineering since it was introduced more than 30 years ago,” says Gert Cauwenberghs, Professor of Bioengineering and Co-Director, Institute for Neural Computation, UC San Diego. “What is new with NeuRRAM is that the extreme efficiency now goes together with great flexibility for diverse AI applications with almost no loss in accuracy over standard digital general-purpose compute platforms.”
A carefully crafted methodology was key to the work with multiple levels of “co-optimization” across the abstraction layers of hardware and software, from the design of the chip to its configuration to run various AI tasks. In addition, the researchers made sure to account for various constraints that span from memory device physics to circuits and network architecture.
“This chip now provides us with a platform to address these problems across the stack from devices and circuits to algorithms,” says Siddharth Joshi, an assistant professor of computer science and engineering at the University of Notre Dame, who started working on the project as a Ph.D. student and postdoctoral researcher in Cauwenberghs lab at UC San Diego.
The researchers measured the chip’s energy efficiency by a measure known as energy-delay product, or EDP. EDP combines both the amount of energy consumed for every operation and the amount of times it takes to complete the operation. By this measure, the NeuRRAM chip achieves 1.6 to 2.3 times lower EDP (lower is better) and 7 to 13 times higher computational density than state-of-the-art chips.
The key to NeuRRAM’s energy efficiency is an innovative method to sense output in memory. Conventional approaches use voltage as input and measure current as the result, which leads to the need for more complex and more power hungry circuits. In NeuRRAM, the researchers engineered a neuron circuit that senses voltage and performs analog-to-digital conversion in an energy efficient manner. This voltage-mode sensing can activate all the rows and all the columns of an RRAM array in a single computing cycle, allowing higher parallelism.
Various AI tasks were run on the chip. It achieved 99% accuracy on a handwritten digit recognition task; 85.7% on an image classification task; and 84.7% on a Google speech command recognition task. In addition, the chip also achieved a 70% reduction in image-reconstruction error on an image-recovery task. These results are comparable to existing digital chips that perform computation under the same bit-precision, but with drastic savings in energy.
All of the results featured are obtained directly on the hardware, say the researchers. In many previous works of compute-in-memory chips, AI benchmark results were often obtained partially by software simulation.
Next steps include improving architectures and circuits and scaling the design to more advanced technology nodes. The researchers also plan to tackle other applications, such as spiking neural networks.
“We can do better at the device level, improve circuit design to implement additional features and address diverse applications with our dynamic NeuRRAM platform,” says Rajkumar Kubendran, an assistant professor for the University of Pittsburgh, who started work on the project while a Ph.D. student in Cauwenberghs’ research group at UC San Diego.
For more, see “A compute-in-memory chip based on resistive random-access memory.”
