MENU

CEO interview: Charge-based in memory compute at Encharge AI

CEO interview: Charge-based in memory compute at Encharge AI

Interviews |
By Nick Flaherty



Naveen Verma, CEO of encharge.ai talks to eeNews Europe about its analog in memory compute technology for edge AI.

US startup Encharge AI is reaching the next stage of its development of in-memory compute (IMC) for low power AI. The company has raised $45m for its capacitor-based analog in-memory computing technology and is actively raising funds for the commercialisation of AI chips for laptops.

“There has been a lot of interest in analog computing as it can be orders of magnitude energy efficient but it is sensitive to noise so the big problem that we have solved is managing analog IMC in a very robust and scalable way,” said Naveen Verma, CEO of Encharge AI and professor of electrical and computer engineering at Princeton University since 2009.

“AI is moving out of the data centre to scale up, moving into laptops and desktops, edge servers. These are often battery powered or space constrained, and that’s where we have found a lot of traction.”

“My research group at Princeton has been looking at this for many years, with all sorts of optical and electrical computing. The big fundamental difference is rather than using the current through a semiconductor device we are using charge,” he said.

“We don’t use semiconductor devices, we use capacitors formed from the metal wires, they only depend on the geometry and that’s something we can control very well.”

“This solves the analog accuracy and scalability problems from the geometry control as we scale to more advanced nodes so the energy efficiency and density improves. The precision of patterning the wires does scale, and what geometry you need is an interesting question. What we need is more of an order of magnitude less [than approaches] , and that allows you to use the upper metal layers.”

Encharge has built a macro block that does the multiply accumulate (MAC) functions very efficiently for AI, using SRAM to provide the addressability for the capacitors in the metal layers that can store the weights for the frameworks. This approach has been demonstrated in chips ranging from 130nm in 2017 to 16nm in 2021.

However building a chip is not enough.

“The trick is to re-architect the chip to get the advantages,” he said. “The fundamental technology was a breakthrough in 2017 and the next five years was to build an architecture around it with a software stack and a compiler to map applications to it.”

“Because the fundamental capacitor technology does not require any additional technology, that allows us to be in the most advanced nodes and that gives us flexibility to build programmability into this with all the AI models

“We defined an architectural unit, a tile, that is complete, highly efficient MAC engines with all the other compute and control, localised vector units and L2 memory cache, and you can build an array of these tiles.”

“Then the problem is to map the software to this architecture to use the units efficiently. This gives high levels of density and power efficiency. Now you can very flexibly choose the parallelism of the AI graph.”

“So we had the full stack in a baseline from where we understood the architecture, the compiler structure, we spun that out in 2022 for customers that needed the higher efficiency and performance.

“What we found was the first couple of models broke everything as there were small features in the first couple of AI workloads that were missing. Working with customers, we identified the features for the next spin of the hardware and the software. That’s the journey that we were on,” he said.

“We have been working with partners to prove out the workloads that need to run. The systems need to be optimised from top to bottom. We are building the chips and the software for a couple of reasons. Really getting the full performance advantage requires a top to bottom optimisation and being able to control that allows us to get the most out of the technology.”

The initial form factor for this are simple discrete boards on PCIe cards, and he sees M.2 as well aligned for laptops. The existing technology can provide performance of over 150 8bit TOPS/W, but the chip under development for 2025 has an efficiency of 375TOPS/W and can also handle AI frameworks with billions of parameters.

“Our target is to be 3-4x the capability of laptop processors have 40TOPS with an NPU in Snapdragon or Intel’s Lunar Lake,” he said. “We have a significant advantage at 16nm over 5nm digital so our preference is to stay in those nodes where we have proven our IP. The preference is to stay in very mainstream technologies and the technologies that will allow us to scale and these work very well for us.”

Re-architecting chips with the Encharge analog AI IP is key to running AI frameworks with billions of parameters. This uses well established principals of memory virtualisation.

“Virtualised memory takes a capacity limited L1 memory and couples that with the L2 cache all the way out to DRAM and we have used the same principles to IMC, orchestrating the way memory moves across the system so that we can scalably execute models with billions of parameters,” he said.

“What you have to have is some IMC hardware and work out how it works with the L2 and L3 and off chip memory for the right amount of data reuse so that the data movement doesn’t become the limiting factor. That drives the architectural design, and our systems are optimised for multiple multibillion parameter models.”

While the latest laptop devices such as Meteor Lake use chiplets, this is not a focus for the in memory compute architecture, says Verma.

“Chiplets are a very interesting technology but there are constraints, whether using UCIe as an interface or not. I do see opportunities around chiplets.”

www.encharge.ai

 

 

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s