KAN tackles AI power challenge

KAN tackles AI power challenge

Technology News |
By Nick Flaherty

Cette publication existe aussi en Français

Programmable AI chip maker Quadric is highlighting a new AI framework that can dramatically reduce power consumption.

The Kolmogorov Arnold Network (KAN) removes the matrix multiply operations to reduce AI power consumption. This is similar to an approach developed at the University of California Santa Cruz that also eliminates matrix multiply operations to improve the energy efficiency to 13W for a billion parameter LLM model.

The KAN was shown in a research paper published by researchers from MIT and CalTech proposing a fundamentally new approach to machine learning networks.

Early analysis suggests KANs can be 10% or even 5% the size of conventional transformer-based large language models (LLMs) while delivering equal results. Executing a KAN inference consists of computing a vast number of univariate functions such as polynomials and then ADDING the results with very few MATMULs.

Most NPU neural processors have hard-wired state machines for executing matrix multiplication as well as state machines to implement common activation (such as ReLu and GeLu) and pooling functions. 

This is aimed at both data centre operators to reduce power consumption and at inference to reduce the memory requirements for edge AI. Device makers seeking to run GenAI in device are also grappling with compute and storage demands. Using KAN could see a 1B parameter model that neatly fits into the existing platform with only 4 GB of DDR, a small fraction of the 32GB required with today’s models.

Quadric says its Chimera general purpose NPU is able to support KANs as well as the matrix-multiplication hardware needed to efficiently run conventional neural networks with a massively parallel array of general-purpose, C++ programmable ALUs capable of running any and all machine learning models.   

Quadric’s Chimera QB16 processor, for instance, pairs 8192 MACs with a whopping 1024 full 32-bit fixed point ALUs, giving 32,768 bits of parallelism to run KAN networks.

This time last year Quadric announced support for vision transformer (ViT) machine learning (ML) inference models. ViTs were first described in 2021 and repeatedly interleave MAC-heavy operations (convolutions and dense layers) with DSP/CPU centric code (Normalization, SoftMax). This highlights the timing for moving a framework from research to implementation.

The emergence of ViT networks broke the underlying assumptions of hardwired NPUs, as mapping a ViT workload to a heterogeneous SoC would entail repeatedly moving data back and forth between NPU and DSP/CPU. The system power wasted with all those data transfers wipes out the matrix-compute efficiency gains from having the NPU in the first place.

“Back in 2018 when we conceived the Chimera GPNPU architecture, we knew the rapidly evolving nature of machine learning meant we had to build processor IP that was both matrix-performance optimized and general purpose,” said Veerbhan Kheterpal, CEO at Quadric. “We knew something would be invented that pushed the ResNet family of networks into the history bin, but we did not know that transformers would be the current winner. And we do not know what state of the art will look like in 2027, but we do know that Chimera GPNPU licensees will be ready to tackle that next challenge.”


If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles