MENU

Kinara boosts transformer edge AI with second generation chip

Kinara boosts transformer edge AI with second generation chip

Technology News |
By Nick Flaherty

AI


Kinara has launched its second generation edge AI chip with support for large language and transformer models.

The 6W Ara-2 edge AI chip developed by Kinara aims to provide 10 times the performance per watt of GPUs and still run transformer models such as Stable Diffusion at the edge.

“Our first premise was to make the edge AI processors more efficient than GPUs by an order of magnitude,” Ravi Annavajjhala, CEO of Kinara tells eeNews Europe. “The second promise is that as AI models evolve and grow in size and complexity the software stack will keep up with that so that any models can be easily compiled and run on the processor.”

“With Ara-2 we have boosted the performance,” he said. “We started with vision transformers as a baseline and because of that we can add generative workloads including Stable Diffusion and language models shortly.”

This is key for keeping AI local and private, for example for healthcare or retail applications tuned to specific questions with more accuracy and with lower latency for surveillance applications, he says and the Stable Diffusion AI image generation is only the first stage. “Text to image is only the beginning, text to video will demand a lot of computer processing,” he said.

The Ara-2 chip with eight AI cores is sampling in a 16nm process, and showing 5 to 8 times the performance of the Ara 1, built on 28nm, on large language models.

The cores add int4 and MSFP16 data formats as well as memory encryption as well as supporting PCIe 4.0 and up to 16Gbits of low power LPDDR4 memory. Up to eight chips can be connected to a host for larger AI models, managed by load balancing software.

“A single Ara-2 chip can access models up to 16GB in size, but we can also support linear scaling of performance and memory capacity by combining multiple chips. With regards to Generative AI, we expect that models will be tuned for specific applications which will lead to better performance and accuracy,” he said.

“With Ara-2 the philosophy is quite clear with a software first approach and all of the processing is by the neural cores,” said Wajahat Qadeer, Chief architect at Kinara. “We have kept the number of cores the same but enhanced the utilisation. With more arithmetic engines with matrix multiply units and SIMD that allows us to run CNNs and transformers on the same core. We have also added the two FP 32bit and 16bit VPUs for traditional vision algorithms,” he said.

The Ara-2 chip uses a hierarchical memory architecture without caches controlled by software using the data engines. “This gives the compiler complete control over the chip so it knows the bandwidth, the latency, the busses, so it can manage the data without touching the hardware. The shared memory has doubled for the L2 caches, and 4x for the L1 caches with the internal bandwidth up by 8 times. We use the same routing fabric but we have doubled the number of busses and added bidirectional operation.”

“Software is the key differentiator for us and as more CNNs come in we added support for them as they are more complex, then there were 3D CNNS and now the focus is on transformers,” he said.

Ara-2 is available as a stand-alone device, a USB module, an M.2 module or as a PCIe card featuring multiple Ara-2’s. Kinara will show a live demo with Ara-2 at CES in Las Vegas in January.

www.kinara.ai

 

 

 

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s