How transformers impact edge AI chip design

How transformers impact edge AI chip design

Interviews |
By Nick Flaherty

Cette publication existe aussi en Français

Transformers are making their way from generative AI and large language models down to embedded chips.

Avi Baum, chief technology officer and co-founder of Hailo in Israel, talks to Nick Flaherty about the different use cases for transformer AI driving its third generation of chip design.

“There’s a huge span of misuse of terms in this domain,” says Baum. “Transformers are the major building blocks that arrived from natural language processing (NLP) but now it is being used across many fields, for imaging, audio, text, whatever. And this is as critical as these are complementary to CNNs,” he tells eeNews Europe.

The technology is already being used to reduce the bandwidth of 5G radio networks by de-noising the data, and even by re-creating digital avatars for video conferencing with the first generation Hailop-8 chip.  “At Hailo our support for transformers is growing over time, driven by our customers,” he said.

The company is one of the best funded AI chip designers, raising $224m according to Crunchbase. The second generation chips launched in March and currently sampling are being tweaked to add support for transformer AI, and this is a key area for the third generation architecture that is under development.

While transformers are key to cloud-based generative AI systems such as Dall-E and ChatGPT, they are also being used for image recognition at the edge on embedded chips such as the Hailo-8 processor. The key is the balance of cloud and local processing fro individual use cases, he says.

“People are using Generative AI for two different things, for generating content such as new images from a text prompt., but I see many cases where people are extending GenAI to anything with LLMs or multiple modalities to build AI for the new age. That creates a lot of confusion,” he said.   

“Like mobile devices in the early days, I think in general edge and cloud will come together to build a greater whole so each type of use case will be split in a meaningful way. There are applications where the right split is to put things in the cloud as it is more usable, and other parts at the edge.

He points to AI software for monitoring security cameras that combines generative AI and image processing using transformers.

“You can take huge video management systems such as surveillance cameras and find certain images by a text prompt, perhaps to find a person with specific clothes by combining LLMs with the ability to comprehend the visuals from all the cameras in one setting,” he said.

“The right way to deploy such a use case is to query all the cameras in the venue with the same query, rather than feeding all the video to one central point. However creating the synopsys and the embedding that the camera can understand is much more efficient in the cloud.  For speech translation on the other hand you would want to do it on the edge device.”

Video conferencing is another potential combination. “Instead of delivering the video you can deliver an avatar detailed enough and with generative AI on the client recreate the whole talking head on the local machine,” he said.   

“Another use case we will showcase is using AI for de-noising images – the de-noised version creates much lower bandwidth at the encoder,” he said. “In our solution our customers are using object detection that is transformer based.”

Transformers are being used for standard object detection, particularly in automotive. “Its most commonly a performance issue for the accuracy you can get for a given workload, a transformer gives a better accuracy at a given frame rate mainly because the transformers create this notion of attention to look at particular parts of the image and that can give better results than CNN,” said Baum.

“We have demonstrated a single device with a transformer that runs at full HD over 6 cameras in real time on Hailo-8 for automotive as a companion device,” he said. “At the end of the day what limits us is the resources of the whole platform. Eventually we will need somewhere to store all the models. The larger the model the lower the performance, so if I double the model size roughly speaking the performance halves.”

“But we see the algorithm guys shrinking the transformer models and the hardware builders are expanding the platforms. The interception point will grow, until we get to the point where there is a nice balance between cloud and edge, but it will take a few years,” he said.

Hailo is a flexible architecture with an array of multiply accumulate units that can be built on any process node and does not rely on the latest leading edge process technology for the performance.

“We built Hailo to be flexible as we built it upfront with a lot of reliance with the compiler and toolchain. Moving forward we are looking at optimisation in hardware, tweaking it a bit to be more friendly to transformers and what we think transformers will evolve to. It is flexible enough, but when we designed Hailo-8 transformers were not a consideration. There are things we can improve further,” he said.

“There is a lot of stuff that is transformer specific such as the types of normalisation functions, the nature of the rapid changes of weights on the fly, and most important is the enumeration,” he said. “Each type of neural network has a span of the typical of tensors in and out of each layer and this changes with transformers so some parametrisation may need to be tweaked and that is what we are doing,” he said.

“Some of this we did in the second generation and some of it will be more apparent in the next generation. The roadmap is not fully settled and we are still in discussion. Our core technology is the architecture rather than the technology node, so we have the luxury to do it at the current geometries, we can choose to do it or not, depending on the target markets.”

Related generative AI articles 


If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles