NeuReality preps 7nm data centre AI chip

NeuReality preps 7nm data centre AI chip

Technology News |
By Peter Clarke

Israeli startup NeuReality is preparing to ship its NR1 AI-server-on-a-chip ASIC in the second quarter of 2023.

The company, founded in 2018, has been shipping FPGA-based prototypes to partners that include IBM, Lenovo and AMD, since May 2021. The aim is to reduce latency, improve performance and reduce the power consumption of AI inference in data centers. The company said such moves will be necessary to enable demanding applications such as natural language processing (NLP).

NeuReality’s approach is not a machine learning accelerator itself – it is not doing the multiply-accumulates of the neural networks – but a chip that puts into hardware all the network access software and pre- and post-processing that is currently a bottleneck in data centre implementations of AI.

The NR1 is a highly detailed heterogeneous chip that is based on AI-server architecture. It improves the utilization of the AI compute resources – which NeuReality calls DLAs for deep learning accelerators – and is essentially agnostic to them. Tanach claims that by removing existing system bottlenecks, lowering the latency of AI operations, the monolithic version of NR1 can provide a 10x improvement in operations per watt and operations per dollar.

NeuReality has long-running partnerships with chip companies that are working on inference DLAs. However, the company provides a platform approach so it is also providing complementary software tools and software runtime libraries to enable customers to deploy AI-based applications and services.

Network attached

“We are building a deep learning processor that is network attached and does not have to go through PCI to a host CPU. It’s a hybrid of multiple types of processor, vector DSPs and vision and audio processing for JPEG, MPEG. It allows the host CPU to offload the complete AI pipeline and not just the machine learning acceleration,” Tanach said.

Until now such functions as load-balancing, job scheduling, queue management, quality of service and monitoring support have been done in software, Tanach said. While this is flexible, it is an order of magnitude slower than putting these functions into our AI-hypervisor hardware, he asserted.

The NR1 still provides flexibility by being programmable but once the first porting of a new application have been done it offers performance, power and cost benefits, Lior Khermosh, CTO of NeuReality told eeNews Europe.

The NR1 is essentially agnostic to the DLAs which could be digital, analog, or even photonic, Tanach said. As such the NR1 needs to support numerous datatypes. It does, including FP16 and FP8 and integer data at 16, 8, 4 and 2 bit resolution.

Small models

Moshe Tanach said that AI processing, already widely deployed, does have issues around rapidly increasing model complexity and power consumption to process such complexity. But he said that market forces and the cost of energy will provide a natural brake.

Face recognition was demonstrated ten years ago as an academic exercise and it took hours of runtime because of the need to do it on general-purpose CPUs. Only now with special-purpose DLAs is it becoming a commercial proposition, he said. “Similarly, complex AI models will probably not penetrate the market very much until after the next four or five years. But meanwhile simpler AI models will bring a lot of value – and we are increasing the energy efficiency with NR1.”

He added: “NLP is definitely a use case that is going to explode in the next few years.”

Related articles:

Other articles on eeNews Europe



If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles