Israeli startup NeuReality is preparing to ship its NR1 AI-server-on-a-chip ASIC in the second quarter of 2023.
The company, founded in 2018, has been shipping FPGA-based prototypes to partners that include IBM, Lenovo and AMD, since May 2021. The aim is to reduce latency, improve performance and reduce the power consumption of AI inference in data centers. The company said such moves will be necessary to enable demanding applications such as natural language processing (NLP).
NeuReality’s approach is not a machine learning accelerator itself – it is not doing the multiply-accumulates of the neural networks – but a chip that puts into hardware all the network access software and pre- and post-processing that is currently a bottleneck in data centre implementations of AI.
The NR1 is a highly detailed heterogeneous chip that is based on AI-server architecture. It improves the utilization of the AI compute resources – which NeuReality calls DLAs for deep learning accelerators – and is essentially agnostic to them. Tanach claims that by removing existing system bottlenecks, lowering the latency of AI operations, the monolithic version of NR1 can provide a 10x improvement in operations per watt and operations per dollar.
NeuReality has long-running partnerships with chip companies that are working on inference DLAs. However, the company provides a platform approach so it is also providing complementary software tools and software runtime libraries to enable customers to deploy AI-based applications and services.
“We are building a deep learning processor that is network attached and does not have to go through PCI to a host CPU. It’s a hybrid of multiple types of processor, vector DSPs and vision and audio processing for JPEG, MPEG. It allows the host CPU to offload the complete AI pipeline and not just the machine learning acceleration,” Tanach said.
Until now such functions as load-balancing, job scheduling, queue management, quality of service and monitoring support have been done in software, Tanach said. While this is flexible, it is an order of magnitude slower than putting these functions into our AI-hypervisor hardware, he asserted.
The NR1 still provides flexibility by being programmable but once the first porting of a new application have been done it offers performance, power and cost benefits, Lior Khermosh, CTO of NeuReality told eeNews Europe.
The NR1 is essentially agnostic to the DLAs which could be digital, analog, or even photonic, Tanach said. As such the NR1 needs to support numerous datatypes. It does, including FP16 and FP8 and integer data at 16, 8, 4 and 2 bit resolution.
Moshe Tanach said that AI processing, already widely deployed, does have issues around rapidly increasing model complexity and power consumption to process such complexity. But he said that market forces and the cost of energy will provide a natural brake.
Face recognition was demonstrated ten years ago as an academic exercise and it took hours of runtime because of the need to do it on general-purpose CPUs. Only now with special-purpose DLAs is it becoming a commercial proposition, he said. “Similarly, complex AI models will probably not penetrate the market very much until after the next four or five years. But meanwhile simpler AI models will bring a lot of value – and we are increasing the energy efficiency with NR1.”
He added: “NLP is definitely a use case that is going to explode in the next few years.”
- IBM adopts Neureality for AI inference
- NeuReality starts showing prototype inference platform
- IBM shows first dedicated AI inference chip
- NeuReality eyes datacentre AI with $8m seed funding
Other articles on eeNews Europe
- Semiconductor market heads for biggest downturn since 2000
- Amazon sets its sights on quantum networking
- Linux Foundation Europe launches
- Beyond Gravity sets up lithography division as it prepares for sale
- Arduino launches IDE2.0 development tool