Intel aims at Nvidia with Gaudi 3 AI chip

Intel aims at Nvidia with Gaudi 3 AI chip

Business news |
By Nick Flaherty


A year after tapeout, Intel has launched its Gaudi 3 AI accelerator chip to take on Nvidia’s H100 systems.

The Gaudi 3 chip has been re-directed to support the memory-bound transformer-based large language models (LLMs) used for generative AI in the data centre.

The chip will be available by June for a single node to clusters, super-clusters and mega-clusters with thousands of nodes, supporting inference, fine-tuning and training at scale. Dell Technologies, HPE, Lenovo and Supermicro. 

Intel compares the chip directly with the Nvidia H100 and expects it to deliver 50% faster time-to-train on average across Llama2 AI models with 7bn and 13bn parameters, and GPT-3 175bn parameter model. Additionally, Intel Gaudi 3 accelerator inference throughput is projected to outperform the H100 by 50% on average1 and 40% for inference power-efficiency averaged2 across Llama 7B and 70B parameters, and Falcon 180B parameter models.  

The Gaudi 3 accelerator is built in a 5nm process, which implies it is built at TSMC. The design allows activation of all engines in parallel — with the Matrix Multiplication Engine (MME), Tensor Processor Cores (TPCs) and Networking Interface Cards (NICs) — to boost the performance.

Each accelerator has a heterogenous compute engine comprised of 64 AI-custom and programmable TPCs and eight MMEs. Each MME is capable of performing an 64,000 parallel operations and supports multiple data types, including FP8 and BF16.  

LLMs are memory bound, so the chip supports 128 gigabytes (GB) of HBMe2 memory capacity, 3.7 terabytes (TB) of memory bandwidth and 96 megabytes (MB) of on-board static random access memory (SRAM) provide ample memory for processing large GenAI datasets on fewer devices.

Each chip has 24 200 gigabit Ethernet ports for open-standard networking to support large compute clusters and eliminate vendor lock-in from proprietary networking fabrics such as Nvidia.

A PCI Express add-in card is aimed at fine-tuning, inference and retrieval-augmented generation (RAG). It is equipped as a full-height form factor at 600 watts, with a memory capacity of 128GB and a bandwidth of 3.7TB per second.  

The Gaudi software integrates the PyTorch framework and provides optimized Hugging Face community-based models – the most-common AI framework for GenAI developers today. This allows GenAI developers to operate at a high abstraction level for ease of use and productivity and ease of model porting across hardware types. 

Intel is also trying to break Nvidia’s stranglehold on the CUDA architecture by promoting  more open, community-based software and industry-standard Ethernet networking.  

Bosch is planning to use the chip to  explore further opportunities for smart manufacturing, including foundational models generating synthetic datasets of manufacturing anomalies to provide robust, evenly-distributed training sets for automated optical inspection.

Roboflow in the US is also running production workloads of YOLOv5, YOLOv8, CLIP, SAM and ViT models for its end-to-end computer vision platform while Naver is also planning to use the chip.

“Innovation is advancing at an unprecedented pace, all enabled by silicon – and every company is quickly becoming an AI company,” said Intel CEO Pat Gelsinger. “Intel is bringing AI everywhere across the enterprise, from the PC to the data center to the edge. Our latest Gaudi, Xeon and Core Ultra platforms are delivering a cohesive set of flexible solutions tailored to meet the changing needs of our customers and partners and capitalize on the immense opportunities ahead.”

For an open genAI ecosystem, Intel is working with Anyscale, Articul8, DataStax, Domino, Hugging Face, KX Systems, MariaDB, MinIO, Qdrant, RedHat, Redis, SAP,  VMware, Yellowbrick and Zilliz. The aim is to enable enterprises to use their existing proprietary data sources running on standard cloud infrastructure augmented with open LLM capabilities.

As initial steps in this effort, Intel will release reference implementations for GenAI pipelines on secure Intel Xeon and Gaudi 3 systems, publish a technical conceptual framework, and continue to add infrastructure capacity in the Intel Tiber Developer Cloud for ecosystem development and validation of the current and future pipelines.

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles