Intel takes on Nvidia, GraphCore with Gaudi2 AI chip

Technology News |
By Nick Flaherty

Intel has launched its second generation AI accelerator chip to take on Nvidia’s A100 GPU in data centre applications.

The Intel Gaudi2 and Greco deep learning processors were developed by its Habana Labs division for for training and inference in the data centre.

Gaudi2 is built on a 7nm process, compared to 16nm for the first generation, and is a similar die size to the Nvidia A100-80GB GPU and shows twice the training performance ResNet-50 computer vision model and the BERT natural language processing model.

However Nvidia is planning to launch its next generation H100 AI accelerator later this year. It will also be up against the latest AI chip, called Bow, from GraphCore.

The Greco chip for edge inference is also built on 7nm and doubles the onchip SRAM to 128GB from the first generation Goya chip, with16GB of LPDDR5 memory with a memory bandwidth of 204GB/s. This consumes 75W and allows a single slot, half height, half length (HHHL) PCIe 4.0 inference card.

Gaudi2 has two Matrix Multiplication Engines (MMEs) alongside an array of 24 Tensor Processor Cores, up from eight in Gaudi1, and 48MB of SRAM for the AI models. It also has with three times the on-board high speed HBM2E memory at 96 GB running at 2.45TB/sec bandwidth and integrates 24 x 100GbE RoCE RDMA NICs on the chip for scaling up designs using standard Ethernet.

Intel subsidiary Mobileye is using the technology to train AI networks for its driver assistance and driverless car technologies.

“The launch of Habana’s new deep learning processors is a prime example of Intel executing on its AI strategy to give customers a wide array of solution choices – from cloud to edge – addressing the growing number and complex nature of AI workloads. Gaudi2 can help Intel customers train increasingly large and complex deep learning workloads with speed and efficiency, and we’re anticipating the inference efficiencies that Greco will bring,” said Sandra Rivera, Intel executive vice president and general manager of the Datacenter and AI Group.

“Compared with the A100 GPU, implemented in the same process node and roughly the same die size, Gaudi2 delivers clear leadership training performance as demonstrated with apples-to-apples comparison on key workloads,” said Eitan Medina, chief operating officer at Habana Labs. “This deep-learning acceleration architecture is fundamentally more efficient and backed with a strong roadmap.” 

“As training such models is time-consuming and costly, multiple teams across Mobileye have chosen to use Gaudi-accelerated training machines, either on Amazon EC2 DL1 instances or on-prem,” said Gaby Hayon, executive vice president of R&D at Mobileye. “Those teams consistently see significant cost savings relative to existing GPU-based instances across model types, enabling them to achieve much better time-to-market for existing models or training much larger and complex models aimed at exploiting the advantages of the Gaudi architecture.”

Habana is working with server maker Supermicro on a Training Server with eight Gaudi2 chips for later this year that will be integrated into a rack with the AI400X2 storage from DNN.

The Greco chip will sample to customers in Q3 of this year with mass production is scheduled for 2H 2022.

Other articles on eeNews Europe



Linked Articles