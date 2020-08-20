Microsoft is to use the latest A100 GPU from Nvidia for its AI supercomputer in the cloud and opening up the technology to a wide range of applications.

Microsoft announced it would host an AI supercomputer in the cloud with OpenAI system back in May, but did not detail the technology it would use. The virtual machine (VM) developed by Microsoft for AI will combine eight A100 Ampere A100 with an AMD processor.

This architecture, called the ND A100 v4 VM series, can scale up to thousands of GPUs with 1.6 Tbit/s of interconnect bandwidth per VM. Each GPU is provided with its own dedicated topology-agnostic 200 Gbit/s NVIDIA Mellanox HDR InfiniBand connection.

These GPU sub-systems will be coupled with AMD’s ‘Rome’ processors. These use a hybrid multi-die architecture that decouples two streams: eight dies for the processor cores to map directly to the GPUs, and one I/O die that supports security and communication outside the processor. The latest 64 core / 128 thread version, the EPYC 7H12, is built on a 7nm process from TSMC with a 14nm I/O chip in the package. This is designed for liquid-cooled data centre operation with a 2.6GHz base frequency and power consumption up to 280W and delivers up to 4.2TFLOPS.

All this is needed to handle large machine learning models. “The advantage of large scale models is that they only need to be trained once with massive amounts of data using AI supercomputing, enabling them to then be “fine-tuned” for different tasks and domains with much smaller datasets and resources,” said Ian Finder Senior Program Manager, Accelerated HPC Infrastructure at Microsoft.

“The more parameters that a model has, the better it can capture the difficult nuances of the data, as demonstrated by our 17-billion-parameter Turing Natural Language Generation (T-NLG) model and its ability to understand language to answer questions from or summarize documents seen for the first time.”

Training models at this scale