
Microsoft announces new supercomputer, large-scale AI vision
Built in collaboration with and exclusively for OpenAI – an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent organization, the non-profit OpenAI Inc. – the supercomputer hosted in Azure was designed specifically to train that company’s AI models. The supercomputer, says the company, represents a first step toward making the next generation of very large AI models and the infrastructure needed to train them available as a platform for other organizations and developers to build upon.
“The exciting thing about these models is the breadth of things they’re going to enable,” says Microsoft Chief Technical Officer Kevin Scott. “This is about being able to do a hundred exciting things in natural language processing at once and a hundred exciting things in computer vision, and when you start to see combinations of these perceptual domains, you’re going to have new applications that are hard to even imagine right now.”
Traditionally, says the company, machine learning experts have built separate, smaller AI models that use many labeled examples to learn a single task such as translating between languages, recognizing objects, reading text to identify key points in an email, or recognizing speech well enough to deliver today’s weather report when asked. A new class of models developed by the AI research community has proven that some of those tasks can be performed better by a single massive model — one that learns from examining billions of pages of publicly available text, for example.
This type of model can so deeply absorb the nuances of language, grammar, knowledge, concepts, and context, says the company, that it can excel at multiple tasks: summarizing a lengthy speech, moderating content in live gaming chats, finding relevant passages across thousands of legal files, or even generating code from scouring GitHub. As part of its AI at Scale initiative, Microsoft has developed its own family of large AI models – the Microsoft Turing models – which it says it has used to improve many different language understanding tasks across Bing, Office, Dynamics, and other productivity products.
Earlier this year, the company also released to researchers the largest publicly available AI language model in the world – the Microsoft Turing model for natural language generation. The goal, says the company, is to make its large AI models, training optimization tools, and supercomputing resources available through Azure AI services and GitHub so developers, data scientists, and business customers can easily leverage the power of AI at Scale.
“By now most people intuitively understand how personal computers are a platform,” says Scott. “You buy one and it’s not like everything the computer is ever going to do is built into the device when you pull it out of the box. That’s exactly what we mean when we say AI is becoming a platform.”
“This is about taking a very broad set of data and training a model that learns to do a general set of things and making that model available for millions of developers to go figure out how to do interesting and creative things with,” he says.
Training such massive AI models requires advanced supercomputing infrastructure, or clusters of state-of-the-art hardware connected by high-bandwidth networks. It also needs tools to train the models across these interconnected computers.
The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 GPUs, and 400 gigabits per second of network connectivity for each GPU server. Compared with other machines listed on the TOP500 supercomputers in the world, says the company, it ranks in the top five, and, hosted in Azure, it benefits from all the capabilities of a robust modern cloud infrastructure, including rapid deployment, sustainable datacenters, and access to Azure services.
OpenAI CEO Sam Altman says, “As we’ve learned more and more about what we need and the different limits of all the components that make up a supercomputer, we were really able to say, ‘If we could design our dream system, what would it look like?’ And then Microsoft was able to build it.”
Microsoft has also announced that it will soon begin open sourcing its Microsoft Turing models, as well as recipes for training them in Azure Machine Learning, giving developers access to the same family of powerful language models that the company has used to improve language understanding across its products.
“By developing this leading-edge infrastructure for training large AI models,” says Scott, “we’re making all of Azure better. We’re building better computers, better distributed systems, better networks, better datacenters. All of this makes the performance and cost and flexibility of the entire Azure cloud better.”
Related articles:
Microsoft invests $1B to pursue ethical AI
Microsoft previews real-time AI platform for Azure
COVID-19 HPC Consortium masses supercomputing resources against virus
Microsoft, Cray collaborate on deep learning at scale