
Nvidia claims breakthroughs in AI language understanding
The company says its AI platform is the first to train one of the most advanced AI language models – BERT (Bidirectional Encoder Representations from Transformers) – in less than an hour and complete AI inference in just over 2 milliseconds – a level of performance that makes it possible for developers to use state-of-the-art language understanding for large-scale applications that can be made available to hundreds of millions of consumers worldwide.
While limited conversational AI services are not new, until now it has been extremely difficult for chatbots, intelligent personal assistants, and search engines to operate with human-level comprehension due to the inability to deploy extremely large AI models in real time. The company says it has addressed this problem by adding key optimizations to its AI platform — achieving speed records in AI training and inference and building the largest language model of its kind to date.
“Large language models are revolutionizing AI for natural language,” says Bryan Catanzaro, vice president of Applied Deep Learning Research at Nvidia. “They are helping us solve exceptionally difficult language problems, bringing us closer to the goal of truly conversational AI. Nvidia’s groundbreaking work accelerating these models allows organizations to create new, state-of-the-art services that can assist and delight their customers in ways never before imagined.”
The company says it has fine tuned its AI platform with key optimizations that have resulted in three new natural language understanding performance records:
- Fastest training: Running the large version of BERT, one of the world’s most advanced AI language models, an Nvidia DGX SuperPOD using 92 Nvidia DGX-2H systems running 1,472 Nvidia V100 GPUs slashed the typical training time for BERT-Large from several days to just 53 minutes. Additionally, Nvidia trained BERT-Large on just one Nvidia DGX-2 system in 2.8 days – demonstrating Nvidia GPUs’ scalability for conversational AI.
- Fastest inference: Using Nvidia T4 GPUs running Nvidia TensorRT, Nvidia performed inference on the BERT-Base SQuAD dataset in only 2.2 milliseconds – well under the 10-ms processing threshold for many real-time applications, and a sharp improvement from over 40 milliseconds measured with highly optimized CPU code.
- Largest model: With a focus on developers’ ever-increasing need for larger models, Nvidia Research built and trained the world’s largest language model based on Transformers, the technology building block used for BERT and a growing number of other natural language AI models. Nvidia’s custom model, with 8.3 billion parameters, is 24 times the size of BERT-Large.
The company has made the software optimizations available to developers. Continuous optimizations to accelerate training of BERT and Transformer for GPUs on multiple frameworks are freely available on NGC, the company’s hub for accelerated software. Nvidia’s BERT GitHub repository has code today to reproduce the single-node training performance cited by the company, and in the near future will be updated with the scripts necessary to reproduce cited large-scale training performance numbers.
To see the Nvidia Research team’s natural langusge processing (NLP) code on Project Megatron, which the company launched to investigate billion-plus Transformer-based networks, visit the Megatron Language Model GitHub repository.
Related articles:
Conversational AI platform offers out-of-the-box configurations
Microsoft buys conversational AI startup
Microsoft nabs another chatbot startup
