The system processes requests with ultra-low latency and is designed for real-time AI, which is becoming increasingly important as cloud infrastructures process live data streams such as search queries, videos, sensor streams, and user interactions. According to the company, Project Brainwave achieves both performance and flexibility advancements in this area.
The system comprises three main layers: a high-performance, distributed system architecture; a hardware deep neural network (DNN) engine synthesized onto FPGAs; and a compiler and runtime environment for low-friction deployment of trained models.
By attaching high-performance FPGAs directly to its datacenter network, the company is able to serve DNNs as hardware microservices – where a DNN can be mapped to a pool of remote FPGAs and called by a server with no software in the loop. This approach, says Microsoft, both reduces latency – by relieving the CPU of the need to process incoming requests – and allows very high throughput with the FPGA processing requests as fast as the network can stream them.
System flexibility is enhanced through the use of a “soft” DNN processing unit synthesized onto commercially available FPGAs, as opposed to a hard-coded DNN processing unit approach. By doing so, the design combines both the ASIC digital signal processing blocks and synthesizable logic on the FPGAs to be able to scale across a range of data types, with the desired data type being a synthesis-time decision.
The system was designed to show high actual performance across a wide range of complex models, with batch-free execution. It can handle complex, memory-intensive models such as long short-term memory units (LSTMs), says the company, without using batching to “juice” throughput.
The Project Brainwave system was demonstrated at the recent Hot Chips conference. Ported to Intel’s new 14-nm Stratix 10 FPGA, the system was able to run a large gated recurrent unit (GRU) model – five times larger than Resnet-50 – with no batching, achieving “unprecedented levels of demonstrated real-time AI performance on extremely challenging models.” Further performance improvements are expected as the system is tuned over the next few quarters.
This is an embedded Microsoft Office presentation, powered by Office Online.
Project Brainwave is designed to support a wide range of popular deep learning frameworks, including Microsoft Cognitive Toolkit and Google’s Tensorflow. The company is working to bring the system to its Azure cloud computing platform.
AI wars heat up with new Microsoft research lab
Microsoft brings AI to its HoloLens AR headset
Microsoft acquires deep learning startup
Qualcomm buys AI firm, lays out future vision
AI’s ‘human side’ is focus of new Google initiative