Artificial Intelligence in Autonomous Driving
The development of the most advanced driver assistance systems (ADAS) in the industry should be based on integrated and open platforms. A complete solution is required for development, simulation, prototyping, and implementation to enable smarter, more sophisticated ADAS, and to pave the way for the autonomous car. This article summarizes the current status of DNN-based deep learning architectures built on top of a supercomputer on wheels, which are integrated in platforms to drive the future of autonomous vehicles.
What is deep learning?
Deep learning is the most popular approach to develop AI. It is a way to enable machines to recognize and understand the world they are intended to operate in. Neural networks are a collection of simple, trainable mathematical units, which collectively learn complex functions like driving.
Deep learning is the process of turning data into decisions of a computer program. The significant difference to algorithm-based systems is that once the basic model is established, the deep learning system learns on its own how to fulfill the intended tasks. These tasks range from tagging images and understanding spoken languages, to enabling drones to carry out independent missions and empowering cars to drive themselves. Deep learning emulates the way the human brain learns about the world, recognizing patterns and relationships, understanding language and coping with ambiguity.
Neural networks are inherently parallel models. Therefore, they fit very well to multicore GPUs, which can be found across industries such as PCs, robotics and automotive. GPUs take full advantage of this parallelism and are perfectly suited for the definition, training, optimization and deployment of deep learning systems. According to Popular Science, “the GPU is the workhorse of modern A.I.”
A simple example of the progress of deep learning is the ImageNet Large Scale Visual Recognition Challenge. This challenge evaluates algorithms for object detection and image or scene classification from thousands of images and videos at large scale. Until 2012, the rate of recognizing objects improved slowly (lower than 70% recognition rate) based on traditional computer vision (CV) algorithms. In 2012, the introduction of deep learning caused a major jump to the low 80% range and has now exceeded the 95% mark. Deep learning has now replaced the use of CV in this contest (see figure 1).
Deep Learning in the High-Tech Industry
Facebook was one of the first companies to adopt GPU accelerators to train DNNs. DNNs and GPUs play a key role in the new “Big Sur” computing platform and in the Facebook AI Research (FAIR) purpose-built system, which is specifically designed for neural network training. Facebook describes its goal as to advance the field of machine intelligence and developing technologies to give people better ways to communicate.
Google is also heavily investing in deep learning processes. TensorFlow is the second generation of Google’s machine learning system, built to understand very large amounts of data and models. It is very flexible in its architecture and has been applied to various kinds of perception and language understanding tasks like recognition and classification of images, speech and text across many applications (email, robotics, natural language processing, maps, etc.). Google uses thousands of GPUs and experiences tenfold performance improvements compared to CPU equivalents.
According to Anelia Angelova, a research scientist at Google working on CV and machine learning, her company is also exploring deep learning based on cascaded DNNs in its self-driving car project for pedestrian detection.
Figure 2 shows the major building blocks of the self-driving loop. The goal is to sense 360 degrees around the vehicle via cameras, lidar, radar and ultrasonic sensors. This allows algorithms to accurately understand the full 360-degree environment around the car to produce a robust representation, including static and dynamic objects. Use of DNNs for the detection and classification of objects dramatically increases the accuracy of the resulting sensor data fusion. This data is then used in the steps of perception, localization, planning and decision of vehicle trajectory.
The first step, called “perception,” covers sensor fusion (combining the various sensor data), object detection (“there is an object”), classification (“it is a pedestrian”) and segmentation (“to the right side”) for detection and tracking (“X is moving to the left”).
The second step, called “localization,” includes map fusion (different map sources), landmarks and GPS trilateration. To know the exact position of an autonomous vehicle is important for the vehicle’s system to be able to safely drive down the road. The ability to integrate high-definition mapping data, such as the one of map leader HERE, is crucial to calculate the exact position of the vehicle.
Finally, the “path planning” step encompasses trajectory and behavior of the vehicle. A driverless car needs to be able to safely navigate around any potential hazards in a highly dynamic environment. Sophisticated algorithms are employed to calculate free space (where the vehicle can drive safely) as well as anticipate how the environment may change. In addition, the vehicle must move in a smooth manner to avoid disturbing the occupants as well as other vehicles and their drivers. Complex path planning takes all these factors into account to ultimately deliver a safe and enjoyable experience.
A smart camera is not enough though. Each step also requires DNN for perception, localization and planning. Objects need to be detected and classified, landmarks need to be recognized, driving behavior needs to be adapted and decisions need to be made. Also DNN represents an open platform, which can be extended and maintained by the different car OEMs and tier 1 suppliers to build their own solutions and distinguish themselves from their competitors.
Deep Learning Flow
DNNs represent multiple processing layers of neural networks. In object recognition, the neurons in the first layer detect edges, while the neurons in the second layer recognize more complex shapes, like triangles or rectangles built up from edges. In the third layer, even more complex shapes are distinguished, and so on. As a result, the right number of neuron layers and characteristics, called a neural net framework, must be picked to solve a particular problem.
The self-driving project is challenging because the driving situation in crowded cities is very complex and unpredictable. Therefore, many different sensors and data need to be combined in order to localize the vehicle, perceive the driving situation, plan the driving path and control the vehicle.
This scenario is perfect for deep learning. Now, a neural net framework model like Caffe can be chosen for training. Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center and by community contributors. It was specially developed with expression, speed and modularity in mind, and it is therefore well suited for the self-driving challenge.
As a next step, the selected framework needs to get trained on a task, for example, object recognition and classification. Like training in sports, training a DNN requires a coach, someone who instructs the neural network how to respond to a trigger. The scoring functions determine the difference between the desired response and the actual response, called the prediction error (figure 4). At each neuron in the network, the error is used to adjust the weight (or bias) values of the connections between the neurons in the model. As a result, there will be fewer errors in the network response to a trigger for the same inputs. This tuning of the weights happens in response to external stimuli (e.g., driving scenes), without direct intervention by a programmer.
To be able to execute trainings, the developer first has to create a database from driving scene images and prepare them for training by also labeling them with the right object type (e.g., Audi A7) or correct driving decisions. Once a database has been developed, the framework model can be configured and the training executed.
Next, the trained network is verified in offline driving tests based on recorded or simulated driving scenarios. After its verification, it can be deployed in an autonomous driving ECU and road tested. The same steps have to be followed for other DNN-based parts (e.g., trajectory planning) of the autonomous driving system. Consequently, an end-to-end system is needed to train, test and then deploy the neural network.
Figure 5 shows a realistic driving scene. The view above is the actual recorded driving scenario on an U.S. highway. The data is fed into the DNN-based autonomous driving system. The window below visualizes the results. The white vehicle in the center, which is actually driving, recognizes the two vehicles to the left (gray) and the one behind to the right. Based on their relative speed, position and other data, the path planning system calculates possible trajectories (green lanes) and ultimately makes a decision where to drive (stay on current lane).
NVIDIA DRIVE™ Solution
NVIDIA offers an integrated platform (figure 6) for training, testing and deployment of autonomous driving vehicles. NVIDIA DRIVE solutions give automakers, tier 1 suppliers and automotive research institutions the power and flexibility to develop systems that enable cars to see, think and learn. The solution platform starts with NVIDIA DGX-1, a deep learning supercomputer that can be used to train DNNs by exposing them to data collected while a vehicle is driving. On the other end, is NVIDIA DRIVE PX 2 (an autonomous driving car computer), which draws on this training to make inferences to enable the car to progress safely down the road. The connection between those two is NVIDIA DriveWorks, a suite of software tools, libraries and modules that accelerate development, simulation and testing of self-driving vehicles, including integration of HD mapping.
DriveWorks enables sensor calibration, acquisition of surround data, synchronization and recording to process streams of sensor data through a complex pipeline of algorithms running on the DRIVE PX 2’s supercomputer on wheels.
NVIDIA used the NVIDIA DRIVE solution to develop its own object detection system with a neural network framework called DRIVENet. Over the course of five months, the solution achieved the No. 1 score on the KITTI benchmark and DRIVENet was executed in real time. The top five scores are also executed on NVIDIA GPUs. The KITTI vision benchmark suite was developed by Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago and compares different object recognition implementations.
Many automotive companies are already using NVIDIA’s deep learning technology to power their efforts in driving autonomously, and are getting speed-ups of 30-40X while training their networks, compared to the use of conventional technologies. BMW, Daimler and Ford are among them, along with innovative Japanese startups like Preferred Networks and ZMP. Audi was able to train a DNN in four hours to outperform a smart camera that took two years to develop with a third party. Volvo Cars is using DRIVE PX 2 in real cars that are going to be driving around Gothenburg, Sweden in its Drive Me project.
According to BI Intelligence, by 2020, an estimated 10 million cars will be driving around with autonomous driving features. A significant number of them will encompass AI to perceive the environment, localize their position and navigate the most complex traffic situations.
The competition to master the autonomous driving challenge has just begun, and more and more new companies are entering the race. A common theme among the startup companies like Future Mobility, Atieva, Faraday Futures is the creation of an autonomous electrical vehicle with fewer centralized ECUs vs. the 100+ ECUs in established OEMs and significant funding. Ultimately we will see more and more computational horse power in the vehicle and a consolidation of distributed ECUs to centralized car computers creating an autonomous supercomputer on wheels.
- Introduction to deep learning, GTC 2015 Webinar, NVIDIA, July 2015 https://on-demand.gputechconf.com/gtc/2015/webinar/deep-learning-course/intro-to-deep-learning.pdf
- The Crown Jewel of Technology Just Crushed Earnings, Ophir Gottlieb, Feb 17 2016, Capital Market Laboratories https://ophirgottlieb.tumblr.com/post/139506538909/the-crown-jewel-of-technology-just-crushed
- Google’s release of TensorFlow could be a game-changer in the future of AI, David Tuffley, November 13, 2015, PHYS.ORG https://phys.org/news/2015-11-google-tensorflow-game-changer-future-ai.html
- Facebook Open-Sources The Computers Behind Its Artificial Intelligence, Dave Gershgorn, December 10, 2015, Popular Science https://www.popsci.com/facebook-open-source-hardware-behind-artificial-intelligence
- IMAGENET Large Scale Visual Recognition Challenge (ILSVRC), https://www.image-net.org/challenges/LSVRC/
- Facebook AI Research (FAIR), https://research.facebook.com/ai
- Google’s Open Source Machine Learning System: TensorFlow, Mike Schuster, Google, January 15 2016, NVIDIA Conference, Tokyo,