Industry experts have defined five levels in the progression of autonomous driving. Each level describes the extent to which a car takes over the task and responsibility from the driver, and how the car and driver interact. A feature such as adaptive cruise control is an
example of an Advanced Driver Assistance Systems (ADAS) and can be considered a Level 1 capability. Currently, some new cars appearing on the market are achieving Level 2 functionality, but as an industry, we have barely scratched the surface of ADAS systems, let alone full autonomy.
The levels of autonomous driving
As we go through the levels of autonomy, processing power will be vital to achieving the vision of full autonomy, where the driver can have “hands off, eyes off and brain off”. At this stage, people in the car are just passengers and as there is no driver, there is no need for a steering wheel. However, before we get there we should first understand the various levels between non-autonomous to fully autonomous driving.
There are three main elements to ADAS/AV; sensing, compute and actuation.
Sensing captures the current state of the world around the vehicle. This is done using a mix of sensors – RADAR (long and medium range), LIDAR (long range), Camera (short/medium range), infrared and ultrasonic. Each of these senses and captures its own variant of the surrounding environment that it ‘sees’. It locates objects of interest and importance within this view, such as cars, pedestrians, road signs, animals and the curvatures of the road.
The compute stage is the decision-making phase. It is where the information from these various views are merged together to create a greater understanding of what the car is ‘seeing’. For example, what is happening in the scene? Where are the moving objects? What are the predicted movements and what course corrections should the car make? Does it need to brake and/or does it need to steer into another lane for safety?
Actuation, the final stage, is where the car applies this and takes action, potentially overriding the driver. It could be to brake, accelerate or steer towards a safer path. This can either be because a warning has not been heeded by the driver to take action and a collision is imminent or it can be the standard operation of a fully autonomous system.
Level 2 is really the start of the ADAS journey where multiple separate functions can be specified in a safety package such as automatic emergency braking, lane departure warning or lane-keep assist.
Level 3 is currently the leading edge of production cars; for example, the 2018 Audi A8. This means that the driver can be ‘eyes off’ for a period of time but must be able to take over immediately in case of an issue.
Both Level 4 and 5 offer essentially full autonomous driving. The difference between them is that at Level 4, driving would be limited to geo-cached areas such as major highways and smart cities as they would rely heaving on roadside infrastructure to maintain a millimetre accurate picture of where they are.
Level 5 would allow autonomous driving anywhere. At this stage, the car might not even have a steering wheel and the seats may not all be front facing.
The processing power required for autonomous driving
At each level of autonomous driving, the processing power required to handle all the data increases rapidly. As a rule of thumb, one can expect a 10x processing increase going from one level to the next. For full autonomous driving (Level 4/Level 5) we are looking at tens of Teraflops of processing.
From a sensors perspective, the table below gives you an indication of what is required. Level 4/5 will require up to eight cameras, though even higher numbers have been mooted already. Image capture will be at two megapixels resolution at 30-60fps. To handle all of this in real time is a huge processing undertaking. Radar could require as many as 10 devices with a mix of short, medium and long range (100m+) in the 22GHz and 77GHz space Even at Level 2, there is still a significant amount of processing to do as data is being captured from both camera and radar.
For the processing, we will focus on what the camera is required to do. This is the primary sensor (in conjunction with a front-facing radar) to support Autopilot as used, for example, in a Tesla.
The camera system is normally a wide-angled mono or stereo camera, forward facing or in a surround view (360) configuration on the car. Unlike radar and lidar, camera-sensing equipment is only as good as the software processing the inputs; the camera resolution matters but not as much as you would expect.
To simplify matters we use a primary algorithm called a Convolutional Neural Network (CNN). The CNN is a highly specialised and efficient way of extracting and classifying information from camera sources. In our car example, it takes input from the camera and identifies lane markings, obstructions, animals and so on. A CNN is not only capable of doing pretty much everything a radar and lidar can do but is also able to do much more – such as reading signs, detect traffic lights, road composition etc. Indeed, certain Tier 1s and car OEMs are looking at reducing cost by having a camera plus radar set up.
CNNs bring the element of machine learning to the car. The structure of the neural network is based broadly on how our own brain is wired. One must first pick the type of network that you want to implement and its depth in terms of the number of layers. Each layer is effectively a set of interconnected nodes to the preceding and following layers. To make the neural network, smart sets of training data are applied to it (this is a highly compute-intensive operation and largely happens offline). With each pass i.e. images and video of a road situation, the network learns by tweaking coefficients within the various layers. These coefficients may be improved upon millions of times as the training data is passed through it. Once the training is done the network and coefficients can be loaded into structures such as CPU or GPU compute or specific CNN accelerators.
One of the beauties of this type of algorithm and network is that it can be updated with newer or better coefficients so it is constantly improving. For broad comparison, we have seen CNNs running on GPUs (compute) 20x faster and at much lower power than current high-end embedded multicore CPUs. Equally, with a move to hardware acceleration of CNNs, we have seen a further 20x performance improvement and yet further improvements in power dissipation.
Looking to the future
As we head towards a driverless-car future, the compute power necessary will expand with the number of the sensors, the frame rates and resolutions. Convolution neural networks are emerging as the most efficient way of interpreting image data both from a performance and power viewpoint. This leads to the ability to place more processing at the edge of network i.e. in the automotive case, within the car itself, rather than offloading this processing to the cloud and relying on an always-on cellular connection. Opportunities abound in autonomous driving for those delivering the processing capability, algorithms and training data to make it a reality.
About the author:
Bryce Johnstone is Director of Automotive at Imagination Technologies – www.imgtec.com