Computer vision is complemented by deep neural networks
To date, the industry has pursued the traditional computing approach. Different sensors, such as lidar, camera or radar, are computed individually by the sensor layer using hardware accelerators and create individual object lists. The fusion layer creates a 360° environmental model with grid or tracking based algorithms from all surrounding sensors. This model is then processed into the abstraction layer, where functions, such as free space detection or time-to-collision, run to determine the key parameters for the driving strategy. The following application layer uses these parameters to perform route or motion planning and to generate the commands for the actuation layer. This traditional approach can be used to implement NCAP applications and applications with an automation level up to Level 3.
The revolutionary approach would be provided over deep learning (DL), redundantly to the traditional computing path mentioned above. The DL can perform tasks such as semantic segmentation, remapping (like SLAM: simultaneous localisation and mapping), data extraction and the determination of the driving strategy. There are two possible approaches: unsupervised DL (end to end) and supervised DL. The DL layer takes control of the vehicle and provides the driver with functions that allow more complex manoeuvres in traffic, though keeping traditional CV methods to monitor the decisions from the DL layer.
Deep learning outclasses traditional computer vision – at least sometimes
Since 2011, DL methods have clearly outclassed traditional CV methods in image-based classification in terms of accuracy, especially using DNNs and in particular convolutional neural networks (CNNs). Nevertheless, the question arises as to how far DNNs also bring advantages in the automotive world.
For that purpose, the KittiVision benchmark contest can give a good indication of the state of the art for given classes of detection, evaluating which of the top 20 ranking use traditional CV based or DL based approaches.
- KITTI Road Detection benchmark: around 80 percent of the top 20 algorithms are based on CNN methods
- KITTI Automotive Stereovision benchmark: around half of the top 20 algorithms are based on CNN approaches
- KITTI Automotive Optical Flow (which finds movement in the image): 80 to 90 percent of the top 20 algorithms remain with traditional CV methods.