Deep Neural Networks – only in combination with traditional computer vision

Deep Neural Networks – only in combination with traditional computer vision

Technology News |
By Christoph Hammerschmidt

Renesas is one of the largest MCU/SoC manufacturer for the automotive market. This position is also to be maintained in the future, which is why the company is focusing on ADAS applications and autonomous driving (AD). These are two areas that promise high growth rates and for which Renesas – thanks to its core competences in the areas of control (MCUs) and computing (SoCs) – is particularly well positioned.

The SAE International’s J3016 standard identifies six levels (Level 0 to Level 5) of driving automation that range from “no automation” to “full automation”. Since 2010, driver assistance functions (ADAS) such as ACC (Automatic Cruise Control), LKA (Lane Keep Assist), LCA (Lane Change Assist) or CTA (Cross Traffic Alert) covering Level 1 have been available in vehicles. In the next step, these systems were integrated further to implement functions such as AEB (Automatic Emergency Braking) or TJA (Traffic Jam Assist). The highway pilot is a level-3 application, where the driver is still responsible and must be able to intervene at short notice.

Fig. 1: The evolution from driver assistance systems
to automated driving

We assume that commercial vehicles with the city pilot function will not be available in volume production before 2025 due to the increasing level of complexity. Level 5, in other words fully autonomous vehicles, is a degree of automation that will only be achieved in the long term.

In the area of parking, the situation is similar. Initially, the driver is only assisted although the vehicle may also be able to park itself into a parking space. Renesas and Nissan recently announced their collaboration for ProPILOT Park, which is available in the latest Nissan Leaf. The Valet Parking at low speed is connected with level 4 and the driver does not need to control the car anymore. It is interesting to see how new technologies, which make a new feature possible, are being made accessible.

Highest expectations encounter low degree of maturity

The advancement of technologies and markets almost always follows a similar pattern. Initially, a new technology emerges as an innovation trigger that enables new applications. This is typically associated with high expectations on the customer side, which are in stark contrast to the maturity level of the technology. This is followed by a disillusionment phase, in which it will be decided whether a technology will be successful at all. After the disillusionment, a phase follows in which the technology not only shows what it can really do, but also achieves high growth rates. This is followed by the time when the technology is established (plateau of productivity) and shows low but stable growth rates (around 3 percent).

Fig. 2: Market adoption of new technologies

ADAS and AD technologies precisely follow this curve. Building on existing control technologies (e.g. steering control, drive control), GPS and HMI technologies, sensor technologies (radar, ultrasound, camera etc.) for Level 1 applications were developed years ago. These applications have already reached the plateau of productivity or are well on the way there. By 2015, the industry was concentrating on the development of technologies – for example, cameras and radar with more intelligence at lower cost – which now have the highest growth rates. Vehicles with Level 3 applications will be available starting 2019 for volume production, and although Level 4 and 5 are currently the most talked about, the maturity of the technologies is still quite low. This also applies to deep neural networks (DNNs), which are considered already to be very promising in the next years, but are expected to be even more relevant for Level 4 and 5 applications. But their efficacy still need to be proven across all cases.

Computer vision is complemented by deep neural networks

To date, the industry has pursued the traditional computing approach. Different sensors, such as lidar, camera or radar, are computed individually by the sensor layer using hardware accelerators and create individual object lists. The fusion layer creates a 360° environmental model with grid or tracking based algorithms from all surrounding sensors. This model is then processed into the abstraction layer, where functions, such as free space detection or time-to-collision, run to determine the key parameters for the driving strategy. The following application layer uses these parameters to perform route or motion planning and to generate the commands for the actuation layer. This traditional approach can be used to implement NCAP applications and applications with an automation level up to Level 3.

Fig. 3: Computer vision complemented by deep learning

The revolutionary approach would be provided over deep learning (DL), redundantly to the traditional computing path mentioned above. The DL can perform tasks such as semantic segmentation, remapping (like SLAM: simultaneous localisation and mapping), data extraction and the determination of the driving strategy. There are two possible approaches: unsupervised DL (end to end) and supervised DL. The DL layer takes control of the vehicle and provides the driver with functions that allow more complex manoeuvres in traffic, though keeping traditional CV methods to monitor the decisions from the DL layer.

Deep learning outclasses traditional computer vision – at least sometimes

Since 2011, DL methods have clearly outclassed traditional CV methods in image-based classification in terms of accuracy, especially using DNNs and in particular convolutional neural networks (CNNs). Nevertheless, the question arises as to how far DNNs also bring advantages in the automotive world.

For that purpose, the KittiVision benchmark contest can give a good indication of the state of the art for given classes of detection, evaluating which of the top 20 ranking use traditional CV based or DL based approaches.

  • KITTI Road Detection benchmark: around 80 percent of the top 20 algorithms are based on CNN methods
  • KITTI Automotive Stereovision benchmark: around half of the top 20 algorithms are based on CNN approaches
  • KITTI Automotive Optical Flow (which finds movement in the image): 80 to 90 percent of the top 20 algorithms remain with traditional CV methods.

CNN learning paradigm

DNNs have four crucial points: accuracy, network topology, data type, and the size of the data layer. These factors directly affect the inference part, which is the actual hardware accelerator. This hardware accelerator is also characterised by four parameters: performance, power consumption, memory bandwidth, and, of course, costs.

Fig. 4: DNN requirements in terms of hardware

In current state-of-the-art neural network design, there are two main paths going forward: increase the accuracy by increasing the performance MAC (MAC: Multiply Accumulate) or reduce the performance at the same accuracy level. As a rule of thumb: in order to increase the accuracy by 5 percent, the performance must be increased by a factor of 10 currently.

Another major research area to mitigate performance increase is to compromise the data type by adopting integer or even bit representation. The applicability of data type reduction strongly depends on the problem to solve. Nevertheless, the current state of the art shows 16-bit fixed point provided with a loss of 1% accuracy against 32-bit floating point.

Looking at the energy conception in the inference layer, two factors are particularly negative: memory accesses and floating-point computations.

  • 32-bit read access to the DRAM would consume 640 pJ (picoJoule) while SRAM access needs 5 pJ
  • 32-bit floating-point multiplication consumes 3.7 pJ, while an 8-bit integer multiplication only requires 0.2 pJ.

In order to achieve the lowest possible power consumption for embedded systems, the inference engines will specialize in integer computation (16-bit, possibly 8-bit considering higher loss in accuracy) and a memory-free architecture (minimising access to DDR and to local SRAM).

CNN inference paradigm

Traditional computing architectures, such as CPU and GPU, are currently the mainstream for both learning and inference of CNNs, taking advantage of both their high performance and high flexibility. However, these are not effective – especially from a power consumption point of view.

For a 5×5 convolution filter, a total of 50 read (data and operands), 25 MAC, and a write back are necessary. This means that three instructions are required per MAC, where the instruction efficiency is only around 30 percent. However, this is usually covered by architectural improvements such as VLIW or Superscalar.

From the energy point of view, this leads to around 425 pj for floating point computation, of which 60 per cent is due to the actual floating-point MAC operations – considering the data is in local caches. Moving to 16-bit fixed point integer, the energy consumption drops to 276 pJ, and only 10 per cent of this is then due to the actual MAC operations. As a result, an optimized CNN architecture can provide an improvement of a factor of 20 compared to traditional CPU/GPU architectures.

In addition, future requirements will need significantly higher performance. As noted above, if the accuracy needs to be increased by 5 percent, then the performance of the DNN must increase by a factor of 10.

The number of sensors and the input layer are also increased. Today’s 1 megapixel (MP) sensors will be replaced by multiple MP sensors (8 x 2MP or 4 x 4MP). In other words, the performance should be increased by a factor of multiples of 10 .

Today it is foreseeable that a performance of 4 TOPS (Tera Operations per Second) will be required in 2019 (SoP: Start of Production). By 2022 40 TOPS or even more may be required, in other words: an increase of a factor of multiples of 10 must be achieved.

Fig. 5: More performance with specialised hardware

Staying with traditional CPU/GPU architectures, the best factor to be achieved is x2 within 2 years at the same power consumption switching to the next technology process. The missing factor of 5 (or more) would have to be achieved by higher power consumption, easily bringing power consumption into the 50W range and more. A new paradigm in the computing architecture is required to keep embedded power in a controllable area by providing more dedicated hardware IP. This would not be as flexible as a CPU/GPU but allows much higher computing efficiency.

High performance/low power consumption – both are important

Accordingly, Renesas is pursuing a multi-faceted strategy for the R-Car hardware acceleration.

Renesas is initially focusing on the architecture level design to optimise the external memory bandwidth of the memory, as it has shown like in the R-Car V3M solution. This enables seamless heterogeneous IP integration while meeting the safety requirements of ISO 26262.

The next element of the strategy is to keep improving the existing IP set, for example with the use of more cores, higher frequency and more specialised computing elements.

Revolutionary acceleration can be achieved by integrating dedicated hardware IPs based on the state of the art algorithms to achieve the best performance/power efficiency.

Fig. 6: Architecture improvements leading to new IPs

As a result, Renesas continues to rely on ARM cores for standard computing tasks and on the IMP-X5, a heterogeneous computing subsystem for specialised processing. This is composed of a hard-wired part (for the computer-intensive portion and established parts of the DNN algorithms) and on programmable architecture for future-proof, flexible and programmable use with the CV engine.

Fig. 7: Heterogeneous accelerators: efficiency + flexibility
– a must-have for efficient performance

The IMP-X5 uses approaches from the IMP core, which the company launched in 2009 and has been integrated in Renesas products for more than 10 years. One of the first applications for this hardware accelerator was a neural network. Already in 2009, Renesas was able to show that the core performs much better than a standard CPU: to process NN algorithms the CPU needed 204 ms, while the IMP core took only 8.9 ms.

The hard-wired IMP core (8-/16-bit integer) is a computing unit with pixel and line interconnects. Thanks to the memory-centric architecture – the data is almost streamed by feeding the pipeline with one pixel per cycle and one pixel per cycle is stored – the memory accesses are minimised.

In the near future, the IMP core and the CV engine will be expanded by additional IPs, which enable, for example, a high-performance, energy-saving implementation of CNNs.

Renesas autonomy

Renesas autonomy provides an end-to-end platform for assisted and autonomous driving, covering requirements in the various market segments – radar, camera, cognitive and connectivity. All applications are based on different semiconductor platforms, but on the same software platform with Autosar and various operating systems.

Fig. 8: Renesas autonomy – the platform for advanced driver
assistance systems and automated driving

Renesas and its partners from the R-Car Consortium offer customers comprehensive development support. As a result, Renesas is able to deliver not only the hardware, but complete solutions, including drivers, hardware and software tools, operating systems, middleware, and application software. The goal is to enable developers to create new designs with the shortest time to market.

Renesas autonomy’s approach is to give developers freedom and support them with highly advanced solutions from Renesas and its partners. It has already achieved its first successes, confirming that the path of an open, innovative and secure platform is the right one.

About the author:

Cyril Cordoba is Manager Strategic Marketing, Renesas Global ADAS Center, Renesas Electronics.


If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles