
Processing horsepower for the convergence of data streams
Multimedia navigation systems have been available in all categories of car for a few years now, as the boom in smartphones and mobile internet have led to a significant increase in demand for them from customers. New assistance systems, such as traffic sign recognition and camera-based parking, have made their debut in the mass market. Cars that drive themselves autonomously may be a long way off yet, but the trend is obvious: consumer electronics’ wealth of functions and computing power is moving into our cars.
One common issue facing automotive suppliers is that they need to plan a long way ahead. Unlike the consumer technology market, they need to ensure long-term viability, as their products have long development times and a lifecycle of more than ten years. However, nobody can deny that the automotive market is subjected to very high cost pressure. Electronic component developers usually deal with this by selecting the least expensive processor for their system that just about meets requirements. In order to avoid the need to redesign and redevelop the software from scratch, they also need scalable application processors – that way, they can use a single product for a cost-effective entry-level system as well as a premium device.
Against this backdrop, the way these applications are used in the automotive area differs considerably from the mobile phone market. For example, they must be able to withstand temperatures of over 100°C and have a lifecycle of up to 15 years. High ambient temperatures have a negative effect on almost all system characteristics. Components wear out exponentially faster. Developers need to set electrical parameters to handle a temperature range of more than 150 kelvins – and that is no trivial challenge at GHz frequencies on the circuit board. By way of comparison, a typical smartphone is only designed for a range of about 0°C to 35°C.
Mastering thermal management for a multimedia or driver assistance system is a real challenge. With smartphone application processors, battery life is paramount – but with automotive devices, the focus is on the physical feasibility of integrating the highest computing performance into a car. Developers have a limited choice of where to mount the device and space is tight. Camera systems can be installed in the rear-view mirror, while rotating ventilators are not practical as they have trouble fulfilling demands for long lifecycle.
It is much easier for engineers to solve the thermal management issue if the device has low heat dissipation to start with. The core of the problem is the processor, because it can easily make up 20 to 50 per cent of total heat dissipation. It is concentrated on a few square centimetres of circuit board, which usually also contains other high-performance components like the system memory. This is one reason why thermal aspects are taken into account during the circuit board design process. The device’s casing needs to conduct heat away and is often in direct thermal contact with the processor. All these points contribute to making the end system more expensive.
So what is the best way to optimise the processor’s power consumption? To answer this, we need to look at each of the different causes of power dissipation separately. Every transistor switch generates electrical losses by switching the smallest charges, which are characterised by dynamic power consumption. The level of these losses depends on the clock frequency being used. A lower clock frequency will generate correspondingly low losses. Although each new semiconductor generation has reduced the power consumption per transistor, the number of transistors used has increased, as has the clock speed.
While structure widths up to about 90nm only cause dynamic losses, deep sub-micron architectures also feature static losses – in other words, leakage current. This arises due to unavoidable tunnel effects, regardless of whether the transistor is switching or not. Leakage current is extremely temperature dependent – it doubles every 25 to 30 kelvins. So although leakage current in a smartphone at room temperature is still at a reasonable level, it can be ten times higher in an automotive environment and even exceed dynamic power consumption. A transistor needs to be designed for the maximum clock speed possible, yet fast transistors also tend to have high levels of leakage current. That is why it is becoming standard practice in chip design to use libraries of transistors at varying speeds. The fastest ones are only used if the design really requires them. A reduction in the highest clock speed to meet the application’s requirements plays a big part in reducing power consumption.
Renesas took all these issues into account when designing and developing the R-Car, its new, second-generation application processor family for automotive applications. The company made good use of its many years of chip design experience right from the outset. Renesas’ top priority for this product was to solve the performance vs. power conundrum – in other words, to double performance level while remaining true to its “Design for Power” strategy. Combining these two rather contradictory goals resulted in the need for a few optimisations.
In order to achieve the performance goals, Renesas developed the R-Car product using the latest 28nm process. This process enables the integration of roughly twice the number of transistors on the same chip surface compared to 40nm technology. Despite the smaller structure width, Renesas managed to keep leakage current at the same level while reducing dynamic losses by about 20 per cent per transistor.
Implementing the huge processing power of over 25,000 Dhrystone MIPS was only possible with the integration of an ARM Cortex A15 quad-core at 1.4 GHz. This quad-core is joined by another almost 800 MHz Cortex A7 quad-core, which helps out its bigger buddy by seamlessly taking on software-related tasks if it is not too busy doing something else. If it is, all eight processors are able to work together in parallel.
This concept, which ARM calls “Big.LITTLE”, makes it possible to turn off the large A15 cores completely, reducing leakage current levels to zero. Renesas has refined this technique further in the R-Car and enables up to 12 voltage domains to be controlled separately (see Fig. 1). In addition, the maximum clock speed of each of these voltage domains can adapt dynamically to the application’s requirements. If system activity is low, it would be possible to run just one A7 core at a further reduced clock speed and turn off the seven other processors, including the caches. This ticks all the boxes – providing plenty of power while minimising power consumption and leakage.
Fig. 1 Software-controllable power domains
Despite the CPU core’s high computing power, there are still some tasks that need even more. Yet a further increase in clock speed would mean sacrificing low power consumption. The solution is to use dedicated hardware accelerators within the application processor. They have been developed to handle specific tasks and can get by with a fraction of the power supply. Renesas has implemented several of these accelerators in the R-Car. DSP cores handle audio processing, while accelerator IPs support video processing. Every R-Car includes one or more video decoders, which are used to relieve the CPU of HD video decoding tasks. When a user plays a video, seven of the eight CPU cores can be turned off. Image recognition for high-definition camera images is too demanding for a single CPU. This is why the R-Car includes a special image recognition processor – it combines the real-time functions required with low power consumption (See Fig. 2).
Fig. 2 R-Car H2 hardware accelerators
Of all the hardware accelerators, the 3D graphics processor is the largest in every respect. These are often larger than many CPU cores nowadays, and run at similar clock speeds. They can easily use 30 per cent or more of the overall power consumption. Each new generation has about four times more 3D performance than the preceding one because of the growing use of high-definition screens. Choosing the right accelerator for the processor’s performance category makes a significant contribution to reducing power consumption. For the new generation of the R-Car family, Renesas is again using IPs from Imagination Technology. The smallest product in the range, the R-Car E2, provides about the same performance as the R-Car M1 – the previous mid-range product – while the R-Car H2 delivers 8 times more performance, which is four times more than the previous generation while maintaining full software compatibility. A key benefit here is that this increase in power does not come at the cost of power consumption, which has hardly increased at all.
All these hardware accelerators share their memory with the processors. That enables powerful data transmission between the various components while remaining cost-effective. This is known as Unified Memory Architecture (UMA), but its disadvantage is that the available memory bandwidth can turn into a bottleneck. An increase in the performance of the application processors goes hand in hand with an increase in memory bandwidth, which has to work within tight constraints. Although other applications could get round this issue by simply using wider memory buses, that solution is not practical here. Developers would have trouble using memory buses wider than 64 bits due to the broad temperature requirements and the high level of cost pressure in the automotive market. Increasing the clock speed also has its limits, as times for a single bit are less than 600ps while the signal delay is twice or three times longer on the circuit board. This situation is improved by the use of a multi-step cache concept that avoids unnecessary memory access from the outset.
The R-Car family includes an integrated, scalable cache solution. Each CPU has the usual combination of data and command caches, and the A15 and A7 cores each have their own L2 cache. This means that most data and command access can be kept out of the external memory. Without this type of caching, the eight cores would generate 60GB/s memory transfers, while with it this value can be reduced to a much more reasonable 3GB/s. The R-Car H2 uses an additional system cache that is available to all hardware accelerators, including the image recognition processor and the audio DSPs. The 3D graphics accelerator uses tile-based rendering, which uses on-chip memory to render graphics and minimises access to external memory. The external DDR3 SDRAM memory interfaces can be scaled from 16-bit to 64-bit providing memory bandwidth between 2.7 GB/s and 12.8 GB/s, depending on the performance required and the available power dissipation budget (See Fig. 3).
Fig 3: Cache architecture
With all these measures, Renesas has succeeded in developing a scalable application processor family (see Fig. 4) that enables customers to benefit from the performance of the fastest mobile processors available – devices designed to mitigate the challenges in the automotive field. It is now possible to achieve power consumption of under 5W for typical navigation applications. With the integration of image recognition processors and several video interfaces, the R-Car is well equipped for the upcoming driver assistance systems. The R-Car compatible product range allows developers to choose the appropriate product for their application and simply upgrade if the application requires it.
Fig. 4 R-Car generation 2 – A scalable family
About the author:
Peter Fiedler is Manager for Automotive Information Systems in the MCU Marketing & Engineering division of the Automotive Business Group, Renesas Electronics Europe.
