Managing power in embedded applications using dual operating systems
In this product how-to article, TI’s Loc Truong describes how to use inter-processor communication and state machine design to reduce the overall system power in a heterogeneous dual-core system based on the company’s OMAP-L138 C6-Integra DSP + ARM processor running its in-house dual DSP/BIOS RTOS.
Energy consumption is becoming more of a concern as it is receiving an increasingly larger percentage of the overall operating costs. Imagine superstores with lines and lines of check-out lanes, each with a cash register, a credit-card reader, a scanner and a weight measuring station.
It is a waste if these equipments are not designed to be energy efficient with abilities to power down between customers or during non-operating hours. When multiplied by the number of stores, the number of cities and the operating life of the product, the total accumulated portion of the energy bill that could be saved is in the millions of dollars.
Many of today’s operating systems, like Linux, come with power management support. The features have been available on the mainstream kernel since Linux made headways to lower power portable devices like smart phones, tablets and ebook readers. So even though your design is a plugged-in appliance, you can embrace the “go green” initiative from the ground up by taking advantage of the power management features that are already in place and incorporate them.
In this article I will first review power savings techniques available with today’s powered (i.e. plugged-in) system-on-chip (SoC)-based embedded systems and quickly move on to the discussion of how two operating systems (OSes), each with its own power methodologies, can cooperate at the system level to provide power management services.
Chip and system hardware issues
There are two different components to the power equation from a silicon process stand-point: static, sometimes referred to as standby), and active. Static power is affected by leakage mainly and increases with temperature and supply voltage. Since leakage is a natural phenomenon that comes with shrinking process technology, the only way to really eliminate it is to shut that component down. Within the SoC, tactics employed so far include power islands, enabling part of the SoC to completely shut down.
On the other hand, active power, which does increase with supply voltage, but not temperature, depends on chip activity. Strategies here include:
1 – Dynamic voltage and frequency scaling (DVFS), where the voltage and frequency can be dynamically adjusted to adapt to the performance required
2 – Clock domain to gate off unused peripheral
3 – Dynamic power switching (DPS), where software can switch between power modes based on system activity. The “software” is usually part of the operating system
4 – Adaptive voltage scaling (AVS), a closed-loop, hardware and software cooperative strategy to maintain performance while using the minimum voltage needed based on the silicon process and temperature
From the system standpoint, operations needed for power management include the ability to:
1 – Go to standby (user-application- or system-initiated system service)
2 – Hibernate to memory or storage (user-application- or system-initiated system service)
3 – Suspend and resume (user-application-initiated system service)
4 – Transition to different power profiles (user application condition or state, system initiated and controlled)
Power can also be affected how the application code is designed. For example, input/output (I/O) buffers at the pin, memory controllers and especially double data rate (DDR) need to drive current. Unnecessarily moving data in and out of the SoC can waste energy.
Let’s take a look at the block diagram of a typical modern embedded system as shown in Figure 1 below. The processor is highly integrated and includes several types of processors and accelerators for application-specific needs as well as all the I/O peripherals to get the data in and out.
The system board has external voltage regulators for the different power rails in addition to battery and clock management support integrated circuits (ICs). It also contains external I/O modules and hot swappable devices.
To save energy, the application can take advantage of the internal memories by aligning code and data. In this way, algorithms in the pipeline can reuse buffers locally so that the I/O buffers at the pin level do not have to toggle needlessly.
Other techniques include matching data types to architecture, correct alignment and use power of two for array sizes to simplify address calculation. These techniques can help reduce power consumption because the lower MIPs required can lower the temperature. Some call this “energy coding,” the third optimization vector besides speed and code size.
Figure 1: Modern embedded system using complex SoC.
Power management is a concerted effort. Actions such as going into standby mode can involve a series of hardware and software steps. Therefore, to really “do the job right,” power management needs to be a system-level (i.e. where hardware meets software) design goal, especially if the processor is a complex SoC with multiple internal bus masters.
For example, for the “suspend” operation, the software has to take the hardware through the following actions:
1 – Notify drivers and pending tasks that the system is powering down
2 – Wait for the safe state to start the shutdown sequence
3 – Turn off I/Os and accelerators by gating power or clocks
4 – Save system state to memory (shown as mobile mDDR)
5 – Adjust voltage regulators to throttle down
6 – Set up battery management for suspend
7 – Transition clocking to a suspend state (usually involving just the real-time clock and mDDR running)
To get into the details of how power management is implemented, we now need to move our discussion on a real device and software.
Power management with Linux & DSP/BIOS on real hardware
On SoCs, there are often two on-chip processors: a general purpose processor (GPP) such as an ARM® core as well as a specialized core such as a digital signal processor (DSP) or graphics processing unit (GPU).
The ARM usually runs embedded Linux for I/O and graphical user interface tasks. A processor focused on signal processing needs a more deterministic and lightweight operating system such as the DSP/BIOS software kernel foundation to perform signal processing.
The chip is highly integrated with multiple I/O peripherals like Ethernet, USB, SATA, an LCD controller and more, as you can see in the following block diagram in Figure 2, below.
Each processor has equal access to the on-chip peripherals, enabling I/Os to be distributed between the two depending on system response time requirements.
For support of embedded systems requirements, the Linux kernel version 2.6 implements power management using a network of frameworks and drivers.
Figure 2: Power management implementation for embedded Linux
To help understand this, we just need to remind ourselves that there are key functions such as suspend, resume, idle, DVFS and the mechanism to achieve them involving controlling the central processing unit (CPU) (CPUIdle for CPU sleep states), the clocks (clock framework and tickless option), the voltage regulators (regulator framework) and the helper drivers (mainly I2C and SPI).
Specific power targets are defined by operating performance points (OPP), and governors manage how to transition between these OPPs. Recently, the concept of Power Management Quality of Service (PM QoS) was introduced to tie computing resources and capabilities in the hardware with latencies and throughput needs to define the minimum OPP required across platforms.
Going into details of the application programming interface (APIs) and data structure is beyond the scope of this article, but to help clarify work needed, Figure 2 earlier showed how power management is currently implemented for TI’s OMAP-L138 C6-Integra DSP + ARM processor evaluation module (EVM) via the embedded Linux port.
At the application level, power management policies like OPPs can be accessed via the sysfs interface.In the kernel space, the frameworks with their governors and drivers have Linux generic portions as well as platform (SoC)- and boar-specific driver portions that can be customized.
On the DSP/BIOS operating system, power management services are consolidated under the PWRM (power manager) framework. Similar to the Linux side, PWRM abstracts the low-level work needed to gate clocks and clock domains on/off, control DSP sleep modes and coordinate with the internal resources to govern the safe switching between OPPs. PWRM sits on top of the Power Scaling Library (PSCL) and a Power Management Interface (PMI) as in Figure 3 below.
Figure 3: DSP/ BIOS power management framework.
To coordinate between the Linux and DSP/ BIOS environments, a hardware mechanism is needed to communicate. Let’s look again at TI’s OMAP-L138 C6-Integra DSP + ARM processor for an example of an inter-processor communication works.
The inter-processor communication (IPC) hardware mechanism for the OMAPL-138 C6-Integra DSP + ARM processor is very straightforward as shown in Figure 4 below.
At the SoC level, five CHIPINT bits are allocated in the CHIPSIG memory-mapped register (MMR) located in the SYSCFG system configuration module for the signaling between the DSP and ARM.
Figure 4: OMAP-L138 C6-Integra DSP + ARM processor inter-processor communication (IPC) mechanism
Up to four signals can be mapped for the DSP to notify the ARM and two for the ARM to notify the DSP with an additional signal just for the DSP non-maskable interrupt (NMI) event. Note that two bits are common to both the DSP and ARM so that they can be both interrupted at the same time.
This is a useful future for debugging purposes. Writing a “one” to the bit will generate the signal or event. These events are fed to the respective interrupt controller (INTC) to get mapped to the core interrupt inputs.
To pass data between the processors, any of the internal or external memory areas can be used as shared memory area(s). Mutually exclusivity can be controlled using the mutex or semaphore mechanisms provided with the operating system.
The SoC provides a system-level memory protection unit (MPU) that can protect a memory region from being overwritten by internal bus masters like the ARM or DSP cores or the DMAs. This feature can be useful during development to debug the IPC software mechanism or detect ill-behaved programs or memory leaks.
For Linux and DSP/BIOS, the IPC is abstracted by a software component called DSPLink. It consists of several modules to provide DSP control and code loading, buffer passing and control, and message passing and control. On the Linux side, it is a kernel mode driver. On the DSP/ BIOS side, it is a regular driver that can be called at the task level.
For applications that use the DSP as an accelerator, DSPLink PROC functions can be used to shutdown the DSP if the application no longer needs its service.
This is adequate for most embedded systems where the ARM is running Linux as the master processor and it’s time to disable I/Os and accelerators to go standby. Disabling the DSP will enable more than 91 percent power savings just on the processor alone, as shown in Figure 5 below.
For other states of DSP idle, power savings can be realized by expanding the Linux kernel space platform drivers or creating a user space proxy that uses DSPLink as the message transport to communicate with the DSP/ BIOS side application to send request to PWRM.
Figure 5: Up to 90 percent power reduction can be achieved with existing PM services
For example, to get to an OPP that only idles the DSP (90 percent power savings), add a service to the Linux suspend framework driver that will send a message to the DSP to initiate a PWRM_sleepDSP operation.
Conclusion
With power management services becoming available in mainline Linux kernel, it is possible to achieve substantial power savings by just “turning them on.” I hope that this article will encourage you to look into the hardware features of your platform to see if it is capable of reducing operating power to embrace the “go green” initiative.
Loc Truong is a technologist and a senior member of Texas Instruments’ C6000 digital signal processor (DSP) technical staff. He is currently leading an effort to identify system solutions for TI’s portfolio of high-performance single-core, multicore and open source C6000 DSPs.
Truong has long been involved with embedded microprocessor- and DSP-based system design. His past experience includes sales and marketing applications management as well as development engineering management. Truong has also authored and presented many papers related to embedded systems design, signal processing as well as embedded Linux and multicore programming. He is the holder of several US patents.