Early and accurate power analysis: myth or reality?
Power is receiving a mounting share of attention. Innovation, fueled by the information and internet age, poses new challenges for electronic systems across a spectrum of applications. Mobile devices continue to break new frontiers of functional integration. Phones are now your email, social networking interface, video and music player, gaming device, camera, GPS, and more – all rolled into one. Yet the smart phone must survive through the day, and hopefully longer, without having to recharge the battery. Data centers and cloud computing grapple with power and carbon footprints as they move and process incredible amounts of data back and forth, consuming electricity to the order of 1-2% of the total that the entire world consumes. Advances in fabrication technology have made it possible for processors and system-on-chips (SoCs) to boast of over three-billion transistors, also pushing the limits of power density, integrity and reliability.
To address these challenges, power related design decisions are now being made throughout the product development cycle. Yet it is the early decisions that primarily govern the power and energy profile of a product. It is not surprising that key choices include how the design is partitioned into hardware versus software, the design architecture, and even to determine how the software controls the hardware. Once the design architecture is locked in, the most power-efficient implementation may lose out to an alternative more power-efficient architecture, even if the alternative is implemented half as well.
If the impact on power reduces when you go down levels of design abstraction, then predicting power with accuracy is the challenge at the higher levels of abstraction. It is unreasonable to expect the power numbers from a mostly untimed transaction-level design model to closely match numbers from a post-implementation design representation. At the same time, early power numbers must offer sufficient accuracy to evaluate design trade-offs relevant to the design abstraction level.
Models to the rescue
The semiconductor industry has long benefited from models that enable reasonable design cycles for the largest and most complex of designs. Functional, electrical, physical, and additional views are created for the building blocks relevant to different levels of design description. The models include enough information enabling decisions at each level while removing details that don’t apply, providing capacity while maintaining the required accuracy. Models have continued to evolve; keeping up with the demands from technology advancements such as shrinking geometries and breakthrough techniques including three-dimensional integrated circuits (3D-ICs).
Power is a significant design consideration with multiple facets encompassing power budgets, power distribution network and integrity, thermal, and reliability. Recently, there have been well publicized standardization efforts involving the specification and verification of a design’s low-power intent through the Common Power Format (CPF) and Unified Power Format (UPF). However, standard models for analyzing power consumption are targeted at when post-synthesis gate-level netlists are available. Typical models available today are not nearly sufficient enough to offer reliable accuracy for early analysis. In order to be sufficiently accurate, early analyses must account and model the power-significant aspects of downstream design transformations and details. Let’s examine the challenges for digital hardware power analysis at the Register Transfer Language (RTL) design stage. But first, why RTL?
Early RTL power analysis
Traditionally, power is analyzed when a design netlist mapped to a particular technology is available. On one hand, modern flows no longer “design” by manipulating gates. On the other hand, early transaction-level power analysis can enable significant architectural power exploration benefits, but that methodology is still evolving for most mainstream users. RTL delivers the best trade-offs between accuracy and the ability to design for low-power. Here are some of the key advantages.
Performance and capacity: RTL power analysis can offer 10X or more productivity as compared to gate-level power analysis; power numbers are available within hours versus days or weeks, even for multi-million instance designs. Some of the performance benefit is a reflection of the fewer design elements that are required to be processed at RTL versus gates. More significantly though, it is from eliminating the overhead to generate the data necessary for gate-level power analysis. It can take over 10 days to go through the implementation flow for a typical microprocessor, versus measuring power at RTL within a couple of hours.
RTL analysis also enables exploration of power savings, achievable through low-power techniques such as power- gating within a few hours, without investing effort in actual implementation. Designers often have an idea of a set of modifications they expect will lower power. RTL analysis allows them to perform a what-if qualification of power savings per individual RTL change, which is unrealistic for a gate-level flow where the viable option is to combine multiple changes per run, masking ineffective ones.
Accuracy: Since RTL is a cycle-accurate hardware design representation language, it reassures the skeptics who expect reasonable accuracy at this level. Employing the right tools and methodology, RTL as the highest level of hardware abstraction can provide power accuracy within 15% error-bound of post-layout gate-level power analysis.
Stimulus availability: Activity has a first order impact on power and it is important to get the right switching scenarios. A design that consumes 1mW in an idle mode of operation can easily consume 100mW in the active mode. A key advantage at RTL is the practicality of extending existing functional test harnesses to generate power-related switching scenarios. Gate-level simulations are increasingly hard to bring up, sometimes possible only after tape-out.
Debug ability: Most importantly, RTL enhances the visibility to identify and fix power hotspots and “power bugs” at the micro-architectural level. RTL power analysis allows designers to view power as a function of their native design description language. Consider a block that does not register incoming data but continues to toggle adders and multipliers when the clock is turned off. This phenomenon is straightforward to spot in an RTL analysis tool, which retains the adders and multipliers as a function associated with the corresponding lines of RTL code, in addition to providing the visibility to trace upstream activity. In contrast, a gate-level netlist transforms a multiplier into hundreds of gates, thereby losing the high-level design view.
Factors affecting RTL power accuracy
Achieving reasonable power accuracy with data available during the RTL design stage has multiple challenges. Beyond the availability of well-characterized power models in the technology library, RTL power analysis must accurately model:
- the mapping of the RTL behavioral description to the target technology, frequency and design application, without losing capacity and visibility;
- low-power techniques such as clock-gating, power-gating and multiple supplies, dynamic voltage and frequency scaling, and multiple threshold voltage cells;
- power aspects of physical implementation tools and methodologies such as parasitics and clock distribution tree;
- all components of SoCs including clocks, high fan-out signal nets, datapath, and control logic – in addition to macros such as I/Os and memories, and hard IP.
Without modeling these effects accurately, the power numbers can potentially be unreliable versus the post-layout representation. Shrinking geometries and gigahertz speeds will make some of the above considerations even more critical to model, with parasitic capacitance, clock tree, and cell mapping rising to the top of the list.
Switched capacitance has a first order impact on power. However, statistical wire load models that provide a coarse view of capacitance as a function of the logical fan-out of a net, have outlived their utility. Custom wire load models provide an additional level of detail, but are not available early and are still dependent on the simplistic capacitance per fan-out model. This graph represents the parasitic capacitance profile for nets grouped by the driven fan-out loads for a video encoder implemented in 28-nanometer (nm) technology. Notice the up to 3X variation between the minimum and maximum capacitance for nets with the same fan-out and the lack of a consistent increase in capacitance with an increased fan-out load, underscoring the need for a better model.
Clock trees do not exist at RTL, but without modeling these critical, and often highest frequency signals, the power accuracy will be highly suspect. Yet, a standardized model for clock trees at RTL or above is lacking today. A large percentage in the deviation of RTL power from sign-off is due to approximating the clock distribution network topology chosen during clock tree synthesis and optimization. This imposes a requirement on RTL power analysis to model the clock buffers, clock tree topology, and clock-gating methodology.
For smaller geometries, leakage power is a large portion of total power and requires reasonable RTL behavioral-to-structural mapping in order to track synthesis. The RTL mapping must be sufficiently accurate to target a particular design intent (high-performance, low-power), timing and power constraints, project-specific tools, methodology, and low-power techniques. At the same time, it must retain the high level function view; adders must be retained as adders and not as hundreds of standard cell gates.
A potential approach that can address some of the previously mentioned challenges is to abstract models for RTL power analysis by characterizing physical layout data. Augmenting standard models by calibrating design transformations through the implementation flow can deliver consistent power analysis accuracy, from RTL all the way through pre route gate-level design abstraction. A model representing a block of a particular application and technology can be generated once and applied to other design blocks. The RTL power models can also enhance the power integrity flow by enabling early chip-package co-design decisions; leveraging RTL vector availability to focus on power-critical switching scenarios versus estimates and spreadsheets.
Early accurate power analysis is a reality
In designing a product where power is important, the benefits of early analysis are obvious as the architectural decisions will primarily determine the energy and power profile of a design. Such power-related design decisions require a reliable analysis early in the design flow, even with limited data. RTL power analysis, as the highest level of cycle accurate hardware design description, offers the right trade-offs between accuracy and the ability to design for lower power. Using new robust models, RTL power analysis can deliver the necessary predictability and consistency with final implementation, enabling power-efficient electronic products.
Preeti Gupta, Director RTL Product Management, Apache Design