Efficient physical-aware timing ECO solution
Today’s challenges
Contemporary IC designs have advanced quickly from 65 and 45 nanometers, down to 28, 20, and below. This progression to ever smaller geometries has brought significant challenges in achieving timing closure to meet production deadlines and market windows. Engineering teams often struggle to efficiently perform late-stage ECO’s to satisfy their design objectives. There are many problems that combine to make this a particularly difficult task:
Poor convergence
A big obstacle to closing timing is due to the fact the signoff timing analyzer and the ECO tool employ different engines. This practically guarantees there will be timing differences between engines, making convergence difficult. To compound the problem, commercially available ECO tools do not consider physical information when determining buffer insertions and routing changes. Submitting a modified circuit to signoff STA may solve a problem in one area, but introduce a problem somewhere else.
Too many iterations
Poor convergence forces engineers to perform many iterative loops though the ECO process to achieve timing closure. These iterations are not only time-consuming, but may not linearly progress to a final solution.
Difficult physical issues
Complex chips are typically quite dense, making late-stage routing changes extremely difficult. Moreover, many chips contain several power domains. Specific buffers are needed for routes to traverse alternate power domains, further complicating the routing issue.
Excessive buffer insertion
Power is always a consideration in chip designs. Excessive buffer insertion can increase a chip’s power consumption, making it less-competitive, or missing its power goals completely.
Figure 1: Conventional timing ECO flow
As result, the only efficient way to achieve a solution to these problems is to consider these issues simultaneously.
Addressing the issues
ICScape has addressed these issues with their TimingExplorer tool in order to provide an efficient and practical solution for achieving timing closure.
Faster convergence
Converging on a final solution is problematic for many ECO tools. When the signoff STA reports timing violations, an ECO tool is used to create fixes and export them back to the place and route routines. However, this flow suffers from various correlation-related problems that require many iterations to arrive at a final solution – often ten or more passes through the flow. As such, ECO’s become the gating item in the design tapeout process.
Timing and layout correlation: There are two things that affect correlation: the timing engine, and layout considerations. When calculating a timing correction, an ECO tool typically works with partial information such as a violation report or a partial timing graph. Working from this kind of partial information often fixes one problem but introduces another. And the fact that the ECO tool has its own timing engine makes it almost impossible to match the results of the original STA tool.
Layout considerations are another aspect that must be accounted for. ECO changes based on the netlist alone will introduce subtle differences in the layout. These layout changes can have an unpredictable impact on timing, producing a correlation issue.
Types of ECO tools: There are three approaches to providing a timing ECO solution. One of the most common is a script applied by the user. This may use crude delay estimations based on a violation report or a partial STA graph, and look for easy ways to fix timing violations. But it’s not feasible to predict layout changes after place & route, so this approach suffers from both timing engine and layout correlation issues.
The second type of ECO solution is an optimizer on top of the STA tool. But the STA engine does not have physical information and suffers from lack of layout correlation.
The third type of tool is built upon place & route software. This approach accounts for the layout, but since it must perform its own timing calculation, it still has trouble matching the STA engine. Often large timing margins are used to compensate for this. But that can lead to too many buffers being inserted, which increases routing congestion and impacts power consumption.
Figure 2: Driving down the number of violations in a typical ECO flow
Figure 2 shows how it can be hard to drive down the number of violations in a traditional ECO flow. Different constraints, scenarios, and objectives are usually processed in separate steps. Combine this aspect with the issue of correlation and the result is that some ECO iterations may cause the design to get worse from a global perspective. As a result, progress towards zero violations may not proceed monotonically, and may require many iterations.
Improving the flow
The flow in Figure 3 illustrates an approach that overcomes the convergence problem. Improvement in the flow is achieved by using full STA information, as well as simultaneously considering the impact of layout changes. The complete STA timing graph can include all scenarios, supporting multi-corner multi-mode (MCMM) ECO’s. Reading the layout allows the tool to execute instance legalization, and both a physical ECO and changed netlist are fed back to place and route.
Figure 3: Physical-aware timing ECO flow
Achieving timing closure is a complex optimization process that needs to consider multiple constraints such as setup, hold, and max transition violation. The process must also consider designs in different corners and modes, plus optimization objectives such as power and chip area.
Figure 4: Improved flow converges faster
Figure 4 shows how an improved approach can affect ECO processing, both in terms of how many cycles are required, and how long each cycle must take. During optimization, the TimingExplorer program simultaneously considers all constraints, scenarios, and objectives. This allows the tool to achieve a high correlation to both the timing engine and the layout. As a result, timing closure proceeds as a monotonic process, requiring a greatly reduced number of iterations to arrive at a final solution.
Handling physical issues
Multiple power domains: Figure 5 illustrates a max-transition fix affecting two different power domains. In this example, a net has a driver and sinks in the Always-on domain, but crosses the On/Off domain. To fix violations at the sink pins, the tool must insert buffers on-route and be sure to place them in the proper logical hierarchy.
Figure 5: Max transition fix over power domains
Minimum buffer insertion: Figure 6 shows a typical register construct that needs a hold fix. However, the setup margin on the registers is already very small. Inserting delay buffers at the input pins of the registers would fix the hold violation, but introduce a setup problem. A solution that satisfies both of these constraints requires tracing back through the netlist to find a more strategic location for buffer insertion. This approach also avoids excessive buffer insertion, reducing power consumption.
Figure 6: Efficient hold-fix: preserve setup margin & save power
Timing fix applied to clock network: By the time a design reaches the ECO stage, all of its clock trees have been fully synthesized. But poorly synthesized clock trees can lead to significant timing violations such as setup and hold. For designs that use third party IP, this problem is further complicated by the fact that clock skews may be introduced but there is no ability to change the clock design.
Figure 7 shows a design with a large clock skew between a bank of flops to a memory IP. Typically, such a skew will accumulate in one direction. In this case, if there are setup timing violations, they can be fixed by inserting proper delay at the memory IP clock pin. Trying to fix these violations in the data path would require a greater number of buffers and design resources.
Figure 7: Clock skew with design IPs
Case studies
Reducing violation count
The following is an example of how TimingExplorer significantly reduced the number of ECO iterations in a design containing over 3 million instances:
Figure 8: TimingExplorer hold fix results
Fast run time
For another design containing 5 million instances, the table below shows that TimingExplorer was able to achieve a 10x improvement in processing time, and a much better rate of fixing violations compared to an STA-based ECO tool.
Figure 9: Hold fix of TimingExplorer vs. Signoff STA based ECO tool
Handling difficult fixes
The tables below shows TimingExplorer can clean up the max transition violations for designs using multiple power domains. Conventional place/route tools do a poor job of max transition fix on designs with power domains.
Figure 10: Max transition fix over power domain
Increased performance reduces turn-around time
The cumulative effect of these performance improvements allow an ECO cycle to be completed in one day, which is very important as it becomes closer to tape-out time.
Figure 11: Improved flow shortens turn-around time
Conclusions
With IC designs in the range of 20nm and below, ECO tools need to have a high timing correlation with the signoff STA engine, and be able to provide physical solutions which were previously performed only by the place & route tools. Also, the tool needs to be comprehensive enough to cover all design constraints such as setup, hold, and max transition; to cover all design corners and modes; and to optimize for design objectives such as power and chip area.
TimingExplorer is an ECO tool that provides these capabilities. It has been used successfully to tapeout 40nm and 28nm designs in MCMM (Multiple-Corner-Multiple-Mode). With an advanced multiple-thread physical-aware optimization algorithm, and highly correlated timing interface to sign-off STA tools, TimingExplorer can significantly reduce design ECO iterations.