Synthesis-aware clock analysis and constraints generation
Physical implementation challenges of clock nets
The intentions of a clock tree synthesis (CTS) tool are to create a balanced clock network with short insertion delay, smaller skews, and as few buffers as possible. Long clock insertion delays will create large on-chip variation (OCV) on clock network, which makes timing closure harder to accomplish. Large clock skews will add to the timing closure problem.
Clocks are fast-switching signals by design. In today’s SOC designs, the number and complexity of clock networks require a large number of buffers to sufficiently drive the clock signals around the chip, thereby increasing power consumption. This increase in power consumption is a major problem in wireless and handheld device markets, which are primarily driving today’s semiconductor market.
Modern SOC designs use complex clock structures, and the number of clock trees is growing from a handful to a few hundred. Prior to the availability of CTS, physical designers did not have adequate tools to analyze clock structures. Since CTS tools are not logically aware, clock constraints are used to optimize the clock graph that CTS will work on.
However, the number of clocks in today’s designs has exploded the complexity of clock constraint generation, which for the most part is a manual task. This often results in the generation of improper or non-optimal clock constraints, leading to post-CTS clock structures with long clock latency, large clock skew, and high buffer count.
In addition, modern SOC designs use advanced techniques such as multi-voltage domains to reduce chip power, and such structures make the balancing of clock networks very difficult.
Today’s complex clock structures
The clock network used to be a simple structure, where one clock root drove a list of flip-flops; hence, it led to the term clock tree. However, in today’s complex SOC designs, the clock network is often made up of hundreds of primary clocks and several times more generated clocks. It is no longer a clock tree but rather a clock graph. These designs are not only populated with generated clocks using clock dividers, but they often overlap with each other too. Also, clock gating cells are used to reduce dynamic clock power. All these clock components make the implementation of clock graph a very complex task.
Modern designs use several design modes, each of which may include several functional and test modes. Different modes typically have different clock definitions and thus different clock networks. Balancing a clock network in one mode will not necessarily balance the clock in the other mode, which is a challenge.
Further, today’s SOC designs include some design IPs where clock networks have been predesigned and are typically un-touchable. Those fixed clock structures additionally complicate the balancing of clock networks.
Gap between frond-end and back-end teams
A clock network is always designed by a front-end design team and implemented by a back-end implementation team, and implementation is done using CTS tools. In reality, the front-end team cannot anticipate the physical constraints imposed by implementation tools and hence are unable to define constraints that are implementation aware. With timing and power being orthogonal to each other, the back-end team is unable to fully replicate what the front-end team wanted with the fewest number of buffers. This is a major gap in the clock design-to-implementation process. Both teams end up seeing two different views of the clock network. In theory, the clock network seen by a back-end team should be identical, or at least a superset of, the clock network seen by the corresponding front-end team. When it is a superset, the clock network should be optimized to look as close as possible to the clock network as seen by the front-end team.
Figure 1: Communication gap of front-end and back-end teams
In the design of clock networks, both front-end and back-end teams use SDC files to define clocks and to set up design constraints. Today, with multiple clocks and several more generated clocks in every design, several SDC files are created to accommodate all clock circuits for a given design. With the ability to handle only a limited number of modes, CTS tools operate using a clock specification file and one SDC file only. This SDC file is typically a merged version of many SDC files from the front-end team. The SDC file, however, still does not communicate the logical awareness or intentions of the front-end team in its definition of clocks and generated clocks. With the SDC file that drives the CTS tool being different from the SDC files generated by the front-end team, there is a high risk that the final implementation could be different from what was originally intended and designed. This is typically seen at the ECO stage of chip design, delaying and/or forcing the production of a lower-performance chip.
Once the clocks and generated clocks are defined in a SDC file, a clock network will be formed, which becomes a sub-graph of the design netlist. This graph will be trimmed further as per the definitions in the clock specification file, which includes ignore pins, pass-through pins, stop pins, etc. During the CTS process, since the consolidated file is devoid of logical awareness, CTS tools use a tracing function to get an expanded view of a clock network. This becomes the superset and the initial clock graph for balancing. This superset is obviously a view of the clock network and is different from that created by the front-end designers. The expanded network may have non-clock portions in it. Balancing of the non-clock portion of a network is not necessary, and if balanced, it will result in a non-optimal clock tree with unnecessary buffers and hence increased power consumption.
Figure 2: Design netlist, SDC files, and clock specifications determine clock networks
Complex physical design constraints
Due to the endless push for high performance and low power, the physical design of modern SOC chips become more and more complex. Building clock trees on top of such complex physical designs is a challenge.
Figure 3: When an always-on clock net traverses across on/off power domain, an always-on buffer is needed for buffer insertion
First, as the size of SOC chips become bigger and bigger, it is not unusual to see 100 million gates in designs. Such designs are usually implemented hierarchically. This requires the clock trees to be built hierarchically as well. Balancing hierarchically built clock trees is difficult.
Second, when process technology scales to 28nm, 20nm, and down to 14nm and 10nm, signal integrity (SI) becomes a difficult issue for physical design and timing closure tools to handle. Since clock signals switch at high frequencies, they are very sensitive to SI issues. Such clock signals are usually shielded or routed using non-default spacing rules to avoid SI issues.
Third, use of multi-voltage domains is a popular implementation choice for lower-power SOC designs. In such designs, chip area is partitioned into several voltage domains, such that individual domains can be turned on and off as needed. Typically, each of these domains uses standard cells from different libraries for implementation. Clock trees are the global circuitry in the design. To build a clock tree over voltage domains that are turned on or off as needed is a challenge.
These bring up the fact that there is more interdependency between different clock networks on chip. Since the sequence in which constraints are fed into a CTS tool determines the structure and balance of the clock network implementation, it is possible for the back-end SDC file to force a clock implementation that is different from what was intended by the front-end team. The physical considerations that one needs to take into account add another dimension to the implementation of clocks in today’s SOC designs.
Addressing clock optimization issues
The issues discussed above are addressed by the ICScape ClockExplorer™ tool, which complements a CTS tool and speeds up the physical design closure of clock networks. Built-in analysis and optimization capabilities of ClockExplorer are:
- A schematic-based clock analysis platform with built-in SDC checking validates the correctness of clock definitions.
- The optimization of SDC constraints for better CTS includes:
- Invalid clock path checking
- Defining the appropriate CTS sequence through timing dependency analysis
- Detecting conflicts in clock pins, and synthesizing with the right sequence and clock specification
Comprehensive clock structure analysis by schematics
ClockExplorer performs clock structure analysis by tracing complex clock nets and visually presenting them to designers. The visual outcome of this analysis, which designers can easily relate to, is a schematic of the clock network. Schematic snapshots of multi-mode clock structures help users to identify clock overlaps and re-convergence. The tracing also helps users to identify the longest and shortest clock paths.
After CTS, clock-timing information is back-annotated on schematics for both front-end and back-end teams to view and analyze. The teams may use cross probing between schematics and design layout to find out a variety of things, including long clock paths, bad placement problems, or need for maximum load fix.
SDC constraint checking
SDC files define clock and generated clocks. Inadequately defined clocks may result in incomplete or incorrect clock designs. Improperly defined SDC may also result in unbalanced clock trees that will make timing closure difficult.
The following is a partial list of items that ClockExplorer checks to make sure clocks are defined properly.
- A clock or a generated clock is defined only on pins in the netlist.
- Clock root pin is not defined on a chip boundary or a PLL.
In such cases, CTS tools will not check or fix violations on path segments back traced from the clock root pin to the chip boundary or PLL. Timing checks will not find this issue since the inspections are started from the clock root. This may result in an incorrect clock signal at a clock root pin and hence chip failure.
Figure 4: Clock path from PLL to clock root will not be checked or fixed by CTS tools
3. Generated clock root pins are correctly defined; otherwise, they cannot be traced back to its clock root.
4. Un-clocked flip-flops are identified. This is usually the result of a missing clock definition or broken clock paths due to SDC constraints such as case analysis.
Reduce CTS clock graph by removing invalid clock paths
An invalid clock path is a clock path where the signal is not a clock signal but is a data signal used for clock gating. Since traced clock signals by CTS tools often include invalid clock paths, ClockExplorer identifies such invalid clock paths and makes sure they are not considered in balancing and hence buffer insertion.
Figure 5: Invalid clock path
Figure 5 shows an invalid clock path, ABQCG. When a generated clock is defined at pin G, CTS tools will back trace to the clock root A. Since design information is not available, CTS tools will include path ABQCG as part of the clock graph. Net on QC is not a clock signal but is a type of data and hence is an invalid clock path. The true clock path is AEG. When an invalid clock path exists but is not recognized, CTS tools insert a series of buffers on AEG to match the delay of the flop. This is unnecessary and any buffer inserted is a waste.
The correct approach is to set pin C to be ignored so the invalid path is broken. The resulting clock graph looks like Figure 6.
Figure 6: Clock graph after setting ignore pin at C
Resolving clock specification conflicts between modes
For a given design, typically there are at least two modes—one functional mode and one test mode. Between those modes, conflicting clock pin definitions may exist, as illustrated in Figure 7.
Figure 7: Conflict clock pin between modes
In the functional mode, a generated clock is defined at pin Q of the clock divider, Div. This will result in a non-stop pin, Div/CLK, for the functional mode. However, in the test mode, a generated clock is not defined at Div/Q. CTS tools will treat Div/CLK as a leaf pin or stop pin and try to balance the path through Div with all of the other flops FFs1 and FFs2. This is neither possible nor something that designers want. Div is a clock divider and should not be tested by scan, and hence, Div/CLK should be ignored.
The right sequence of steps for a CTS tool is to first synthesize FCLK1 and FCLK2 in the functional mode, and then in the test mode, set Div/CLK to be ignored and synthesize SCLK.
Timing dependency
In a given design, not all flops are connected to each other through combinational logic. Flops so-connected are called timing dependent, and non-connected flops are considered to be timing independent.
Figure 8: Timing-dependent flop groups
During CTS, only dependent flops should be balanced. Balancing independent flops is not necessary, and in addition, it consumes unnecessary power. Therefore, the right CTS flow is to first run timing dependency analysis and then balance the dependent groups separately (refer to Figure 8) as follows:
- Set Y1 as the ignore pin.
- Synthesize the clock tree root at Y2.
- Set the clock tree at Y2 as do not touch.
- Synthesize the clock tree root at CLK (root).
Bridging the Gap Between Front-end and Back-end Teams
ClockExplorer is architected to bridge the knowledge gap between front-end design and back-end implementation teams. The front-end team uses this platform to perform clock analysis, SDC, and clock constraints sign-off. The back-end team uses ClockExplorer to complete the sign-in verification and checking.
Figure 9: ClockExplorer bridges communication gap
The front-end clock sign-off includes:
- SDC checking
- Clock specification checking
- Mode conflict checking
The back-end clock sign-in includes:
- SDC checking
- Clock specification checking
- Invalid clock paths
- Timing-dependent analysis
- Mode conflict checking
Conclusion
For the best clock tree synthesis with short clock insertion delay and low clock power, a platform for clock analysis as well as constraints verification and generation is crucial. ClockExplorer offers a platform that analyzes and generates a visual clock structure and allows a design team to generate constraints for a CTS tool. These constraints from ClockExplorer result in an implementation that delivers short insertion delays and low clock power. It enables a front-end team to analyze clock structures and generate meaningful constraints for a CTS tool. The platform additionally allows a back-end team to do clock sign-in and optimization of clock constraints for effective clock tree synthesis by a CTS tool.