MENU

Moving to SystemC TLM for design and verification of digital hardware

Moving to SystemC TLM for design and verification of digital hardware

Technology News |
By eeNews Europe



Introduction

Design and verification of new digital hardware blocks is becoming increasingly challenging. Today, designers are confronted with a host of issues, including growing design and verification complexity, time-to-market pressures, power goals, and evolving design specifications.

To tackle these challenges, customers are beginning to make a significant change in design methodology, by moving to SystemC transaction-level models (TLM) as the design entry point, and by leveraging high-level synthesis (HLS) in combination with IP reuse. This article presents our experience in working with Fujitsu Semiconductor Ltd. to adopt this new methodology using Cadence® C-to-Silicon Compiler on a data access controller design, and presents the very promising results they reported at a recent C-to-Silicon user group meeting in Japan. The selection of the design, modeling work, and results analysis described in this paper were performed by Fujitsu Semiconductor with some assistance from Cadence.

Motivation for Moving to a New Approach

While Fujitsu Semiconductor’s design groups are very experienced at design, verification, and synthesis at the register-transfer level (RTL), they have found that this approach has many limitations that have become an increasing problem as design and verification complexity increases.

Some of the key limitations with the RTL entry approach include:

  • Very limited ability to perform algorithmic and architecture exploration, and thus a very limited ability to optimize area, power, and performance.  This is because RTL designs are too detailed, and are not functional soon enough, to enable this type of exploration.
  • Limited ability to build configurable models that can be reused in different environments. This is because the detailed structure of the design at the RT level is largely fixed.
  • Inefficient verification. This is because the complexity of the RTL makes for increased chances for bugs, and it also makes debugging and isolating problems more difficult.
  • Long design and verification times, delaying time to market.

Fujitsu Semiconductor expected to address these challenges by moving hardware design and verification to the SystemC transaction level, along with adopting high-level synthesis.  TLM are more abstract than RTL models because they are focused on modeling the behavior of blocks, and they use function calls to model the transaction communication between blocks. HLS tools such as the Cadence C-to-Silicon Compiler use IEEE 1666-2011 SystemC TLM as their input, and automate many design tasks such as operation scheduling.

Fujitsu Semiconductor was looking to realize the following benefits by moving to TLM and HLS:

  • Greater ability to perform algorithmic and architecture exploration and optimization, since the design model is much easier to change and is functional much sooner.
  • Greater ability to build configurable models, by taking advantage of the powerful modeling capabilities in C++, and automatically generating the application-specific implementation with HLS.
  • Much more efficient verification, since there is less chance for the designer to make mistakes, and also because debugging at the TLM level is easier.
  • Shorter design and verification times, because of the above benefits.
  • Ability to reuse the model for the hardware block in SystemC TLM2 virtual platform models for embedded systems, thus enabling hardware models and software components to be designed and verified in a more unified process.

For the TLM and HLS approach to be successfully adopted in production, it was necessary that the new approach meet Fujitsu Semiconductor’s quality of results (or "QoR") targets. In particular, the area, power, and performance of the hardware created via the TLM approach had to be nearly equal to, or better than, what is achievable via the hand-written RTL approach. Fortunately, their experience with Cadence C-to-Silicon Compiler has shown that they can exceed the QoR of hand-written RTL designs.

Reusable HLS IP

SystemC TLM and HLS are new to many designers, and successful adoption requires learning new modeling techniques. Cadence supplies a wide range of HLS design examples with C-to-Silicon Compiler, which helps designers learn the new techniques more quickly.

Cadence has developed a library of synthesizable IP for use with C-to-Silicon Compiler, spanning common building blocks such as FIFOs, register files, bus interfaces such as AHB and AXI, and floating- and fixed-point math operations.  These models are designed to be highly configurable, and provide good QoR when synthesized to gates.

The data access controller design described in this paper used the AXI3 TLM IP and FIFO models provided by Cadence, greatly accelerating the design process and delivering good QoR. By starting with the examples and design IP, a designer can quickly get a basic design working, and then incrementally modify it to meet all design requirements.

The AXI3 TLM IP provided by Cadence enabled Fujitsu Semiconductor’s models to use a high-level API to access the AXI3 bus, while still providing access to all protocol features of AXI3. In addition, the AXI3 TLM IP can be configured to automatically provide a SystemC TLM2 interface to external models. This makes it possible to use a single model to drive the HLS flow, as well as for high-speed simulation in TLM2 virtual platforms, which is something Fujitsu Semiconductor will explore in the future.

Case Study: A Data Access Controller Design

While earlier HLS tools focused on datapath-oriented designs, Fujitsu Semiconductor applied C-to-Silicon Compiler and the AXI3 TLM models to a control-centric data access controller design. The design is described in simplified form here, since some aspects are proprietary. It contains 64-bit AXI3 target and initiator interfaces and one to eight logical channels for data transfer. Figure 1 shows the block diagram of the design. It uses an AXI3 target interface to configure the registers for all logical channels for data transfer and an internal FIFO to store the data from the source address to destination address via the AXI3 initiator interface. The “ChReg” is a set of registers to store the configuration parameters for each logical channel, and the “CommonRegs” is a set of registers to store the configuration parameters for all logical channels.

Figure 1: Data access controller block diagram

Table 1 shows the design parameters.

Table 1: Design Parameters

Fujitsu Semiconductor implemented the design in SystemC using the Cadence AXI3 TLM IP. To accurately compare with their existing hand-written RTL design, they implemented the SystemC model as follows:

  • Use the standard AXI3 protocols with the same parameters as the existing hand-written RTL design.
  • Implement the same behavior between SystemC and hand-written RTL.
  • Generate different micro-architectures by the SystemC flow from the hand-written RTL to improve QoR and performance.
  • Write the SystemC model so that its micro-architecture could be easily reconfigured using C++ parameters.

Figure 2 shows part of the SystemC source code for the design that configures the AXI3 TLM write address channel of the initiator socket. Line 1 declares the payload of the write address. Lines 2 to 5 set the attributes of the write address payload with address, transfer length, burst types, and data size for the AXI3 TLM initiator. Line 6 uses a while loop to put the write address payload on the write address channel of the initiator socket “initiator_if.waddr” using a non-blocking function “nb_put(waddr)”. Note that all of the details of the AXI3 signal-level protocol are hidden in the nb_put() function in the AXI3 TLM library, so that designers can simply use a put/get function call without having to worry about these details. This significantly improves the readability of the code and completely separates the behavior from the interface protocols

Figure 2: SystemC code to access AXI3 TLM initiator

Design Results

Fujitsu Semiconductor successfully implemented the design using SystemC and the AXI3 TLM IP models, and used Cadence C-to-Silicon Compiler HLS to generate RTL. The design passed simulation and functional verification, described in more detail below. They compared the implementation, QoR, and performance with the hand-written RTL design.

Line Count

The line count of the SystemC model is almost 1/3 the size of the hand-written RTL code for this design, which is significant because there were over 10,000 lines of RTL. Note that the line count for the SystemC model only represents customer-written code, since the AXI3 TLM model was provided within a SystemC library and is design-independent. For the hand-written RTL code, there was no reusable AXI3 code available. The large line count reduction with the TLM-based approach significantly reduced the coding effort and enabled designers to concentrate on exploring and optimizing core functionality.

Performance

To compare performance between the models, Fujitsu Semiconductor measured average throughput using six different types of data transfers that cover the various types of burst transfers the design needs to perform. In all cases, the performance of the HLS-generated RTL was better than that of the hand-written RTL and, on average, the HLS-generated model had 35% better performance than the hand-written RTL.

The reason for this was that Fujitsu Semiconductor was able to take advantage of the higher abstraction level of the SystemC model and explore a range of micro-architecture implementations in C-to-Silicon Compiler, ultimately finding a more efficient micro-architecture than what had been implemented in RTL. With traditional RTL-based design entry, this type of exploration is almost impossible.

Area

Fujitsu Semiconductor used Cadence C-to-Silicon Compiler to generate RTL from the SystemC model and Cadence RTL Compiler to generate the gate-level netlist using their own production technology library. Table 2 shows the area comparison between the HLS-generated RTL and the hand-written RTL using the implementation with eight logical channels, across different clock frequencies.

 

Table 2: Area Comparison

Power Consumption

Fujitsu Semiconductor utilized clock gating optimization in both C-to-Silicon Compiler and RTL Compiler to reduce dynamic power, then compared the dynamic power consumption results of each flow by simulating at the gate level. Table 3 shows the dynamic power reduction from the SystemC flow versus the hand-written RTL flow.

 

Table 3: Power Comparison at 400MHz

Design Summary

Fujitsu Semiconductor was able to realize the following benefits in using SystemC TLM and C-to-Silicon Compiler HLS for this design:

  • Designers did not need to explicitly describe the state machine as they did when hand-writing RTL, enabling a much more efficient description of the design.
  • Using the AXI3 TLM IP library enabled a huge time savings in implementing the complicated AXI3 protocol, so designers could concentrate on realizing better algorithms and micro-architectures.
  • C-to-Silicon Compiler automatically generated RTL based on different technology libraries. Designers did not need to fine-tune the timing for the chosen target technology library.
  • Using SystemC TLM and C-to-Silicon Compiler HLS made it much easier to explore a couple of different design micro-architectures and measure the resulting QoR. This actually was the key factor in reaching a better implementation than hand-written RTL design.

In summary, Fujitsu Semiconductor achieved 35% less area, 51% less power, and 35% better performance with 3x fewer lines of code than the hand-written RTL design, following the same design specification. This convinced them that HLS technology is ready not only for data-path designs but also for control-centric designs, which can include complicated bus interfaces like this data access controller. Raising the abstraction level allows designers to focus more on design exploration than RTL implementation, which can significantly improve both design productivity as well as QoR.    

Verification Methodology

To verify the design and analyze its data throughput and dynamic power consumption, Fujitsu Semiconductor created a comprehensive verification environment using Cadence Verification IP (VIP) for AMBA Protocols, including AXI3. They used this environment to debug and analyze both the SystemC TLM model and the HLS-generated Verilog RTL as shown in Figures 3 and 4.

Figure 3: SystemC Verification Environment

 

Figure 4: HLS-generated RTL Verification Environment

For the verification effort, Fujitsu Semiconductor did not have to write any code at the signal level. All new code was written at the transaction level because the AXI3 VIP provides an abstract API for creating test scenarios, and provides checkers and coverage collectors for all AXI3 signal-level bus traffic.

The SystemC TLM was created without timing details for any AXI3 bus signals. They also created various test scenarios using the AXI3 VIP with some specific constraints and randomized constraints without any AXI3 bus signal timing details. The AXI3 TLM IP and AXI3 VIP helped them focus on functional design, verification, and debug, greatly simplifying the design and verification effort.

After C-to-Silicon Compiler generated the Verilog RTL, the team used the same verification environment to re-run the tests on the RTL and to analyze dynamic power. The verification environment with AXI3 VIP mimics an interconnect, helping to realistically analyze the data throughput and dynamic power consumption of the design.

Using the same test scenarios within the verification environment, the team considered different coverage items for the respective models. For the SystemC model, the team considered functional coverage (design features) and code coverage (line coverage). For the HLS-generated RTL, they considered functional (AXI3 protocols and design features), code (line and expression), and FSM coverage (state and arc).

For future design projects, Fujitsu Semiconductor expects to see benefits in shifting more verification effort to higher levels of abstraction. In particular, they believe that much of the verification can be completed at the SystemC TLM level without including the signal-level transactors, and that additional coverage metrics can be used at the SystemC TLM level.

Next Steps for TLM and HLS at Fujitsu Semiconductor

In the near-term, Fujitsu Semiconductor is proceeding with final verification of the new SystemC data access controller design so they can replace the hand-written design in their systems, to take advantage of its significantly improved area, power, and performance. They also expect to apply SystemC TLM and C-to-Silicon Compiler HLS for additional design IP projects in their company because they believe this will help them become more productive and achieve better QoR.

In the medium-term, they are interested in exploring how to use the SystemC TLM model to unify the design and verification flow between hardware and embedded software.   The AXI3 TLM IP from Cadence can be configured to automatically provide a SystemC TLM2 interface to external components. Using this feature, they expect to be able to use the synthesizable SystemC TLM models that they develop directly in SystemC TLM2 virtual platforms. 

In conclusion, Fujitsu Semiconductor’s experience showed that a design team can successfully use SystemC TLM in combination with C-to-Silicon Compiler HLS to significantly improve design and verification productivity, and to exceed the QoR that are achievable using traditional RTL entry.

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s