Aurix – An enormous increase in performance

Aurix – An enormous increase in performance

Technology News |
By eeNews Europe

TriCore? Some of you are now thinking: Wait a minute, I know that. This is not surprising. After all, TriCore was introduced for the first time more than ten years ago. Since then, the TriCore microcontroller architecture has been constantly developed further and has already been used in millions of electronic control units (ECUs) for a wide range of automobile applications such as the control of combustion engines, transmission control units, driver assistance systems, braking systems, airbags and chassis electronics. Furthermore, TriCore devices are used for the control of electric motors, inverters and for battery management in electrical vehicles.

What is so revolutionary about the new TriCore-based 32-bit multicore architecture Aurix? This is where it helps to take a look at the existing devices in the AUDO (AUtomotive unifieD processOr) families (see Table).

Table: Devices in the AUDO families contain a 32-bit super-scalar TriCore CPU with 4-stage pipeline in different versions

They contain a 32-bit super-scalar TriCore CPU with 4-stage pipeline in different versions as well as a 32-bit Peripheral Control Processor (PCP2) with its own instruction set. This can autonomously and bi-directionally transport data between peripherals and memory, pre-process and thus relieve the load of the main CPU. Quite often, the PCP2 is used together with the General Purpose Timer Array (GPTA), with which various solutions for the areas time measurement, Capture/Compare of digital input signals up to complex algorithms such as pulse width modulation (PWM) can be flexibly realized. This means that typical requirements for the areas of application mentioned above can generally be efficiently implemented.


The first Aurix architecture based MCU, part number TC275T, contains three TriCore processor cores (version 1.6). Two of these are optimized for maximum performance (high-performance TriCore CPU 1.6P) and can execute up to three instructions in one cycle at a maximum clock frequency of 200 MHz. With the third core, a high-efficiency TriCore CPU 1.6E, lowest possible power consumption and an efficient data exchange with the peripherals are the most important factors. It can execute a maximum of one instruction per cycle and is currently clocked at a maximum of 200 MHz. The three TriCore processor cores are connected over a crossbar running at the full CPU speed and avoiding hardware contentions (Figure 1).

Figure 1: The three TriCore processor cores of the TC275T are connected over a crossbar running at the full CPU speed and avoiding hardware contentions

The total of 4 MByte program flash memory comprise two 2 MByte banks with independent read interface, which allow a simultaneous flash access from two CPUs without speed limitations. Boot ROM, data flash banks with EEPROM emulation and SRAM as core-local memory are implemented as additional memories. All memories are equipped with both error detection code (EDC) and error correction code (ECC) and thus well suited for safety applications in the automotive sector. A Memory Protection Unit (MPU) that stretches over the whole address space (also that of the peripheral register) enables a simple separation of software. Therefore, it is possible to easily integrate several applications or tasks of an operating system and protect against mutual interference. Furthermore, the 65 nm flash technology, which is used for the first time, allows programming speeds up to 20 times faster than with devices in the AUDO family. This is especially important due to the increased amount of memory required by applications.

The tasks of the General Purpose Timer Array (GPTA) and Peripheral Control Processors from the AUDO types are adopted by a new, high performance Generic Timer Module (GTM). Likewise with its own optimized instruction set flexibly programmable, the GTM enables implementation of all typical task classes for the target fields of application. Apart from the long list of, and mostly further developed, peripherals, the new devices also offer completely new A/D converters, which function according to the Delta-Sigma conversion method. Therefore, for signals in the lower KHz range, measurement results with very high signal to noise ratio and thus very high accuracy are achievable.

For safety applications, one of the performance cores and the efficiency core of the TC275T can be operated in lockstep mode. Checker and shadow cores, which are equipped with the same inputs and perform the same operations as the master core, exist for this mode. These are equipped with the same inputs and carry out the same operations as the master core. The cores are maximally separated on the chip. The checker core operates with two clock cycles delay in order to achieve a certain times difference in the execution. The results are constantly compared by an integrated logic in order to ensure a uniform behavior of the two cores. In the case that there is a deviation, it is possible to flexibly react to this event. The possibilities for this range from triggering an interrupt for any one of the three cores up to a device reset. A microcontroller-based hardware security module (HSM) enables a configurable protection of applications against unwanted external attacks.

In practice, the above mentioned characteristics bring an enormous increase in performance, but also require a new approach to the software development. While with the use of multicore architectures in other areas of application, an operating system often encapsulates the concrete hardware for the applications; in the automotive sector, experience must be gained with the implementation of software for multiple cores. The dynamic distribution of the software to the CPUs cannot be left to an operating system. In view of the usual rigorous real-time demands in the automotive sector, an appropriate partitioning is already needed when generating the binary image. This then in turn means that new challenges arise with software test and troubleshooting in such a system.

The existing TriCore AUDO architecture is well-known for a sophisticated On-Chip Debug System (OCDS). This was optimized further for the Aurix family and adapted to the requirements of multicore debugging. With the new Aurix devices, the following interfaces are provided for debug, test and calibration tools: JTAG with up to 40 MHz serial clock, 2-pin and 3-pin Device Access Ports (DAP) as well as a 3-pin DAP2 with up to 160 MHz serial clock. The block transfer rate of the DAP2 could be increased almost three-fold to 30 MByte/s by means of an optimized protocol. The development with associated tool manufacturers such as PLS was supported by suitable hardware tools such as for example the Universal Access Devices (UAD2/UAD3+) family.

The creation of software for an automotive multicore system also requires new compiler approaches. The established quasi-standard ELF/DWARF format for the output files, which contain both the binary patterns for the on-chip flash memory and symbolic information for the debugger, is not familiar with any multicore aspects. There are currently two general approaches by compiler manufacturers to solve this new requirement.

The distribution of program and data objects to different cores or the declaration of their common use takes place either at C-level by proprietary language extensions or at linker/locator-level in a corresponding description file. It has also not yet been decided whether one ELF file for the entire multicore system is generated or one per core. In practice, it is also possible that both approaches are applied. In order to be ready for this situation, the Universal Debug Engine (UDE) from PLS contains a multicore loader, which can be used in all of the case referred to above (Figure 2). The binary and the part with the debug information of each ELF file can be flexibly assigned to each core.


Figure 2: The multicore loader of the Universal Debug Engine (UDE) supports various compiler approaches. For full resolution click here.


Eight hardware-based breakpoints for program addresses or data accesses are possible for each CPU. Each core can be separately stopped and started. However, the control of multiple cores or the entire Aurix device is also possible via a trigger switch. A particular part of the device can be very accurately selected as debug target in the Universal Multicore Workbench from PLS (Figure 3). The concept of run control groups was developed and implemented in the tool for fine-grained control of the individual cores. Their definition allows cores for particular aspects of the run control (start/stop/step) to be quasi-interconnected in order to achieve an almost synchronous execution with help of the trigger switch integrated on the device.

Figure 3: A particular part of the device can be very accurately selected as debug target in the Universal Multicore Workbench from PLS

For program trace, data trace and even bus trace, Infineon again relies on the proven concept of the Emulation Devices (ED) with integrated Multi-Core Debug Solution (MCDS). The Emulation Devices are pin-compatible with the production chip, but contain a sophisticated observation and trigger logic as well as currently up to 2 MByte of emulation memory. A whole series of enhancements were also made here for the Aurix family. For example, two CPUs, two further units connected on the crossbar as well as the System Peripheral Bus (SPB) in the trace stream can be monitored in parallel. Programming of the Emulation Device logic can be comfortably carried out with the Universal Emulation Configurator (UEC) from PLS, because this offers a graphical configuration of measurement tasks based on the concept of signals/actions linked via a state machine.


For the first time, an Aurora GigaBit Trace (AGBT) interface was also implemented on the Emulation Device. This enables a significant enlargement of the trace memory by connecting external hardware and thus high-end trace tasks with large amounts of data, for example code coverage. However, a 2.5 GByte/s Aurora interface requires a correspondingly high-performance hardware for signal acquisition, signal conditioning and pre‑processing on the target. A corresponding Trace-Pod with AGBT interface is already available for the UAD3+ from PLS. The up to 4 GByte of external trace memory of the UAD3+ are sufficient for support of even the most demanding measurement tasks (Figure 4).

Figure 4: Aurora Trace-Pod for the Universal Access Device 3+ (UAD 3+)

The example of the new Aurix family is not only proof of the tremendous high performance of today’s SoCs. It also makes very clear how important a close cooperation between semiconductor manufacturers and tool manufacturers is during the development phase. It is not only the hardware that is a challenge during the development of increasingly complex microcontroller architectures but also the test and the debugging of such products. Experience has shown: In practice, sophisticated bus, memory and peripheral concepts are only of value when appropriate tools for optimal use of the implemented functions are also available at the same time.

About the author:

Heiko Riessland (Dipl.-Ing.) studied information technology at the Dresden University of Technology (TU Dresden) in Germany. After the successful completion of his studies, he worked for 10 years in the design and sales of software development tools and emulators for 16-bit and 32-bit microcontrollers. For the past seven years, Heiko heads product marketing at PLS Programmierbare Logik & Systeme GmbH in Lauta, Germany.



If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles