MENU

Supporting Multicore SoCs in critical embedded systems

Supporting Multicore SoCs in critical embedded systems

Feature articles |
By eeNews Europe



If COTS multicore SoCs (System-on-Chip) stand as the common approach to meet these design goals today, their use in critical embedded systems introduces some new paradigms when considering key goals related to safety and certification.

This paper extracts and highlights those specific features of multicore processors like in the Freescale QorIQ 45nm P-series (P4080 and derivatives) and newly introduced 28nm T-series (T4240 and derivatives) of particular interest for safety applications in Avionics, Defense and Transport. Robust partitioning figures a set of mechanisms required to consolidate and segregate several independent applications on a single System-on-Chip.

Vector processing (as provided by Altivec SIMD Unit) can accelerate a wide range of scientific, imaging and signal processing required in the high-end embedded processing. Power management helps optimize the dissipation according to the actual usage of the SoC, considering what is used and what is not used in both static and dynamic behaviors.

Debug and monitoring infrastructure assists in all the HW/SW integration & testing phases and can ease the validation tasks required up to the certification. Security and trust features can enforce system integrity in the sense of protecting key internal IPs and protect against some malicious external tamper attempts.

Last but not least, MCFA (Multicore for Avionics) working Group is here to help equipment developers and final integrators in civil avionics take their hands on multicore technology and address safety and certification specific requirements.

Overall architecture

The QorIQ P-series and T-Series processors integrate from 1 to 12 physical cores and up to 24 logical cores (or so-called “hardware threads”) running at up to 2GHz, based on three different types of Power Architecture cores:

– e500mc 32-bit single-threaded

– e5500 32/64-bit single-threaded

– e6500 32/64-bit dual-threaded with vector unit

As an example of the overall architecture principles, figure 1 shows the P5020 2-core SoC block diagram.

Figure 1: QorIQ P5020 block diagram.

Each core has its own private L1 caches for instruction and data. L2 cache is per-core in all the P-series derivatives and in the T1 members of T-series. L2 is per-cluster in the latest T2 and T4 using e6500 core. L1 and L2 caches are accessed by their respective core through a direct interface without the need to go through the global SoC interconnect.

In addition to the low-latencies of L1 and L2 caches, the SoC provides a shared SRAM resource which can be used either as L3 frontside cache (also refered to as Corenet Platform Cache – CPC) or can be used as an internal fixed-address SRAM for optimizing some data access or to keep hidden some internal “secret” data. Internal memory locations are error protected through either parity (L1) or ECC (single error correction double error correction), which is key for safety applications.

The interconnection between the core modules and the other SoC IP-blocks, including shared SRAM , DDR system memory, peripheral units, and hardware accelerators, is performed at the physical level through CoreNet coherent fabric. CoreNet fabric is a high-bandwidth switch interconnect technology which allows several concurrent transactions to flow in parallel between the various hardware initiators and targets within the SoC. Each core (in P-series and T1) or cluster (in T4 and T2) is attached to CoreNet fabric through a dedicated point-to-point connection, which unlike previous system bus architecture, allows several cores to be fed in parallel by data coming from internal or external memory.

Robust partitioning

Partitioning is becoming a common requirement in various application areas. In high-end, real-time

embedded areas, such as Avionics and Transportation, it is often referred to as “robust partitioning” to account for the high level of isolation and protection requested between partitions in a static configuration approach with relatively low virtualization usage (when at the other end, in the domains of data-centers and networking nodes, partitioning has to be implemented in a more dynamic way with extended virtualization usage for sharing underlying physical resources.)

Implementation of partitioning in a system based on Freescale QorIQ multicore architecture relates to several different features that are spread over the various IP blocks of the SoC: Cores, MMU, caches, interconnect fabric, interrupt controller, peripherals/DMAs, etc …

In the QorIQ terminology, “partitioning” initially refers to the logical grouping of hardware resources, such as CPU/cores, portions of memory, I/Os, and accelerators, in support of various application requirements. A given resource can be used either privately (by a single partition) or can be shared (by multiple partitions).

The most common use for partitioning hardware resources is to consolidate several independent software environments that were previously implemented on different hardware subsystems. Each partition (that is, set of hardware resources) typically hosts a software environment made of an OS and its applications. For that reason, a partition extends to this software environment, making the partition a combination of hardware and software resources.

A software model with several OSes running concurrently is referred to as Asymmetric Multi-Processing (AMP), as opposed to Symmetric Multi-Processing (SMP), where a single OS instance manages all the resources. The QorIQ architecture supports different partitioning models depending on how the various SW environments need to be segregated. A cooperative model is where all partitions are managed with an equal level of trustability. In this model, only two privilege levels are required, this is the traditional User/Supervisor model.

The supervised AMP model (shown in figure 2) is where different OSes with non-equal levels of trustability must co-exist. In this case, the non-trusted partitions must be controlled to not go beyond the usage of resources that are strictly allocated to this partition. To account for this level of protection, the addition of a third privilege level, called “hypervisor,” is provided. Using this 3rd privilege level implies a corresponding hypervisor SW layer running on each core and responsible of supervising that each so-called “guest” OS does not attempt to violate any partition HW-enforced rule. As such, the hypervisor SW layer must be seen as an error and resource virtualization manager of the partitioned system.

Figure 2: Supervised AMP Partitioning model.

Beyond the two models (Cooperative and Supervised AMP), some variants are possible, like for example one of the OSes acting as a hypervisor (or virtualizer) for some other guest OSes.

Robust partitioning is primarily supported by three key hardware mechanisms inside the SoC:

• Memory Management Unit (MMU) per core

• Peripheral Access Management Unit (PAMU) at the interconnect level

• User/Supervisor/Hypervisor (a 3-level hierarchical model)

MMU is the hardware mechanism that controls all the address-based accesses initiated by the cores.

PAMU (figure 3) is a similar mechanism that controls all address-based accesses initiated by the DMA-capable peripherals. MMU and PAMU are the two key security features of a partitioned system. In the supervised AMP model, only the hypervisor SW can physically configure and modify the MMU and PAMU content.

Figure 3: The key role of PAMU.

In addition to the three key mechanisms mentioned above, many extensions have been added into the QorIQ P- and T-series to efficiently manage the various aspects of partitioning dealing with:

– Protection & Authorization

– Coherency of memory hierarchy

– Virtualization of Guest-OSes

– Interrupt management

– Inter-core and inter-partition communication and synchronization

In critical, real-time application, robust partitioning may encompass both spatial and temporal aspects. Temporal partitioning is more challenging in the context of a multicore SoC because of the high amount of physical interaction that happens inside the SoC. To enforce determinism, system designers may consider several options when making architectural hardware and software decisions, including:

– De-activate some of the optional mechanisms that improve performance, like the shared cache and use it instead as normal RAM.

– Restrict hardware coherency to minimize interconnect traffic overhead

– Make use of several available capabilities, such as cache pre-loading and line-locking

– Restrict the interactions between partitions by implementing time-management strategies at the RTOS or hypervisor level (time scheduling/slicing).

Vector processing

The aerospace and defence markets require SIMD (single instruction multiple data) performance within embedded power budgets. They need fast signal processing, real-time imaging and other algorithmics. The latest QorIQ T-Series T4 and T2 families reintroduce AltiVec vector processing technology initially designed in the e600 Power Architecture core. Radar imaging, cockpit displays and target acquisition can take advantage of this capability.

AltiVec vector unit extends the e6500 integer and floating-point units with a SIMD unit providing 156 vector instructions and working on its own set of 128b wide vector registers, providing DSP- and GPU-like functionality in a synchronous manner. Data types supported are boolean, integers and floating-point. Up to two SIMD Altivec instructions can execute per cycle, each instruction can process vectors of up to 16 data. In floating-point single precision, Altivec unit can sustain up to 192 GFLOPS (Floating-Point Operations Per Second).

Programmers can implement Altivec either by fully writing and optimizing their own vector library or they can build their application upon an existing library provided in the ecosystem of the T-series family. This library includes all basic and standard mathematic, scientific, DSP and graphics primitives.

Power management

Power consumption is a key issue in embedded applications and besides common techniques like clock gating, unused blocks deactivation or sleep modes, some gains can still be obtained by clocking each core (or cluster in the case of T-series) independently with a frequency tuned to the actual core performance expectation, this is applicable in applications with asymmetric core utilization.

It is also possible to dynamically adapt the frequency of the cores during operation, to take into account some variable periods of activities, this has to be controlled by SW. (Undeterministic frequency adaption like found in server applications is not supported)

In the latest T-series, Freescale has extended power management by allowing both clock and power gating in order to reduce to very low power dissipation for those internal blocks (Core, Altivec, L2 cache …) which show no activity during a significant period of time. The nice thing being that the internal states of the IP block is maintained and that block can be reactivated in a quite fast latency. This way, both the stating and the dynamic components of the silicon power dissipation can be minimized.

Debug and monitoring

Due to the overall complexity of these multicore SoCs and as determinism is a key criteria of certified embedded application, a multicore processor must provide extended support for monitoring and debugging the activity and interactions internal to the SoC. This has been a key improvement compared to past generations when designing the QorIQ P- and T-series. A complete infrastructure including watchpoints, multiple event counters, trace capture logic and buffers has been integrated for this purpose.

Security / trust features

“Security” aspect here relates to the protection of the equipment itself against potential mis-functioning or denial-of-service type of threat or the protection of internal IP (Intellectual Property) like for example preventing that some unexpected entity be able to duplicate and re-use some critical code or data portions in the equipment. It also refers to a “trusted system” as a system which that does what its builder (OEM) and users expect it to do and does not do what the OEM and users consider harmful.

Up to now, this type of protection has been a concern in consumer applications and in some highly-critical embedded areas. With the use of multicore processors, this level of protection will be more and more considered as the consolidation of multiple quite independent functions will lead to several actors (manufacturers and various suppliers) interacting on the same hardware platform, each considered with a variable level of “trust” .

In the QorIQ P- and T-series, Freescale has brought special care to these security requirements and multiple mechanisms have been integrated, including:

– Secured boot

– Periodic re-validation of code and data critical sections through authentication.

– Auto encrypt/decrypt of external secrets

– Tamper detection

– Disgard of debug interfaces

If the QorIQ SoC processor is configured to perform secure-boot, at Power-on Reset, core 0 is released to begin execution from an internal bootROM containing Internal Secure Boot Code (ISBC), which is a non-modifiable code. The instructions executed from ISBC allow CPU-core0 to determine if the next code to execute outside the internal bootROM is safe by checking signed hash using key values contained in an internal fuse block. Starting from this secure boot, system developers can ensure further reasonable secure boot chains of trust, allowing initial boot, hypervisor and/or RTOS, and any further system software running on the multiple partitions to be validated before execution.

Supporting avionics suppliers – MCFA initiative

Freescale is one of the leading processor suppliers in Avionics and has always brought strong support to this market. In this context, considering the high technical interest shown by this industry in emerging multicore technology and the challenges associated by their complexity with regard to integration and certification, Freescale has initiated late 2010 the MCFA (Multicore for Avionics) working group with top North American and European commercial avionics manufacturers.

This working group was initiated to assist the avionics industry in certifying equipments based on multicore SoCs. Through this group, a partnership between the SoC supplier (Freescale) and the avionics equipment manufacturers is aimed at defining the set of data required in the development and certification phases in various areas including technical collaterals, SoC design & verification and service experience.

About the author:

Eric Bost is Senior Field Application Engineer at Freescale Semiconductor – www.freescale.com –He can be reached at eric.bost@freescale.com

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s