Functional Safety: Predictable reactions in real-time (Part 1)

Technology News | May 17, 2011

By eeNews Europe

Functional Safety, as defined basically in IEC 61508 and in ISO 26262 for automotive systems, clearly describes actions to take, methods to perform to develop a safe system. Safe actually can include the presence of faults and bugs. This implies the detection of malfunctions and taking proper actions before any harm is done. So it’s all about timing: Before a hazard occurs, the system has to get itself in time into a safe state involving automated mechanisms and the driver.

Precisely defining the safety requirements, including time spans the system has to respond to faults, is mandatory. And it is crucial for project success to evaluate early in the development process if these requirements are met. We will introduce the reader to the timing aspects of functional safety. A model based methodology based on a matured tool suite will be described to help design embedded systems having the correct dynamic behavior and showing robustness to changes and unexpected system states.

1. Motivation

During validation the manufacturer of the system has to prove the combined properties and functions of the system comprised of mechanical, electrical and electronic components controlled by software and the user are able to avoid the hazards in time. “Absence of unacceptable risk due to hazards caused by mal-functional behavior of E/E systems”, as stated in the ISO 26262 [1].

Many functional aspects like voltage levels or mechanical robustness are quite static and can be designed with sufficient safety margins. The dynamic behavior and reaction times on the contrary are very dependent on the interactions and interferences with all system components. Predicting and ensuring them to fulfill the timing requirements by design requires a good specification and proper design of the dynamic behavior starting in early design phases.

Modeling the real-time behavior of embedded systems enables engineering to describe the timing properties and interactions of software and hardware. By simulating and validating the model, the reaction, performance and timing of the system can be analyzed and tested against the timing requirements it has from the functional safety – starting already in early design phases.

How well do the current development projects do? Only one third of all development projects are within the planned functionality, quality, effort and time according to the Standish Group’s CHAOS Report [2]. Speculating how this maps to automotive E/E systems, one would tend to claim a better quota, but would also have a rather tight and stressful project progress in mind. An orderly project beginning and advance is very often followed by a project final with huge last minute efforts for implementing late requirements and debugging complex errors. In the end the project team made it before the project crashed, but all involved parties had to sacrifice some planned features, time lines, quality measures or profits – causing quite some frustration.

The growing complexity of automotive E/E systems and functional safety requirements demand not only incremental but substantial changes in development methods and processes. There is help. A lot good, proven processes and methods are available. CMMI, automotive SPICE and last but not least ISO26262 not only demand higher quality, they provide proven recipes to address the challenges.

One important factor in succeeding to manage the complexity is how the project is able to handle timing within the E/E system. In the distributed, heterogeneous computer system, what a current automotive E/E system actually is, application functions are performed by multiple microcontrollers and communication busses. Sensor signals are acquired, transformed and transferred as digital data that is processed by software functions. These asynchronous actions have to result in predictable system reactions at a defined time and performance. In the course of this paper the details and implications will be further elaborated.

2. Functional Safety – avoidance of harm

ISO26262 is defining functional safety as absence of unacceptable risks. The design of E/E-Systems includes several measures to achieve this goal. Functional safety means to deal with malfunctioned behavior so that harm is avoided. Among others timing aspects are stated explicitly.

The timing model in ISO 26262 is illustrated in . The first spot on the timeline is a fault, which is defined as an event. It could be e.g. a short circuit due to a damaged insulation of a cable. This fault may lead to a failure (2nd spot on the timeline) in that an ECU stops working correctly.

The system must react on any safety critical failure. The most common approach is the transfer to a safe state (3rd spot on the timeline) which mostly means to switch off the system. But in some cases a functionality cannot be abandoned so that a switch-off is not feasible. One example is the breaking system which requires high availability. The ABS functionality may be switched off in case of a failure while the breaks themselves still have to remain fully functional. Nevertheless the transition to a safe state must be completed before a hazardous event occurs (4th spot on the timeline). This is defined as the starting point where harm can occur.

The time span between a fault and the hazardous event is called ‘Fault Tolerant Time Interval’ and is used to define the worst case reaction time of the system to be functional safe. Any countermeasure must be finished within this interval. The time span between fault and safe state is called ‘Emergency Operation Interval’. Within this timeframe the system must be considered as unsafe. The system could react on a fault with delay caused by the scheduling of the functionality. The time span needed for the actual transfer to safe state is called ‘Fault Reaction Time’, a system property as the result of design and implementation. The fault reaction begins some time after the fault, possibly even after the failure, that a self test or plausibility check function would detect. This time span needs to be constructively estimated and validated. Engineering needs to prove that the fault reaction time is shorter than the fault tolerant time interval and any fault reaction is completed before a hazardous event occurs.

Figure 1: Timing model in ISO 26262

3. Five Steps to design safety

Timing requirements like any other safety requirement shall be engineered in five steps [4]. In the following the green boxes reference to according parts of the ISO 26262. The ISO does not explain exactly what to do but provides very helpful guidelines how to proceed.

The ‘Hazard Analysis and Risk Assessment’ classifies all functionality of the system regarding safety. Important, exact sequences of consecutive events are described to define system states, events and time spans for later detailing of requirements. In the ‘Specification of safety goals’ phase top level requirements are generated to design a safe system. The physical units of the timing are not yet important for these two steps. The ‘Specification of functional safety requirements’ details out the earlier defined top level requirements. This step marks the end of the concept phase.

The design phase begins with the ‘Specification of the technical safety requirements’. Physical characteristics shall be considered here for the first time in the design process. This includes timing under all above described aspects. The fifth and final step is the co-specification of hardware and software on a detailed level. Here the timing related requirements are defined for both areas.

Figure 2: Five steps from ISO 26262 to design safety

You may have the impression functional safety is only an engineering issue. The timeframe called safety lifecycle actually starts with a concept phase but ends with decommissioning. There are many activities included on the management side as well as on engineering and production. Among others you have to consider configuration management but also verification tests in production. It is obvious that the engineering of any safety related system is based on architectural decisions that are made before the safety lifecycle starts. The architecture consists of several components that are linked to each other and therefore have to be considered in conjunction. The following abstract will elaborate the safety lifecycle steps using a real world example.

4. Safety design of a memory seat

Memory seats belong to the popular comfort functions in modern vehicle designs. An electric motor adjusts the position of the seat and its backrest. The basic models only have buttons to perform the movement. Advanced models have position controllers with a few presets. High end versions are combined with the personalized remote key of the car. As soon as a person unlocks the car the seat is moving into the position most comfortable for the driver. We have chosen this example because it is easy to understand and does not require deep knowledge in a specific vehicle domain.

The safety lifecycle starts when vehicle development is already running. A specification and the raw system design are at least the preconditions required for the hazard analysis and risk assessment. ISO 26262 requires the definition of the item. In this example the item is defined as the memory seat on the driver’s side.

3-7 Hazard Analysis and Risk Assessment

The major risk is an unexpected movement while driving. First there is the aspect of shock. Secondly an undersized person may not be able to control the car when the seat moves backwards and the driver gets out of reach to the pedals and steering wheel. On the opposite a tall and corpulent person may have difficulties to control the car when the seat is moving too near to the front. All these considerations will be scored in automotive safety integrity levels (ASIL) during the hazard analysis and risk assessment.

3-7 Specification of Safety Goals

Safety goals are the top level safety requirements. “Any unexpected movement of the seat must be detected and immediately stopped” is the safety goal for this example. Whilst ‘immediately’ is a definition of a very short time span, there is neither a detailed amount of microseconds, nor a defined start and end of the time span defined in this early phase.

3-8 Specification of Functional Safety Requirements

Safety goals need to be refined. What does “unexpected movement” actually mean? A broken switch is the same as a pressed switch from the system point of view. So, the system cannot differentiate whether the driver or a short circuit causes a movement of the seat. For this case an appropriate functional safety requirement would be: “the driver can interrupt a movement by pressing the switch that leads to a movement in the opposite direction”. In addition we just permit smaller adjustments for the memory seat while driving. Therefore the ECU needs information about vehicle speed to detect driving and has a CAN interface. With his step the concept phase ends.

Figure 3: Fault tree analysis for unexpected moving seat

One main document used in this phase is called fault tree analysis (Figure 3). For our example the failure “seat moves unexpected” has two root causes (either a broken switch or the malfunctioning ECU). Any ECU failure has three possible root causes (power stage, elecronic, software).

4-6 Specification of Technical Safety Requirements

The design phase starts. Physical parameters like timing come into place. A detailled analysis of the kinematics showed that the fault tolerant time interval is 3 seconds long for our memory seat. ISO 26262 reqires to prove there is no violation of this timeframe possible. That leads to time budgets for each hardware and software component. We have to regard also that the memory seat control functionality is integrated into a body computer and not running stand alone. Therefore there are running seat tasks as well as other tasks to be concidered in the safety analysis.

5-5 and 5-6 Hardware and Software Engineering

We actually have all information needed:

fault tolerant time interval: 3 seconds

moment of shock: 1 second

time to react (find and press button for opposite movement): 1 second

functionality is integrated on a large body computer

speed and position information is mesured

speed information is received via CAN

switches are connected via digital IO

plausibility checks are performed.

We have seen in Figure 1 that there is a time gap between fault (event) and failure (functional reaction). For this example we consider 100 – 300 ms between broken switch and the situation where the ECU interpretes “move”, the failure occurs. This is the time constant of the imput stage of the ECU and is scheduler dependend. We also consider 2.5 s as maximum emergency operation interval.

Part 2 of this two-part-story will describe challenging of integration tasks as ell as timing modeling and critical event chains.

About the authors:

Dr. Ralf Münzenberger (Muenzenberger@INCHRON.com) is Co-Founder and Managing Director Professional Services of INCHRON GmbH, Potsdam, Germany.

Dipl.-Ing. Tapio Kramer (Kramer@INCHRON.com) is Marketing and Product Manager at Inchron GmbH, Garching, Germany.

Juergen Belz (juergen.belz@prometo.de) is CEO of Prometo GmbH, Paderborn, Germany.