Creating highly reliable FPGA designs
Radiation-induced soft errors – "glitches" – became widely known in the 1970s with the introduction of dynamic RAM chips. The problem emerged as a result of radioactive contaminants in chip packaging, which emit alpha particles as they decay and subsequently disturb electrons in the semiconductor. This disturbance can result in an unwelcome change in voltage levels in digital logic.
In combinational logic, the voltage disturbance will most likely be transient; an unwanted transient signal is known as a single event transient (SET). However, synchronous logic – such as state machines, registers and memory – can store and propagate the transient error, which is likely to result in hardware failure. Such a stored error is known as a single event upset (SEU).
Figure 1: Single event upset (SEU) results from
storing an unwanted transient event.
As far back as 1996, researchers at IBM estimated that each 256MB of RAM suffers one error per month as a result of soft errors (click here for more details). The error rate grows as logic densities increase, switching voltage levels decrease and switching speeds rise. Today’s bigger, faster FPGAs will suffer from higher soft-error rates.
Beyond aerospace and defense applications
Indeed, soft errors still occur today as a result of radiation from space – even within electronic equipment operating at sea level. For many years, design teams working in aerospace and defense have been aware of the need to protect their designs against SEUs. Today, engineers working in other market sectors are adopting techniques to guard against SEUs. We are increasingly dependent on the safe operation in automotive systems and medical equipment, but high reliability is no longer purely a safety-critical issue; it is a growing concern even for networking and industrial automation systems that demand high quality-of-service and uptime.
Detecting and protecting against SEUs
For some applications, design teams choose to use radiation hardened ("rad-hard") devices that are physically resistant to soft errors. However, rad-hard devices such as MicroSemi’s RT ProASIC 3 and Xilinx’s Virtex-5-QVR FPGAs come at a price premium and, as a consequence, find use mainly in mission-critical space projects.
Fortunately, there are design-based techniques that engineers can use to detect and protect against soft errors in normal sequential logic FPGA structures. Synopsys’ Synplify Premier enables design teams to automatically apply techniques that build safety into the design. These techniques include triple modular redundancy (TMR) and fault-tolerant Finite State Machine (FSM) implementation.
Safe Finite State Machines
A flipped bit in a state machine’s state register can put the FSM into what the design team assumed would be an "unreachable" state under normal circumstances. The FSM can become stuck in the invalid state, which is potentially disastrous in, for example, a control logic module.
Safe FSM implementation involves using error-detection circuitry to force a state machine into a reset state or into a user-defined error state so the error can be handled in a specific way. The Synplify synthesis software can be instructed to automatically add error detection circuitry to identify errors and create additional error mitigation circuitry to return the FSM into a safe state, so that the chip resumes correct operation.
Figure 2: Additional error detection and mitigation circuitry
is created to ensure correct FSM operation in the event of
radiation-induced soft errors (SEUs)
For state machines that use "1-hot" state encoding, the error detection circuitry could be a parity checker, which ensures only one state register bit is high at any time. Once an error is detected, the state machine is then returned to a "safe" or "reset" state.
Fault-tolerant FSMs with Hamming-3 encoding can be used to detect and correct single-bit errors with a Hamming distance of 3, ensuring that the content of a state register erroneously reaching an adjacent state would be detected and that correct operation of the FSM would continue.
Deadlock occurs when a state machine enters a state from which it is not able to exit. Design teams can avoid deadlock by automatically inserting timeout counters on critical state machines.
Protecting redundant logic
Synthesis tools are designed to optimize away redundant logic, since the tool seeks to meet timing goals in the smallest possible chip area. Many of the structures that help to mitigate soft errors contain logic that synthesis tools would like to remove. Synopsys provides synthesis tool attributes such as "syn_keep" and "syn_preserve" in order to preserve the error detection and mitigation logic that has been created to improve reliability.
Designers can use the RTL "others" clause to specify a fault-tolerant or safe FSM. The "others" clause describes the behavior of the FSM or sequential logic, should an SEU cause it to enter a state that is nominally unused (that is, unreachable), but that in fact can be entered when the SEU causes a bit flip to occur. For example, the code fragment below specifies that the FSM returns to the IDLE state if it enters an unused state:
when others =>
next_state <= IDLE ;
By default, synthesis would optimize away the "others" clause. Designers can now instruct the Synplify synthesis tool to preserve the "others" clause when optimizing Safe FSMs or sequential logic.
Error correcting code (ECC) memories
Design teams can use error-correcting codes (ECCs) to detect and correct single-bit errors. Designers simply have to indicate in the RTL or constraints file which memory functions are safety critical for design. The Synplify Premier software infers the ECC memories offered by many FPGA vendors and automatically makes the proper connections.
Distributed TMR with voting logic
Design teams have used Triple Modular Redundancy (TMR) for years to help mitigate SEUs in sequential circuits. TMR triplicates part or all of the logic in a circuit and then uses "voting" logic to determine the best two from three results in case a signal is changed due to a soft error.
Figure 3 shows how a cone of logic is replicated three times to create identical cones along with voting logic. If one cone fails, the output from the voting logic will pass through to the output the signal with the two-thirds majority vote.
Figure 3: TMR helps mitigate SEUs induced by radiation
effects by inserting redundancy during synthesis
with triplicated circuitry and voting logic.
For certain applications, especially those that cannot tolerate going into a reset or error handling state, TMR can be a good way to mitigate soft errors. The disadvantage of TMR is that it takes a lot of extra logic, that is, chip resources, to implement and can impose additional latency in the output of the cone of logic.
In general, the design team will want to selectively implement TMR at a local, block or system level. The Synplify Premier software lets designers decide which parts of the design would benefit from redundancy and automatically implements TMR for those areas.
Summary
Radiation-induced soft errors impose an increasing threat to the reliable operation of mil-aero, communications, automotive and industrial designs alike. Design teams can protect their FPGA designs against soft errors by incorporating redundancy and by developing safe sequential logic and fault-tolerant state machines with custom error mitigation logic. Such techniques ensure safe design operation by returning the design to a known safe state of operation, should a soft error occur. This logic can ensure high system availability in the field and provide reliable system operation. Synopsys Synplify Premier provides designers with the ability to automatically create this circuitry in FPGAs that are not radiation hardened, and the flexibility to control where and how these techniques are applied to the design.
About the author
Angela Sutton brings over 20 years of experience in the field of semiconductor and design tools to her role as staff product marketing manager for FPGA Implementation products at Synopsys. Before joining Synopsys, Ms. Sutton worked as senior product marketing manager in charge of FPGA implementation tools at Synplicity, Inc., which was acquired by Synopsys in May 2008. She has a B.Sc. in Applied Physics from Durham University UK, and a Ph.D. in Engineering from Aberdeen University UK.