MENU

Why do variable speed drives fail and how do we test them?

Why do variable speed drives fail and how do we test them?

Feature articles |
By Julien Happich



The physics behind product failure

The first step is to think about the factors that could make drives fail. We take the physics of failure (PoF) approach that divides products into two types and provides two reasons why they fail.

The two reasons for failure are overstress and wear-out, which are related to the product’s strength and durability, respectively. Overstress failure occurs when a product is subjected to stress that exceeds its strength. Wear-out is a longer term failure process: each time a product is exposed to stress it suffers some damage, and the cumulative effect builds up and eventually causes failure when it exceeds the product’s durability (see figure 1).

Fig. 1: Things fail when stress exceeds their strength. Defective products fail at nominal stress.

Under the PoF approach, products are considered as either nominal or defective. Nominal products will withstand nominal stress. And as long as this stress is not exceeded they will last their entire design lifetime. But defective products will fail at less than nominal stress. You might conclude that defective products will fail quickly when put into service. Yet many do in fact continue to function without any problems. This is because the stress levels in most applications are well below the nominal design limits, so defects don’t turn into failures.

The sorts of stresses that can affect the reliability of drives are mechanical, thermal, electrical, radiation and chemical. Figure 2 shows how these stresses can lead to overstress and wear-out failures.

Fig. 2: Typical failure mechanisms for electronic equipment.

Deciding what to test and how to test it

PoF provides the basis for drives testing programs devised according to the product type and the failure mechanism under investigation (see figure 3). Testing is carried out at all stages: in R&D the aim is to verify that the design and component selection meet both the specifications and customer expectations; in production the purpose is to verify the quality of the product and ensure it continues to perform as designed; and when the product eventually fails, failure analysis can be carried out to identify what went wrong or if the failure was caused by natural wear-out.

Fig. 3: Testing methods are selected according to the type of sample and failure mechanism.

Type testing carried out as part of R&D focuses on nominal samples so relatively small sample sizes can be used. Because the samples in the test are nominal products, they should all fail the same way. An important step at the end of the testing process is to analyze the failure to ensure it is indeed a nominal product and the failure was not caused by a defective component or production error. If a defect is found in such a small sample it indicates that there could be a severe quality problem in production.

When testing for defective products, it is vital to select the correct sampling rate. In most cases only a small percentage of the products is defective, so it is necessary to test a large number to verify their true proportion. Statistical theory tells us that if 99% of products are good and the required confidence level is 99%, for example, then we need to test 459 units without a single failure to confirm that the proportion of defective products really is less than 1%.

 

Highly Accelerated Life Testing (HALT)

The purpose of HALT testing is to probe the product’s weakest links and determine how much overstress it can withstand, i.e. to verify the overstress margins. HALT tests often focus on temperature and vibration, both separately and in combination, with typical test temperatures of -55°C  to 150°C and vibration levels up to 50 g. Other stresses commonly used in HALT testing are voltage, current, mechanical shock, over-torqueing of terminals, moisture, etc. Figure 4 shows an example stress profile for HALT testing with temperature and vibration.

Fig. 4: Typical stress profile in Highly Accelerated Life Testing.

HALT testing is most commonly performed on components and sub-assemblies. When the product fails the root causes are analyzed to determine whether a similar failure could occur in a real life situation. In many cases, it turns out that the same type of failure could occur in a real application if certain abnormal conditions arose – such as an accident during transportation or malfunctioning of a cooling system. If necessary the design is improved and HALT testing is repeated to verify that the improvements have had the desired effect.


Reliability Demonstration Testing (RDT)

RDT testing is carried out to confirm that the product’s expected life meets or exceeds the target, and it is generally carried out as part of R&D.

Factors that must be known or determined to design ALT/RDT tests are the product’s expected reliability at end of life, its mission profile, required confidence level, and the allowed stress levels identified in HALT tests. The tests expose the product to the typical stresses it will experience over its entire lifetime. Most common stresses for drives are temperature and temperature cycling. To reduce the time needed to complete the testing, stress levels that exceed specified operating conditions are used.

Depending on the confidence level required and the reliability levels expected, 7-20 samples are typically needed. The product can be launched on the market following successful RDT testing, but the tests will continue in the form of ALT programs. These follow-on tests confirm the validity of the model and may provide opportunities for total life cycle cost reduction.

At ABB we RDT test complete drives in a special reliability container where they are exposed to drastic stresses. Typically, 10 years of life can be simulated in a few months.

 

Accelerated Life Testing (ALT)

ALT testing is used to determine the product’s expected lifetime. It is very similar to RDT, except that various stress levels are used and the tests are continued until failure. The difference between RDT and ALT testing is that in RDT we don’t know what the actual life is because the units are not supposed to fail. We just know that it should survive so many years or longer. ALT tests will give us estimate of the real life. RDT tests also assume certain failure models and material constants. ALT testing provides us with the model and material constants.

Depending on the expected confidence and reliability levels, ALT testing typically requires 7-60 samples. Large sample sizes and a significant test time are often needed, especially if the activation energy or other coefficients used by the model are not known.


Highly Accelerated Stress Screening (HASS)

HASS screening is performed as an integral part of the production process. The idea is to expose products to increased stress levels to cause those with defects to fail during screening rather than in the infant period of the life cycle. Figure 5 is a product ‘bathtub curve’, which shows how failure rates vary over the lifetime. Machines are like humans: the higher the stress the higher the infant mortality, higher sickness rate and shorter life.

Fig. 5: HASS screening shifts the bathtub curve in order to induce infant mortality
failure among defective units and therefore reduce failure rates in actual use.

HASS involves a trade-off: while the aim is to reduce failures in actual use, the screening process itself shortens the product’s lifetime slightly because the application of stress contributes to wear-out. Therefore, it is important to carefully consider the balance between benefits and drawbacks when planning HASS. Is it worth using up two months of a ten-year lifetime, for example, if this will eliminate x% of the product’s infant mortality failures? HASS is naturally most beneficial for products with high infant mortality failure rates. Downsides of HASS are that the screening costs are high and production throughput times are increased.

As a practical example of how we use HASS on ABB drives, test cabinets are used to screen main circuit boards and gate driver control boards. The boards are connected to a power supply which is cycled during screening. They are tested several days at temperatures exceeding maximum operational temperatures. This reproduces the stress experienced in a few weeks of normal operation.

 

Ongoing Reliability Testing (ORT)

The purpose of ORT is to ensure that no changes have occurred in components or production processes that will have a systematic impact on reliability. The methodology is similar to RDT/ALT testing, but the sample units are randomly selected from actual production. In the case of ABB’s drives, testing can be performed on drive modules, IGBT drive packages, PCB boards and even complete drives. Depending on the risk level that has been set, samples are taken from production each week or month. If a failure occurs during testing, a thorough root cause analysis is carried out to check whether the failure was a random event or due to a change in the product or one of the components.


Testing drives together with motors

When OEMs source a new drive they need to know how it will perform with their motors. At ABB we have created a dedicated Drives Customer Laboratory that enables various motor/drive combinations to be tested under loading conditions that simulate the actual application.

This testing is mainly concerned with performance of the motor/drive system rather than reliability. However, it does enable measurement of the actual stresses imposed by the application. This real-world information can then be used to predict the reliability of the drive in this particular case as well as feeding into our overall test programs.

 

Failure analysis

When a failure occurs, the next step is naturally to ask what went wrong – and how it might be avoided in future. That is the role of the Drives Failure Analysis Laboratory that undertakes root cause analysis using equipment and techniques such as 3D X-rays, acoustic scanners, SAM microscopes, and component cross sectioning. This helps determine if the product was nominal or defective, what caused the stress that led to the failure and this information then informs future test regimes.

 

About the author:

Kari Tikkanen is Principal Reliability Engineering Manager for ABB’s Robotics and Motion Division – abb.com/drives

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s