Impact of burn-in on power supply reliability

Impact of burn-in on power supply reliability

Technology News |
By eeNews Europe

Users of power supply products demand increasingly higher levels of reliability and performance. Although the suppliers of individual components can confidently provide impressive life and reliability data, the compound effect on overall reliability when a large number of individual components are combined in a module such as a power supply can be significant. Perhaps more important for product reliability is the quality and repeatability of the assembly process. Solder joints, connectors and mechanical fixings are all potential origins for product failure. In use, operating temperature and other environmental factors also affect the life and reliability of a power supply.

Burn-in and various other forms of life and stress testing help provide the data to enable power supply manufacturers to continually improve the reliability of their products. When analysed correctly and fed back into the design and assembly process, the accumulated data can be used to optimise the test and burn-in process.

The Burn-In Process
The purpose of the burn-in process for power supplies which have passed initial manufacturing test is to weed out “infant mortalities” as seen in the first portion of the well-known “bathtub curve” of failure rate versus operating time (Fig. 1). These early life failures may be due to latent intrinsic faults within bought-in components, marginal workmanship errors or latent faults induced in components by inappropriate handling such as ESD damage. Note that there are no absolutes in the world of reliability testing; only probabilities and confidence levels for large populations so there is never a guarantee that all infant mortalities are caught by the burn-in process.

Over many years the conventional approach to power supply burn-in has involved running products at an elevated temperature, often the maximum-rated operating temperature, where the rate of appearance of latent defects is assumed to be accelerated. The supplies are run under full load with power cycling and the input voltage is run at either maximum or minimum to provide either maximum voltage stress or maximum current stress, depending on the design topology.

Care in the choice of conditions is necessary because some components in some topologies can see more stress at light loads, such as snubber networks in variable-frequency converters. Some ingenuity can also be applied. For example, if a product is intended to operate normally with forced air, it could be run in still air at light load and still achieve comparable temperature stress levels of the hottest components. However, without the “heat spreading” effect of the forced air, other components might see very little stress under these conditions.

A technique sometimes used by Murata Power Solutions, dependent on the product topology, is to burn-in products into outputs cycled between short and open circuit. This can apply an appropriate current-stress level while exercising the inbuilt protection circuitry on short circuit and imposing a high-voltage stress level to many components on open circuit. There is a major benefit in the fact that the power in the short- or open-circuit load is theoretically zero, although practically the short might be applied by a MOSFET dissipating a few watts.

This method alleviates the real problem of energy wasted in burn-in loads. However, some types of component stresses are not applied with this method because the overall power supplied by the unit might be low and therefore self-heating may be low. An elevated ambient temperature will compensate for this in part, perhaps using the waste heat from the burn-in loads.

Some product topologies are not suitable for this burn-in method, such as those that have a poorly defined or strongly re-entrant short-circuit current characteristic. For example, if on a “hard” short circuit the output current reduces to much less than the rated maximum output current or if the supply enters a “hiccup” mode, the level of burn-in stress may be too low to be effective.

The decision on burn-in configuration is made jointly between design and reliability/quality engineers to ensure optimised screening. Data logging and analysis of the units under test is important for determining whether and when a failure has occurred. If all failures occur in the first few minutes of a 48-hr burn-in sequence, there would be good reason to shorten the time and increase throughput while saving energy.

A comprehensive test after burn-in is necessary to ensure that products are fully functional. This can also show whether there are any intermittent problems. Understanding and using burn-in data to modify product design and manufacturing processes can result in improved reliability and yield so companies like Murata Power Solutions use burn-in data to drive a continuous improvement quality process.

Experience in burn-in testing has shown that thermal cycling precipitates more infant mortalities than a constant elevated ambient although the sets of failures don’t completely overlap. Thermal cycling with a dwell time at each thermal extreme is therefore the preferred process. Increasing the rate of change of burn-in temperature precipitates more failures in fewer cycles as illustrated in Fig. 2.

Note that with increased thermal rate of change, different populations of failures can appear that are more or less affected by this type of stress and the occurrence of some residual failure types is unaffected. Even though there is equipment available to achieve thermal rates of change of 60°C per minute or higher, some manufacturers don’t exceed 45°C per minute to prevent excessive thermal stress that may, for example, cause cracking of multilayer ceramic capacitors (MLCCs).

In the absence of thermal cycling chambers, power cycling at an elevated ambient with judiciously selected cycle times approaches the effectiveness of the thermal cycling/dwell process. Care must be taken to ensure that the products are not stressed outside of their ratings in the often atypical environment of burn-in. If overstressed, significant useful life of a good product could be used up, and at worst, hard or latent failures could actually be induced in otherwise good product. If the product includes components with inherent wear-out mechanisms such as electrolytic capacitors or optocouplers, their remaining lifetime after burn-in should be evaluated to be adequate.

At Murata Power Solutions, the burn-in process typically starts with a duration of 24 hours, with a decision process to reduce the burn-in time when no failures occur after a set number of hours. Standard IPC-9592 gives plans for reduction of burn-in times given observed failure rates over set number of unit-hours.

Burn-in can be eliminated when no failures occur after multiple production builds. However, it could be argued that this removes the insurance against a group of defective components being used or a process anomaly occurring. In volume production of parts that are known to have a significant infant mortality rate, perhaps because of the degree of manual assembly, a regime of variable burn-in can be implemented whereby burn-in is terminated when a pre-calculated period of failure-free operation of a batch has passed.

This period is found from statistical tables, given the expected percentage of infant mortalities, their known failure rate and distribution type, batch size and percentage confidence level required that only a given number of latent failures remain. For example, consider a batch of 10,000 units that historically has had 10 infant mortalities per batch of a type found to have a mean time to failure (MTTF) of 10 hours at the burn-in temperature. In this case, statistical tables show that a failure-free period of 13 hours must pass to give a 90% confidence level that only one latent product failure remains. The period extends to 24 hours to have the same 90% confidence level that no latent infant mortality-type failures remain. (Reference 1)

Some manufacturers have taken the burn-in process further after finding that the types of burn-in described do not eliminate, within a reasonable time, all of the failures occurring in the early life of a power supply. Also, conventional burn-in does not provoke early failures that could be a result of the shock and vibration of shipping and handling.

To combat this, a more aggressive highly accelerated stress screen (HASS) can be used that applies mechanical, thermal and electrical stress typically beyond product ratings but within design margins. Acceleration factors of more than 40 over conventional burn-in have been claimed for this method, giving correspondingly shorter test times.

A problem however is that the stress levels are so extreme there is a risk of damaging good product with hard or latent failures. In answer to this, the highly accelerated life test (HALT) process was designed to identify the real damage limits in a product by stressing the product to failure with temperature extremes, thermal cycling, progressively higher levels of vibration, and then a combination of thermal cycling and vibration.

During this testing, the destruction limits of the power supply are identified. These operating limits are then used to set the less-severe HASS test levels. HALT is also used extensively during product development to identify potential weaknesses in the design. The test equipment required to do HALT must typically ramp temperature between -55°C to 125°C while applying six-axis linear and rotational random vibration. This requires a major capital investment and is often subcontracted to specialist test houses. Some vendors such as Murata Power Solutions already have internal HALT facilities.

The No Burn-In Model
As described earlier, once burn-in failures have reduced to a certain level by progressive manufacturing process improvements, the process can be dropped completely. Standard IPC-9592 for example allows this after one year or 30,000 unit-hours if the maximum failure is from zero to 400ppm depending on the product type. This can be considered only if the manufacturing process is entirely predictable and the quality of bought-in material is such that it has a very low rate of latent intrinsic defects. In other words, the bought-in components themselves don’t exhibit significant infant mortalities and only have their intrinsic low-level latent defect rate.

Although commodity components approach this quality level and modern manufacturing quality control can minimize process variations, there is still a real risk that a customer may see some early life failures. The cost of this in terms of lost goodwill has to be weighed against the cost of burn-in. Remember that customers will still see the intrinsic failure rate of the product in its service life.

A small extra number of failures attributable to infant mortalities may not be significant. For example, one product from Murata Power Solutions that uses quality components is built using a stable, mature process without burn-in and has an observed field mean time to failure (MTTF) of more than 25 million hours. This figure is derived from 130 failures in the total sales of 4.37 million parts shipped regularly over six years. In this case, it is assumed that the parts are powered for 25% of any given period, that only 10% of failures are actually reported and all shipped parts are still in the field. This represents a creditable defect rate of 30 parts per million over all units shipped to date and is a justification for a ‘no burn-in’ model. Note that some manufacturers count defect rate (dppm) as failures on delivery or within a short time of delivery. It can make sense to define a time as of course a cumulative defect rate for any electronics approaches a million parts per million after a long enough period!

On-going reliability tests are normally only used when there are large quantities of units built on a continuing basis and can give an estimate of the intrinsic reliability of a product during its service lifetime, that is, MTBF. The accuracy of this figure depends on the failure-rate acceleration during the test having a known relationship to the real-life failure rate.

The ‘Arrhenius’ equation can give a value for the acceleration factor given a constant failure rate after infant mortalities. This equation has its origins in chemistry, so in theory, it requires a knowledge of effective “activation energies” for all failure modes. Historically, the ‘rule of thumb’ has been to double the acceleration factor for each 10°C rise above the real life operating temperature. As an example, 50 units running for six months at 70°C with no failures gives 219,000 operational hours. From statistical tables, this represents a failure rate (λ) of 4110 failures in 109 hr of operation (4110 FITs) with a 60% confidence level or 10,502 FITs with 90% confidence.

At a lower temperature of say 40°C, our rule of thumb for an acceleration factor to 70°C is eight, so the figures reduce to 514 FITs and 1313 FITs. FIT is λ#x 109, and MTBF is 1/λ, so these figures represent an expected 1.95 million hours or 760,000 hours MTBF at 60% and 90% confidence levels respectively. It may seem odd that a test with no failures gives a finite failure rate. This is because it is assumed that the first failure is just about to happen. It should be emphasised that real field failure rate is the most accurate measure of the reliability of a product.

A calculated MTBF can be compared with the demonstrated figure obtained through life testing to check for consistency. However, the calculations can be misleading depending on the base failure rates used for components and the method of calculation. A survey by Murata Power Solutions found a variation of a factor of more than 100 between MTBF figures for the same circuit calculated by several different power supply manufacturers.

Different standards such as MIL-HDBK-217F and Telcordia SR332 will also give different answers. In addition, the MIL standard also gives two different calculation methods. One method is the ‘parts count’, which gives a quick but conservative measure, and the other is the ‘part stress’ method, which requires detailed knowledge of the electrical operating conditions. The latter method is more realistic. As an example of a ‘part stress’ calculation according to MIL-HDBK-217F, a general-purpose diode has a failure rate per million hours λP, given by:

λP = λB Л T Л S Л C Л Q Л E

where λB is a base failure rate for different types of diodes and the Л factors are for temperature, electrical stress, internal construction, manufacturing quality and environment of use respectively. For a Schottky power diode operating at a junction temperature of 80°C, with a voltage stress of 75% of its rating, metallurgically bonded construction, plastic commercial packaging and operated in a “ground benign” environment, the part failure rate calculates to be:

 λP = 0.003 x 5 x 0.58 x 1 x 8 x 1 = 0.0696 failures per million hours, or 69.6 FITs.

Optimizing Process Control
The important point to note is that quality and reliability cannot be “tested in” or “inspected in.” Burn-in testing is ultimately another inspection process, but serves as a mechanism for process control and feedback. Failures in burn-in along with field failures prompt failure analysis and corrective action to ensure that the product design and process have been centred and optimised to provide the best product possible in the field. Studies have shown that higher factory yields give higher product reliability, happier customers and lower warranty-return costs.

Reference 1

Jensen, Finn. Electronic Component Reliability: Fundamentals, Modelling, Evaluation, and Assurance. John Wiley & Sons, 1995.

The tables in this reference are credited to Marcus and Blumenthal (1974) by permission of the American Statistical Association.

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles