Category Archives: Worst Case Analysis
The cargo fire hypothesized by Canadian pilot Chris Goodfellow to explain the disappearance of Malaysian Flight 370 (see “Malaysian Flight 370: Canadian pilot’s analysis goes viral“) is a reasonable one.
According to Malaysian officials, the plane was carrying 440 pounds of lithium batteries. Lithium batteries, sitting inert (not being charged or discharged), were identified as the cause of the fire and resultant 2010 crash of a UPS 747 flight at Dubai. Ironically, even though “improper storage” in that case was determined to be the cause of the fire, I have never read any explanation of how improper storage can ignite a lithium battery. It appears more likely that lithium batteries, under certain conditions not completely understood (e.g. a combination of battery construction and chemistry, heat, vibration, and/or shock) can spontaneously ignite, albeit very rarely.
In addition to pilot Goodfellow’s comments, an added interesting point is that Flight 370 also gained very high altitude shortly after communications ceased. It could be that the pilots, upon becoming aware of the fire at that time, tried to quickly elevate the plane to quell the fire by starving it of oxygen. This might have been an excellent maneuver for most fires, but lithium batteries, once ignited, create their own oxygen and will continue to burn at high altitude.
Bottom Line: Until the cause of the disappearance of Flight 370 is positively determined, the possibility of a lithium battery fire is a reasonable hypothesis, and worth investigating.
“Boeing’s fix includes more insulation between each of the eight cells in the batteries. The batteries will also be encased in a new steel box designed to contain any fire and vent possible smoke or hazardous gases out of the planes.
“…both the F.A.A. administrator, Michael P. Huerta, and Transportation Secretary Ray LaHood said they were are satisfied that the proposed changes would eliminate concerns that the plane’s two lithium-ion batteries could erupt in smoke or fire.”
-“F.A.A. Endorses Boeing Remedy for 787 Battery” by C. Drew and J. Mouawad, 19 April 2013 New York Times
Conspicuously absent from this pronouncement is a definitive identification of the root cause of the lithium battery fires. Therefore Boeing, the FAA, and the Department of Transportation are all guessing that the stated modifications will fix the problem. I hope they are correct. But if they are it will be a matter of luck, not engineering diligence. The dissembling of the FAA and Department of Transportation are clearly evident in their own words: they say that they are “…satisfied that the proposed changes would eliminate concerns that the plane’s two lithium-ion batteries could erupt in smoke or fire.” If they are so satisfied, then why is it necessary to have a steel box to contain a fire? If they are so satisfied, then why did they not provide the supporting evidence to support their conclusions?
Also, Boeing and these government agencies have touted a few test flights as being of particular significance in proving the safety of the batteries. This is nonsense. The battery fires are low probability events, occurring only once for thousands of hours of operation. This implies that there are subtle variables in the battery construction, chemistry, and/or operation, which when combined worst case will cause the batteries to overheat. This combination may only occur for a small number of manufactured batteries, and fires may occur only when those particular batteries are exposed to a worst case combination of stresses (temperature, charge currents, etc.).
Therefore a handful of test flights, of a few dozen hours or so total, are not nearly sufficient to empirically identify a low-probability event. The identification of such an event would require hundred or even thousands of test flights, which is obviously not practical. Therefore the only alternative is an investigation that drills down and positively identifies the true underlying failure mechanism (as recommended here: “Flying the Flaming Skies: Should You Trust the Boeing Dreamliner?“). It is my opinion that this has not been done, because if it had, this knowledge would be trumpeted by Boeing.
I’m not flying the Boeing Dreamliner until I see the evidence that supports the optimistic conclusions of Boeing, the FAA, and the Department of Transportation.
“Boeing Co. is confident that proposed changes to the 787 Dreamliner will provide a permanent solution to battery problems that grounded its newest jet, a senior executive said Monday.” –Reuters, 11 March 2013
The reported changes include “adding ceramic insulation between the cells of the battery and a stronger stainless steel box with a venting tube to contain a fire and expel fumes from the aircraft.” –Reuters, Alwyn Scott and Tim Hepher and Peter Henderson, 5 Mar 2013
Why is Boeing confident? This is a mystery because, based on available published data, it does not appear that Boeing has positively determined the root cause of the battery fires. Furthermore, as for all safety-critical applications, the certainty of the cause should be determined beyond a reasonable doubt. This stringent requirement would be certified by a panel of independent experts of unquestioned expertise and integrity, who have no financial interest in the outcome of their review.
Without positive identification of the root cause, Boeing may be indulging in a logical fallacy that I have seen employed before, with very bad results. The fallacy is in trying to fix what is assumed to be the problem (e.g. inadequate thermal insulation between battery cells). But what if the assumption is wrong? If so, the “fix” could be ineffective, or even make things worse. For example, improving cell insulation will trap more heat within the cells, raising the cell temperature. If the true root cause is related to higher cell temperature, the added insulation could make cell failure more likely, not less.
There are many other troubling scenarios that can be hypothesized, and the only way to disprove them is to dig in and find the true root cause, beyond a reasonable doubt (including rigorous validation as discussed here: “Flying the Flaming Skies: Should You Trust the Boeing Dreamliner?“)
P.S. A good review of the genesis of the Boeing battery problem can be found here: “NTSB report shows Boeing’s battery analysis fell short,” Dominic Gates, Seattle Times
Note: On hundreds of projects I have found the great majority of customers to be highly professional and a pleasure to work with. This post addresses the few exceptions that are encountered from time to time. -EW
Several years ago I was hired by an electronics firm to determine the root cause of a circuit problem that was holding up production. I spoke to the young engineer who had created the design, analyzed his circuit, reviewed the test data, and concluded that he had made a design error. (For what it’s worth, most of my troubleshooting investigations have determined that the root cause of circuit problems is insufficient design margin, which is why I always recommend that every circuit be validated with a good WCA.) I provided a solution and that was that. Or so I thought.
I later received a tip from a colleague that the young engineer I had worked with had generated a memo that stated that my conclusions were wrong, and that he had found the “true cause” of the problem. Apparently the engineer felt threatened by the fact that he had designed a circuit with a problem that he could not identify, and decided to lie about the facts behind my back. Based on the tip, I provided a follow-up memo that corrected his inaccuracies. This caused the young engineer some serious embarrassment, but I think he earned it.
I felt bad nonetheless, because the first rule of a consultant is, in my opinion, to be sure that the client’s team perceives you as non-threatening. The consultant is not there to act superior, or to gloat, or to point out the perceived faults of the team. (Hint: such consultants create more damage than they’re worth; fire them.) The consultant’s job is simply to lend a hand.
Furthermore, there is no reason for the consultant to feel superior. Yes, the consultant must have design expertise and problem-solving skills, but more valuable is the fact that the consultant provides an outside and objective viewpoint, unpolluted by the daily hassles (sometimes political) that impede the team. In many cases the team is very close to finding the problem, but they are unable to do so because they are behind schedule, overworked, tired, and distracted by the varied and hectic demands of the typical engineering workplace. This is why it makes good sense to hire a consultant: it’s just not possible for a team to be completely objective about their own efforts, particularly when they’re under a lot of pressure.
Yet, despite the tactful and low-key assistance of a modest consultant, there will still be those cases where the defensiveness of some individuals cannot be disarmed. Untruthful memos, passive-aggressive unhelpfulness, “I thought of it before the consultant did” posturing, and other immature behavior will sometimes be encountered. If you want to be a consultant, then you will need to deal with such unpleasantness forthrightly but tactfully. It’s just part of the job.
A new Design Master article, “Use Worst-Case Analysis Tool To Efficiently Validate Your Designs,” is now available in the latest issue of How2Power.com.
Design MasterTM, the practical and easy-to-use advanced worst case analysis software used worldwide, provides a fully integrated set of analysis tools, including worst case solutions to design equations, probability estimates of any out-of-spec conditions, sensitivities, and optimized values for design centering.
The Lite version is now available for free at How2Power.com, under Design Notes and Tools / Worst case analysis software. Although the Lite version has some functional restrictions, it is ideal for small projects and academic use. (For the full featured versions, please click here.)
Also, please be sure to watch for How2Power’s July Newsletter, which will include the application note, “How To Use Design Master for WCA – A Simple Example,” as well as other in-depth design articles for power electronics engineers.
Is Your Circuit Simulator Just A Pretty Face? Five Reasons Why Simulations Are Not Sufficient For Design Validation
Jerry Twomey recently pointed out some pitfalls with math-based circuit analysis (“Academic Simplifications Produce Meaningless Equations,” 13 June 2012, Electronic Design.com.)
I agree with the general sentiments of Mr. Twomey, but would like to point out that there is a simple solution to avoiding the pitfalls he mentions: develop equations from component data sheets, not from academic simplifications. This is straightforward and will be discussed further in a future post.
Also, it should be noted that simulations are not some miracle cure-all elixir. Indeed, simulators are also math-based creatures: SPICE and its cousins simply grind out numerical solutions to the multitude of hidden equations that are buried beneath their pretty graphical interfaces.
So what’s the problem with simulators? A lot. For example,
1. Because simulator math is hidden behind the user interface, simulators don’t promote engineering analysis (thinking). To the contrary, they promote lazy tweak-and-tune tinkering.
2. Because simulator component models are typically very complex, the interactions between important variables are usually obscure, if not downright unfathomable. Obscurity does not promote engineering understanding.
3. Simulator results typically do not provide insight into important sensitivities. For example, can your simulator tell you how sensitive your power supply’s thermal stability is to the Rds(on) of the switching Mosfet, including the effects of thermal feedback?
4. A simulation “run” is not an analysis, but is instead a virtual prototype test. Yes, it’s better to check out crappy designs with a simulator rather than wasting time and money on building and testing crappy hardware. So simulators have their place, particularly when checking out initial design concepts. Eventually, however, hardware testing is required to verify that the simulator models were correct. And you will still need to do a worst case math analysis to determine performance limits, and to confirm that desired performance will be maintained in the presence of tolerances and aging.
- Proper Design Validation = Testing/Simulations + Analysis
5. Simulators don’t really do worst case analysis. Yes, you can use a simulator to implement a bunch of Monte Carlo runs, but valid results requires (a) identification of all of the important parameters (such as Rds(on)), (b) assignment of the appropriate distributions to those parameters (such distributions are typically not available), and (c) the generation of enough runs to catch errors out in the tails of the overall resultant distribution (and how many runs should you do? Hmmm…).
- Monte Carlo is not a crystal ball. It only shows you the production performance you will get if all of your assumptions were correct, and if you did enough runs.
- The knowledge required to determine the number of runs requires an exhaustive study of the circuit’s parameters, distributions, and interrelationships (not practical), or a knowledge of the limits of performance.
- But if you know the limits of performance, then why do you need a Monte Carlo analysis? You don’t. You can skip it altogether and go directly to a math-based Worst Case Analysis.
For further insights into math-based Worst Case Analysis versus simulations, please see “Design MasterTM: A Straightforward No-Nonsense Approach to Design Validation.”
We’ve contributed to hundreds of electronics design projects wherein the circuitry was subjected to rigorous WCA+ (WCA+ is our advanced version of Worst Case Analysis; see “Four Costly Myths About WCA“). Our analyses invariably detected various design deficiencies, both stress and functional. Unfortunately, like an annoying relative who can’t get the hint to please not visit again, some common problems that we were finding decades ago are still regularly popping up in today’s new designs. These include:
- Lack of protection from Bozo the Clown: inadequate ESD protection; connectors without reverse-polarity keying; identical connectors for all ports (you don’t expect Bozo to pay any attention to cable labels or connector colors, do you?); no spills/immersion protection (e.g., coffee, slurpees, beer, or even juice from a steak being thawed on top of a warm electronics unit (no kidding)).
- Transient protection devices (TPDs) not present at circuit interfaces. Not just the AC power and load interfaces, but all the internal interfaces that are exposed to ESD or potentially unruly test equipment during testing, particularly for costly subassemblies. We’ve seen a hugely expensive and schedule-critical board blown up by a test instrument failure; a disaster that could have been prevented by a few bucks’ worth of TPDs.
- Failure to account for dissimilar power supply voltages, causing interface overdrive and/or latchup. (Sometimes this only occurs during transient conditions, making the deficiency hard to catch during testing. You will typically learn about it after you’ve shipped a few thousand units and your boss is frantically paging you to get back to work after you’ve had too many beers and the last thing you want is to work through the night and the weekend on warranty repairs while angry customers are screaming at you on the phone…but I digress…)
- Inadequate ratings for AC mains rectifiers and other power components, particularly in switchmode supplies. Hint: Don’t completely rely on SPICE or other simulations to identify realistic worst case performance boundaries for these components. Or do, but then be sure to not provide a warranty with your product.
For some more tips, see page 210 of The Design Analysis Handbook; still very relevant after all of these years. (Note: We’re out of copies of the Revised Edition, but it’s still available from Amazon and Elsevier.)
P.S. We’re considering creating some low-cost mini-modules of our Design Master WCA+ software, configured for common design tasks such as proper TPD selection, op amp gain stage analysis, etc. (If you care to comment, your feedback will be appreciated and will help us make a decision. You can add a comment to this post, or email us at firstname.lastname@example.org.)
There are actually a few different types of WCA, primarily:
Extreme Value Analysis (EVA)
Statistical Analysis (Monte Carlo)
WCA+ is safer than Monte Carlo and more practical than EVA. Monte Carlo can miss small but important extreme values, and EVA can result in costly overdesign. WCA+ identifies extreme values that statistical methods can miss, and then estimates the probability that the extreme value will exceed specification limits, thereby providing the designer with a practical risk-assessment metric. WCA+ also generates normalized sensitivities and optimization, which can be used for design centering. (Ref. http://daci-wca.com/products_005.htm)
Myth #2: Worst Case Analysis is optional if you do a lot of testing
To maintain happy customers and minimize liability exposure, the effects of environmental and component variances on performance must be thoroughly understood. Testing alone cannot achieve this understanding, because testing — for economic reasons — is usually performed on a very small number of samples. Also, since testing typically has a short time schedule, the effects of long-term aging will not be detected.
Myth #3: Worst Case Analysis is optional if we vary worst case parameters during testing
Initial tolerances typically play a substantial role in determining worst case performance. Such tolerances, however, are not affected by heating/cooling the samples, varying the supply voltages, varying the loads, etc.
For example, a design might have a dozen functional specs and a dozen stress specs (these numbers are usually much, much higher). To expose worst case performance, some tolerances may need to be at their low values for some of the specs, but at their high or intermediate values for other specs. First, it’s not even likely that a tolerance will be at the worst case value for a single spec. Second, it’s impossible for the tolerance to simultaneously be at the different values required to expose worst case performance for all the specs. Therefore it’s not valid to expect a test sample to serve as a worst case performance predictor, regardless of the amount of temperature cycles, voltage variations, etc. that are applied to the sample.
Myth #4: Worst Case Analysis is best done by statistics experts
No, it is far better to have WCA performed — or at least supervised — by experts in the design being analyzed, using a practical tool like WCA+ that employs minimal statistical mumbo-jumbo. Analyses (particularly cook-book statistical ones), when applied by those without expertise in the design being analyzed, often yield hilariously incorrect results.