* Re: Ariane 5 failure
@ 1996-10-01 0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-02 0:00 ` Alan Brain
0 siblings, 1 reply; 13+ messages in thread
From: Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-10-01 0:00 UTC (permalink / raw)
Ken Garlington <garlingtonke@LMTAS.LMCO.COM> writes:
>Alan Brain wrote:
>> A really good safety-critical
>> program should be remarkably difficult to de-bug, as the only way you
>> know it's got a major problem is by examining the error log, and
>> calculating that it's performance is below theoretical expectations.
>> And if it runs too slow, many times in the real-world you can spend 2
>> years of development time and many megabucks kludging the software, or
>> wait 12 months and get the new 400 Mhz chip instead of your current 133.
>
>I really need to change jobs. It sounds so much simpler to build
>software for ground-based PCs, where you don't have to worry about the
>weight, power requirements, heat dissipation, physical size,
>vulnerability to EMI/radiation/salt fog/temperature/etc. of your system.
>
I personally like the part about "performance is below theoretical
expectations". Where I live, I have a 5 millisecond loop which
*must* finish in 5 milliseconds. If it runs in 7 milliseconds, we
will fail to close the loop in sufficient time to keep valves from
"slamming into stops", causing them to break, rendering someone's
billion dollar rocket and billion dollar payload "unserviceable".
In this business, that's what *we* mean by "performance is below
theoretical expectations" and why runtime checks which seem
"trivial" to most folks can mean the difference between having a
working system and having an interesting exercise in computer
science which isn't going to go anywhere.
MDC
Marin David Condic, Senior Computer Engineer ATT: 561.796.8997
M/S 731-96 Technet: 796.8997
Pratt & Whitney, GESP Fax: 561.796.4669
P.O. Box 109600 Internet: CONDICMA@PWFL.COM
West Palm Beach, FL 33410-9600 Internet: CONDIC@FLINET.COM
===============================================================================
"Some people say a front-engine car handles best. Some people say
a rear-engine car handles best. I say a rented car handles best."
-- P. J. O'Rourke
===============================================================================
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-01 0:00 Ariane 5 failure Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-10-02 0:00 ` Alan Brain 1996-10-02 0:00 ` Ken Garlington 0 siblings, 1 reply; 13+ messages in thread From: Alan Brain @ 1996-10-02 0:00 UTC (permalink / raw) Marin David Condic, 407.796.8997, M/S 731-93 wrote: > > Ken Garlington <garlingtonke@LMTAS.LMCO.COM> writes: > >I really need to change jobs. It sounds so much simpler to build > >software for ground-based PCs, where you don't have to worry about the > >weight, power requirements, heat dissipation, physical size, > >vulnerability to EMI/radiation/salt fog/temperature/etc. of your system. > > The particular system I was talking about was for a Submarine. Very tight constraints indeed, on power (it was a diesel sub), physical size (had to fit in a torpedo hatch), heat dissipation (a bit), vulnerability to 100% humidity, salt, chlorine etc etc. Been there, Done that, Got the T-shirt. I'm a Software Engineer who works mainly in Systems. Or maybe a Systems Engineer with a hardware bias. Regardless, in the initial Systems Engineering phase, when one gets all the HWCIs and CSCIs defined, it is only good professional practice to build in plenty of slack. If the requirement is to fit in a 21" hatch, you DON'T design something that's 20.99999" wide. If you can, make it 16", 18 at max. It'll probably grow. Similarly, if you require a minimum of 25 MFlops, make sure there's a growth path to at least 100. It may well be less expensive and less risky to build a chip factory to make a faster CPU than to lose a rocket, or a sub due to software failure that could have been prevented. Usually such ridiculously extreme measures are not neccessary. The Hardware guys bitch about the cost-per-CPU going through the roof. Heck, it could cost $10 million. But if it saves 2 years of Software effort, that's a net saving of $90 million. (All numbers are representative ie plucked out of mid-air, and as you USAians say, Your Mileage May Vary) > I personally like the part about "performance is below theoretical > expectations". Where I live, I have a 5 millisecond loop which > *must* finish in 5 milliseconds. If it runs in 7 milliseconds, we > will fail to close the loop in sufficient time to keep valves from > "slamming into stops", causing them to break, rendering someone's > billion dollar rocket and billion dollar payload "unserviceable". > In this business, that's what *we* mean by "performance is below > theoretical expectations" and why runtime checks which seem > "trivial" to most folks can mean the difference between having a > working system and having an interesting exercise in computer > science which isn't going to go anywhere. In this case, "theoretical expectations" for a really tight 5 MuSec loop should be less than 1 MuSec. Yes, I'm dreaming. OK, 3 MuSec, that's my final offer. For the vast majority of cases, if your engineering is closer to the edge than that, it'll cost big bucks to fix the over-runs you always get. Typical example: I had a big bun-fight with project management about a hefty data transfer rate required for a broadband sonar. They wanted to hand-code the lot in assembler, as the requirements were really, really tight. No time for any of this range-check crap, the data was always good. I eventually threw enough of a professional tantrum to wear down even a group of German Herr Professor Doktors, and we did it in Ada-83. If only as a first pass, to see what the rate really would be. The spec called for 160 MB/Sec. First attempt was 192 MB/Sec, and after some optimisation, we got over 250. After the hardware flaws were fixed (the ones the "un-neccessary" range-bound checking detected ) this was above 300. Now that's too close for my druthers, but even 161 I could live with. Saved maybe 16 months on the project, about 100 people at $15K a month. After the transfer, the data really was trustworthy - which saved a lot of time downstream on the applications in debug time. Note that even with (minor) hardware flaws, the system still worked. Note also that by paying big $ for more capable hardware than strictly neccessary, you can save bigger $ on the project. Many projects spend many months and many $ Million to fix, by hacking, Kludging, and sheer Genius what a few lousy $100K of extra hardware cost would make un-neccessary. A good software engineer in the Risk-management team, and on the Systems Engineering early on, one with enough technical nous in hardware to know what's feasible, enough courage to cost the firm millions in initial costs, and enough power to make it stick, that's what's neccessary. I've seen it; it works. But it's been tried less than a dozen times in 15 years in my experience :( ---------------------- <> <> How doth the little Crocodile | Alan & Carmel Brain| xxxxx Improve his shining tail? | Canberra Australia | xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM ---------------------- o OO*O^^^^O*OO o oo oo oo oo By pulling Maerklin Wagons, in 1/220 Scale ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-02 0:00 ` Alan Brain @ 1996-10-02 0:00 ` Ken Garlington 1996-10-02 0:00 ` Matthew Heaney 1996-10-03 0:00 ` Ariane 5 failure Alan Brain 0 siblings, 2 replies; 13+ messages in thread From: Ken Garlington @ 1996-10-02 0:00 UTC (permalink / raw) To: aebrain Alan Brain wrote: > > Marin David Condic, 407.796.8997, M/S 731-93 wrote: > > > > Ken Garlington <garlingtonke@LMTAS.LMCO.COM> writes: > > > >I really need to change jobs. It sounds so much simpler to build > > >software for ground-based PCs, where you don't have to worry about the > > >weight, power requirements, heat dissipation, physical size, > > >vulnerability to EMI/radiation/salt fog/temperature/etc. of your system. > > > > > The particular system I was talking about was for a Submarine. Very > tight > constraints indeed, on power (it was a diesel sub), physical size (had > to > fit in a torpedo hatch), heat dissipation (a bit), vulnerability to 100% > humidity, salt, chlorine etc etc. Been there, Done that, Got the > T-shirt. So what did you do when you needed to build a system that was bigger than the torpedo hatch? Re-design the submarine? You have physical limits that you just can't exceed. On a rocket, or an airplane, you have even stricter limits. Oh for the luxury of a diesel generator! We have to be able to operate on basic battery power (and we share that bus with emergency lighting, etc.) > I'm a Software Engineer who works mainly in Systems. Or maybe a Systems > Engineer with a hardware bias. Regardless, in the initial Systems > Engineering > phase, when one gets all the HWCIs and CSCIs defined, it is only good > professional practice to build in plenty of slack. If the requirement is > to fit > in a 21" hatch, you DON'T design something that's 20.99999" wide. If you > can, > make it 16", 18 at max. It'll probably grow. Exactly. You build a system that has slack. Say, 15% slack. Which is exactly why the INU design team didn't want to add checks unless they had to. Because they were starting to eat into that slack. > Similarly, if you require a > minimum > of 25 MFlops, make sure there's a growth path to at least 100. It may > well be less > expensive and less risky to build a chip factory to make a faster CPU > than to > lose a rocket, or a sub due to software failure that could have been > prevented. What if your brand new CPU requires more power than your diesel generator can generate? What if your brand new CPU requires a technology that doesn't let you meet your heat dissipation? Doesn't sound like you had to make a lot of tradeoffs in your system. Unfortunately, airborne systems, particular those that have to operate in lower-power, zero-cooling situations (amazing how hot the air gets around Mach 1!), don't have such luxuries. > Usually such ridiculously extreme measures are not neccessary. The > Hardware guys > bitch about the cost-per-CPU going through the roof. Heck, it could cost > $10 million. > But if it saves 2 years of Software effort, that's a net saving of $90 > million. What does maintenance costs have to do with this discussion? > In this case, "theoretical expectations" for a really tight 5 MuSec loop > should be less than 1 MuSec. Yes, I'm dreaming. OK, 3 MuSec, that's my > final offer. For the vast majority of cases, if your engineering is > closer to > the edge than that, it'll cost big bucks to fix the over-runs you always > get. I've never had a project yet where we didn't routinely cut it that fine, and we've yet to spend the big bucks. If you're used to developing systems with those kind of constraints, you know how to make those decisions. Occasionally, you make the wrong decision, as the Ariane designers discovered. Welcome to engineering. > Typical example: I had a big bun-fight with project management about a > hefty > data transfer rate required for a broadband sonar. They wanted to > hand-code > the lot in assembler, as the requirements were really, really tight. No > time > for any of this range-check crap, the data was always good. > I eventually threw enough of a professional tantrum to wear down even a > group > of German Herr Professor Doktors, and we did it in Ada-83. If only as a > first > pass, to see what the rate really would be. > The spec called for 160 MB/Sec. First attempt was 192 MB/Sec, and after > some optimisation, we got over 250. After the hardware flaws were fixed > (the ones > the "un-neccessary" range-bound checking detected ) this was above 300. And, if you had only got 20MB per second after all that, you would have done...? Certainly, if you just throw out range checking without knowing its cost, you're an idiot. However, no one has shown that the Ariane team did this. I guarantee you (and am willing to post object code to prove it) that range checking is not always zero cost, and in the right circumstances can cause you to bust your budget. > Note also > that by paying big $ for more capable hardware than strictly neccessary, > you > can save bigger $ on the project. Unfortunately, cost is not the only controlling variable. Interesting that a $100K difference in per-unit cost in your systems is negligible. No wonder people think military systems are too expensive! -- LMTAS - "Our Brand Means Quality" For more info, see http://www.lmtas.com or http://www.lmco.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-02 0:00 ` Ken Garlington @ 1996-10-02 0:00 ` Matthew Heaney 1996-10-04 0:00 ` Robert S. White 1996-10-04 0:00 ` System Engineering (was Re: Ariane 5 failure) Ken Garlington 1996-10-03 0:00 ` Ariane 5 failure Alan Brain 1 sibling, 2 replies; 13+ messages in thread From: Matthew Heaney @ 1996-10-02 0:00 UTC (permalink / raw) In article <3252B46C.5E9D@lmtas.lmco.com>, Ken Garlington <garlingtonke@lmtas.lmco.com> wrote: >Interesting that a $100K difference in per-unit cost in your systems is >negligible. No wonder people think military systems are too expensive! I think he meant "negligable compared to the programming cost that would be required to get the software to run on the cheaper hardware." It's never cost effective to skimp on hardware if it means human programmers have to write more complex software. -------------------------------------------------------------------- Matthew Heaney Software Development Consultant mheaney@ni.net (818) 985-1271 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-02 0:00 ` Matthew Heaney @ 1996-10-04 0:00 ` Robert S. White 1996-10-05 0:00 ` Robert Dewar 1996-10-05 0:00 ` Ariane 5 failure Alan Brain 1996-10-04 0:00 ` System Engineering (was Re: Ariane 5 failure) Ken Garlington 1 sibling, 2 replies; 13+ messages in thread From: Robert S. White @ 1996-10-04 0:00 UTC (permalink / raw) In article <mheaney-ya023180000210962257430001@news.ni.net>, mheaney@ni.net says... >It's never cost effective to skimp on hardware if it means human >programmers have to write more complex software. Not if the ratio is tilted very heavy towards reoccuring cost versus Non-Reoccuring Engineering (NRE). How about 12 staff-months versus $300 extra hardware cost on 60,000 units? ___________________________________________________________________________ Robert S. White -- an embedded systems software engineer WhiteR@CRPL.Cedar-Rapids.lib.IA.US -- It's long, but I pay for it! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` Robert S. White @ 1996-10-05 0:00 ` Robert Dewar 1996-10-06 0:00 ` Ariane 5 failure - latest S/W tech vs. cold hard facts Robert S. White 1996-10-05 0:00 ` Ariane 5 failure Alan Brain 1 sibling, 1 reply; 13+ messages in thread From: Robert Dewar @ 1996-10-05 0:00 UTC (permalink / raw) iRobert White said ">It's never cost effective to skimp on hardware if it means human >programmers have to write more complex software. Not if the ratio is tilted very heavy towards reoccuring cost versus Non-Reoccuring Engineering (NRE). How about 12 staff-months versus $300 extra hardware cost on 60,000 units?" Of course this is true at some level, but the critical thing is that a proper cost comparison here must take into account: a) full life cycle costs of the software, not just development costs b) time-to-market delays caused by more complex software c) decreased quality and reliability caused by more complex software There are certainly cases where careful consideration of these three factors still results in a decision to use less hardware and more complex software, but I think we have all seen cases where such decisions were made, and n in retrospect turned out to be huge mistakes. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure - latest S/W tech vs. cold hard facts 1996-10-05 0:00 ` Robert Dewar @ 1996-10-06 0:00 ` Robert S. White 1996-10-10 0:00 ` Ken Garlington 0 siblings, 1 reply; 13+ messages in thread From: Robert S. White @ 1996-10-06 0:00 UTC (permalink / raw) In article <dewar.844517570@schonberg>, dewar@schonberg.cs.nyu.edu says... >There are certainly cases where careful consideration of these three factors >still results in a decision to use less hardware and more complex software, >but I think we have all seen cases where such decisions were made, and n >in retrospect turned out to be huge mistakes. In business when making hard decisions about embedded systems products, such studies are almost always made. One factor now causing a lot of study and implementation of "complex software" are efforts to reduce procedure call overhead and virtual dispatching for object oriented high level languages that might be used for embedded products. The extra thruput and memory required does end up using more memory chips and faster (more power hungary) processors. This is a major problem when power consumption, size and cost requirements are taken into consideration. Engineering has to look at the entire picture. We want to use the latest software technology that results in the cleanest most easy to maintain design. Sometimes it takes a while to hone the tools that use this new technology till they are ready for prime time in mission critical software that has to operate in an enviroment with a lot of other constraints. Ada 83 in 1984 had a lot of problems in implementations that were mostly solved by 1991. Ada 95 (with advantage taken of its new object oriented features) and Java bytecode virtual machines also need a significant amount of effort expended till they are ready for these more constrained embedded products. Not to say that it won't happen, just that they often don't pass the rigorous cost/feature analysis tradeoffs at this date for immediate product implementation. And we NEVER want to make a mistake for flight critical software :-< or have a critical task not able to run to completion within its deadlines. We need that processor thruput reserve to have a safety margin for rate monotonic tasks. I agreed with most everything Ken Garlington and other Lockheed Martin engineers have posted in this thread on this same subject. Their statements ring with my industry experience for the last 21 years. ___________________________________________________________________________ Robert S. White -- an embedded systems software engineer WhiteR@CRPL.Cedar-Rapids.lib.IA.US -- It's long, but I pay for it! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure - latest S/W tech vs. cold hard facts 1996-10-06 0:00 ` Ariane 5 failure - latest S/W tech vs. cold hard facts Robert S. White @ 1996-10-10 0:00 ` Ken Garlington 0 siblings, 0 replies; 13+ messages in thread From: Ken Garlington @ 1996-10-10 0:00 UTC (permalink / raw) Robert S. White wrote: > > In article <dewar.844517570@schonberg>, dewar@schonberg.cs.nyu.edu says... > > >There are certainly cases where careful consideration of these three factors > >still results in a decision to use less hardware and more complex software, > >but I think we have all seen cases where such decisions were made, and n > >in retrospect turned out to be huge mistakes. > > In business when making hard decisions about embedded systems products, > such studies are almost always made. In my experience, both statements are true. Such studies are often made, and I have also seen cases where they weren't made, or were done poorly. I have no idea what the percentages are for: * the study was done correctly * the study was not done correctly (or not done at all), but the decision turned out to be right anyway * the study was done incorrectly, the answer was wrong, and no one ever discovered it was wrong (because no one ever looked at the final cost, etc.) * the study was done incorrectly, the answer was wrong, someone found out it was wrong, but didn't broadcast it to the general public (would you?) Overall, I'd say we'll never know. As a colleague of mine said: "How many systems out there have a bug like the Ariane 5, but just never hit that magic condition where the bug caused a failure?" Just think: A little less acceleration on takeoff, and we'd think Arianespace made a wonderful decision by reusing the Ariane 4 -- look at all the money they saved! It might have been mentioned in the Reuse News as a major success :) I've got a few minutes, so I'll mention another of my favorite themes at this point (usually stated in the context of preparing Ada waivers): It's really hard to determine the life-cycle cost of software, particularly over a long period (e.g. 20 years). There are cost models; sometimes, we even get the parameters right and the model comes up with the right answer. Nonetheless, it's tough to consider life-cycle costs objectively. That's not an excuse for failing to try, but an acknowledgement that it's easy to get it wrong (particularly for new technology). Software engineering can be _so_ depressing! -- LMTAS - "Our Brand Means Quality" For more info, see http://www.lmtas.com or http://www.lmco.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` Robert S. White 1996-10-05 0:00 ` Robert Dewar @ 1996-10-05 0:00 ` Alan Brain 1996-10-06 0:00 ` Robert S. White 1 sibling, 1 reply; 13+ messages in thread From: Alan Brain @ 1996-10-05 0:00 UTC (permalink / raw) Robert S. White wrote: > > In article <mheaney-ya023180000210962257430001@news.ni.net>, mheaney@ni.net > says... > > >It's never cost effective to skimp on hardware if it means human > >programmers have to write more complex software. > > Not if the ratio is tilted very heavy towards reoccuring cost versus > Non-Reoccuring Engineering (NRE). How about 12 staff-months versus $300 extra > hardware cost on 60,000 units? $300 extra on 60,000 units. That's $18 Million, right? vs 12 Staff-months. Now if your staff is 1, then that's maybe $200,000 for a single top-notch profi. If your staff is 200, each at 100,000 cost (ie average wage is about 50K/year), then that's 20 million. But say you only have the one guy. And say it adds 50% to the risk of failure. With consequent and liquidated damages of 100 Million. Then that's 50 million, 200 thousand it's really costing. Feel free to make whatever strawman case you want. The above figures are based on 2 different projects (actually the liquidated damages one involved 6 people, rather than 1, and an estimated 70% increased chance of failure, but I digress). Summary: In the real world, and with the current state-of-the-art, I agree with the original statement as an excellent general rule. ---------------------- <> <> How doth the little Crocodile | Alan & Carmel Brain| xxxxx Improve his shining tail? | Canberra Australia | xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM ---------------------- o OO*O^^^^O*OO o oo oo oo oo By pulling Maerklin Wagons, in 1/220 Scale ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-05 0:00 ` Ariane 5 failure Alan Brain @ 1996-10-06 0:00 ` Robert S. White 0 siblings, 0 replies; 13+ messages in thread From: Robert S. White @ 1996-10-06 0:00 UTC (permalink / raw) In article <3256ED61.7952@dynamite.com.au>, aebrain@dynamite.com.au says... > >12 Staff-months. Now if your staff is 1, then that's maybe $200,000 for >a single top-notch profi. If your staff is 200, each at 100,000 cost (ie >average wage is about 50K/year), then that's 20 million. Number of staff * amount of time = staff months (with a dash of reality for reasonable parallel tasks) The type of strawman that I had in mind could be 1 person for a year, two persons for six months, to a limit of 4 persons for 3 months. And watch out for the mythical man-machine month! > But say you >only have the one guy. And say it adds 50% to the risk of failure. With >consequent and liquidated damages of 100 Million. Then that's 50 >million, 200 thousand it's really costing. Projects these days also have a "Risk Management Plan" per SEI CMM recomendations. That 50% to the risk of failure has to be assigned a estimated cost and factored in the decision. >Feel free to make whatever strawman case you want. The above figures are >based on 2 different projects (actually the liquidated damages one >involved 6 people, rather than 1, and an estimated 70% increased chance >of failure, but I digress). I've seen a lot of successes. Failures most often can be attributed to poor judgement by incompetent personnel. That can be tough to manage when the managers don't want to hear bad news or risk projections. Especially when they set up a project and move on before it is done. > >Summary: In the real world, and with the current state-of-the-art, I >agree with the original statement as an excellent general rule. I beg to disagree in the case of higher volume markets. I do agree very much for lower volumes or when the type of development task is new to the engineers and managers. You must have a good understanding of the problem and the solution domain to do proper cost tradeoffs that involve significant risk. ___________________________________________________________________________ Robert S. White -- an embedded systems software engineer WhiteR@CRPL.Cedar-Rapids.lib.IA.US -- It's long, but I pay for it! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: System Engineering (was Re: Ariane 5 failure) 1996-10-02 0:00 ` Matthew Heaney 1996-10-04 0:00 ` Robert S. White @ 1996-10-04 0:00 ` Ken Garlington 1 sibling, 0 replies; 13+ messages in thread From: Ken Garlington @ 1996-10-04 0:00 UTC (permalink / raw) Matthew Heaney wrote: > > It's never cost effective to skimp on hardware if it means human > programmers have to write more complex software. Never say never! Take an example of building, say, 1,000 units of some widget containing 1,500 source lines of code. For ease of calculation, assume $100/workerhour, 150 w-hours/w-month. 1. Buy a CPU at $50/unit which will do the job, but will cause the software development team to spend 10 w-months to complete the task, and will cause the post-deployment cost to be 2x the development cost. 10 w-months to complete the original development is $100 x 150 x 10 = $150,000. Maintenance is $300,000. Total software cost per unit (amortized over several years, possibly, for the maintenance): $450/unit. 2. Buy a CPU at $300/unit which will do the job, and because it's so modern, the software development team only needs 5 w-months to complete the task, and the post-deployment cost is only 1x the development cost. In other words, the software development time is cut in half (the standard promise for such improvements). So, software cost: $225/unit. Assuming I did my math right, I'd be buying some cheap hardware right about now... -- LMTAS - "Our Brand Means Quality" For more info, see http://www.lmtas.com or http://www.lmco.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-02 0:00 ` Ken Garlington 1996-10-02 0:00 ` Matthew Heaney @ 1996-10-03 0:00 ` Alan Brain 1996-10-04 0:00 ` Ken Garlington 1 sibling, 1 reply; 13+ messages in thread From: Alan Brain @ 1996-10-03 0:00 UTC (permalink / raw) Ken Garlington wrote: > So what did you do when you needed to build a system that was bigger than the > torpedo hatch? Re-design the submarine? Nope, we re-designed the system so it fit anyway. Actually, we designed the thing in the first place so that the risk of it physically growing too big and needing re-design was tolerable (ie contingency money was allocated for doing this, if we couldn't accurately estimate the risk as being small). > Oh for the luxury of a diesel generator! We have to be able to operate on basic > battery power (and we share that bus with emergency lighting, etc.) Well ours had a generator connected to a hamster wheel with a piece of cheese as backup ;-).... but seriously folks, yes we have a diesel. Why? to charge the batteries. Use of the Diesel under many conditions - eg when taking piccies in Vladivostok Harbour - would be unwise. > Exactly. You build a system that has slack. Say, 15% slack. Which is exactly > why the INU design team didn't want to add checks unless they had to. Because > they were starting to eat into that slack. I'd be very, very suspicious of a slack like "15%". This implies you know to within 2 significant figures what the load is going to be. Which in my experience is not the case. "About a Seventh" is more accurate, as it implies more imprecision. And I'd be surprised if any Bungee-Jumper would tolerate that small amount of safety margin using new equipment. Then again, slack is supposed to be used up. It's for the unforeseen. When you come across a problem during development, you shouldn't be afraid of using up that slack, that's what it's there for! One is reminded of the apocryphal story of the quartemaster at Pearl Harbour, who refused to hand out ammunition as it could have been needed more later. > What if your brand new CPU requires more power than your diesel generator > can generate? > What if your brand new CPU requires a technology that doesn't let you meet > your heat dissipation? But it doesn't. When you did your initial systems engineering, you made sure there was enough slack - OR had enough contingency money so that you could get custom-built stuff. > Doesn't sound like you had to make a lot of tradeoffs in your system. > Unfortunately, airborne systems, particular those that have to operate in > lower-power, zero-cooling situations (amazing how hot the air gets around > Mach 1!), don't have such luxuries. I see your zero-cooling situations, and I raise you H2, CO2, CO, Cl, H3O conditions etc. The constraints on a sub are different, but the same in scope. Until such time as you do work on a sub, or I do more than just a little work on aerospace, we may have to leave it at that. > > Usually such ridiculously extreme measures are not neccessary. The > > Hardware guys > > bitch about the cost-per-CPU going through the roof. Heck, it could cost > > $10 million. > > But if it saves 2 years of Software effort, that's a net saving of $90 > > million. > > What does maintenance costs have to do with this discussion? Sorry I didn't make myself clear: I was talking development costs, not maintenance. > I've never had a project yet where we didn't routinely cut it that fine, > and we've yet to spend the big bucks. Then I guess either a) You're one heck of a better engineer than me (and I freely admit the distinct possibility) or b) You've been really lucky or c) You must tolerate a lot more failures than the organisations I've worked for. > If you're used to developing systems > with those kind of constraints, you know how to make those decisions. > Occasionally, you make the wrong decision, as the Ariane designers discovered. > Welcome to engineering. My work has only killed 2 people (Iraqi pilots - that particular system worked as advertised in the Gulf). There might be as many as 5000 people whose lives depend on my work at any time, more if War breaks out. I guess we have a different view of "acceptable losses" here, and your view may well be more correct. Why? Because such a conservative view as my own may mean I just can't attempt some risky things. Things which your team (sometimes at least) gets working, teherby saving more lives. Yet I don't think so. > And, if you had only got 20MB per second after all that, you would have > done...? 20 MB? First, re-check all calculations. Examine hardware options. Then (probably) set up a "get-well" program using 5-6 different tracks and pick the best. Most probably though, we'd give up: it's not doable within the budget. The difficult case is 150 MB. In this case, assembler coding might just make the difference - I do get your point, BTW. > Certainly, if you just throw out range checking without knowing its cost, > you're an idiot. However, no one has shown that the Ariane team did this. > I guarantee you (and am willing to post object code to prove it) that > range checking is not always zero cost, and in the right circumstances can > cause you to bust your budget. Agree. There's always pathological cases where general rules don't apply. Being fair, I didn't say "zero cost", I said "typically 5% measured". In doing the initial Systems work, I'd usually budget for 10%, as I'm paranoid. > Unfortunately, cost is not the only controlling variable. > > Interesting that a $100K difference in per-unit cost in your systems is > negligible. No wonder people think military systems are too expensive! You get what you pay for, IF you're lucky. My point though is that many of the hacks, kludges etc in software are caused by insufficient foresight in systems design. Case in point: RAN Collins class submarine. Now many years late due to software problems. Last time I heard, they're still trying to get that last 10% performance out of the 68020s on the cards. Which were leading-edge when the systems work was done. Putting in 68040s a few years ago would have meant the Software would have been complete by now, as the hacks wouldn't have been neccessary. ---------------------- <> <> How doth the little Crocodile | Alan & Carmel Brain| xxxxx Improve his shining tail? | Canberra Australia | xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM ---------------------- o OO*O^^^^O*OO o oo oo oo oo By pulling Maerklin Wagons, in 1/220 Scale ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Ariane 5 failure 1996-10-03 0:00 ` Ariane 5 failure Alan Brain @ 1996-10-04 0:00 ` Ken Garlington 0 siblings, 0 replies; 13+ messages in thread From: Ken Garlington @ 1996-10-04 0:00 UTC (permalink / raw) Alan Brain wrote: > > Ken Garlington wrote: > > > So what did you do when you needed to build a system that was bigger than the > > torpedo hatch? Re-design the submarine? > > Nope, we re-designed the system so it fit anyway. Tsk, tsk! You violated your own design constraint of "always provide enough margin for growth." Just think how much money you would have saved if you had built it bigger to begin with! > Actually, we designed > the thing in the first place so that the risk of it physically growing > too big and needing re-design was tolerable (ie contingency money was > allocated for doing this, if we couldn't accurately estimate the risk as > being small). I'm sure the Arianespace folks had the same contingency funding. In fact, they're spending it right now. :) > > > Oh for the luxury of a diesel generator! We have to be able to operate on basic > > battery power (and we share that bus with emergency lighting, etc.) > > Well ours had a generator connected to a hamster wheel with a piece of > cheese as backup ;-).... but seriously folks, yes we have a diesel. Why? > to charge the batteries. Batteries, plural? Wow! > I'd be very, very suspicious of a slack like "15%". This implies you > know to within 2 significant figures what the load is going to be. Which > in my experience is not the case. "About a Seventh" is more accurate, as > it implies more imprecision. And I'd be surprised if any Bungee-Jumper > would tolerate that small amount of safety margin using new equipment. > Then again, slack is supposed to be used up. It's for the unforeseen. > When you come across a problem during development, you shouldn't be > afraid of using up that slack, that's what it's there for! Actually, no. For most military programs, slack is for a combination of growth _after_ the initial development, or for unforseen variations in the production system (e.g., a processor that's a little slower than spec.) And, 15% is a common number for such slack. I think you're confusing "slack" with "management reserve," which is usually an number set by the development organization and used up (if needed) during development. The 15% number is usually imposed by a prime on a subcontractor for the reasons described above. > > What if your brand new CPU requires more power than your diesel generator > > can generate? > > What if your brand new CPU requires a technology that doesn't let you meet > > your heat dissipation? > > But it doesn't. When you did your initial systems engineering, you made > sure there was enough slack - OR had enough contingency money so that > you could get custom-built stuff. How much money is required to violate the laws of physics? _That's_ the kind of limitations we're talking about when you get into power, cooling, heat dissipation, etc. > I see your zero-cooling situations, and I raise you H2, CO2, CO, Cl, H3O > conditions etc. The constraints on a sub are different, but the same in > scope. Until such time as you do work on a sub, or I do more than just a > little work on aerospace, we may have to leave it at that. But we _already_ have these same restrictions, since we have to operate in Naval environments. We also have _extra_ requirements. Considering that the topic of this thread is an aerospace system, I think it's not enough to "leave it at that." > > > > Usually such ridiculously extreme measures are not neccessary. The > > > Hardware guys > > > bitch about the cost-per-CPU going through the roof. Heck, it could cost > > > $10 million. > > > But if it saves 2 years of Software effort, that's a net saving of $90 > > > million. > > > > What does maintenance costs have to do with this discussion? > > Sorry I didn't make myself clear: I was talking development costs, not > maintenance. Then you're not talking about inertial nav systems. On most of the projects I've seen, the total software development time is two years or less. You're not going to save 2 years of software effort for a new system! > > If you're used to developing systems > > with those kind of constraints, you know how to make those decisions. > > Occasionally, you make the wrong decision, as the Ariane designers discovered. > > Welcome to engineering. > > My work has only killed 2 people (Iraqi pilots - that particular system > worked as advertised in the Gulf). There might be as many as 5000 people > whose lives depend on my work at any time, more if War breaks out. I > guess we have a different view of "acceptable losses" here, and your > view may well be more correct. You're misisng the point. It's not a question as to whether it's OK for the system to fail. It's a question of humans having to make decisions that don't include "well, if we throw enough money at it, we'll get everything we want." You cannot optimize software development time and ignore all other factors! In some cases, you have to compromise software development/maintenance efficiencies to meet other requirements. Sometimes, you make the wrong decision. Anyone who says they've always made the right call is a lawyer, not an engineer. > Why? Because such a conservative view as > my own may mean I just can't attempt some risky things. Things which > your team (sometimes at least) gets working, teherby saving more lives. However, if you build a system with the latest and greatest CPU, thereby having the maximum amount of horsepower to permit the software engineers to avoid turning off certain checks, etc., you _have_ attempted a risky thing. The latest hardware technology is the least used. > Yet I don't think so. > > > And, if you had only got 20MB per second after all that, you would have > > done...? > > 20 MB? First, re-check all calculations. Examine hardware options. Then > (probably) set up a "get-well" program using 5-6 different tracks and > pick the best. Most probably though, we'd give up: it's not doable > within the budget. That's the difference. We would not go to our management and say, "The only solutions we have require us to make compromises in our software approach, therefore it can't be done. Take your multi-billion project and go home." We'd work with the other engineering disciplines to come up with the best compromise. It's the difference, in my mind, between a computer scientist and a software engineer. The software engineer is paid to find a way to make it work -- even if (horrors) he has to write it in assembly, or use Unchecked_Conversion, or whatever. The difficult case is 150 MB. In this case, assembler > coding might just make the difference - I do get your point, BTW. > > > Certainly, if you just throw out range checking without knowing its cost, > > you're an idiot. However, no one has shown that the Ariane team did this. > > I guarantee you (and am willing to post object code to prove it) that > > range checking is not always zero cost, and in the right circumstances can > > cause you to bust your budget. > > Agree. There's always pathological cases where general rules don't > apply. Being fair, I didn't say "zero cost", I said "typically 5% > measured". In doing the initial Systems work, I'd usually budget for > 10%, as I'm paranoid. I've seen checks in just the wrong place that cause differences in 30% or more in a high-rate process. It's just not that trivial. > You get what you pay for, IF you're lucky. My point though is that many > of the hacks, kludges etc in software are caused by insufficient > foresight in systems design. And I wouldn't argue that. However, it's a _big_ leap to say ALL hacks are caused by such problems. Also, having gone through the system design process a few times, I've never had "sufficient foresight." There's always been at least one choice I made then that I would have made differently today. (Why didn't I see the obvious answer in 1985: HTML for my documentation! :) That's why reuse is always so tricky in safety-critical systems. It's very easy to make reasonable decisions then that don't make sense now. That's why at laugh at people who say, "reused code is safer; you don't have to test it once you get it working once!" > Case in point: RAN Collins class submarine. > Now many years late due to software problems. Last time I heard, they're > still trying to get that last 10% performance out of the 68020s on the > cards. Which were leading-edge when the systems work was done. Putting > in 68040s a few years ago would have meant the Software would have been > complete by now, as the hacks wouldn't have been neccessary. 68040s? I didn't think you could get mil-screened 68040s anymore. They're already obsolete. Not easy to make those foresighted decisions, is it? :) -- LMTAS - "Our Brand Means Quality" For more info, see http://www.lmtas.com or http://www.lmco.com ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~1996-10-10 0:00 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1996-10-01 0:00 Ariane 5 failure Marin David Condic, 407.796.8997, M/S 731-93 1996-10-02 0:00 ` Alan Brain 1996-10-02 0:00 ` Ken Garlington 1996-10-02 0:00 ` Matthew Heaney 1996-10-04 0:00 ` Robert S. White 1996-10-05 0:00 ` Robert Dewar 1996-10-06 0:00 ` Ariane 5 failure - latest S/W tech vs. cold hard facts Robert S. White 1996-10-10 0:00 ` Ken Garlington 1996-10-05 0:00 ` Ariane 5 failure Alan Brain 1996-10-06 0:00 ` Robert S. White 1996-10-04 0:00 ` System Engineering (was Re: Ariane 5 failure) Ken Garlington 1996-10-03 0:00 ` Ariane 5 failure Alan Brain 1996-10-04 0:00 ` Ken Garlington
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox