Re: Ariane 5 failure

comp.lang.ada
 help / color / mirror / Atom feed

* Re: Ariane 5 failure
@ 1996-10-01  0:00 Marin David Condic, 407.796.8997, M/S 731-93
  1996-10-02  0:00 ` Alan Brain
  0 siblings, 1 reply; 13+ messages in thread
From: Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-10-01  0:00 UTC (permalink / raw)



Ken Garlington <garlingtonke@LMTAS.LMCO.COM> writes:
>Alan Brain wrote:
>> A really good safety-critical
>> program should be remarkably difficult to de-bug, as the only way you
>> know it's got a major problem is by examining the error log, and
>> calculating that it's performance is below theoretical expectations.
>> And if it runs too slow, many times in the real-world you can spend 2
>> years of development time and many megabucks kludging the software, or
>> wait 12 months and get the new 400 Mhz chip instead of your current 133.
>
>I really need to change jobs. It sounds so much simpler to build
>software for ground-based PCs, where you don't have to worry about the
>weight, power requirements, heat dissipation, physical size,
>vulnerability to EMI/radiation/salt fog/temperature/etc. of your system.
>
    I personally like the part about "performance is below theoretical
    expectations". Where I live, I have a 5 millisecond loop which
    *must* finish in 5 milliseconds. If it runs in 7 milliseconds, we
    will fail to close the loop in sufficient time to keep valves from
    "slamming into stops", causing them to break, rendering someone's
    billion dollar rocket and billion dollar payload "unserviceable".
    In this business, that's what *we* mean by "performance is below
    theoretical expectations" and why runtime checks which seem
    "trivial" to most folks can mean the difference between having a
    working system and having an interesting exercise in computer
    science which isn't going to go anywhere.

    MDC

Marin David Condic, Senior Computer Engineer    ATT:        561.796.8997
M/S 731-96                                      Technet:    796.8997
Pratt & Whitney, GESP                           Fax:        561.796.4669
P.O. Box 109600                                 Internet:   CONDICMA@PWFL.COM
West Palm Beach, FL 33410-9600                  Internet:   CONDIC@FLINET.COM
===============================================================================
    "Some people say a front-engine car handles best. Some people say
    a rear-engine car handles best. I say a rented car handles best."

        --  P. J. O'Rourke
===============================================================================




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-01  0:00 Ariane 5 failure Marin David Condic, 407.796.8997, M/S 731-93
@ 1996-10-02  0:00 ` Alan Brain
  1996-10-02  0:00   ` Ken Garlington
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Brain @ 1996-10-02  0:00 UTC (permalink / raw)

Marin David Condic, 407.796.8997, M/S 731-93 wrote:
> 
> Ken Garlington <garlingtonke@LMTAS.LMCO.COM> writes:

> >I really need to change jobs. It sounds so much simpler to build
> >software for ground-based PCs, where you don't have to worry about the
> >weight, power requirements, heat dissipation, physical size,
> >vulnerability to EMI/radiation/salt fog/temperature/etc. of your system.
> >

The particular system I was talking about was for a Submarine. Very
tight
constraints indeed, on power (it was a diesel sub), physical size (had
to
fit in a torpedo hatch), heat dissipation (a bit), vulnerability to 100%
humidity, salt, chlorine etc etc. Been there, Done that, Got the
T-shirt.

I'm a Software Engineer who works mainly in Systems. Or maybe a Systems
Engineer with a hardware bias. Regardless, in the initial Systems
Engineering
phase, when one gets all the HWCIs and CSCIs defined, it is only good
professional practice to build in plenty of slack. If the requirement is
to fit
in a 21" hatch, you DON'T design something that's 20.99999" wide. If you
can,
make it 16", 18 at max. It'll probably grow. Similarly, if you require a
minimum
of 25 MFlops, make sure there's a growth path to at least 100. It may
well be less
expensive and less risky to build a chip factory to make a faster CPU
than to
lose a rocket, or a sub due to software failure that could have been
prevented.
Usually such ridiculously extreme measures are not neccessary. The
Hardware guys
bitch about the cost-per-CPU going through the roof. Heck, it could cost
$10 million.
But if it saves 2 years of Software effort, that's a net saving of $90
million.
(All numbers are representative ie plucked out of mid-air, and as you
USAians say,
 Your Mileage May Vary)

>     I personally like the part about "performance is below theoretical
>     expectations". Where I live, I have a 5 millisecond loop which
>     *must* finish in 5 milliseconds. If it runs in 7 milliseconds, we
>     will fail to close the loop in sufficient time to keep valves from
>     "slamming into stops", causing them to break, rendering someone's
>     billion dollar rocket and billion dollar payload "unserviceable".
>     In this business, that's what *we* mean by "performance is below
>     theoretical expectations" and why runtime checks which seem
>     "trivial" to most folks can mean the difference between having a
>     working system and having an interesting exercise in computer
>     science which isn't going to go anywhere.

In this case, "theoretical expectations" for a really tight 5 MuSec loop
should be less than 1 MuSec. Yes, I'm dreaming. OK, 3 MuSec, that's my
final offer. For the vast majority of cases, if your engineering is
closer to
the edge than that, it'll cost big bucks to fix the over-runs you always
get.

Typical example: I had a big bun-fight with project management about a
hefty
data transfer rate required for a broadband sonar. They wanted to
hand-code
the lot in assembler, as the requirements were really, really tight. No
time
for any of this range-check crap, the data was always good.
I eventually threw enough of a professional tantrum to wear down even a
group
of German Herr Professor Doktors, and we did it in Ada-83. If only as a
first
pass, to see what the rate really would be.
The spec called for 160 MB/Sec. First attempt was 192 MB/Sec, and after
some optimisation, we got over 250. After the hardware flaws were fixed
(the ones
the "un-neccessary" range-bound checking detected ) this was above 300.
Now that's
too close for my druthers, but even 161 I could live with. Saved maybe
16 months
on the project, about 100 people at $15K a month. After the transfer,
the data
really was trustworthy - which saved a lot of time downstream on the
applications
in debug time.
Note that even with (minor) hardware flaws, the system still worked.
Note also
that by paying big $ for more capable hardware than strictly neccessary,
you
can save bigger $ on the project.
Many projects spend many months and many $ Million to fix, by hacking,
Kludging,
and sheer Genius what a few lousy $100K of extra hardware cost would
make
un-neccessary. A good software engineer in the Risk-management team, and
on the
Systems Engineering early on, one with enough technical nous in hardware
to know
what's feasible, enough courage to cost the firm millions in initial
costs, and
enough power to make it stick, that's what's neccessary. I've seen it;
it works.

But it's been tried less than a dozen times in 15 years in my experience
:(

----------------------      <> <>    How doth the little Crocodile
| Alan & Carmel Brain|      xxxxx       Improve his shining tail?
| Canberra Australia |  xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM
---------------------- o OO*O^^^^O*OO o oo     oo oo     oo  
                    By pulling Maerklin Wagons, in 1/220 Scale

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-02  0:00 ` Alan Brain
@ 1996-10-02  0:00   ` Ken Garlington
  1996-10-02  0:00     ` Matthew Heaney
  1996-10-03  0:00     ` Ariane 5 failure Alan Brain
  0 siblings, 2 replies; 13+ messages in thread
From: Ken Garlington @ 1996-10-02  0:00 UTC (permalink / raw)
  To: aebrain

Alan Brain wrote:
> 
> Marin David Condic, 407.796.8997, M/S 731-93 wrote:
> >
> > Ken Garlington <garlingtonke@LMTAS.LMCO.COM> writes:
> 
> > >I really need to change jobs. It sounds so much simpler to build
> > >software for ground-based PCs, where you don't have to worry about the
> > >weight, power requirements, heat dissipation, physical size,
> > >vulnerability to EMI/radiation/salt fog/temperature/etc. of your system.
> > >
> 
> The particular system I was talking about was for a Submarine. Very
> tight
> constraints indeed, on power (it was a diesel sub), physical size (had
> to
> fit in a torpedo hatch), heat dissipation (a bit), vulnerability to 100%
> humidity, salt, chlorine etc etc. Been there, Done that, Got the
> T-shirt.

So what did you do when you needed to build a system that was bigger than the
torpedo hatch? Re-design the submarine? You have physical limits that you just can't
exceed. On a rocket, or an airplane, you have even stricter limits.

Oh for the luxury of a diesel generator! We have to be able to operate on basic
battery power (and we share that bus with emergency lighting, etc.)

> I'm a Software Engineer who works mainly in Systems. Or maybe a Systems
> Engineer with a hardware bias. Regardless, in the initial Systems
> Engineering
> phase, when one gets all the HWCIs and CSCIs defined, it is only good
> professional practice to build in plenty of slack. If the requirement is
> to fit
> in a 21" hatch, you DON'T design something that's 20.99999" wide. If you
> can,
> make it 16", 18 at max. It'll probably grow.

Exactly. You build a system that has slack. Say, 15% slack. Which is exactly
why the INU design team didn't want to add checks unless they had to. Because
they were starting to eat into that slack.

> Similarly, if you require a
> minimum
> of 25 MFlops, make sure there's a growth path to at least 100. It may
> well be less
> expensive and less risky to build a chip factory to make a faster CPU
> than to
> lose a rocket, or a sub due to software failure that could have been
> prevented.

What if your brand new CPU requires more power than your diesel generator
can generate?

What if your brand new CPU requires a technology that doesn't let you meet
your heat dissipation?

Doesn't sound like you had to make a lot of tradeoffs in your system.
Unfortunately, airborne systems, particular those that have to operate in
lower-power, zero-cooling situations (amazing how hot the air gets around
Mach 1!), don't have such luxuries.

> Usually such ridiculously extreme measures are not neccessary. The
> Hardware guys
> bitch about the cost-per-CPU going through the roof. Heck, it could cost
> $10 million.
> But if it saves 2 years of Software effort, that's a net saving of $90
> million.

What does maintenance costs have to do with this discussion?

> In this case, "theoretical expectations" for a really tight 5 MuSec loop
> should be less than 1 MuSec. Yes, I'm dreaming. OK, 3 MuSec, that's my
> final offer. For the vast majority of cases, if your engineering is
> closer to
> the edge than that, it'll cost big bucks to fix the over-runs you always
> get.

I've never had a project yet where we didn't routinely cut it that fine,
and we've yet to spend the big bucks. If you're used to developing systems
with those kind of constraints, you know how to make those decisions.
Occasionally, you make the wrong decision, as the Ariane designers discovered.
Welcome to engineering.

> Typical example: I had a big bun-fight with project management about a
> hefty
> data transfer rate required for a broadband sonar. They wanted to
> hand-code
> the lot in assembler, as the requirements were really, really tight. No
> time
> for any of this range-check crap, the data was always good.
> I eventually threw enough of a professional tantrum to wear down even a
> group
> of German Herr Professor Doktors, and we did it in Ada-83. If only as a
> first
> pass, to see what the rate really would be.
> The spec called for 160 MB/Sec. First attempt was 192 MB/Sec, and after
> some optimisation, we got over 250. After the hardware flaws were fixed
> (the ones
> the "un-neccessary" range-bound checking detected ) this was above 300.

And, if you had only got 20MB per second after all that, you would have
done...?

Certainly, if you just throw out range checking without knowing its cost,
you're an idiot. However, no one has shown that the Ariane team did this.
I guarantee you (and am willing to post object code to prove it) that
range checking is not always zero cost, and in the right circumstances can
cause you to bust your budget.

> Note also
> that by paying big $ for more capable hardware than strictly neccessary,
> you
> can save bigger $ on the project.

Unfortunately, cost is not the only controlling variable.

Interesting that a $100K difference in per-unit cost in your systems is
negligible. No wonder people think military systems are too expensive!

-- 
LMTAS - "Our Brand Means Quality"
For more info, see http://www.lmtas.com or http://www.lmco.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-02  0:00   ` Ken Garlington
@ 1996-10-02  0:00     ` Matthew Heaney
  1996-10-04  0:00       ` Robert S. White
  1996-10-04  0:00       ` System Engineering (was Re: Ariane 5 failure) Ken Garlington
  1996-10-03  0:00     ` Ariane 5 failure Alan Brain
  1 sibling, 2 replies; 13+ messages in thread
From: Matthew Heaney @ 1996-10-02  0:00 UTC (permalink / raw)



In article <3252B46C.5E9D@lmtas.lmco.com>, Ken Garlington
<garlingtonke@lmtas.lmco.com> wrote:

>Interesting that a $100K difference in per-unit cost in your systems is
>negligible. No wonder people think military systems are too expensive!

I think he meant "negligable compared to the programming cost that would be
required to get the software to run on the cheaper hardware."

It's never cost effective to skimp on hardware if it means human
programmers have to write more complex software.

--------------------------------------------------------------------
Matthew Heaney
Software Development Consultant
mheaney@ni.net
(818) 985-1271




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-02  0:00     ` Matthew Heaney
@ 1996-10-04  0:00       ` Robert S. White
  1996-10-05  0:00         ` Robert Dewar
  1996-10-05  0:00         ` Ariane 5 failure Alan Brain
  1996-10-04  0:00       ` System Engineering (was Re: Ariane 5 failure) Ken Garlington
  1 sibling, 2 replies; 13+ messages in thread
From: Robert S. White @ 1996-10-04  0:00 UTC (permalink / raw)



In article <mheaney-ya023180000210962257430001@news.ni.net>, mheaney@ni.net 
says...

>It's never cost effective to skimp on hardware if it means human
>programmers have to write more complex software.

  Not if the ratio is tilted very heavy towards reoccuring cost versus
Non-Reoccuring Engineering (NRE).  How about 12 staff-months versus $300 extra 
hardware cost on 60,000 units?

___________________________________________________________________________
Robert S. White                    -- an embedded systems software engineer
WhiteR@CRPL.Cedar-Rapids.lib.IA.US -- It's long, but I pay for it!





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-04  0:00       ` Robert S. White
@ 1996-10-05  0:00         ` Robert Dewar
  1996-10-06  0:00           ` Ariane 5 failure - latest S/W tech vs. cold hard facts Robert S. White
  1996-10-05  0:00         ` Ariane 5 failure Alan Brain
  1 sibling, 1 reply; 13+ messages in thread
From: Robert Dewar @ 1996-10-05  0:00 UTC (permalink / raw)



iRobert White said

">It's never cost effective to skimp on hardware if it means human
>programmers have to write more complex software.

  Not if the ratio is tilted very heavy towards reoccuring cost versus
Non-Reoccuring Engineering (NRE).  How about 12 staff-months versus $300 extra
hardware cost on 60,000 units?"



Of course this is true at some level, but the critical thing is that a proper
cost comparison here must take into account:

a) full life cycle costs of the software, not just development costs
b) time-to-market delays caused by more complex software
c) decreased quality and reliability caused by more complex software

There are certainly cases where careful consideration of these three factors
still results in a decision to use less hardware and more complex software,
but I think we have all seen cases where such decisions were made, and n
in retrospect turned out to be huge mistakes.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure - latest S/W tech vs. cold hard facts
  1996-10-05  0:00         ` Robert Dewar
@ 1996-10-06  0:00           ` Robert S. White
  1996-10-10  0:00             ` Ken Garlington
  0 siblings, 1 reply; 13+ messages in thread
From: Robert S. White @ 1996-10-06  0:00 UTC (permalink / raw)



In article <dewar.844517570@schonberg>, dewar@schonberg.cs.nyu.edu says...

>There are certainly cases where careful consideration of these three factors
>still results in a decision to use less hardware and more complex software,
>but I think we have all seen cases where such decisions were made, and n
>in retrospect turned out to be huge mistakes.

  In business when making hard decisions about embedded systems products, 
such studies are almost always made.  One factor now causing a lot of study 
and implementation of "complex software" are efforts to reduce procedure call 
overhead and virtual dispatching for object oriented high level languages 
that might be used for embedded products.  The extra thruput and memory 
required does end up using more memory chips and faster (more power 
hungary) processors.  This is a major problem when power consumption, size and 
cost requirements are taken into consideration.  Engineering has to look at 
the entire picture.

  We want to use the latest software technology that results in the cleanest 
most easy to maintain design.  Sometimes it takes a while to hone the tools 
that use this new technology till they are ready for prime time in mission 
critical software that has to operate in an enviroment with a lot of other 
constraints.  Ada 83 in 1984 had a lot of problems in implementations that 
were mostly solved by 1991.  Ada 95 (with advantage taken of its new object 
oriented features) and Java bytecode virtual machines also need a significant 
amount of effort expended till they are ready for these more constrained 
embedded products.  Not to say that it won't happen, just that they often 
don't pass the rigorous cost/feature analysis tradeoffs at this date for 
immediate product implementation.

  And we NEVER want to make a mistake for flight critical software :-<
or have a critical task not able to run to completion within its deadlines.
We need that processor thruput reserve to have a safety margin for rate 
monotonic tasks.  

  I agreed with most everything Ken Garlington and other Lockheed Martin 
engineers have posted in this thread on this same subject.  Their statements 
ring  with my industry experience for the last 21 years.
___________________________________________________________________________
Robert S. White                    -- an embedded systems software engineer
WhiteR@CRPL.Cedar-Rapids.lib.IA.US -- It's long, but I pay for it!





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure - latest S/W tech vs. cold hard facts
  1996-10-06  0:00           ` Ariane 5 failure - latest S/W tech vs. cold hard facts Robert S. White
@ 1996-10-10  0:00             ` Ken Garlington
  0 siblings, 0 replies; 13+ messages in thread
From: Ken Garlington @ 1996-10-10  0:00 UTC (permalink / raw)

Robert S. White wrote:
> 
> In article <dewar.844517570@schonberg>, dewar@schonberg.cs.nyu.edu says...
> 
> >There are certainly cases where careful consideration of these three factors
> >still results in a decision to use less hardware and more complex software,
> >but I think we have all seen cases where such decisions were made, and n
> >in retrospect turned out to be huge mistakes.
> 
>   In business when making hard decisions about embedded systems products,
> such studies are almost always made.

In my experience, both statements are true. Such studies are often made, and I have 
also seen cases where they weren't made, or were done poorly. I have no idea what the 
percentages are for:

  * the study was done correctly
  * the study was not done correctly (or not done at all), but the decision turned
    out to be right anyway
  * the study was done incorrectly, the answer was wrong, and no one ever discovered
    it was wrong (because no one ever looked at the final cost, etc.)
  * the study was done incorrectly, the answer was wrong, someone found out it was
    wrong, but didn't broadcast it to the general public (would you?)

Overall, I'd say we'll never know. As a colleague of mine said: "How many systems out 
there have a bug like the Ariane 5, but just never hit that magic condition where the 
bug caused a failure?" Just think: A little less acceleration on takeoff, and we'd 
think Arianespace made a wonderful decision by reusing the Ariane 4 -- look at all the 
money they saved! It might have been mentioned in the Reuse News as a major success :)

I've got a few minutes, so I'll mention another of my favorite themes at this point 
(usually stated in the context of preparing Ada waivers): It's really hard to 
determine the life-cycle cost of software, particularly over a long period (e.g. 20 
years). There are cost models; sometimes, we even get the parameters right and the 
model comes up with the right answer. Nonetheless, it's tough to consider life-cycle 
costs objectively. That's not an excuse for failing to try, but an acknowledgement 
that it's easy to get it wrong (particularly for new technology).

Software engineering can be _so_ depressing!

-- 
LMTAS - "Our Brand Means Quality"
For more info, see http://www.lmtas.com or http://www.lmco.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-04  0:00       ` Robert S. White
  1996-10-05  0:00         ` Robert Dewar
@ 1996-10-05  0:00         ` Alan Brain
  1996-10-06  0:00           ` Robert S. White
  1 sibling, 1 reply; 13+ messages in thread
From: Alan Brain @ 1996-10-05  0:00 UTC (permalink / raw)

Robert S. White wrote:
> 
> In article <mheaney-ya023180000210962257430001@news.ni.net>, mheaney@ni.net
> says...
> 
> >It's never cost effective to skimp on hardware if it means human
> >programmers have to write more complex software.
> 
>   Not if the ratio is tilted very heavy towards reoccuring cost versus
> Non-Reoccuring Engineering (NRE).  How about 12 staff-months versus $300 extra
> hardware cost on 60,000 units?

$300 extra on 60,000 units. That's $18 Million, right?

vs

12 Staff-months. Now if your staff is 1, then that's maybe $200,000 for
a single top-notch profi. If your staff is 200, each at 100,000 cost (ie
average wage is about 50K/year), then that's 20 million. But say you
only have the one guy. And say it adds 50% to the risk of failure. With
consequent and liquidated damages of 100 Million. Then that's 50
million, 200 thousand it's really costing.

Feel free to make whatever strawman case you want. The above figures are
based on 2 different projects (actually the liquidated damages one
involved 6 people, rather than 1, and an estimated 70% increased chance
of failure, but I digress). 

Summary: In the real world, and with the current state-of-the-art, I
agree with the original statement as an excellent general rule.

----------------------      <> <>    How doth the little Crocodile
| Alan & Carmel Brain|      xxxxx       Improve his shining tail?
| Canberra Australia |  xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM
---------------------- o OO*O^^^^O*OO o oo     oo oo     oo  
                    By pulling Maerklin Wagons, in 1/220 Scale

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-05  0:00         ` Ariane 5 failure Alan Brain
@ 1996-10-06  0:00           ` Robert S. White
  0 siblings, 0 replies; 13+ messages in thread
From: Robert S. White @ 1996-10-06  0:00 UTC (permalink / raw)



In article <3256ED61.7952@dynamite.com.au>, aebrain@dynamite.com.au says...
>
>12 Staff-months. Now if your staff is 1, then that's maybe $200,000 for
>a single top-notch profi. If your staff is 200, each at 100,000 cost (ie
>average wage is about 50K/year), then that's 20 million. 

   Number of staff * amount of time = staff months 
         (with a dash of reality for reasonable parallel tasks)

   The type of strawman that I had in mind could be 1 person for a year, two 
persons for six months, to a limit of 4 persons for 3 months.  And watch out 
for the mythical man-machine month!

> But say you
>only have the one guy. And say it adds 50% to the risk of failure. With
>consequent and liquidated damages of 100 Million. Then that's 50
>million, 200 thousand it's really costing.

  Projects these days also have a "Risk Management Plan"  per SEI CMM 
recomendations.  That 50% to the risk of failure has to be assigned a estimated 
cost and factored in the decision.


>Feel free to make whatever strawman case you want. The above figures are
>based on 2 different projects (actually the liquidated damages one
>involved 6 people, rather than 1, and an estimated 70% increased chance
>of failure, but I digress).

  I've seen a lot of successes.  Failures most often can be attributed to poor 
judgement by incompetent personnel.  That can be tough to manage when the 
managers don't want to hear bad news or risk projections.  Especially when they 
set up a project and move on before it is done. 
>
>Summary: In the real world, and with the current state-of-the-art, I
>agree with the original statement as an excellent general rule.

  I beg to disagree in the case of higher volume markets.  I do agree very much 
for lower volumes or when the type of development task is new to the engineers 
and managers.  You must have a good understanding of the problem and the 
solution domain to do proper cost tradeoffs that involve significant risk.

___________________________________________________________________________
Robert S. White                    -- an embedded systems software engineer
WhiteR@CRPL.Cedar-Rapids.lib.IA.US -- It's long, but I pay for it!





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: System Engineering (was Re: Ariane 5 failure)
  1996-10-02  0:00     ` Matthew Heaney
  1996-10-04  0:00       ` Robert S. White
@ 1996-10-04  0:00       ` Ken Garlington
  1 sibling, 0 replies; 13+ messages in thread
From: Ken Garlington @ 1996-10-04  0:00 UTC (permalink / raw)

Matthew Heaney wrote:
> 
> It's never cost effective to skimp on hardware if it means human
> programmers have to write more complex software.

Never say never!

Take an example of building, say, 1,000 units of some widget containing 1,500
source lines of code. For ease of calculation, assume $100/workerhour, 150 
w-hours/w-month.

1. Buy a CPU at $50/unit which will do the job, but will cause the software
development team to spend 10 w-months to complete the task, and will cause
the post-deployment cost to be 2x the development cost.

10 w-months to complete the original development is $100 x 150 x 10 = $150,000.
Maintenance is $300,000. Total software cost per unit (amortized over several
years, possibly, for the maintenance): $450/unit.

2. Buy a CPU at $300/unit which will do the job, and because it's so modern,
the software development team only needs 5 w-months to complete the task, and
the post-deployment cost is only 1x the development cost. In other words, the
software development time is cut in half (the standard promise for such
improvements). So, software cost: $225/unit.

Assuming I did my math right, I'd be buying some cheap hardware right about now...

-- 
LMTAS - "Our Brand Means Quality"
For more info, see http://www.lmtas.com or http://www.lmco.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-02  0:00   ` Ken Garlington
  1996-10-02  0:00     ` Matthew Heaney
@ 1996-10-03  0:00     ` Alan Brain
  1996-10-04  0:00       ` Ken Garlington
  1 sibling, 1 reply; 13+ messages in thread
From: Alan Brain @ 1996-10-03  0:00 UTC (permalink / raw)

Ken Garlington wrote:

> So what did you do when you needed to build a system that was bigger than the
> torpedo hatch? Re-design the submarine? 

Nope, we re-designed the system so it fit anyway. Actually, we designed
the thing in the first place so that the risk of it physically growing
too big and needing re-design was tolerable (ie contingency money was
allocated for doing this, if we couldn't accurately estimate the risk as
being small).

> Oh for the luxury of a diesel generator! We have to be able to operate on basic
> battery power (and we share that bus with emergency lighting, etc.)

Well ours had a generator connected to a hamster wheel with a piece of
cheese as backup ;-).... but seriously folks, yes we have a diesel. Why?
to charge the batteries. Use of the Diesel under many conditions - eg
when taking piccies in Vladivostok Harbour - would be unwise.

> Exactly. You build a system that has slack. Say, 15% slack. Which is exactly
> why the INU design team didn't want to add checks unless they had to. Because
> they were starting to eat into that slack.

I'd be very, very suspicious of a slack like "15%". This implies you
know to within 2 significant figures what the load is going to be. Which
in my experience is not the case. "About a Seventh" is more accurate, as
it implies more imprecision. And I'd be surprised if any Bungee-Jumper
would tolerate that small amount of safety margin using new equipment.  
Then again, slack is supposed to be used up. It's for the unforeseen.
When you come across a problem during development, you shouldn't be
afraid of using up that slack, that's what it's there for! One is
reminded of the apocryphal story of the quartemaster at Pearl Harbour,
who refused to hand out ammunition as it could have been needed more
later. 

> What if your brand new CPU requires more power than your diesel generator
> can generate? 
> What if your brand new CPU requires a technology that doesn't let you meet
> your heat dissipation?

But it doesn't. When you did your initial systems engineering, you made
sure there was enough slack - OR had enough contingency money so that
you could get custom-built stuff.

> Doesn't sound like you had to make a lot of tradeoffs in your system.
> Unfortunately, airborne systems, particular those that have to operate in
> lower-power, zero-cooling situations (amazing how hot the air gets around
> Mach 1!), don't have such luxuries.

I see your zero-cooling situations, and I raise you H2, CO2, CO, Cl, H3O
conditions etc. The constraints on a sub are different, but the same in
scope. Until such time as you do work on a sub, or I do more than just a
little work on aerospace, we may have to leave it at that.

> > Usually such ridiculously extreme measures are not neccessary. The
> > Hardware guys
> > bitch about the cost-per-CPU going through the roof. Heck, it could cost
> > $10 million.
> > But if it saves 2 years of Software effort, that's a net saving of $90
> > million.
> 
> What does maintenance costs have to do with this discussion?

Sorry I didn't make myself clear: I was talking development costs, not
maintenance.

> I've never had a project yet where we didn't routinely cut it that fine,
> and we've yet to spend the big bucks.

Then I guess either a) You're one heck of a better engineer than me (and
I freely admit the distinct possibility) or b) You've been really lucky
or c) You must tolerate a lot more failures than the organisations I've
worked for.

>  If you're used to developing systems
> with those kind of constraints, you know how to make those decisions.
> Occasionally, you make the wrong decision, as the Ariane designers discovered.
> Welcome to engineering.

My work has only killed 2 people (Iraqi pilots - that particular system
worked as advertised in the Gulf). There might be as many as 5000 people
whose lives depend on my work at any time, more if War breaks out. I
guess we have a different view of "acceptable losses" here, and your
view may well be more correct. Why? Because such a conservative view as
my own may mean I just can't attempt some risky things. Things which
your team (sometimes at least) gets working, teherby saving more lives.
Yet I don't think so.  

> And, if you had only got 20MB per second after all that, you would have
> done...?

20 MB? First, re-check all calculations. Examine hardware options. Then
(probably) set up a "get-well" program using 5-6 different tracks and
pick the best. Most probably though, we'd give up: it's not doable
within the budget. The difficult case is 150 MB. In this case, assembler
coding might just make the difference - I do get your point, BTW.

> Certainly, if you just throw out range checking without knowing its cost,
> you're an idiot. However, no one has shown that the Ariane team did this.
> I guarantee you (and am willing to post object code to prove it) that
> range checking is not always zero cost, and in the right circumstances can
> cause you to bust your budget.

Agree. There's always pathological cases where general rules don't
apply. Being fair, I didn't say "zero cost", I said "typically 5%
measured". In doing the initial Systems work, I'd usually budget for
10%, as I'm paranoid.

> Unfortunately, cost is not the only controlling variable.
> 
> Interesting that a $100K difference in per-unit cost in your systems is
> negligible. No wonder people think military systems are too expensive!

You get what you pay for, IF you're lucky. My point though is that many
of the hacks, kludges etc in software are caused by insufficient
foresight in systems design. Case in point: RAN Collins class submarine.
Now many years late due to software problems. Last time I heard, they're
still trying to get that last 10% performance out of the 68020s on the
cards. Which were leading-edge when the systems work was done. Putting
in 68040s a few years ago would have meant the Software would have been
complete by now, as the hacks wouldn't have been neccessary.

----------------------      <> <>    How doth the little Crocodile
| Alan & Carmel Brain|      xxxxx       Improve his shining tail?
| Canberra Australia |  xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM
---------------------- o OO*O^^^^O*OO o oo     oo oo     oo  
                    By pulling Maerklin Wagons, in 1/220 Scale

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ariane 5 failure
  1996-10-03  0:00     ` Ariane 5 failure Alan Brain
@ 1996-10-04  0:00       ` Ken Garlington
  0 siblings, 0 replies; 13+ messages in thread
From: Ken Garlington @ 1996-10-04  0:00 UTC (permalink / raw)

Alan Brain wrote:
> 
> Ken Garlington wrote:
> 
> > So what did you do when you needed to build a system that was bigger than the
> > torpedo hatch? Re-design the submarine?
> 
> Nope, we re-designed the system so it fit anyway.

Tsk, tsk! You violated your own design constraint of "always provide enough
margin for growth." Just think how much money you would have saved if you had
built it bigger to begin with!

> Actually, we designed
> the thing in the first place so that the risk of it physically growing
> too big and needing re-design was tolerable (ie contingency money was
> allocated for doing this, if we couldn't accurately estimate the risk as
> being small).

I'm sure the Arianespace folks had the same contingency funding. In fact, they're
spending it right now. :)

> 
> > Oh for the luxury of a diesel generator! We have to be able to operate on basic
> > battery power (and we share that bus with emergency lighting, etc.)
> 
> Well ours had a generator connected to a hamster wheel with a piece of
> cheese as backup ;-).... but seriously folks, yes we have a diesel. Why?
> to charge the batteries.

Batteries, plural? Wow!

> I'd be very, very suspicious of a slack like "15%". This implies you
> know to within 2 significant figures what the load is going to be. Which
> in my experience is not the case. "About a Seventh" is more accurate, as
> it implies more imprecision. And I'd be surprised if any Bungee-Jumper
> would tolerate that small amount of safety margin using new equipment.
> Then again, slack is supposed to be used up. It's for the unforeseen.
> When you come across a problem during development, you shouldn't be
> afraid of using up that slack, that's what it's there for!

Actually, no. For most military programs, slack is for a combination of
growth _after_ the initial development, or for unforseen variations in
the production system (e.g., a processor that's a little slower than spec.)
And, 15% is a common number for such slack.

I think you're confusing "slack" with "management reserve," which is usually
an number set by the development organization and used up (if needed) during
development. The 15% number is usually imposed by a prime on a subcontractor
for the reasons described above.

> > What if your brand new CPU requires more power than your diesel generator
> > can generate?
> > What if your brand new CPU requires a technology that doesn't let you meet
> > your heat dissipation?
> 
> But it doesn't. When you did your initial systems engineering, you made
> sure there was enough slack - OR had enough contingency money so that
> you could get custom-built stuff.

How much money is required to violate the laws of physics? _That's_ the
kind of limitations we're talking about when you get into power, cooling,
heat dissipation, etc.

> I see your zero-cooling situations, and I raise you H2, CO2, CO, Cl, H3O
> conditions etc. The constraints on a sub are different, but the same in
> scope. Until such time as you do work on a sub, or I do more than just a
> little work on aerospace, we may have to leave it at that.

But we _already_ have these same restrictions, since we have to operate in
Naval environments. We also have _extra_ requirements.

Considering that the topic of this thread is an aerospace system, I think
it's not enough to "leave it at that."

> 
> > > Usually such ridiculously extreme measures are not neccessary. The
> > > Hardware guys
> > > bitch about the cost-per-CPU going through the roof. Heck, it could cost
> > > $10 million.
> > > But if it saves 2 years of Software effort, that's a net saving of $90
> > > million.
> >
> > What does maintenance costs have to do with this discussion?
> 
> Sorry I didn't make myself clear: I was talking development costs, not
> maintenance.

Then you're not talking about inertial nav systems. On most of the projects
I've seen, the total software development time is two years or less. You're
not going to save 2 years of software effort for a new system!

> >  If you're used to developing systems
> > with those kind of constraints, you know how to make those decisions.
> > Occasionally, you make the wrong decision, as the Ariane designers discovered.
> > Welcome to engineering.
> 
> My work has only killed 2 people (Iraqi pilots - that particular system
> worked as advertised in the Gulf). There might be as many as 5000 people
> whose lives depend on my work at any time, more if War breaks out. I
> guess we have a different view of "acceptable losses" here, and your
> view may well be more correct.

You're misisng the point. It's not a question as to whether it's OK for the
system to fail. It's a question of humans having to make decisions that
don't include "well, if we throw enough money at it, we'll get everything we
want." You cannot optimize software development time and ignore all other
factors! In some cases, you have to compromise software development/maintenance
efficiencies to meet other requirements. Sometimes, you make the wrong
decision. Anyone who says they've always made the right call is a lawyer, not
an engineer.

> Why? Because such a conservative view as
> my own may mean I just can't attempt some risky things. Things which
> your team (sometimes at least) gets working, teherby saving more lives.

However, if you build a system with the latest and greatest CPU, thereby
having the maximum amount of horsepower to permit the software engineers
to avoid turning off certain checks, etc., you _have_ attempted a risky
thing. The latest hardware technology is the least used.

> Yet I don't think so.
> 
> > And, if you had only got 20MB per second after all that, you would have
> > done...?
> 
> 20 MB? First, re-check all calculations. Examine hardware options. Then
> (probably) set up a "get-well" program using 5-6 different tracks and
> pick the best. Most probably though, we'd give up: it's not doable
> within the budget.

That's the difference. We would not go to our management and say, "The
only solutions we have require us to make compromises in our software
approach, therefore it can't be done. Take your multi-billion project
and go home." We'd work with the other engineering disciplines to come
up with the best compromise. It's the difference, in my mind, between a
computer scientist and a software engineer. The software engineer is paid
to find a way to make it work -- even if (horrors) he has to write it in
assembly, or use Unchecked_Conversion, or whatever.

 The difficult case is 150 MB. In this case, assembler
> coding might just make the difference - I do get your point, BTW.
> 
> > Certainly, if you just throw out range checking without knowing its cost,
> > you're an idiot. However, no one has shown that the Ariane team did this.
> > I guarantee you (and am willing to post object code to prove it) that
> > range checking is not always zero cost, and in the right circumstances can
> > cause you to bust your budget.
> 
> Agree. There's always pathological cases where general rules don't
> apply. Being fair, I didn't say "zero cost", I said "typically 5%
> measured". In doing the initial Systems work, I'd usually budget for
> 10%, as I'm paranoid.

I've seen checks in just the wrong place that cause differences in 30% or
more in a high-rate process. It's just not that trivial.

> You get what you pay for, IF you're lucky. My point though is that many
> of the hacks, kludges etc in software are caused by insufficient
> foresight in systems design. 

And I wouldn't argue that. However, it's a _big_ leap to say ALL hacks
are caused by such problems. Also, having gone through the system design
process a few times, I've never had "sufficient foresight." There's always
been at least one choice I made then that I would have made differently today.
(Why didn't I see the obvious answer in 1985: HTML for my documentation! :)

That's why reuse is always so tricky in safety-critical systems. It's very
easy to make reasonable decisions then that don't make sense now. That's
why at laugh at people who say, "reused code is safer; you don't have to
test it once you get it working once!"

> Case in point: RAN Collins class submarine.
> Now many years late due to software problems. Last time I heard, they're
> still trying to get that last 10% performance out of the 68020s on the
> cards. Which were leading-edge when the systems work was done. Putting
> in 68040s a few years ago would have meant the Software would have been
> complete by now, as the hacks wouldn't have been neccessary.

68040s? I didn't think you could get mil-screened 68040s anymore. They're
already obsolete.

Not easy to make those foresighted decisions, is it? :)

-- 
LMTAS - "Our Brand Means Quality"
For more info, see http://www.lmtas.com or http://www.lmco.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~1996-10-10  0:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-10-01  0:00 Ariane 5 failure Marin David Condic, 407.796.8997, M/S 731-93
1996-10-02  0:00 ` Alan Brain
1996-10-02  0:00   ` Ken Garlington
1996-10-02  0:00     ` Matthew Heaney
1996-10-04  0:00       ` Robert S. White
1996-10-05  0:00         ` Robert Dewar
1996-10-06  0:00           ` Ariane 5 failure - latest S/W tech vs. cold hard facts Robert S. White
1996-10-10  0:00             ` Ken Garlington
1996-10-05  0:00         ` Ariane 5 failure Alan Brain
1996-10-06  0:00           ` Robert S. White
1996-10-04  0:00       ` System Engineering (was Re: Ariane 5 failure) Ken Garlington
1996-10-03  0:00     ` Ariane 5 failure Alan Brain
1996-10-04  0:00       ` Ken Garlington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox