comp.lang.ada
 help / color / mirror / Atom feed
* Ariane 5 - not an exception?
@ 1996-07-25  0:00 Simon Bluck
  1996-07-25  0:00 ` Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) Kirk Beitz
                   ` (6 more replies)
  0 siblings, 7 replies; 194+ messages in thread
From: Simon Bluck @ 1996-07-25  0:00 UTC (permalink / raw)



The Ariane 501 flight failure was due to the raising of an unexpected
Ada exception, which was handled by switching off the computer.  The
report on this:

   http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html

is clear and hard-hitting: it will result in much improved software.
But does it get right to the bottom of the issues, and does the
software community appreciate that there are fundamental software
control problems which can directly give rise to such enormous
failures, in this particular case thankfully without loss of life?

It is most unfortunate, but must be accepted as true, that if the
Ariane software had been written in a less powerful language the
numeric overflow might have gone unnoticed, the computers would have
remained switched on, and the rocket would have continued its upward
flight.

Exceptions and assertions are both used, in Ada and C/C++, to detect
software/hardware anomalies.  When one of these trips, it is
frequently very difficult for the designer to know how best to handle
the problem.  To continue may result in corrupt data; to abort is
drastic but eliminates the possibility that further processing will
compound the problem.

The more checks you have, the more likely it is that one of them will
trip.  If you can't think of good ways of handling these checks, the
end result, for the user, may well be very much worse than if the
check had never been performed in the first place.

Of the two handling options, neither is really acceptable.  However,
there is a third option which ought to be considered: to continue but
mark the processed data as suspect.

I.e. each data item would have a truth value of 1.0 for good data,
0.0 for absolutely rotten data, utilising values in between if you
have some idea how good the data is.  If you have numeric overflow,
you could set the data to the largest value available, and mark it as
suspect.

Any data further derived from suspect data must also be marked as
suspect.

Taking a probabilistic attitude to data would bring a lot of software
into the real world where failures can happen at all levels.  Using
this approach would made complex mission-critical software like the
failing Ariane software much easier to understand and control.  Data
would be processed along the same path regardless of whether it is
suspect or entirely valid.  Only the end-users of the data would be
affected, and where duplication of systems provides redundancy, the
algorithm would be to switch to the backup on receiving suspect data,
and switch back to the main source if the backup was suspect.  If
both sources are suspect, then take the least suspect source.  This
is simple and you don't lose your vital input data.  The data truth
values would be passed on from system to system along with the data.

You _never_ switch off a computer, but you may have cause to mark all
data emanating from it as suspect.  Leave it up to the users of the
data to decide if they want to use it or not - they may have no
choice.


Along with the data truth attribute, you need a data type attribute.
This is tending to be relatively standard stuff now that objects are
around and need to know what kind of object they are.  But adding a
data type field is still something that designers skimp on if not
supplied by the language, relying instead on implicit coding of type
information in the senders and receivers of data.

Lack of type information accounts for why the Ariane flight control
was able to interpret diagnostic data as attitude data, virtually
guaranteeing catastrophic failure.  At least if attitude data had
been cut short it could have continued in a straight line.


Well, those are what I think are the important lessons to be learned.
The main reasons cited for Ariane 501's failure are typical human
ones which will be made again on the next big project.  I.e.
inadequate testing, particularly of the complete system in its
(simulated) environment.  Surprise, surprise, this turns out to be
too difficult and too costly to achieve thoroughly.  And small system
mistakes which stress the adequate functioning of the system as a
whole (like thinking that the Ariane 4 alignment process didn't need
changing for Ariane 5).  These will happen time and again, we're only
human.  But with more realistic data processing the system as a whole
would stand a better chance of survival.

SimonB

[All my own opinions, of course.]





^ permalink raw reply	[flat|nested] 194+ messages in thread
* Re: Ariane 5 - not an exception?
@ 1996-08-08  0:00 Marin David Condic, 407.796.8997, M/S 731-93
  1996-08-09  0:00 ` John McCabe
  0 siblings, 1 reply; 194+ messages in thread
From: Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-08-08  0:00 UTC (permalink / raw)



Francis Lipski <g1006@FS1.MAR.LMCO.COM> writes (with deletions):
>> > "A PL/I programmer
>> > experienced with real time systems, would have CHALLENGED
>> > such a stupid requirement that the computer be shut down by the
>> > error-handler in the event of a fixed-point overflow.  He would
>> > have had it changed.
>
>   Not always possible.  If you are in the minority and are unsuccessful
>to argue others to your point, what do you do?
>
    That's not always the case. Sometimes, the issue is "Either we do
    the project with runtime checks supressed or we don't do it at all
    because we don't have the CPU margin to make it work." Often what
    you do is turn off most or all of the runtime checks, then
    implement interrupt service routines to saturate math results on
    overflows, etc. and hope that will do the trick for any
    unanticipated errors.

    If they were running at 80% utilization without runtime checks,
    including the checks might have left an unacceptable risk. If they
    had run with checks in place and were at 98% utilization and hit a
    "corner case" in the software which drove them over 100%, we'd be
    able to sit here now and criticize them for failing to remove the
    checks to leave a safety margin on utilization.

    There's always tradeoffs in engineering. You have to weigh risks
    and rewards. Risk: public humiliation, billions of $ lost,
    thousands of casualties. Reward: a certificate with your name on
    it in a plastic frame. The Ariane 5 engineers have no doubt
    learned this lesson.

    With respect to the earlier poster's comments about "experienced
    PL/I programmers" I'd have to say that smacks of language bigotry.
    It would be the same sort of thing as saying "experienced German
    speaking engineers wouldn't have made such a stupid mistake. It's
    because the engineers were speaking French that the rocket went
    down."

    MDC

Marin David Condic, Senior Computer Engineer    ATT:        407.796.8997
M/S 731-96                                      Technet:    796.8997
Pratt & Whitney, GESP                           Fax:        407.796.4669
P.O. Box 109600                                 Internet:   CONDICMA@PWFL.COM
West Palm Beach, FL 33410-9600                  Internet:   CONDIC@FLINET.COM
===============================================================================
    "Some people say the rainforests must be saved because the cure for
    cancer might be there. Why aren't these same people worried that
    the scientist who would have found that cure might be aborted?"

        --  John Switzer
===============================================================================




^ permalink raw reply	[flat|nested] 194+ messages in thread
* Re: Ariane 5 - not an exception?
@ 1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
  1996-08-15  0:00 ` John McCabe
  0 siblings, 1 reply; 194+ messages in thread
From: Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-08-13  0:00 UTC (permalink / raw)



John McCabe <john@ASSEN.DEMON.CO.UK> writes:
>The point I am trying to make here is that I believe that the success
>of a mission should never be traded off against such an arbitrary
>requirement as a loading margin.
>
    Well, I have to agree that the important thing is mission success,
    not loading margin. But the general reason you establish some sort
    of "goal" for margin is to insure mission success. When going to
    Zero Margin (or worse) means dropping the rocket in the drink, not
    leaving yourself some room for "corner cases" which you never
    tested could be construed as imprudent - just as turning off
    checks could be considered imprudent. Had the "unanticipated case"
    never occurred, the software developers would have been "heros"
    and would have been given a certificate with their name on it in a
    cheap plastic frame. They took a gamble and lost, so now they get
    to be the scapegoats for us to kick around for a while.

    I'll admit that I also dislike setting some absolute number for
    CPU margin and sticking to it blindly. Eroding margin simply
    erodes the level of confidence and you can afford to do that
    sometimes. Especially if you're willing to do the work to
    demonstrate that you really have found the worst-case behavior or
    that the system is sufficiently deterministic that you can run
    with less margin and maintain sufficient confidence. (Define
    "sufficient confidence...")

    Lots of people have tried to make the case that you should never
    turn off the runtime checks that Ada provides because they're
    critical to the safety of the system you are developing. I'd like
    to agree and certainly Ariane 5 is an example of where this might
    have prevented disaster. But sometimes us poor saps who have
    nothing to work with but a Mil-Std-1750a are stuck making
    tradeoffs between safety checks and building a system that will
    work at all.

    Anybody want to make me a rad-hard, space tested, 200mips
    processor that I can buy in small lots at $40 a piece and has a
    full suite of development tools (including Ada95 compiler)
    available for it? (Sober up, Marin! ;-)

    MDC

Marin David Condic, Senior Computer Engineer    ATT:        407.796.8997
M/S 731-96                                      Technet:    796.8997
Pratt & Whitney, GESP                           Fax:        407.796.4669
P.O. Box 109600                                 Internet:   CONDICMA@PWFL.COM
West Palm Beach, FL 33410-9600                  Internet:   CONDIC@FLINET.COM
===============================================================================
    "Being in a minority, even a minority of one, did not make you
    mad. There was truth, and there was untruth, and if you clung to
    the truth even against the whole world, you were not mad. 'Sanity
    is not statistical.'"

        --  G. Orwell, "1984"
===============================================================================




^ permalink raw reply	[flat|nested] 194+ messages in thread
* Re: Ariane 5 - not an exception?
@ 1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
  1996-08-15  0:00 ` John McCabe
  0 siblings, 1 reply; 194+ messages in thread
From: Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-08-13  0:00 UTC (permalink / raw)



++ robin <rav@GOANNA.CS.RMIT.EDU.AU> writes:
>---As I stated, a PL/I programmer experienced in real-time
>programming, would not have made this stupid mistake.
>

    This still smacks of language bigotry. Why is it that only an
    experienced PL/I programmer would not make this "mistake"? I've
    personally seen *lots* of mistakes made by many "experienced"
    programmers in just about every language there is - including
    PL/I.

    Just remember that Ada has built-in runtime checks on conversions
    and the ability to write interrupt service routines as well. (And
    we "experienced Ada programmers" know how to use them, too!) The
    monday morning quarterbacks with 20/20 hindsight binoculars can
    easily see that the best thing would have been to leave in the
    checks or write an ISR which saturated the math rather than shut
    the unit down. But you don't need to be fluent in PL/I to see
    that.

    The language the system was programmed in or the language spoken
    by the developers has nothing to do with the error that occurred.
    It occurred because there was a conscious decision on someone's
    part to remove the safety net and to handle all exceptions by
    shutting down the channel. The designers no doubt made this
    decision for engineering reasons that are more complex than are
    outlined in the failure report and certainly had little or nothing
    at all to do with the language of implementation. And sitting in
    the back-seat after the crash telling the driver "If *I* had been
    driving, I'd never have crashed..." is condescending as well as
    being completely unprovable.

    MDC


Marin David Condic, Senior Computer Engineer    ATT:        407.796.8997
M/S 731-96                                      Technet:    796.8997
Pratt & Whitney, GESP                           Fax:        407.796.4669
P.O. Box 109600                                 Internet:   CONDICMA@PWFL.COM
West Palm Beach, FL 33410-9600                  Internet:   CONDIC@FLINET.COM
===============================================================================
    "It may be true that the law cannot make a man love me. But it can
    keep him from lynching me, and I think that's pretty important."

            --  Rev. Martin Luther King, Jr
===============================================================================




^ permalink raw reply	[flat|nested] 194+ messages in thread

end of thread, other threads:[~1996-09-20  0:00 UTC | newest]

Thread overview: 194+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-07-25  0:00 Ariane 5 - not an exception? Simon Bluck
1996-07-25  0:00 ` Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) Kirk Beitz
1996-07-26  0:00   ` ++           robin
1996-08-05  0:00     ` Darren C Davenport
1996-08-06  0:00       ` U32872
1996-08-07  0:00         ` Robert Dewar
1996-08-08  0:00           ` Pascal Martin @lone
1996-08-09  0:00             ` Robert Dewar
1996-08-10  0:00               ` dwnoon
1996-08-11  0:00                 ` Robert Dewar
1996-08-15  0:00                   ` dwnoon
1996-08-16  0:00                     ` Robert Dewar
1996-08-20  0:00                       ` dwnoon
1996-08-12  0:00                 ` Ken Garlington
1996-08-15  0:00                 ` Richard Riehle
1996-08-22  0:00                   ` ++           robin
1996-08-23  0:00                     ` Ken Garlington
1996-08-31  0:00                     ` Ada versus PL/I " Richard Riehle
1996-09-02  0:00                       ` ++           robin
1996-09-02  0:00                         ` Richard A. O'Keefe
1996-09-03  0:00                           ` ++           robin
1996-09-03  0:00                             ` Robb Nebbe
1996-09-17  0:00                             ` shmuel
1996-09-17  0:00                               ` Jay McFadyen
1996-09-18  0:00                                 ` John McCabe
1996-09-20  0:00                               ` shmuel
1996-09-03  0:00                       ` ++           robin
1996-09-04  0:00                         ` Robert Dewar
1996-09-07  0:00                           ` ++           robin
1996-09-06  0:00                             ` PL/I or PL/1 Larry Hazel
1996-09-03  0:00                       ` Ada versus PL/I (was: Re: Ariane 5 - not an exception?) J. Kanze
1996-09-07  0:00                         ` Robert Dewar
1996-09-09  0:00                           ` ++           robin
1996-09-09  0:00                             ` Robert Dewar
1996-09-09  0:00                               ` Ken Garlington
1996-09-11  0:00                     ` Multiple reasons for failure of Ariane 5 " J.Worringen
1996-09-12  0:00                       ` Ken Garlington
1996-09-14  0:00                       ` David Alex Lamb
1996-09-14  0:00                       ` Use DejaNews to retrieve Ariane discussion David Alex Lamb
1996-09-19  0:00                         ` Earl H. Kinmonth
1996-08-11  0:00               ` Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) ++           robin
     [not found]               ` <4uibvh$References: <Dv45EJ.8r@fsa.bris.ac.uk>
1996-08-16  0:00                 ` A. Grant
1996-08-08  0:00         ` bohn
1996-07-26  0:00   ` Robert I. Eachus
1996-08-23  0:00   ` Jon S Anthony
1996-08-26  0:00     ` ++           robin
1996-08-23  0:00   ` Jon S Anthony
1996-08-23  0:00     ` ++           robin
1996-08-23  0:00       ` Richard A. O'Keefe
1996-08-23  0:00         ` Ken Garlington
1996-08-26  0:00         ` ++           robin
1996-08-27  0:00           ` Ken Garlington
1996-08-28  0:00             ` Larry Kilgallen
1996-08-29  0:00               ` Ken Garlington
1996-08-30  0:00             ` ++           robin
1996-08-30  0:00               ` David Weller
1996-09-04  0:00               ` Ken Garlington
1996-09-06  0:00                 ` Sandy McPherson
1996-09-09  0:00                   ` Ken Garlington
1996-08-30  0:00         ` Jon S Anthony
1996-08-26  0:00       ` Ken Garlington
1996-08-26  0:00         ` Dave Jones
1996-08-27  0:00           ` Ken Garlington
1996-08-30  0:00             ` ++           robin
1996-09-04  0:00               ` Ken Garlington
1996-09-06  0:00                 ` ++           robin
1996-09-18  0:00               ` Merlin Dorfman
1996-09-20  0:00                 ` John McCabe
1996-08-30  0:00         ` ++           robin
1996-08-30  0:00           ` John McCabe
1996-09-06  0:00       ` Jon S Anthony
1996-09-06  0:00         ` Robert Dewar
1996-07-26  0:00 ` Ariane 5 - not an exception? JP Thornley
1996-07-29  0:00   ` Nigel Tzeng
1996-07-29  0:00   ` JP Thornley
1996-07-29  0:00   ` Ken Garlington
1996-07-30  0:00   ` Robert I. Eachus
1996-07-31  0:00     ` JP Thornley
1996-08-01  0:00       ` Alan Brain
1996-08-02  0:00         ` JP Thornley
1996-08-01  0:00   ` Ken Garlington
1996-07-26  0:00 ` Theodore E. Dennison
1996-07-29  0:00   ` Ken Garlington
1996-07-26  0:00 ` ++           robin
1996-07-29  0:00   ` Bill Angel
1996-07-29  0:00     ` Paul_Green
1996-07-30  0:00     ` Lloyd Fischer
1996-07-30  0:00     ` Ken Garlington
1996-07-30  0:00     ` Nancy Mead
1996-07-31  0:00       ` Tucker Taft
1996-07-31  0:00       ` Steve O'Neill
1996-08-01  0:00       ` root
1996-08-01  0:00         ` Tucker Taft
1996-07-30  0:00     ` Richard Shetron
1996-07-30  0:00       ` ++           robin
1996-07-30  0:00     ` Bob Kurtz
1996-08-04  0:00     ` Richard Riehle
1996-08-05  0:00       ` Fergus Henderson
1996-08-05  0:00       ` Nigel Tzeng
1996-08-06  0:00         ` John McCabe
1996-08-05  0:00       ` John McCabe
1996-08-13  0:00       ` ++           robin
1996-08-13  0:00         ` Ken Garlington
1996-08-13  0:00           ` Kirk Bradley
1996-08-14  0:00             ` Ken Garlington
1996-08-18  0:00           ` PL/I Versus Ada (Was: Arianne ...) Richard Riehle
1996-08-19  0:00             ` Robert Dewar
1996-08-20  0:00             ` Lon Amick
1996-08-21  0:00             ` Tim Dugan
1996-08-21  0:00             ` Lon D. Gowen, Ph.D.
1996-08-21  0:00             ` Tony Konashenok
1996-08-28  0:00               ` Richard Riehle
1996-08-29  0:00                 ` Lon D. Gowen, Ph.D.
1996-08-30  0:00                   ` Tony Konashenok
1996-08-30  0:00                     ` Adam Beneschan
1996-08-30  0:00                 ` John McCabe
1996-08-23  0:00             ` arbuckj
1996-08-22  0:00           ` Ariane 5 - not an exception? ++           robin
1996-08-22  0:00             ` Ken Garlington
1996-08-13  0:00         ` Darren C Davenport
1996-08-14  0:00         ` John McCabe
1996-08-19  0:00           ` Chris Papademetrious
1996-08-22  0:00           ` ++           robin
1996-08-22  0:00             ` John McCabe
1996-08-23  0:00               ` Ken Garlington
1996-08-24  0:00                 ` John McCabe
1996-08-26  0:00                   ` Byron B. Kauffman
1996-08-27  0:00                     ` John McCabe
1996-08-28  0:00                       ` Byron B. Kauffman
1996-08-28  0:00                         ` Robert Dewar
1996-08-29  0:00                           ` Ted Dennison
1996-08-30  0:00                         ` John McCabe
1996-08-22  0:00             ` Martin Tom Brown
1996-08-23  0:00             ` Bob Gilbert
1996-08-24  0:00               ` Robert I. Eachus
1996-08-25  0:00                 ` John McCabe
1996-08-27  0:00                 ` Tom Speer
1996-08-26  0:00               ` Jon S Anthony
1996-08-20  0:00         ` Richard Riehle
1996-07-30  0:00   ` Ken Garlington
1996-08-02  0:00     ` Craig P. Beyers
1996-07-30  0:00   ` Steve O'Neill
1996-07-31  0:00     ` Martin Tom Brown
1996-07-31  0:00       ` Nigel Tzeng
1996-08-02  0:00       ` Ken Garlington
1996-08-03  0:00         ` Thomas Kendelbacher
1996-08-01  0:00     ` ++           robin
1996-08-01  0:00       ` Ken Garlington
1996-08-05  0:00         ` John McCabe
1996-08-06  0:00           ` Mark van Walraven
1996-08-06  0:00           ` Ken Garlington
1996-08-06  0:00           ` Ken Garlington
1996-08-02  0:00       ` Pascal Martin @lone
1996-08-03  0:00         ` Dr. Richard Botting
1996-08-05  0:00           ` system
1996-08-06  0:00         ` ++           robin
1996-08-08  0:00           ` Darius Blasband
1996-08-10  0:00             ` dwnoon
1996-08-12  0:00               ` Thomas Kendelbacher
1996-08-13  0:00                 ` ++           robin
1996-08-13  0:00             ` Roy Gardiner
1996-08-13  0:00               ` Ken Garlington
1996-08-13  0:00               ` Lance Kibblewhite
1996-08-13  0:00             ` ++           robin
1996-08-15  0:00             ` Richard Riehle
1996-08-05  0:00       ` Steve O'Neill
1996-08-06  0:00         ` Francis Lipski
1996-08-07  0:00           ` Martin Tom Brown
1996-08-09  0:00             ` Ken Garlington
1996-08-06  0:00         ` Frank Manning
1996-08-08  0:00           ` Steve O'Neill
1996-08-09  0:00             ` Pat Rogers
1996-08-09  0:00           ` JP Thornley
1996-08-13  0:00         ` ++           robin
1996-08-13  0:00           ` Steve O'Neill
1996-08-01  0:00   ` Jon S Anthony
1996-08-02  0:00   ` James Kanze US/ESC 60/3/141 #40763
1996-08-06  0:00   ` Robert I. Eachus
1996-08-06  0:00   ` Stefan 'Stetson' Skoglund
1996-07-26  0:00 ` Bob Gilbert
1996-07-29  0:00   ` Martin Tom Brown
1996-07-30  0:00     ` John McCabe
1996-07-31  0:00       ` Greg Bond
1996-08-03  0:00         ` John McCabe
1996-07-27  0:00 ` Bill Angel
1996-07-30  0:00 ` Dr. Richard Botting
1996-07-30  0:00   ` David Weller
1996-07-30  0:00     ` Robert Dewar
  -- strict thread matches above, loose matches on Subject: below --
1996-08-08  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-09  0:00 ` John McCabe
1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-15  0:00 ` John McCabe
1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-15  0:00 ` John McCabe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox