From: rav@goanna.cs.rmit.edu.au (++ robin)
Subject: Re: Ariane 5 - not an exception?
Date: 1996/07/26
Date: 1996-07-26T00:00:00+00:00 [thread overview]
Message-ID: <4t9vdg$jfb@goanna.cs.rmit.edu.au> (raw)
In-Reply-To: Dv45EJ.8r@fsa.bris.ac.uk
simonb@pact.srf.ac.uk (Simon Bluck) writes:
>The Ariane 501 flight failure was due to the raising of an unexpected
>Ada exception,
---An exception, yes, but not unexpected.
The programming mistake made was in assuming that a
floating-point value of some 58 significant bits would
somehow "fit" into a 15-bit integer.
There was no check that the data conversion would not
result in overflow, so the problem went to the error
handler, which shut down the system.
>which was handled by switching off the computer. The
>report on this:
> http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html
>is clear and hard-hitting: it will result in much improved software.
>But does it get right to the bottom of the issues, and does the
>software community appreciate that there are fundamental software
>control problems which can directly give rise to such enormous
>failures, in this particular case thankfully without loss of life?
>It is most unfortunate, but must be accepted as true, that if the
>Ariane software had been written in a less powerful language the
>numeric overflow might have gone unnoticed, the computers would have
>remained switched on, and the rocket would have continued its upward
>flight.
>Exceptions and assertions are both used, in Ada and C/C++,
---and PL/I
>to detect
>software/hardware anomalies. When one of these trips, it is
>frequently very difficult for the designer to know how best to handle
>the problem.
---Not in the case of a simple fixed-point overflow -- as was the
case with Ariane. It is a fact that real-time programming
has been available in PL/I for some 30 years, and recovery
from errors is standard established practice.
> To continue may result in corrupt data;
---To continue in this case probably would need the value to
be set to the maximum. And it wouldn't be corrupt data.
>to abort is
>drastic but eliminates the possibility that further processing will
>compound the problem.
---What? Here, the lack of further processing resulted in
destruction of the project!
>The more checks you have, the more likely it is that one of them will
>trip. If you can't think of good ways of handling these checks, the
>end result, for the user, may well be very much worse than if the
>check had never been performed in the first place.
>Of the two handling options, neither is really acceptable. However,
>there is a third option which ought to be considered: to continue but
>mark the processed data as suspect.
There are other better approaches. One is to continue
with the maximum value; another is to avoid the use of
a 16-bit variable, and to use a variable as the same
size and type (here floating-point storage),
thus avoiding the problem altogether.
>I.e. each data item would have a truth value of 1.0 for good data,
>0.0 for absolutely rotten data, utilising values in between if you
>have some idea how good the data is. If you have numeric overflow,
>you could set the data to the largest value available, and mark it as
>suspect.
>Any data further derived from suspect data must also be marked as
>suspect.
>Taking a probabilistic attitude to data would bring a lot of software
>into the real world where failures can happen at all levels. Using
>this approach would made complex mission-critical software like the
>failing Ariane software much easier to understand and control. Data
>would be processed along the same path regardless of whether it is
>suspect or entirely valid. Only the end-users of the data would be
>affected, and where duplication of systems provides redundancy, the
>algorithm would be to switch to the backup on receiving suspect data,
>and switch back to the main source if the backup was suspect.
---In Ariane, both the active processor and the backup failed at
the same time, because it was a *programming* error that was
encountered at the same time in both processors, and both
processors were shut down at the same time by their respective
error handlers.
> If both sources are suspect, then take the least suspect source. This
>is simple and you don't lose your vital input data. The data truth
>values would be passed on from system to system along with the data.
>You _never_ switch off a computer, but you may have cause to mark all
>data emanating from it as suspect. Leave it up to the users of the
>data to decide if they want to use it or not - they may have no
>choice.
---Indeed.
>Along with the data truth attribute, you need a data type attribute.
>This is tending to be relatively standard stuff now that objects are
>around and need to know what kind of object they are. But adding a
>data type field is still something that designers skimp on if not
>supplied by the language, relying instead on implicit coding of type
>information in the senders and receivers of data.
>Lack of type information accounts for why the Ariane flight control
>was able to interpret diagnostic data as attitude data, virtually
>guaranteeing catastrophic failure. At least if attitude data had
>been cut short it could have continued in a straight line.
---This is more of a lack of communication between the two
programs. Another design error.
>Well, those are what I think are the important lessons to be learned.
---I think the real lessons are that
1. real-time programming requires special expertise.
2. the choice of language is suspect. A better-established
language such as PL/I -- specifically designed for
real-time programming -- with robust compilers, and
with its base of experienced programming
staff could well have prevented this disaster.
next prev parent reply other threads:[~1996-07-26 0:00 UTC|newest]
Thread overview: 194+ messages / expand[flat|nested] mbox.gz Atom feed top
1996-07-25 0:00 Ariane 5 - not an exception? Simon Bluck
1996-07-25 0:00 ` Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) Kirk Beitz
1996-07-26 0:00 ` ++ robin
1996-08-05 0:00 ` Darren C Davenport
1996-08-06 0:00 ` U32872
1996-08-07 0:00 ` Robert Dewar
1996-08-08 0:00 ` Pascal Martin @lone
1996-08-09 0:00 ` Robert Dewar
1996-08-10 0:00 ` dwnoon
1996-08-11 0:00 ` Robert Dewar
1996-08-15 0:00 ` dwnoon
1996-08-16 0:00 ` Robert Dewar
1996-08-20 0:00 ` dwnoon
1996-08-12 0:00 ` Ken Garlington
1996-08-15 0:00 ` Richard Riehle
1996-08-22 0:00 ` ++ robin
1996-08-23 0:00 ` Ken Garlington
1996-08-31 0:00 ` Ada versus PL/I " Richard Riehle
1996-09-02 0:00 ` ++ robin
1996-09-02 0:00 ` Richard A. O'Keefe
1996-09-03 0:00 ` ++ robin
1996-09-03 0:00 ` Robb Nebbe
1996-09-17 0:00 ` shmuel
1996-09-17 0:00 ` Jay McFadyen
1996-09-18 0:00 ` John McCabe
1996-09-20 0:00 ` shmuel
1996-09-03 0:00 ` ++ robin
1996-09-04 0:00 ` Robert Dewar
1996-09-07 0:00 ` ++ robin
1996-09-06 0:00 ` PL/I or PL/1 Larry Hazel
1996-09-03 0:00 ` Ada versus PL/I (was: Re: Ariane 5 - not an exception?) J. Kanze
1996-09-07 0:00 ` Robert Dewar
1996-09-09 0:00 ` ++ robin
1996-09-09 0:00 ` Robert Dewar
1996-09-09 0:00 ` Ken Garlington
1996-09-11 0:00 ` Multiple reasons for failure of Ariane 5 " J.Worringen
1996-09-12 0:00 ` Ken Garlington
1996-09-14 0:00 ` David Alex Lamb
1996-09-14 0:00 ` Use DejaNews to retrieve Ariane discussion David Alex Lamb
1996-09-19 0:00 ` Earl H. Kinmonth
1996-08-11 0:00 ` Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) ++ robin
[not found] ` <4uibvh$References: <Dv45EJ.8r@fsa.bris.ac.uk>
1996-08-16 0:00 ` A. Grant
1996-08-08 0:00 ` bohn
1996-07-26 0:00 ` Robert I. Eachus
1996-08-23 0:00 ` Jon S Anthony
1996-08-26 0:00 ` ++ robin
1996-08-23 0:00 ` Jon S Anthony
1996-08-23 0:00 ` ++ robin
1996-08-23 0:00 ` Richard A. O'Keefe
1996-08-23 0:00 ` Ken Garlington
1996-08-26 0:00 ` ++ robin
1996-08-27 0:00 ` Ken Garlington
1996-08-28 0:00 ` Larry Kilgallen
1996-08-29 0:00 ` Ken Garlington
1996-08-30 0:00 ` ++ robin
1996-08-30 0:00 ` David Weller
1996-09-04 0:00 ` Ken Garlington
1996-09-06 0:00 ` Sandy McPherson
1996-09-09 0:00 ` Ken Garlington
1996-08-30 0:00 ` Jon S Anthony
1996-08-26 0:00 ` Ken Garlington
1996-08-26 0:00 ` Dave Jones
1996-08-27 0:00 ` Ken Garlington
1996-08-30 0:00 ` ++ robin
1996-09-04 0:00 ` Ken Garlington
1996-09-06 0:00 ` ++ robin
1996-09-18 0:00 ` Merlin Dorfman
1996-09-20 0:00 ` John McCabe
1996-08-30 0:00 ` ++ robin
1996-08-30 0:00 ` John McCabe
1996-09-06 0:00 ` Jon S Anthony
1996-09-06 0:00 ` Robert Dewar
1996-07-26 0:00 ` ++ robin [this message]
1996-07-29 0:00 ` Ariane 5 - not an exception? Bill Angel
1996-07-29 0:00 ` Paul_Green
1996-07-30 0:00 ` Lloyd Fischer
1996-07-30 0:00 ` Ken Garlington
1996-07-30 0:00 ` Nancy Mead
1996-07-31 0:00 ` Steve O'Neill
1996-07-31 0:00 ` Tucker Taft
1996-08-01 0:00 ` root
1996-08-01 0:00 ` Tucker Taft
1996-07-30 0:00 ` Richard Shetron
1996-07-30 0:00 ` ++ robin
1996-07-30 0:00 ` Bob Kurtz
1996-08-04 0:00 ` Richard Riehle
1996-08-05 0:00 ` John McCabe
1996-08-05 0:00 ` Nigel Tzeng
1996-08-06 0:00 ` John McCabe
1996-08-05 0:00 ` Fergus Henderson
1996-08-13 0:00 ` ++ robin
1996-08-13 0:00 ` Ken Garlington
1996-08-13 0:00 ` Kirk Bradley
1996-08-14 0:00 ` Ken Garlington
1996-08-18 0:00 ` PL/I Versus Ada (Was: Arianne ...) Richard Riehle
1996-08-19 0:00 ` Robert Dewar
1996-08-20 0:00 ` Lon Amick
1996-08-21 0:00 ` Lon D. Gowen, Ph.D.
1996-08-21 0:00 ` Tony Konashenok
1996-08-28 0:00 ` Richard Riehle
1996-08-29 0:00 ` Lon D. Gowen, Ph.D.
1996-08-30 0:00 ` Tony Konashenok
1996-08-30 0:00 ` Adam Beneschan
1996-08-30 0:00 ` John McCabe
1996-08-21 0:00 ` Tim Dugan
1996-08-23 0:00 ` arbuckj
1996-08-22 0:00 ` Ariane 5 - not an exception? ++ robin
1996-08-22 0:00 ` Ken Garlington
1996-08-13 0:00 ` Darren C Davenport
1996-08-14 0:00 ` John McCabe
1996-08-19 0:00 ` Chris Papademetrious
1996-08-22 0:00 ` ++ robin
1996-08-22 0:00 ` John McCabe
1996-08-23 0:00 ` Ken Garlington
1996-08-24 0:00 ` John McCabe
1996-08-26 0:00 ` Byron B. Kauffman
1996-08-27 0:00 ` John McCabe
1996-08-28 0:00 ` Byron B. Kauffman
1996-08-28 0:00 ` Robert Dewar
1996-08-29 0:00 ` Ted Dennison
1996-08-30 0:00 ` John McCabe
1996-08-22 0:00 ` Martin Tom Brown
1996-08-23 0:00 ` Bob Gilbert
1996-08-24 0:00 ` Robert I. Eachus
1996-08-25 0:00 ` John McCabe
1996-08-27 0:00 ` Tom Speer
1996-08-26 0:00 ` Jon S Anthony
1996-08-20 0:00 ` Richard Riehle
1996-07-30 0:00 ` Ken Garlington
1996-08-02 0:00 ` Craig P. Beyers
1996-07-30 0:00 ` Steve O'Neill
1996-07-31 0:00 ` Martin Tom Brown
1996-07-31 0:00 ` Nigel Tzeng
1996-08-02 0:00 ` Ken Garlington
1996-08-03 0:00 ` Thomas Kendelbacher
1996-08-01 0:00 ` ++ robin
1996-08-01 0:00 ` Ken Garlington
1996-08-05 0:00 ` John McCabe
1996-08-06 0:00 ` Mark van Walraven
1996-08-06 0:00 ` Ken Garlington
1996-08-06 0:00 ` Ken Garlington
1996-08-02 0:00 ` Pascal Martin @lone
1996-08-03 0:00 ` Dr. Richard Botting
1996-08-05 0:00 ` system
1996-08-06 0:00 ` ++ robin
1996-08-08 0:00 ` Darius Blasband
1996-08-10 0:00 ` dwnoon
1996-08-12 0:00 ` Thomas Kendelbacher
1996-08-13 0:00 ` ++ robin
1996-08-13 0:00 ` Roy Gardiner
1996-08-13 0:00 ` Lance Kibblewhite
1996-08-13 0:00 ` Ken Garlington
1996-08-13 0:00 ` ++ robin
1996-08-15 0:00 ` Richard Riehle
1996-08-05 0:00 ` Steve O'Neill
1996-08-06 0:00 ` Francis Lipski
1996-08-07 0:00 ` Martin Tom Brown
1996-08-09 0:00 ` Ken Garlington
1996-08-06 0:00 ` Frank Manning
1996-08-08 0:00 ` Steve O'Neill
1996-08-09 0:00 ` Pat Rogers
1996-08-09 0:00 ` JP Thornley
1996-08-13 0:00 ` ++ robin
1996-08-13 0:00 ` Steve O'Neill
1996-08-01 0:00 ` Jon S Anthony
1996-08-02 0:00 ` James Kanze US/ESC 60/3/141 #40763
1996-08-06 0:00 ` Robert I. Eachus
1996-08-06 0:00 ` Stefan 'Stetson' Skoglund
1996-07-26 0:00 ` Bob Gilbert
1996-07-29 0:00 ` Martin Tom Brown
1996-07-30 0:00 ` John McCabe
1996-07-31 0:00 ` Greg Bond
1996-08-03 0:00 ` John McCabe
1996-07-26 0:00 ` Theodore E. Dennison
1996-07-29 0:00 ` Ken Garlington
1996-07-26 0:00 ` JP Thornley
1996-07-29 0:00 ` Nigel Tzeng
1996-07-29 0:00 ` JP Thornley
1996-07-29 0:00 ` Ken Garlington
1996-07-30 0:00 ` Robert I. Eachus
1996-07-31 0:00 ` JP Thornley
1996-08-01 0:00 ` Alan Brain
1996-08-02 0:00 ` JP Thornley
1996-08-01 0:00 ` Ken Garlington
1996-07-27 0:00 ` Bill Angel
1996-07-30 0:00 ` Dr. Richard Botting
1996-07-30 0:00 ` David Weller
1996-07-30 0:00 ` Robert Dewar
-- strict thread matches above, loose matches on Subject: below --
1996-08-08 0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-09 0:00 ` John McCabe
1996-08-13 0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-15 0:00 ` John McCabe
1996-08-13 0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-15 0:00 ` John McCabe
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox