From: rav@goanna.cs.rmit.edu.au (++ robin)
Subject: Re: Ariane 5 - not an exception?
Date: 1996/08/13
Date: 1996-08-13T00:00:00+00:00 [thread overview]
Message-ID: <4up6jg$h3e@goanna.cs.rmit.edu.au> (raw)
In-Reply-To: 32065615.77C7@sanders.lockheed.com
g1006@fs1.mar.lmco.com (Francis Lipski) writes:
>In article <32065615.77C7@sanders.lockheed.com>, you write:
>> ++ robin wrote:
>> > Steve O'Neill <smoneill@sanders.lockheed.com> writes:
>> > >I disagree completely! The language was not the
>> > >problem the design decisions in how the language
>> > >was used were.
>> >
>> > ---The choice of language is indeed very relevant.
>> > What I wrote in an earlier posting on this topic is highly
>> > apt:
>> >
>> > "A PL/I programmer
>> > experienced with real time systems, would have CHALLENGED
>> > such a stupid requirement that the computer be shut down by the
>> > error-handler in the event of a fixed-point overflow. He would
>> > have had it changed.
> Not always possible. If you are in the minority and are unsuccessful
>to argue others to your point, what do you do?
---Don't be absurd. The checks WERE included in all but 3
of the type conversions in the vicinity of the conversion
that blew up.
> As a previous message in this thread had stated, what
>should someone do? Say to hell with the requirements,
>I'm going to code what I think is correct.
---The requirements were that any kind of interrupt was
going to be handled by the interrupt handler (which would
then shut doen the computer).
A *real* real-time PL/I programmer would have included
a test to make certain that the interrupt could not occur.
That was NOT going against the specifications.
But, as I wrote in a previous post, a belt-and-braces
approach should have been taken, viz, to include an
error handler for fixed-point overflow, as an interrupt
was to be taken as SUDDEN DEATH for the project.
This is where a PL/I programmer would have had the
specification changed.
>> > "I'd go further to say that no experienced PL/I programmer
>> > would have shut down the system as a result of a fixed-point
>> > overflow.
>> Substitute Ada (or C or FORTRAN or Assembly) for
>> PL/I here and you see my point.
---Neither C nor Fortran have error-handling.
Ada *was* used, and look what happened.
Hence the suggestion that PL/I expertise on the
project would have been advantage. You see,
real-time programming in PL/I has been part of the scene
since 1966!
>> It's not the language that makes the developer challange the
>> ridiculous requirement to shut down it is the developer "experienced with
>> real-time systems". Just because I am programming in PL/I doesn't mean I
>> am magically a better real-time developer. As a real-time designer
>> concerned with the system-wide aspects of completely shutting down any
>> sensor I would question this approach regardless of the language in use.
>> This has nothing to do with the fact that much of my experience is with
>> Ada.
>> The (flawed) reasoning for why certain conversions were not protected was
>> also covered in the report. Invalid assumptions were made
---Yes; it was assumed that the value would not overflow
but it did!. They have forgotten Murphy's Law:
"If anything can go wrong, it will". And Robert's
Law: "Even if it *can't* go wrong, it will".
>> Certainly you and I would not have shut down the system but what about
>> the vast majority of developers without as much experience or who thought
>> that their job was to implement the requirements that they were given?
---They could have implemented the "requirements"
WITHOUT raising a fixed-point interrupt,
just by checking for overflow!
> The report states that the rationale was based on the "culture within the
>Ariane programme of only addressing random hardware failures. From this point of view exception - or error- handling mechanisms are designed for a random
>hardware failure which can quite rationally be handled by a backup system"
> If all conversions and other possible overflow
>conditions are protected,
>and then an overflow occurs, what action should be taken?
---Action should be taken to deal with a fixed-point overflow!
Something was overlooked. It needed to be dealt with. That
it was not is a fundamental error! That's why error-handling
is provided! To provide a margin of safety.
> The system has
>just had a random hardware failure. Continue to operate with known bad
>hardware? In the case of an overflow, set to max value, continue and
>hope for the best?
---Good idea, already suggested in the report. But the
report also suggested that the design needed to
take into account programmer error.
> While clearly the design, in this case, did not protect itself sufficiently,
>and compounded errors by not handling the case of a simultaneous failure of
>both processors, what action should be taken on an overflow if not to shut
>down. With flight controls or inertial systems, partitioning into tasks and
>then restarting the offending task is not an option. It would take entirely
>too long to restart the task to be able to effectively recover.
> Regarding the spare requirements. The answer as to why to have spare time
>is to ensure that all hard deadlines are met and to allow growth for future
>versions of SW. Allowing room for growth is necessary in development programs
>however, the requirement is usually never relaxed as more functionality is
>added. That is another story. However, it is necessary to ensure sufficient
>time is available to complete all the processing within the allotted time.
>The execution time of the software is at best a statistical problem, at least
>the hardware times can be statistical. If the SW is always measured as a worse
>case time, and all these are added together can can not allow this time
>to meet or exceed the allowable time, given the statistical nature of the HW.
>So how much spare time should be allotted? If 20% is unrealistic, what
>number should be used, 10%, 1%, 0.001%?
>> >
>> > "Furthermore, he would have included a check that the value
>> > did not go out of range;"
>> >
>> > ---But all it needed was a check that the value was in range.
>> > Such checks had been included on other similar conversions in
>> > the vicinity!
>> Yes, and there was mention in the report that 'they' thought that this
>> would violate that precious spare requirement.
---That's a red herring.
> So they set about picking
>> and choosing which conversions to protect.
---This doesn't sppear specifically in the report as regards
this conversion and the 2 others in the vicinity. There's
the impliciation that these conversions were overlooked.
In any case, the test would have introduced a trivial
number of additional instructions.
>> I find it extremely hard to
>> believe that the (small) handful of instructions to do a range check
>> would have been too much!
---Agreed.
>> And, in hindsight, well worth it.
---Agreed again.
>> The issue of the OBC interpreting the 'essentially diagnostic data' as
>> valid sensor data really makes me wonder. In a system with a reasonable
>> interface between the two devices this should *never* happen. I am
>> surprised that this misinterpretation didn't cause a similar overflow in
>> the OBC and resulting shutdown! :(
---Yes.
> I was also amazed by the poor design of the interface that didn't detect
>this problem. Probably given enough time, some form of error would
>have occurred resulting in the OBC shutting down.
---There were a number of inadequacies revealed in the design.
>> I think that we agree in our assessment of the situation and the fact
>> that these problems could have been avoided with a better overall system
>> design and more extensive testing. Essentially the same conclusions that
>> the review board came to. My only disagreement is with your _opinion_
>> that the simple choice of a different language would have saved the day.
---As I stated, a PL/I programmer experienced in real-time
programming, would not have made this stupid mistake.
>> And with this point I will continue to disagree.
---You do not appear to have grounds for this opinion.
>> Steve O'Neill | "No,no,no, don't tug on that!
>> Sanders, A Lockheed Martin Company | You never know what it might
>> smoneill@sanders.lockheed.com | be attached to."
>> (603) 885-8774 fax: (603) 885-4071| Buckaroo Banzai
next prev parent reply other threads:[~1996-08-13 0:00 UTC|newest]
Thread overview: 194+ messages / expand[flat|nested] mbox.gz Atom feed top
1996-07-25 0:00 Ariane 5 - not an exception? Simon Bluck
1996-07-25 0:00 ` Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) Kirk Beitz
1996-07-26 0:00 ` ++ robin
1996-08-05 0:00 ` Darren C Davenport
1996-08-06 0:00 ` U32872
1996-08-07 0:00 ` Robert Dewar
1996-08-08 0:00 ` Pascal Martin @lone
1996-08-09 0:00 ` Robert Dewar
1996-08-10 0:00 ` dwnoon
1996-08-11 0:00 ` Robert Dewar
1996-08-15 0:00 ` dwnoon
1996-08-16 0:00 ` Robert Dewar
1996-08-20 0:00 ` dwnoon
1996-08-12 0:00 ` Ken Garlington
1996-08-15 0:00 ` Richard Riehle
1996-08-22 0:00 ` ++ robin
1996-08-23 0:00 ` Ken Garlington
1996-08-31 0:00 ` Ada versus PL/I " Richard Riehle
1996-09-02 0:00 ` ++ robin
1996-09-02 0:00 ` Richard A. O'Keefe
1996-09-03 0:00 ` ++ robin
1996-09-03 0:00 ` Robb Nebbe
1996-09-17 0:00 ` shmuel
1996-09-17 0:00 ` Jay McFadyen
1996-09-18 0:00 ` John McCabe
1996-09-20 0:00 ` shmuel
1996-09-03 0:00 ` J. Kanze
1996-09-07 0:00 ` Robert Dewar
1996-09-09 0:00 ` ++ robin
1996-09-09 0:00 ` Robert Dewar
1996-09-09 0:00 ` Ken Garlington
1996-09-03 0:00 ` ++ robin
1996-09-04 0:00 ` Robert Dewar
1996-09-07 0:00 ` ++ robin
1996-09-06 0:00 ` PL/I or PL/1 Larry Hazel
1996-09-11 0:00 ` Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) J.Worringen
1996-09-12 0:00 ` Ken Garlington
1996-09-14 0:00 ` David Alex Lamb
1996-09-14 0:00 ` Use DejaNews to retrieve Ariane discussion David Alex Lamb
1996-09-19 0:00 ` Earl H. Kinmonth
1996-08-11 0:00 ` Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) ++ robin
[not found] ` <4uibvh$References: <Dv45EJ.8r@fsa.bris.ac.uk>
1996-08-16 0:00 ` A. Grant
1996-08-08 0:00 ` bohn
1996-07-26 0:00 ` Robert I. Eachus
1996-08-23 0:00 ` Jon S Anthony
1996-08-26 0:00 ` ++ robin
1996-08-23 0:00 ` Jon S Anthony
1996-08-23 0:00 ` ++ robin
1996-08-23 0:00 ` Richard A. O'Keefe
1996-08-23 0:00 ` Ken Garlington
1996-08-26 0:00 ` ++ robin
1996-08-27 0:00 ` Ken Garlington
1996-08-28 0:00 ` Larry Kilgallen
1996-08-29 0:00 ` Ken Garlington
1996-08-30 0:00 ` ++ robin
1996-08-30 0:00 ` David Weller
1996-09-04 0:00 ` Ken Garlington
1996-09-06 0:00 ` Sandy McPherson
1996-09-09 0:00 ` Ken Garlington
1996-08-30 0:00 ` Jon S Anthony
1996-08-26 0:00 ` Ken Garlington
1996-08-26 0:00 ` Dave Jones
1996-08-27 0:00 ` Ken Garlington
1996-08-30 0:00 ` ++ robin
1996-09-04 0:00 ` Ken Garlington
1996-09-06 0:00 ` ++ robin
1996-09-18 0:00 ` Merlin Dorfman
1996-09-20 0:00 ` John McCabe
1996-08-30 0:00 ` ++ robin
1996-08-30 0:00 ` John McCabe
1996-09-06 0:00 ` Jon S Anthony
1996-09-06 0:00 ` Robert Dewar
1996-07-26 0:00 ` Ariane 5 - not an exception? JP Thornley
1996-07-29 0:00 ` Ken Garlington
1996-07-29 0:00 ` Nigel Tzeng
1996-07-29 0:00 ` JP Thornley
1996-07-30 0:00 ` Robert I. Eachus
1996-07-31 0:00 ` JP Thornley
1996-08-01 0:00 ` Alan Brain
1996-08-02 0:00 ` JP Thornley
1996-08-01 0:00 ` Ken Garlington
1996-07-26 0:00 ` Theodore E. Dennison
1996-07-29 0:00 ` Ken Garlington
1996-07-26 0:00 ` Bob Gilbert
1996-07-29 0:00 ` Martin Tom Brown
1996-07-30 0:00 ` John McCabe
1996-07-31 0:00 ` Greg Bond
1996-08-03 0:00 ` John McCabe
1996-07-26 0:00 ` ++ robin
1996-07-29 0:00 ` Bill Angel
1996-07-29 0:00 ` Paul_Green
1996-07-30 0:00 ` Bob Kurtz
1996-07-30 0:00 ` Richard Shetron
1996-07-30 0:00 ` ++ robin
1996-07-30 0:00 ` Nancy Mead
1996-07-31 0:00 ` Steve O'Neill
1996-07-31 0:00 ` Tucker Taft
1996-08-01 0:00 ` root
1996-08-01 0:00 ` Tucker Taft
1996-07-30 0:00 ` Ken Garlington
1996-07-30 0:00 ` Lloyd Fischer
1996-08-04 0:00 ` Richard Riehle
1996-08-05 0:00 ` Fergus Henderson
1996-08-05 0:00 ` Nigel Tzeng
1996-08-06 0:00 ` John McCabe
1996-08-05 0:00 ` John McCabe
1996-08-13 0:00 ` ++ robin
1996-08-13 0:00 ` Darren C Davenport
1996-08-13 0:00 ` Ken Garlington
1996-08-13 0:00 ` Kirk Bradley
1996-08-14 0:00 ` Ken Garlington
1996-08-18 0:00 ` PL/I Versus Ada (Was: Arianne ...) Richard Riehle
1996-08-19 0:00 ` Robert Dewar
1996-08-20 0:00 ` Lon Amick
1996-08-21 0:00 ` Lon D. Gowen, Ph.D.
1996-08-21 0:00 ` Tony Konashenok
1996-08-28 0:00 ` Richard Riehle
1996-08-29 0:00 ` Lon D. Gowen, Ph.D.
1996-08-30 0:00 ` Tony Konashenok
1996-08-30 0:00 ` Adam Beneschan
1996-08-30 0:00 ` John McCabe
1996-08-21 0:00 ` Tim Dugan
1996-08-23 0:00 ` arbuckj
1996-08-22 0:00 ` Ariane 5 - not an exception? ++ robin
1996-08-22 0:00 ` Ken Garlington
1996-08-14 0:00 ` John McCabe
1996-08-19 0:00 ` Chris Papademetrious
1996-08-22 0:00 ` ++ robin
1996-08-22 0:00 ` John McCabe
1996-08-23 0:00 ` Ken Garlington
1996-08-24 0:00 ` John McCabe
1996-08-26 0:00 ` Byron B. Kauffman
1996-08-27 0:00 ` John McCabe
1996-08-28 0:00 ` Byron B. Kauffman
1996-08-28 0:00 ` Robert Dewar
1996-08-29 0:00 ` Ted Dennison
1996-08-30 0:00 ` John McCabe
1996-08-22 0:00 ` Martin Tom Brown
1996-08-23 0:00 ` Bob Gilbert
1996-08-24 0:00 ` Robert I. Eachus
1996-08-25 0:00 ` John McCabe
1996-08-27 0:00 ` Tom Speer
1996-08-26 0:00 ` Jon S Anthony
1996-08-20 0:00 ` Richard Riehle
1996-07-30 0:00 ` Steve O'Neill
1996-07-31 0:00 ` Martin Tom Brown
1996-07-31 0:00 ` Nigel Tzeng
1996-08-02 0:00 ` Ken Garlington
1996-08-03 0:00 ` Thomas Kendelbacher
1996-08-01 0:00 ` ++ robin
1996-08-01 0:00 ` Ken Garlington
1996-08-05 0:00 ` John McCabe
1996-08-06 0:00 ` Ken Garlington
1996-08-06 0:00 ` Mark van Walraven
1996-08-06 0:00 ` Ken Garlington
1996-08-02 0:00 ` Pascal Martin @lone
1996-08-03 0:00 ` Dr. Richard Botting
1996-08-05 0:00 ` system
1996-08-06 0:00 ` ++ robin
1996-08-08 0:00 ` Darius Blasband
1996-08-10 0:00 ` dwnoon
1996-08-12 0:00 ` Thomas Kendelbacher
1996-08-13 0:00 ` ++ robin
1996-08-13 0:00 ` ++ robin
1996-08-13 0:00 ` Roy Gardiner
1996-08-13 0:00 ` Lance Kibblewhite
1996-08-13 0:00 ` Ken Garlington
1996-08-15 0:00 ` Richard Riehle
1996-08-05 0:00 ` Steve O'Neill
1996-08-06 0:00 ` Frank Manning
1996-08-08 0:00 ` Steve O'Neill
1996-08-09 0:00 ` Pat Rogers
1996-08-09 0:00 ` JP Thornley
1996-08-06 0:00 ` Francis Lipski
1996-08-07 0:00 ` Martin Tom Brown
1996-08-09 0:00 ` Ken Garlington
1996-08-13 0:00 ` ++ robin [this message]
1996-08-13 0:00 ` Steve O'Neill
1996-07-30 0:00 ` Ken Garlington
1996-08-02 0:00 ` Craig P. Beyers
1996-08-01 0:00 ` Jon S Anthony
1996-08-02 0:00 ` James Kanze US/ESC 60/3/141 #40763
1996-08-06 0:00 ` Stefan 'Stetson' Skoglund
1996-08-06 0:00 ` Robert I. Eachus
1996-07-27 0:00 ` Bill Angel
1996-07-30 0:00 ` Dr. Richard Botting
1996-07-30 0:00 ` David Weller
1996-07-30 0:00 ` Robert Dewar
-- strict thread matches above, loose matches on Subject: below --
1996-08-08 0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-09 0:00 ` John McCabe
1996-08-13 0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-15 0:00 ` John McCabe
1996-08-13 0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-15 0:00 ` John McCabe
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox