From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 101deb,885dab3998d28a4 X-Google-Attributes: gid101deb,public X-Google-Thread: 103376,885dab3998d28a4 X-Google-Attributes: gid103376,public X-Google-Thread: f74ae,eca28648989efca9 X-Google-Attributes: gidf74ae,public From: Steve O'Neill Subject: Re: Ariane 5 failure Date: 1996/10/09 Message-ID: <325BE79B.7610@sanders.lockheed.com>#1/1 X-Deja-AN: 188563096 references: <52a572$9kk@goanna.cs.rmit.edu.au> <843845039.4461.0@assen.demon.co.uk> <1996Oct1.093107.47351@ucl.ac.uk> <325572AA.4663@delphi.com> <53fhsg$45$1@goanna.cs.rmit.edu.au> content-type: text/plain; charset=us-ascii organization: Sanders, A Lockheed-Martin Company mime-version: 1.0 newsgroups: sci.astro,comp.lang.pl1,comp.lang.ada x-mailer: Mozilla 2.01 (Win16; I) Date: 1996-10-09T00:00:00+00:00 List-Id: @@ robin wrote: > ---Definitely not. No floating-point overflow occurred. In > Ariane 5, the overflow occurred on converting a double-precision > (some 56 bits?) floating-point to a 16-bit integer (15 > significant bits). > > That's why it was so important to have a check that the > conversion couldn't overflow! > Agreed. Yes, the basic reason for the destruction of a billion dollar vehicle was for want of a couple of lines of code. But it relects a systemic problem much more damaging than what language was used. I would have expected that in a mission/safety critical application the proper checks would have been implemented, no matter what. And in a 'belts-and-suspenders' mode I would also expect an exception handler to take care of unforeseen possibilities at the lowest possible level and raise things to a higher level only when absolutely necessary. Had these precautions been taken there would probably be lots of entries in an error log but the satellites would now be orbiting. As outsiders we can only second guess as to why this approach was not taken but the review board implies that 1) the SRI software developers had an 80% max utilization requirement and 2) careful consideration (including faulty assumptions) was used in deciding what to protect and not protect. >It was designed to shut down if any interrupt occurred. It wasn't ^^^^^^^^^ exception, actually >intended to be shut down for a routine thing as a conversion of >floating-point to integer. This was based on the (faulty) system-wide assumption that any exception was the result of a random hardware failure. This is related to the other faulty assumption that "software should be considered correct until is proven to be at fault". But that's what the specification said. > ---No, the backup SRI experienced the programming error (UNCHECKED > CONVERSION from floating-point to integer) first, and shut itself > down, then the active SRI computer experienced the same programming > error, then it shut itself down. Yes, according to the report the backup died first (by 0.05 seconds). Probably not as a result of an unchecked_conversion though - the source and target are of different sizes which would not be allowed. Most likely just a conversion of a float to an sixteen-bit integer. This would have raised a Constraint_Error (or Operand_Error in this environment). This error could have been handled within the context of this procedure (and the mission continued) but obviously was not. Instead it appears to have been propagated to a global exception handler which performed the specified actions admirably. Unfortunately these included committing suicide and, in doing so, dooming the mission. -- Steve O'Neill | "No,no,no, don't tug on that! Sanders, A Lockheed Martin Company | You never know what it might smoneill@sanders.lockheed.com | be attached to." (603) 885-8774 fax: (603) 885-4071| Buckaroo Banzai