From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_00,INVALID_MSGID, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 101deb,12d7915e86ce849c X-Google-Attributes: gid101deb,public X-Google-Thread: 12b42c,12d7915e86ce849c X-Google-Attributes: gid12b42c,public X-Google-Thread: 103376,5f645669103080a8 X-Google-Attributes: gid103376,public From: rgilbert@unconfigured.xvnews.domain (Bob Gilbert) Subject: Re: Ariane Crash (Was: Adriane crash) Date: 1996/07/31 Message-ID: <4tnip9$k0s@zeus.orl.mmc.com>#1/1 X-Deja-AN: 171227549 references: <4tkfe5$did@goanna.cs.rmit.edu.au> organization: The unconfigured xvnews people reply-to: rgilbert@unconfigured.xvnews.domain newsgroups: comp.lang.ada,comp.lang.pl1,rmit.cs.100 Date: 1996-07-31T00:00:00+00:00 List-Id: In article <4tkfe5$did@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (++ robin) writes: > rgilbert@unconfigured.xvnews.domain (Bob Gilbert) writes: > > >The error was assuming that the Ariane 4 design would be adaquate > >for the Ariane 5 system. > > >> The specific error was that a conversion of a double-precision > >> floating-point value (~58 significant bits) to 15 significant > >> bits caused fixed-point overflow. The conversion was not > >> checked for overflow. It should have been. > > >It was checked, hence the exception and an exception handler to > >take corrective action. > > ---The SRI computer (& its backup) had an exception > handler, to be sure, but it did not have an exception > handler to take corrective action. The exception handler > shut the computer down. Which was the specified corrective action. > > Unfortunately the corrective action was > >to assume that the SRI had failed and to shut it down. The > >software performed exactly as designed. > > ---The software did not performed as designed. It was > intended to shut down the computer only in the event of > a hardware error. The out of bounds data was considered to be indictative of a random hardware fault, at least for the Ariane 4. Perhaps this was not a valid method of determining a hardware fault, but it was the design decision. > The software shut down the computer > because of a programming error. The software performed > only as written! > > >> This is, after all, > >> a real-time system. It's a fundamental check that a programmer > >> experienced in real-time systems should have carried out. > >> > >> Control was then passed to the interrupt handler, which > >> shut down the system. > > >Exactly as designed. > > ---Again, not as designed. It was designed to shut down only > in the event that the SRI computer failed. Then the backup > would be used. Again, the (wrongly assumed) SRI failure was determined by the detection of out of bounds data. It was a requirements oversight, not a programming oversight, and most certainly not influenced by the programming language used. To quote the report: Although the source of the Operand Error has been identified, this in itself did not cause the mission to fail. The specification of the exception-handling mechanism also contributed to the failure. In the event of any kind of exception, the system specification stated that: the failure should be indicated on the databus, the failure context should be stored in an EEPROM memory (which was recovered and read out for Ariane 501), and finally, the SRI processor should be shut down. The last sentence of the above is what the requirements stated, and exactly what the software did, exactly as designed. -Bob