From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 12b42c,12d7915e86ce849c X-Google-Attributes: gid12b42c,public X-Google-Thread: 103376,5f645669103080a8 X-Google-Attributes: gid103376,public X-Google-Thread: 101deb,12d7915e86ce849c X-Google-Attributes: gid101deb,public From: rav@goanna.cs.rmit.edu.au (++ robin) Subject: Re: Ariane Crash (Was: Adriane crash) Date: 1996/08/01 Message-ID: <4torim$ku8@goanna.cs.rmit.edu.au>#1/1 X-Deja-AN: 171307468 references: <4tkfe5$did@goanna.cs.rmit.edu.au> <4tnip9$k0s@zeus.orl.mmc.com> organization: Comp Sci, RMIT, Melbourne, Australia newsgroups: comp.lang.ada,comp.lang.pl1,rmit.cs.100 nntp-posting-user: rav Date: 1996-08-01T00:00:00+00:00 List-Id: rgilbert@unconfigured.xvnews.domain (Bob Gilbert) writes: >In article <4tkfe5$did@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (++ robin) writes: >> rgilbert@unconfigured.xvnews.domain (Bob Gilbert) writes: >> >> >The error was assuming that the Ariane 4 design would be adaquate >> >for the Ariane 5 system. >> >> >> The specific error was that a conversion of a double-precision >> >> floating-point value (~58 significant bits) to 15 significant >> >> bits caused fixed-point overflow. The conversion was not >> >> checked for overflow. It should have been. >> >> >It was checked, hence the exception and an exception handler to >> >take corrective action. >> >> ---The SRI computer (& its backup) had an exception >> handler, to be sure, but it did not have an exception >> handler to take corrective action. The exception handler >> shut the computer down. >Which was the specified corrective action. ---Calling it "corrective" action is stretching the English Language a bit. In no way shape or form was the action "corrective". >> > Unfortunately the corrective action was >> >to assume that the SRI had failed and to shut it down. The >> >software performed exactly as designed. >> >> ---The software did not performed as designed. It was >> intended to shut down the computer only in the event of >> a hardware error. >The out of bounds data was considered to be indictative of a random hardware >fault, at least for the Ariane 4. Perhaps this was not a valid method >of determining a hardware fault, but it was the design decision. ---Please read what I wrote. The overflow was not a hardware fault. It was a programming error that should not have occurred, bearing in mind the "sudden death" nature of the shutdown in the event of any kind of interrupt.. >> The software shut down the computer >> because of a programming error. The software performed >> only as written! >> >> >> This is, after all, >> >> a real-time system. It's a fundamental check that a programmer >> >> experienced in real-time systems should have carried out. >> >> >> >> Control was then passed to the interrupt handler, which >> >> shut down the system. >> >> >Exactly as designed. >> >> ---Again, not as designed. It was designed to shut down only >> in the event that the SRI computer failed. Then the backup >> would be used. >Again, the (wrongly assumed) SRI failure was determined by the detection >of out of bounds data. It was a requirements oversight, not a programming >oversight, and most certainly not influenced by the programming language used. ---If you make an assumption about the range of data, and you are wrong, it is a programming error. >To quote the report: > Although the source of the Operand Error has been identified, this in > itself did not cause the mission to fail. The specification of the > exception-handling mechanism also contributed to the failure. In the > event of any kind of exception, the system specification stated that: > the failure should be indicated on the databus, the failure context > should be stored in an EEPROM memory (which was recovered and read out > for Ariane 501), and finally, the SRI processor should be shut down. >The last sentence of the above is what the requirements stated, and >exactly what the software did, exactly as designed. ---Again, the interrupt for fixed-point overflow was not expected to happen. The software DID NOT OPERATE AS DESIGNED. It failed. You're placing too literal an interpretation on the first sentence.