From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,c615e41a65104004 X-Google-Attributes: gid103376,public From: William Clodius Subject: Re: Ariane 5 failure (Was: Size code Ada and C) Date: 1998/07/02 Message-ID: <359BFC60.446B@lanl.gov>#1/1 X-Deja-AN: 368212956 Content-Transfer-Encoding: 7bit References: <35921271.E51E36DF@aonix.fr> <6mtiv0$9j3@gcsin3.geccs.gecm.com> <6n7jut$al0$1@nnrp1.dejanews.com> <6navqt$shc$1@goanna.cs.rmit.edu.au> <359A53E2.41C6@lanl.gov> <6ng8ua$1jp$1@goanna.cs.rmit.edu.au> Content-Type: text/plain; charset=us-ascii Organization: Los Alamos National Lab Mime-Version: 1.0 Newsgroups: comp.lang.ada Date: 1998-07-02T00:00:00+00:00 List-Id: robin wrote: > > No, it was the unchecked conversion. If the conversion > had undergone a magnitude check, the OS would have never > shut down the SRI. Any kind of error would cause the > SRI computer to shut down. Thus, the programmer should > have undertaken every proecaution to ensire that each and > every possible cause of an interrupt could not occur. > Your reasoning might be valid if the programmers were unaware that Ada's default semantics would cause an exception to be thrown and that this would shut down the computer. The papers indicate that programmers did not include an explicit check because they were aware of the semantics and its consequences and made the decision that if the "error" would occur it was cause for shutting down the computer. Quoting the report "To determine the vulnerability of unprotected code, an analysis was performed on every operation which could give rise to an exception, including an Operand Error. In particular, the conversion of floating point values to integers was analysed and operations involving seven variables were at risk of leading to an Operand Error. This led to protection being added to four of the variables, evidence of which appears in the Ada code. However, three of the variables were left unprotected. No reference to justification of this decision was found directly in the source code. Given the large amount of documentation associated with any industrial application, the assumption, although agreed, was essentially obscured, though not deliberately, from any external review. The reason for the three remaining variables, including the one denoting horizontal bias, being unprotected was that further reasoning indicated that they were either physically limited or that there was a large margin of safety, a reasoning which in the case of the variable BH turned out to be faulty. It is important to note that the decision to protect certain variables but not others was taken jointly by project partners at several contractual levels." ... "The specification of the exception-handling mechanism also contributed to the failure. In the event of any kind of exception, the system specification stated that: the failure should be indicated on the databus, the failure context should be stored in an EEPROM memory (which was recovered and read out for Ariane 501), and finally, the SRI processor should be shut down. It was the decision to cease the processor operation which finally proved fatal. Restart is not feasible since attitude is too difficult to re-calculate after a processor shutdown; therefore the Inertial Reference System becomes useless. The reason behind this drastic action lies in the culture within the Ariane programme of only addressing random hardware failures. From this point of view exception - or error - handling mechanisms are designed for a random hardware failure which can quite rationally be handled by a backup system." The assumptions given in the second paragraph were valid for Ariane 4 but not Ariane 5. The resulting turn off of the computer was not required Ada by exception handling, but an explicit decision of the Ariane team driven by the culture of the Ariane team. To emphasize, the team made an explicit decision that any unhandled exception that occurred was evidence of a hardware error and justification for turning off the computer. They examined the code for possible overflows that could trigger such exceptions, found this specific part of the code, and determined that overflows for this quantity were not physically possible and hence indicative (for the Ariane 4) of a hardware failure. They apparently made the decision several times not to handle this specific exception AND wWERE AWARE OF THE CONSEQUENCES OF NOT HANDLING AN EXCEPTION. Even in the absence of a language defined exception handling, their reasoning (culture) likely would have caused them to explicitly insert the check and have it set a flag which would turn off the computer at a lower level. -- William B. Clodius Phone: (505)-665-9370 Los Alamos Nat. Lab., NIS-2 FAX: (505)-667-3815 PO Box 1663, MS-C323 Group office: (505)-667-5776 Los Alamos, NM 87545 Email: wclodius@lanl.gov