From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: f43e6,5ac12f5a60b1bfe X-Google-Attributes: gidf43e6,public X-Google-Thread: 103376,5ac12f5a60b1bfe X-Google-Attributes: gid103376,public From: johndoe@zaphod.nosc.mil (Kirk Beitz) Subject: Multiple reasons for failure of Ariane 5 (was: Re: Ariane 5 - not an exception?) Date: 1996/07/25 Message-ID: #1/1 X-Deja-AN: 170774702 sender: johndoe@henning.camb x-nntp-posting-host: henning.camb.inmet.com references: organization: myself? newsgroups: comp.software-eng,comp.lang.ada Date: 1996-07-25T00:00:00+00:00 List-Id: after reading the failure analysis, it seems a little irresponsible to even hint about the failure of the rocket being based solely on the lack of proper exception handling. yes, it does seem clear that this was the final straw, and one of the heavy hands from within the report comes down on the initial specification that exceptions should be handled in such a manner. but there were several choices made over the course of a long period of development that helped lead to the situation. a set of decisions were made along the way that led to this (i won't rehash the whole report in detail from the web; it is fairly clear and self-explanatory): - the value which caused the exception was on a variable that was not "guarded" or protected when converted from a generated float input in the same fashion as a couple of similar variables (partially in response to a "maximum workload target" for the system.) - the alignment software that generated the exception that went unhandled was running when it didn't need to be after launch, particularly on ariane 5 which had a different preparation sequence and thus didn't require the hold to avoid the lengthy reset of the launcher. the exception occurred within the window after the software was necessary and before that particular portion of the software was shutdown. - the value that triggered the exception was out of range because ariane 5 has higher horizontal velocity values than ariane 4; the exception was on a value related to these horizontal vel vals. - no testing was performed on the software for behavior in the circumstances of count-down and flight-time sequence and the trajectory of ariane 5. (the software did not contain this as a measurement in its functional requirement. the two machines that shutdown as a result of the unhandled exception were not present but only simulated in pre-flight simulation testing. one of the reasons given for this latter was that it was tested based on use in ariane 4.) i understand some of the points raised in the initial post, and the final full report on the ariane 5 failure does address this: that system shutdown following an unhandled exception was not a sound design decision in this mission critical software. it just seems to me that there are many more important lessons to be learned: - the value of treating an raised or potentially raised exception within a context that is as local as possible - the value of understanding the nature of potential exceptions raised from input - the value of fully testing the software in the closest environment possible to the real situation - the value of making the software conform to the real requirements of a mission, particularly in mission critical software - the value of creating requirements specifications that properly fit the mission at hand. every single one of these probably should have been done during design; *any* single one would have probably prevented the failure of ariane 501. my point: the title of the original post (and much of its flavor) might to suggest that the single problem that led to ariane 5 failing was the failure to properly handle an exception. and that is *not* the lesson that should be taken from this at all. --kirk beitz