From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_00,INVALID_MSGID, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: f43e6,5ac12f5a60b1bfe X-Google-Attributes: gidf43e6,public X-Google-Thread: 103376,5ac12f5a60b1bfe X-Google-Attributes: gid103376,public From: JP Thornley Subject: Re: Ariane 5 - not an exception? Date: 1996/07/26 Message-ID: <285641259wnr@diphi.demon.co.uk>#1/1 X-Deja-AN: 170728482 x-nntp-posting-host: diphi.demon.co.uk references: x-mail2news-path: relay-4.mail.demon.net!post.demon.co.uk!diphi.demon.co.uk organization: None reply-to: jpt@diphi.demon.co.uk newsgroups: comp.software-eng,comp.lang.ada Date: 1996-07-26T00:00:00+00:00 List-Id: In article: simonb@pact.srf.ac.uk (Simon Bluck) writes: > > The Ariane 501 flight failure was due to the raising of an unexpected > Ada exception, which was handled by switching off the computer. The > report on this: > > http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html > > is clear and hard-hitting: it will result in much improved software. > But does it get right to the bottom of the issues, Don't know about that until I get to read the full report (the above reference is to a press release about the report). ...... and does the > software community appreciate that there are fundamental software > control problems which can directly give rise to such enormous > failures, in this particular case thankfully without loss of life? Yup - that's why we accept coding rates that we haven't seen since all input and output was in reverse binary (and I'm not sure that we get even that). > [snip] > Exceptions and assertions are both used, in Ada and C/C++, to detect > software/hardware anomalies. When one of these trips, it is > frequently very difficult for the designer to know how best to handle > the problem. To continue may result in corrupt data; to abort is > drastic but eliminates the possibility that further processing will > compound the problem. > That's why the *software* designer must not make these decisions. Any action in response to an unexpected event (corrupt data, out-of-range values, etc) affects the *system* behaviour and must be known about at the system level, so that the consequences can be taken into account in the system safety case. > The more checks you have, the more likely it is that one of them will > trip. If you can't think of good ways of handling these checks, the > end result, for the user, may well be very much worse than if the > check had never been performed in the first place. > My experience is with systems where all the code is compiled with checks suppressed. This allows us to strip out the exception handling code from the run-time (a substantial simplification) and put in exactly the checks we want exactly where we want them. (But I am aware of differences in approach by other people). > Of the two handling options, neither is really acceptable. However, > there is a third option which ought to be considered: to continue but > mark the processed data as suspect. > Simon then goes on to describe a way of dealing with data validities that unfortunately breaks the most fundamental rule of safety-critical code - Keep It Simple. It's an idea that might work with mission-critical code, but the thought of implementing it for safety-critical code (remembering that any one of these systems is probably handling in the range 200-500 pieces of data - each with its associated data validity) is beyond anything that I know how to tackle. (and I've just realised that each of these 'truth values' and the data type information will require their own data validities - this gets even more complicated than I first thought) Phil Thornley -- ------------------------------------------------------------------------ | JP Thornley EMail jpt@diphi.demon.co.uk | ------------------------------------------------------------------------