* Ariane (yet again...) @ 2000-01-16 0:00 Mike Silva 2000-01-17 0:00 ` Andreas Winckler 2000-01-19 0:00 ` Samuel T. Harris 0 siblings, 2 replies; 5+ messages in thread From: Mike Silva @ 2000-01-16 0:00 UTC (permalink / raw) Before anybody starts throwing anything, my question is very specific -- does anybody know exactly what the Ariane report means when it speaks of "protecting" conversions? The subject came up in alt.folklore.computers and it seems there are at least three possible meanings: (a) turn off the runtime checks for a given conversion, (b) put some code before the conversion to explicitly check for in-range, or (c) have a local exception handler to catch the error. Anybody know what exactly was / was not done (or even better, have an actual code fragment)? I've just always been curious... Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Ariane (yet again...) 2000-01-16 0:00 Ariane (yet again...) Mike Silva @ 2000-01-17 0:00 ` Andreas Winckler 2000-01-19 0:00 ` Samuel T. Harris 1 sibling, 0 replies; 5+ messages in thread From: Andreas Winckler @ 2000-01-17 0:00 UTC (permalink / raw) Mike Silva schrieb: > > Before anybody starts throwing anything, my question is very specific > -- does anybody know exactly what the Ariane report means when it > speaks of "protecting" conversions? The subject came up in > alt.folklore.computers and it seems there are at least three possible > meanings: (a) turn off the runtime checks for a given conversion, (b) > put some code before the conversion to explicitly check for in-range, > or (c) have a local exception handler to catch the error. c AW ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Ariane (yet again...) 2000-01-16 0:00 Ariane (yet again...) Mike Silva 2000-01-17 0:00 ` Andreas Winckler @ 2000-01-19 0:00 ` Samuel T. Harris [not found] ` <200001200846.JAA16576@xs4.xs4all.nl> 1 sibling, 1 reply; 5+ messages in thread From: Samuel T. Harris @ 2000-01-19 0:00 UTC (permalink / raw) Mike Silva wrote: > > Before anybody starts throwing anything, my question is very specific -- > does anybody know exactly what the Ariane report means when it speaks of > "protecting" conversions? The subject came up in alt.folklore.computers and > it seems there are at least three possible meanings: (a) turn off the > runtime checks for a given conversion, (b) put some code before the > conversion to explicitly check for in-range, or (c) have a local exception > handler to catch the error. Anybody know what exactly was / was not done > (or even better, have an actual code fragment)? I've just always been > curious... > > Mike You should read the report itself. It is an excellent read on the nature of cascading failures and how good technical effort can be fouled-up by poor management practices. I'll try to summarize below ... The report did not specify the nature of the conversion code. However, given the nature of problem it might have been a scaled conversion of an integer-type sensor reading to a floating or fixed point type for the code to use. This is pretty normal in this problem domain. This probably was not a simple unchecked_conversion. The sizes required for the two mentioned types do not match. The report specified that the conversion resulted in a value being out of range. It did not specify how the code determined this. Namely, did a normal Ada runtime check on the resulting value see that it was outside the range of the type or did the code use some explicit range check? We don't know from the report. The report did specify that an exception was raised. The report did specify that an exception handler was not provided based on the exhaustive analysis of the Ariane 4 teams proving that such an exception would never occur. Several exception handlers were not provided based on similar analysis to save processing time and memory requirements. Such analysis is common in this field and very reliable given the known constraints of the Ariane 4 trajectory and acceleration profile. This kind of analysis is vital in supporting the assumption that bad data is the result of a hardware failure. This was the case with the Ariane 5. The bad data was interpreted as a hardware failure so the component went into diagnostic mode. In fact, the backup component actually failed before the main component. Had an exception handler been present, I'm not sure what it could do except indicate a hardware failure which is the supported assumption of the component in the intended environment (the Ariane 4). In this diagnostic mode, the system sends diagnostic information to the central processor. The central processor misinterpreted this as real attitude and altitude information and commanded the thrusters to maximum deflection to correct the "course" of the rocket. This caused the rocket to turn sideways introducing catastrophic stresses on the fuselage as the air flow moved from the nose to the side of the rocket. Sensors detected the impending failure of the superstructure and the rocket commanded a self-destruct to insure lots of little bits of debris fell downrange instead of two or three very large sections. If there is a real "bug" it is the misinterpretation of the command processor of this diagnostic information. It should have know it was not real attitude and altitude information. The sad part of this is that the code in use is used by the Ariane 4 to enable a quick reset should a launch be aborted. This code is useless on the Ariane 5. The real problem was that this code was not being used on an Ariane 4, but was being reused on an Ariane 5 without any verification whatsoever. The Ariane 5 has a significantly different acceleration and trajectory profile. These differences simply made all that work proving the Ariane 4 would never raise the exception inapplicable but similar work was not done to verify this code on the Ariane 5. The contractor was not given the expected acceleration and trajectory profile of the Araine 5 nor was the contractor required to test against them. The report also noted that no simulations were run and speculated and a single simulation of the involved components, either individually or in an integrated environment, would have quickly identified the problem. A big case of overreliance on code reuse and under-employment of basic verification methods. Just because something works in the past does not mean it will work in the future, especically when the enclosing environment changes. The design teams considered each and every verification method and decided on each of them that they were not worth doing. The problem is they didn't review to see they really did nothing at all to verify the Ariane 4 code. While each decision had some merit when considered individually, all together they present an insane position to take. A management problem after all. -- Samuel T. Harris, Principal Engineer Raytheon, Aerospace Engineering Services "If you can make it, We can fake it!" ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <200001200846.JAA16576@xs4.xs4all.nl>]
* Re: Ariane (yet again...) [not found] ` <200001200846.JAA16576@xs4.xs4all.nl> @ 2000-01-20 0:00 ` Samuel T. Harris 2000-01-20 0:00 ` Mike Silva 0 siblings, 1 reply; 5+ messages in thread From: Samuel T. Harris @ 2000-01-20 0:00 UTC (permalink / raw) To: fdebruin fdebruin wrote: > > In comp.lang.ada you write: > > >You should read the report itself. It is an excellent read on > > > Do you happen to know whether (and where) the report is available > on the net? > > F. de Bruin A GoTo.com search on +ariane +5 +crash +report yields the following URLs ... http://www.siam.org/siamnews/general/ariane.htm http://java.sun.com/people/jag/Ariane5.html ... reading these will correct any errors I may have introduced in my prior summary. BTW this is one of my favorite examples of Ada bashers getting egg on their face! Many were quick to blame Ada when this was in fact a management problem which was negligent in their reuse strategy, namely doing nothing at all to verify the reused components in a new environment. -- Samuel T. Harris, Principal Engineer Raytheon, Aerospace Engineering Services "If you can make it, We can fake it!" ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Ariane (yet again...) 2000-01-20 0:00 ` Samuel T. Harris @ 2000-01-20 0:00 ` Mike Silva 0 siblings, 0 replies; 5+ messages in thread From: Mike Silva @ 2000-01-20 0:00 UTC (permalink / raw) Samuel T. Harris wrote in message <388760A5.9E3A9F32@Raytheon.com>... > >A GoTo.com search on +ariane +5 +crash +report yields >the following URLs ... > >http://www.siam.org/siamnews/general/ariane.htm >http://java.sun.com/people/jag/Ariane5.html > >... reading these will correct any errors I may have >introduced in my prior summary. > >BTW this is one of my favorite examples of Ada bashers >getting egg on their face! Many were quick to blame >Ada when this was in fact a management problem which >was negligent in their reuse strategy, namely doing >nothing at all to verify the reused components in >a new environment. Yes, the whole question came up again when somebody asserted that Ada's runtime checks "caused" the Ariane-5 fireworks. Eventually it worked around to the question of what exactly the report meant when it said "The data conversion instructions (in Ada code) were not protected from causing an Operand Error." Later it is implied that there is a performance cost to "protection", and what I was asking was what was the form of this protection. At first glance it would seem that *not* having protection (i.e. having the runtime check the results of the conversion) would have more performance cost than having protection, if this meant not having the runtime check the results. Since the report implies the opposite I was wondering what form the protection took. The one answer I got was that the "protection" was having a local exception handler deal with the conversion, but then, if no exception occurs there's no cost. An explicitly-coded "precheck" of the variable before conversion would, OTOH, always have a performance cost. Sure wish I could see the code... Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2000-01-20 0:00 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-01-16 0:00 Ariane (yet again...) Mike Silva 2000-01-17 0:00 ` Andreas Winckler 2000-01-19 0:00 ` Samuel T. Harris [not found] ` <200001200846.JAA16576@xs4.xs4all.nl> 2000-01-20 0:00 ` Samuel T. Harris 2000-01-20 0:00 ` Mike Silva
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox