comp.lang.ada
 help / color / mirror / Atom feed
* Ariane (yet again...)
@ 2000-01-16  0:00 Mike Silva
  2000-01-17  0:00 ` Andreas Winckler
  2000-01-19  0:00 ` Samuel T. Harris
  0 siblings, 2 replies; 5+ messages in thread
From: Mike Silva @ 2000-01-16  0:00 UTC (permalink / raw)


Before anybody starts throwing anything, my question is very specific --
does anybody know exactly what the Ariane report means when it speaks of
"protecting" conversions?  The subject came up in alt.folklore.computers and
it seems there are at least three possible meanings: (a) turn off the
runtime checks for a given conversion, (b) put some code before the
conversion to explicitly check for in-range, or (c) have a local exception
handler to catch the error.  Anybody know what exactly was / was not done
(or even better, have an actual code fragment)?  I've just always been
curious...

Mike







^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ariane (yet again...)
  2000-01-16  0:00 Ariane (yet again...) Mike Silva
@ 2000-01-17  0:00 ` Andreas Winckler
  2000-01-19  0:00 ` Samuel T. Harris
  1 sibling, 0 replies; 5+ messages in thread
From: Andreas Winckler @ 2000-01-17  0:00 UTC (permalink / raw)



Mike Silva schrieb:
> 
> Before anybody starts throwing anything, my question is very specific
> -- does anybody know exactly what the Ariane report means when it
> speaks of "protecting" conversions?  The subject came up in
> alt.folklore.computers and it seems there are at least three possible
> meanings: (a) turn off the runtime checks for a given conversion, (b)
> put some code before the conversion to explicitly check for in-range,
> or (c) have a local exception handler to catch the error.

c


AW




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ariane (yet again...)
  2000-01-16  0:00 Ariane (yet again...) Mike Silva
  2000-01-17  0:00 ` Andreas Winckler
@ 2000-01-19  0:00 ` Samuel T. Harris
       [not found]   ` <200001200846.JAA16576@xs4.xs4all.nl>
  1 sibling, 1 reply; 5+ messages in thread
From: Samuel T. Harris @ 2000-01-19  0:00 UTC (permalink / raw)


Mike Silva wrote:
> 
> Before anybody starts throwing anything, my question is very specific --
> does anybody know exactly what the Ariane report means when it speaks of
> "protecting" conversions?  The subject came up in alt.folklore.computers and
> it seems there are at least three possible meanings: (a) turn off the
> runtime checks for a given conversion, (b) put some code before the
> conversion to explicitly check for in-range, or (c) have a local exception
> handler to catch the error.  Anybody know what exactly was / was not done
> (or even better, have an actual code fragment)?  I've just always been
> curious...
> 
> Mike

You should read the report itself. It is an excellent read on
the nature of cascading failures and how good technical effort
can be fouled-up by poor management practices. I'll try to summarize
below ...


The report did not specify the nature of the conversion code.
However, given the nature of problem it might have been a scaled
conversion of an integer-type sensor reading to a floating or
fixed point type for the code to use. This is pretty normal
in this problem domain. This probably was not a simple
unchecked_conversion.
The sizes required for the two mentioned types do not match.

The report specified that the conversion resulted in a value
being out of range. It did not specify how the code determined this.
Namely, did a normal Ada runtime check on the resulting value
see that it was outside the range of the type or did the code
use some explicit range check? We don't know from the report.
The report did specify that an exception was raised.

The report did specify that an exception handler was not provided
based on the exhaustive analysis of the Ariane 4 teams proving
that such an exception would never occur. Several exception handlers
were not provided based on similar analysis to save processing time
and memory requirements. Such analysis is common in this field
and very reliable given the known constraints of the Ariane 4
trajectory and acceleration profile. This kind of analysis is
vital in supporting the assumption that bad data is the result
of a hardware failure. This was the case with the Ariane 5.
The bad data was interpreted as a hardware failure so the component
went into diagnostic mode. In fact, the backup component actually
failed before the main component.

Had an exception handler been present, I'm not sure what
it could do except indicate a hardware failure which is the
supported assumption of the component in the intended
environment (the Ariane 4).

In this diagnostic mode, the system sends diagnostic information
to the central processor. The central processor misinterpreted
this as real attitude and altitude information and commanded
the thrusters to maximum deflection to correct the "course"
of the rocket. This caused the rocket to turn sideways introducing
catastrophic stresses on the fuselage as the air flow moved
from the nose to the side of the rocket. Sensors detected
the impending failure of the superstructure and the rocket commanded
a self-destruct to insure lots of little bits of debris fell downrange
instead of two or three very large sections.

If there is a real "bug" it is the misinterpretation of the
command processor of this diagnostic information. It should
have know it was not real attitude and altitude information.

The sad part of this is that the code in use is used by the Ariane 4
to enable a quick reset should a launch be aborted. This code is
useless on the Ariane 5. 

The real problem was that this code was not being used on an Ariane 4,
but was being reused on an Ariane 5 without any verification whatsoever.
The Ariane 5 has a significantly different acceleration
and trajectory profile. These differences simply made all that work
proving the Ariane 4 would never raise the exception inapplicable
but similar work was not done to verify this code on the Ariane 5.
The contractor was not given the expected acceleration and trajectory
profile of the Araine 5 nor was the contractor required to test against
them. 

The report also noted that no simulations were run and speculated
and a single simulation of the involved components, either individually
or in an integrated environment, would have quickly identified the
problem.

A big case of overreliance on code reuse and under-employment of basic
verification methods. Just because something works in the past does
not mean it will work in the future, especically when the enclosing
environment changes. The design teams considered each and every
verification method and decided on each of them that they were not
worth doing. The problem is they didn't review to see they really
did nothing at all to verify the Ariane 4 code. While each decision
had some merit when considered individually, all together they present
an insane position to take.

A management problem after all.

-- 
Samuel T. Harris, Principal Engineer
Raytheon, Aerospace Engineering Services
"If you can make it, We can fake it!"




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ariane (yet again...)
  2000-01-20  0:00     ` Samuel T. Harris
@ 2000-01-20  0:00       ` Mike Silva
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Silva @ 2000-01-20  0:00 UTC (permalink / raw)



Samuel T. Harris wrote in message <388760A5.9E3A9F32@Raytheon.com>...
>
>A GoTo.com search on +ariane +5 +crash +report yields
>the following URLs ...
>
>http://www.siam.org/siamnews/general/ariane.htm
>http://java.sun.com/people/jag/Ariane5.html
>
>... reading these will correct any errors I may have
>introduced in my prior summary.
>
>BTW this is one of my favorite examples of Ada bashers
>getting egg on their face! Many were quick to blame
>Ada when this was in fact a management problem which
>was negligent in their reuse strategy, namely doing
>nothing at all to verify the reused components in
>a new environment.


Yes, the whole question came up again when somebody asserted that Ada's
runtime checks "caused" the Ariane-5 fireworks.  Eventually it worked around
to the question of what exactly the report meant when it said "The data
conversion instructions (in Ada code) were not protected from causing an
Operand Error."  Later it is implied that there is a performance cost to
"protection", and what I was asking was what was the form of this
protection.  At first glance it would seem that *not* having protection
(i.e. having the runtime check the results of the conversion) would have
more performance cost than having protection, if this meant not having the
runtime check the results.  Since the report implies the opposite I was
wondering what form the protection took.  The one answer I got was that the
"protection" was having a local exception handler deal with the conversion,
but then, if no exception occurs there's no cost.  An explicitly-coded
"precheck" of the variable before conversion would, OTOH, always have a
performance cost.

Sure wish I could see the code...

Mike







^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Ariane (yet again...)
       [not found]   ` <200001200846.JAA16576@xs4.xs4all.nl>
@ 2000-01-20  0:00     ` Samuel T. Harris
  2000-01-20  0:00       ` Mike Silva
  0 siblings, 1 reply; 5+ messages in thread
From: Samuel T. Harris @ 2000-01-20  0:00 UTC (permalink / raw)
  To: fdebruin

fdebruin wrote:
> 
> In comp.lang.ada you write:
> 
> >You should read the report itself. It is an excellent read on
> >
> Do you happen to know whether (and where) the report is available
> on the net?
> 
> F. de Bruin

A GoTo.com search on +ariane +5 +crash +report yields 
the following URLs ...

http://www.siam.org/siamnews/general/ariane.htm
http://java.sun.com/people/jag/Ariane5.html

... reading these will correct any errors I may have
introduced in my prior summary.

BTW this is one of my favorite examples of Ada bashers
getting egg on their face! Many were quick to blame
Ada when this was in fact a management problem which
was negligent in their reuse strategy, namely doing
nothing at all to verify the reused components in
a new environment.

-- 
Samuel T. Harris, Principal Engineer
Raytheon, Aerospace Engineering Services
"If you can make it, We can fake it!"




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-01-20  0:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-01-16  0:00 Ariane (yet again...) Mike Silva
2000-01-17  0:00 ` Andreas Winckler
2000-01-19  0:00 ` Samuel T. Harris
     [not found]   ` <200001200846.JAA16576@xs4.xs4all.nl>
2000-01-20  0:00     ` Samuel T. Harris
2000-01-20  0:00       ` Mike Silva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox