comp.lang.ada
 help / color / mirror / Atom feed
* Adriane crash
@ 1996-07-23  0:00 Jerry van Dijk
  1996-07-25  0:00 ` Ariane Crash (Was: Adriane crash) John McCabe
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jerry van Dijk @ 1996-07-23  0:00 UTC (permalink / raw)



Dutch videotext had a topic this evening that said that ESA found that the 
Adriana-5 lauch failed because the software of its guidance systems was 
accidentally replaced by the Adriane-4 version.

Anyone hear anything more about this ?

(If its true, it must be the worlds most spectacular example of a 
configuration management failure :-)
 
-- 
-----------------------------------------------------------------------
--  Jerry van Dijk       --   e-mail: jerry@jvdsys.nextjk.stuyts.nl  --
--  Banking Consultant   --              Member Team-Ada             -- 
--  Ordina Finance BV    --    Located at Haarlem, The Netherlands   --




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Adriane crash
  1996-07-23  0:00 Adriane crash Jerry van Dijk
  1996-07-25  0:00 ` Ariane Crash (Was: Adriane crash) John McCabe
  1996-07-25  0:00 ` Adriane crash Peter Hermann
@ 1996-07-25  0:00 ` Steve O'Neill
  1996-07-26  0:00 ` David Verrier
  3 siblings, 0 replies; 14+ messages in thread
From: Steve O'Neill @ 1996-07-25  0:00 UTC (permalink / raw)



Jerry van Dijk wrote:
> 
> Dutch videotext had a topic this evening that said that ESA found that the
> Adriana-5 lauch failed because the software of its guidance systems was
> accidentally replaced by the Adriane-4 version.

Close, but not quite. Based on my read of the report:

Ariane 4 & 5 use the same inertial measurement units and it appears that they did 
not fully analyze the effect of the Ariane 5's flight characteristics against 
these units.  Also, both Arianes 4 and 5 use dual redundant units which are, 
unfortunately, identical in both hardware and software.  The result was that 
higher (but acceptable for Ariane 5) acceleration levels caused a conversion 
operation to overflow, an exception was raised, and both units completely shut 
down leaving the flight control software with no navigation data!  It also 
appeared from the report that the flight control software interpreted bogus data 
as good and as a result commanded the engine nozzles to full deflection resulting 
in the aerodynamic destruction of the vehicle.

On some really sad notes 1) the software that experienced the overflow had not 
real value during that phase of flight and should have been disabled, 2) the 
decision not to protect the conversion from overflow was influenced by a 
requirement for a max of 80% processor utilization, and 3) the units were 
_required_ to shut down as a result of any exception (rather than make the best of 
it and continue in a degraded mode, if possible) on the assumption that it was 
caused by a hardware failure.  Does the phrase 'penny wise, pound foolish' apply 
here?

So, lots of intertwined assumptions, mistakes, etc. led to this failure but 
definitely an avoidable problem.

-- 
Steve O'Neill                      | "No,no,no, don't tug on that!
Sanders, A Lockheed Martin Company |  You never know what it might
smoneill@sanders.lockheed.com      |  be attached to." 
(603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Ariane Crash (Was: Adriane crash)
  1996-07-23  0:00 Adriane crash Jerry van Dijk
@ 1996-07-25  0:00 ` John McCabe
  1996-07-26  0:00   ` ++           robin
  1996-07-25  0:00 ` Adriane crash Peter Hermann
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: John McCabe @ 1996-07-25  0:00 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 6162 bytes --]


jerry@jvdsys.nextjk.stuyts.nl (Jerry van Dijk) wrote:

>Dutch videotext had a topic this evening that said that ESA found that the 
>Adriana-5 lauch failed because the software of its guidance systems was 
>accidentally replaced by the Adriane-4 version.

>Anyone hear anything more about this ?

>(If its true, it must be the worlds most spectacular example of a 
>configuration management failure :-)

That isn't what the report said. Here is a copy of the report obtained
from sci.space.news:

------------------------------------------------------------------------------
>Date:         Tue, 23 Jul 1996 16:59:23 EST
>Reply-To:     ESAPRESS list <ESAPRESS@VMPROFS.ESOC.ESA.DE>

JOINT ESA/CNES PRESS RELEASE
N  33-96  -  Paris, 23 July 1996


Ariane 501 - Presentation of Inquiry Board report

Attached is a summary of the Inquiry Board report on the
failure of the first Ariane 5 flight.

The full report is available on written request from ESA and
CNES Public Relations.

     ESA     Tel.: + 33.1.53.69.72.82
                Fax: + 33.1.53.69.76.90

     CNES   Tel.: + 33.1.44.76.76.87
                Fax: + 33.1.44.76.78.16


ARIANE 501
Presentation of Inquiry Board report

On 4 June 1996 the maiden flight of the Ariane 5 launcher ended
in a failure.  Only about 40 seconds after initiation of the flight
sequence, at an altitude of about 3700 m, the launcher veered off
its flight path, broke up and exploded.

Mr Jean-Marie Luton, ESA Director General, and Mr Alain
Bensoussan, CNES Chairman, immediately set up an independent
Inquiry Board (see ESA-CNES Press Release of 10 June 1996),
which has now submitted its report.

The report begins by presenting the causes of the failure, analysis
of the flight data having indicated:

-   nominal behaviour of the launcher up to Ho  + 36 seconds;
-   simultaneous failure of the two inertial reference systems;
-   swivelling into the extreme position of the nozzles of the two
solid boosters and, slightly later, of the Vulcain engine, causing
the launcher to veer abruptly;



-   self-destruction of the launcher correctly triggered by rupture
of the electrical links between the solid boosters and the core
stage.

A chain of events, their inter-relations and causes have been
established, starting with the destruction of the launcher and
tracing back in time towards the primary cause.  These provide
the technical explanations for the failure of the 501 flight, which
lay in the flight control and guidance system.  A detailed account
is given in the report, which concludes:

"  The failure of Ariane 501 was caused by the complete loss of
guidance and attitude information 37 seconds after start of the
main engine ignition sequence (30 seconds after lift-off).  This
loss of information was due to specification and design errors in
the software of the inertial reference system.

  The extensive reviews and tests carried out during the Ariane
5 development programme did not include adequate analysis and
testing of the inertial reference system or of the complete flight
control system, which could have detected the potential failure."

Despite the series of tests and reviews carried out under the
programme, in the course of which thousands of corrections were
made, shortcomings in the system approach concerning the
software resulted in failure to detect the fault.  It is stressed that
alignement function of the inertial reference system, which served
a purpose only before lift-off (but remained operative afterwards),
was not taken into account in the simulations and that the
equipment and system tests were not sufficiently representative.

Without implicating the system architecture, the report makes a
series of recommendations for ensuring that the launcher's
software operates correctly.  The Ariane 5 programme will be
taking action in line with all these recommendations, as follows:

-   correction of the problem in the SRI (inertial reference
system) that led to the accident;
-   reexamination of all software embedded in equipment;
-   improvement of the representativeness (vis-�-vis the launcher)
of the qualification testing environment;
-   introduction of overlaps and deliberate redundancy between
successive tests:
     .   at equipment level,
     .   at stage level,
     .   at system level;
-   improvement and systematisation of the two-way flow of
information:



     .   up from equipment to system:  nominal and failure-mode
behaviour;
     .   down from system to equipment:  use of equipment items
in flight.

More specifically, the following corrective measures will be
applied:

-   to the inertial reference system:
     .   switch-off or inhibition of the alignment function after
liftoff,
     .   analysis/modification of processing, particularly on
detection of a fault (no processor shutdown),
     .   testing to check the coverage of the SRI flight domain;

-   to the system qualification environment:
     .   general improvement of representativeness through
systematic use of real equipment and components wherever
possible,
     .   simulation of real trajectories on SRI electronics.

-   In addition, the following general measures will be taken:
     .   critical reappraisal of all software (flight program and
embedded software),
     .   review of mechanisms for managing double failures,
     .   improvement of facilities for acquisition and retrieval of
telemetry data,
     .   improvement of overall coordination relating to software.

The ESA Director General and CNES Chairman will be making
a joint presentation of the plan of action put into effect and its
programmatic consequences at a press conference in September.
-------------------------------------------------------------------

Hope this is useful. So basically it _was_ a software fault - the
software didn't ignore signals it was receiving after launch from a
system whose signals are only valid prior to launch.

What I want to know is, who wrote that software, and if their was an
ESA representative responsible for it, who was he!

Not that I want to apportion blame of course, just interested!


Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Adriane crash
  1996-07-23  0:00 Adriane crash Jerry van Dijk
  1996-07-25  0:00 ` Ariane Crash (Was: Adriane crash) John McCabe
@ 1996-07-25  0:00 ` Peter Hermann
  1996-07-27  0:00   ` Jerry van Dijk
  1996-07-25  0:00 ` Steve O'Neill
  1996-07-26  0:00 ` David Verrier
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Hermann @ 1996-07-25  0:00 UTC (permalink / raw)



Jerry van Dijk (jerry@jvdsys.nextjk.stuyts.nl) wrote:
: Dutch videotext had a topic this evening that said that ESA found that the 
: Adriana-5 lauch failed because the software of its guidance systems was 
  ^^    ^     nice typo    :-(                (for the Ariane)
--
Peter Hermann  Tel:+49-711-685-3611 Fax:3758 ph@csv.ica.uni-stuttgart.de
Pfaffenwaldring 27, 70569 Stuttgart Uni Computeranwendungen
Team Ada: "C'mon people let the world begin" (Paul McCartney)




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Adriane crash
  1996-07-23  0:00 Adriane crash Jerry van Dijk
                   ` (2 preceding siblings ...)
  1996-07-25  0:00 ` Steve O'Neill
@ 1996-07-26  0:00 ` David Verrier
  3 siblings, 0 replies; 14+ messages in thread
From: David Verrier @ 1996-07-26  0:00 UTC (permalink / raw)



Jerry van Dijk wrote:

> Adriana-5 lauch failed because the software of its guidance systems was
> accidentally replaced by the Adriane-4 version.I think you mean Ariane 5 :-)  The report is available at
http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ariane Crash (Was: Adriane crash)
  1996-07-25  0:00 ` Ariane Crash (Was: Adriane crash) John McCabe
@ 1996-07-26  0:00   ` ++           robin
  1996-07-29  0:00     ` John McCabe
  1996-07-29  0:00     ` Bob Gilbert
  0 siblings, 2 replies; 14+ messages in thread
From: ++           robin @ 1996-07-26  0:00 UTC (permalink / raw)



	john@assen.demon.co.uk (John McCabe) writes:

	>JOINT ESA/CNES PRESS RELEASE N  33-96  -  Paris, 23 July 1996

	>Ariane 501 - Presentation of Inquiry Board report

	>-------------------------------------------------------------------

	>Hope this is useful. So basically it _was_ a software fault

---Is this a euphemism for a programming error?  because that's
what it was -- a programming error.

   The error was in assuming that a value would not overflow.
The specific error was that a conversion of a double-precision
floating-point value (~58 significant bits) to 15 significant
bits caused fixed-point overflow.  The conversion was not
checked for overflow.  It should have been.  This is, after all,
a real-time system.  It's a fundamental check that a programmer
experienced in real-time systems should have carried out.

   Control was then passed to the interrupt handler, which
shut down the system.

   The question is, basically, why was Ada used for this work?
PL/I has specific facilities for real-time programming,
and especially for simulating exactly this (and other)
exceptions -- as if the exceptions had actually occurred.
The SIGNAL statement is designed for this purpose.  The
programmer would have discovered this problem the FIRST time
he used it!  And he could have included an exception handler
for this and other similar kinds of trivial errors.  These
exception handlers would have returned control to the code.

   A PL/I programmer and/or a real-time systems programmer
would have OBJECTED to the stupid requirement of shutting
down the system when a trivial error occurred.

	>What I want to know is, who wrote that software, and if their was an
	>ESA representative responsible for it, who was he!
	>Not that I want to apportion blame of course, just interested!

	>Best Regards John McCabe <john@assen.demon.co.uk>




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Adriane crash
  1996-07-25  0:00 ` Adriane crash Peter Hermann
@ 1996-07-27  0:00   ` Jerry van Dijk
  0 siblings, 0 replies; 14+ messages in thread
From: Jerry van Dijk @ 1996-07-27  0:00 UTC (permalink / raw)



Peter Hermann (ica2ph@alpha1.csv.ica.uni-stuttgart.de) wrote:

: : Adriana-5 lauch failed because the software of its guidance systems was
:   ^^    ^     nice typo    :-(                (for the Ariane)

Yes, it seems GNAT is a better spelling checker then ispell :-)

-- 
-----------------------------------------------------------------------
--  Jerry van Dijk       --   e-mail: jerry@jvdsys.nextjk.stuyts.nl  --
--  Banking Consultant   --              Member Team-Ada             -- 
--  Ordina Finance BV    --    Located at Haarlem, The Netherlands   --




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ariane Crash (Was: Adriane crash)
  1996-07-26  0:00   ` ++           robin
  1996-07-29  0:00     ` John McCabe
@ 1996-07-29  0:00     ` Bob Gilbert
  1996-07-30  0:00       ` ++           robin
  1996-08-02  0:00       ` root
  1 sibling, 2 replies; 14+ messages in thread
From: Bob Gilbert @ 1996-07-29  0:00 UTC (permalink / raw)



In article <4ta1vu$m1u@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (++           robin) writes:
> 
> ---Is this a euphemism for a programming error?  because that's
> what it was -- a programming error.
> 
>    The error was in assuming that a value would not overflow.

The error was assuming that the Ariane 4 design would be adaquate
for the Ariane 5 system.

> The specific error was that a conversion of a double-precision
> floating-point value (~58 significant bits) to 15 significant
> bits caused fixed-point overflow.  The conversion was not
> checked for overflow.  It should have been.

It was checked, hence the exception and an exception handler to
take corrective action.  Unfortunately the corrective action was
to assume that the SRI had failed and to shut it down.  The
software performed exactly as designed.

>  This is, after all,
> a real-time system.  It's a fundamental check that a programmer
> experienced in real-time systems should have carried out.
> 
>    Control was then passed to the interrupt handler, which
> shut down the system.

Exactly as designed.

>    The question is, basically, why was Ada used for this work?

The failure is not a language issue, this is not the question.

-Bob







^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ariane Crash (Was: Adriane crash)
  1996-07-26  0:00   ` ++           robin
@ 1996-07-29  0:00     ` John McCabe
  1996-07-29  0:00     ` Bob Gilbert
  1 sibling, 0 replies; 14+ messages in thread
From: John McCabe @ 1996-07-29  0:00 UTC (permalink / raw)



rav@goanna.cs.rmit.edu.au (++           robin) wrote:

>	john@assen.demon.co.uk (John McCabe) writes:

>	>JOINT ESA/CNES PRESS RELEASE N  33-96  -  Paris, 23 July 1996

>	>Ariane 501 - Presentation of Inquiry Board report

>	>-------------------------------------------------------------------

>	>Hope this is useful. So basically it _was_ a software fault

>---Is this a euphemism for a programming error?  because that's
>what it was -- a programming error.

Having read the report, I don't consider it to be a programming error,
it was a design and management error. It sounds like whoever designed
the system didn't pay enough attention to the requirements, and
whoever was managing it didn't pay enough attention to its conformance
to the requirements.

I think the fact that the overflow occurred was not due to a
programming oversight, after all the analyses had been done and a
decision to not check that variable had been made (*see additional
note below), but seeing as that variable should not have been in use
at that point, I don't think you can blame whoever wrote that code.

>   The error was in assuming that a value would not overflow.
>The specific error was that a conversion of a double-precision
>floating-point value (~58 significant bits) to 15 significant
>bits caused fixed-point overflow.  The conversion was not
>checked for overflow.  It should have been.  This is, after all,
>a real-time system.  It's a fundamental check that a programmer
>experienced in real-time systems should have carried out.

>   Control was then passed to the interrupt handler, which
>shut down the system.

>   The question is, basically, why was Ada used for this work?

ESA Ada preference/mandate(?).

<..snip..>

*Note: I hope this makes ESA llok a bit closer at why they want to
limit processor loading and how the margin should be reduced through
the design and development phases. My own project has an ESA enforced
limit of 70% which is quite ridiculous given the equipment we're using
(GPS MA31750 10MHz MIL-STD-1750 processor). We cannot meet that but
have requested a waiver on that - I believe that's much better than
compromising the safety of the mission.

ESA's loading margins are really supposed to take account of a
requirement for future modifications to software once it has been
delivered. There's no way this should have been enforced for Ariane 5.


From the sound of the report,I think a pretty poor job has been done,
not by the programmers who wrote the code and performed the analysis
of what variables could safely be left unchecked, instead I think
whoever performed the requirement analysis and all levels of
management / reviewers above that havebeen completely negligent.

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ariane Crash (Was: Adriane crash)
  1996-07-29  0:00     ` Bob Gilbert
@ 1996-07-30  0:00       ` ++           robin
  1996-07-31  0:00         ` Bob Gilbert
  1996-08-02  0:00       ` root
  1 sibling, 1 reply; 14+ messages in thread
From: ++           robin @ 1996-07-30  0:00 UTC (permalink / raw)



	rgilbert@unconfigured.xvnews.domain (Bob Gilbert) writes:

	>In article <4ta1vu$m1u@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (++           robin) writes:
	>> 
	>> ---Is this a euphemism for a programming error?  because that's
	>> what it was -- a programming error.
	>> 
	>>    The error was in assuming that a value would not overflow.

	>The error was assuming that the Ariane 4 design would be adaquate
	>for the Ariane 5 system.

	>> The specific error was that a conversion of a double-precision
	>> floating-point value (~58 significant bits) to 15 significant
	>> bits caused fixed-point overflow.  The conversion was not
	>> checked for overflow.  It should have been.

	>It was checked, hence the exception and an exception handler to
	>take corrective action.

---The SRI computer (& its backup) had an exception
handler, to be sure, but it did not have an exception
handler to take corrective action.  The exception handler
shut the computer down.

	> Unfortunately the corrective action was
	>to assume that the SRI had failed and to shut it down.  The
	>software performed exactly as designed.

---The software did not performed as designed.  It was
intended to shut down the computer only in the event of
a hardware error.  The software shut down the computer
because of a programming error.  The software performed
only as written!

	>>  This is, after all,
	>> a real-time system.  It's a fundamental check that a programmer
	>> experienced in real-time systems should have carried out.
	>> 
	>>    Control was then passed to the interrupt handler, which
	>> shut down the system.

	>Exactly as designed.

---Again, not as designed.  It was designed to shut down only
in the event that the SRI computer failed.  Then the backup
would be used.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ariane Crash (Was: Adriane crash)
  1996-07-31  0:00         ` Bob Gilbert
@ 1996-07-31  0:00           ` William Clodius
  1996-08-01  0:00           ` ++           robin
  1 sibling, 0 replies; 14+ messages in thread
From: William Clodius @ 1996-07-31  0:00 UTC (permalink / raw)




One point I would like to emphasize is that the out of bounds error
occured in a portion of the software that was not useful once launch
commenced. This has several implications

1. Utilization of this software should have ceased as soon after
launch as possible, freeing computational resources as soon as
possible.

2. The effect of exception handling on processor utilization for this
portion of the software should have been important only during the
prelaunch phase, when I suspect processor utilization would have
been minimal.

3. The proper action to take in the event of an exception in this
portion of the software should be based on what the proper action
should be before launch.  I would not be surprised to discover that
the proper action would be to shut down the processor at that stage.


-- 

William B. Clodius		Phone: (505)-665-9370
Los Alamos National Laboratory	Email: wclodius@lanl.gov
Los Alamos, NM 87545




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ariane Crash (Was: Adriane crash)
  1996-07-30  0:00       ` ++           robin
@ 1996-07-31  0:00         ` Bob Gilbert
  1996-07-31  0:00           ` William Clodius
  1996-08-01  0:00           ` ++           robin
  0 siblings, 2 replies; 14+ messages in thread
From: Bob Gilbert @ 1996-07-31  0:00 UTC (permalink / raw)



In article <4tkfe5$did@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (++           robin) writes:
> 	rgilbert@unconfigured.xvnews.domain (Bob Gilbert) writes:
> 
> 	>The error was assuming that the Ariane 4 design would be adaquate
> 	>for the Ariane 5 system.
> 
> 	>> The specific error was that a conversion of a double-precision
> 	>> floating-point value (~58 significant bits) to 15 significant
> 	>> bits caused fixed-point overflow.  The conversion was not
> 	>> checked for overflow.  It should have been.
> 
> 	>It was checked, hence the exception and an exception handler to
> 	>take corrective action.
> 
> ---The SRI computer (& its backup) had an exception
> handler, to be sure, but it did not have an exception
> handler to take corrective action.  The exception handler
> shut the computer down.

Which was the specified corrective action.

> 	> Unfortunately the corrective action was
> 	>to assume that the SRI had failed and to shut it down.  The
> 	>software performed exactly as designed.
> 
> ---The software did not performed as designed.  It was
> intended to shut down the computer only in the event of
> a hardware error.

The out of bounds data was considered to be indictative of a random hardware
fault, at least for the Ariane 4.  Perhaps this was not a valid method
of determining a hardware fault, but it was the design decision.

>  The software shut down the computer
> because of a programming error.  The software performed
> only as written!
> 
> 	>>  This is, after all,
> 	>> a real-time system.  It's a fundamental check that a programmer
> 	>> experienced in real-time systems should have carried out.
> 	>> 
> 	>>    Control was then passed to the interrupt handler, which
> 	>> shut down the system.
> 
> 	>Exactly as designed.
> 
> ---Again, not as designed.  It was designed to shut down only
> in the event that the SRI computer failed.  Then the backup
> would be used.

Again, the (wrongly assumed) SRI failure was determined by the detection 
of out of bounds data.  It was a requirements oversight, not a programming
oversight, and most certainly not influenced by the programming language used.

To quote the report:

  Although the source of the Operand Error has been identified, this in 
  itself did not cause the mission to fail. The specification of the 
  exception-handling mechanism also contributed to the failure. In the
  event of any kind of exception, the system specification stated that:
  the failure should be indicated on the databus, the failure context 
  should be stored in an EEPROM memory (which was recovered and read out
  for Ariane 501), and finally, the SRI processor should be shut down.

The last sentence of the above is what the requirements stated, and
exactly what the software did, exactly as designed.


-Bob











^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ariane Crash (Was: Adriane crash)
  1996-07-31  0:00         ` Bob Gilbert
  1996-07-31  0:00           ` William Clodius
@ 1996-08-01  0:00           ` ++           robin
  1 sibling, 0 replies; 14+ messages in thread
From: ++           robin @ 1996-08-01  0:00 UTC (permalink / raw)



	rgilbert@unconfigured.xvnews.domain (Bob Gilbert) writes:

	>In article <4tkfe5$did@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (++           robin) writes:
	>> 	rgilbert@unconfigured.xvnews.domain (Bob Gilbert) writes:
	>> 
	>> 	>The error was assuming that the Ariane 4 design would be adaquate
	>> 	>for the Ariane 5 system.
	>> 
	>> 	>> The specific error was that a conversion of a double-precision
	>> 	>> floating-point value (~58 significant bits) to 15 significant
	>> 	>> bits caused fixed-point overflow.  The conversion was not
	>> 	>> checked for overflow.  It should have been.
	>> 
	>> 	>It was checked, hence the exception and an exception handler to
	>> 	>take corrective action.
	>> 
	>> ---The SRI computer (& its backup) had an exception
	>> handler, to be sure, but it did not have an exception
	>> handler to take corrective action.  The exception handler
	>> shut the computer down.

	>Which was the specified corrective action.

---Calling it "corrective" action is stretching the English
Language a bit.  In no way shape or form was the
action "corrective".

	>> 	> Unfortunately the corrective action was
	>> 	>to assume that the SRI had failed and to shut it down.  The
	>> 	>software performed exactly as designed.
	>> 
	>> ---The software did not performed as designed.  It was
	>> intended to shut down the computer only in the event of
	>> a hardware error.

	>The out of bounds data was considered to be indictative of a random hardware
	>fault, at least for the Ariane 4.  Perhaps this was not a valid method
	>of determining a hardware fault, but it was the design decision.

---Please read what I wrote.  The overflow was not a hardware
fault.  It was a programming error that should not have occurred,
bearing in mind the "sudden death" nature of the shutdown in the
event of any kind of interrupt..

	>>  The software shut down the computer
	>> because of a programming error.  The software performed
	>> only as written!
	>> 
	>> 	>>  This is, after all,
	>> 	>> a real-time system.  It's a fundamental check that a programmer
	>> 	>> experienced in real-time systems should have carried out.
	>> 	>> 
	>> 	>>    Control was then passed to the interrupt handler, which
	>> 	>> shut down the system.
	>> 
	>> 	>Exactly as designed.
	>> 
	>> ---Again, not as designed.  It was designed to shut down only
	>> in the event that the SRI computer failed.  Then the backup
	>> would be used.

	>Again, the (wrongly assumed) SRI failure was determined by the detection 
	>of out of bounds data.  It was a requirements oversight, not a programming
	>oversight, and most certainly not influenced by the programming language used.

---If you make an assumption about the range of data,
and you are wrong, it is a programming error.

	>To quote the report:

	>  Although the source of the Operand Error has been identified, this in 
	>  itself did not cause the mission to fail. The specification of the 
	>  exception-handling mechanism also contributed to the failure. In the
	>  event of any kind of exception, the system specification stated that:
	>  the failure should be indicated on the databus, the failure context 
	>  should be stored in an EEPROM memory (which was recovered and read out
	>  for Ariane 501), and finally, the SRI processor should be shut down.

	>The last sentence of the above is what the requirements stated, and
	>exactly what the software did, exactly as designed.

---Again, the interrupt for fixed-point overflow was
not expected to happen.  The software DID NOT OPERATE
AS DESIGNED.  It failed.  You're placing too literal an
interpretation on the first sentence.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ariane Crash (Was: Adriane crash)
  1996-07-29  0:00     ` Bob Gilbert
  1996-07-30  0:00       ` ++           robin
@ 1996-08-02  0:00       ` root
  1 sibling, 0 replies; 14+ messages in thread
From: root @ 1996-08-02  0:00 UTC (permalink / raw)



In article <4torim$ku8@goanna.cs.rmit.edu.au>,
	rav@goanna.cs.rmit.edu.au (++           robin) writes:
<snip>
>
>---Please read what I wrote.  The overflow was not a hardware
>fault.  It was a programming error that should not have occurred,
>bearing in mind the "sudden death" nature of the shutdown in the
>event of any kind of interrupt..

 ++robin, please read what the poster wrote ... he was describing a
 situation where, by spec, the event was deemed to indicate a hardware
 fault. We can all see clearly that it was not a hardware fault in this
 case; however that does not relieve the s/w of it's requirement to
 treat the event as indicative of a hardware fault.

 btw: A 'spec' is when a customer tells you what he thinks he wants.
      You may or may not agree with his interpretation of what he wants,
      but if you want the work, you promise to deliver what he SAYS! he
      wants - even if it is wrong - unless you can convince him to fix
      his wrong 'spec'. The embedded systems world uses 'spec' to
      define a 'design'; then customer gets to piss in the design as well.

<snip>
>---If you make an assumption about the range of data,
>and you are wrong, it is a programming error.
>

 Unless the 'spec'/'design' require you to make that assumption ...

<snip>
>---Again, the interrupt for fixed-point overflow was
>not expected to happen.  The software DID NOT OPERATE
>AS DESIGNED.  It failed.  You're placing too literal an
>interpretation on the first sentence.

 I believe the report clearly indicates that software operated per design.
 The fault lies with adapting existing software to a new mission, without
 doing sufficient system engineering to see where the old design needed
 to be beefed up to meet the new mission!

 Re: your favorite language & embedded systems ... is that all a troll,
     or what ?


                                           regards






^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~1996-08-02  0:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-07-23  0:00 Adriane crash Jerry van Dijk
1996-07-25  0:00 ` Ariane Crash (Was: Adriane crash) John McCabe
1996-07-26  0:00   ` ++           robin
1996-07-29  0:00     ` John McCabe
1996-07-29  0:00     ` Bob Gilbert
1996-07-30  0:00       ` ++           robin
1996-07-31  0:00         ` Bob Gilbert
1996-07-31  0:00           ` William Clodius
1996-08-01  0:00           ` ++           robin
1996-08-02  0:00       ` root
1996-07-25  0:00 ` Adriane crash Peter Hermann
1996-07-27  0:00   ` Jerry van Dijk
1996-07-25  0:00 ` Steve O'Neill
1996-07-26  0:00 ` David Verrier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox