comp.lang.ada
 help / color / mirror / Atom feed
* Ariane 5 - not an exception?
@ 1996-07-25  0:00 Simon Bluck
  1996-07-26  0:00 ` JP Thornley
                   ` (5 more replies)
  0 siblings, 6 replies; 111+ messages in thread
From: Simon Bluck @ 1996-07-25  0:00 UTC (permalink / raw)



The Ariane 501 flight failure was due to the raising of an unexpected
Ada exception, which was handled by switching off the computer.  The
report on this:

   http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html

is clear and hard-hitting: it will result in much improved software.
But does it get right to the bottom of the issues, and does the
software community appreciate that there are fundamental software
control problems which can directly give rise to such enormous
failures, in this particular case thankfully without loss of life?

It is most unfortunate, but must be accepted as true, that if the
Ariane software had been written in a less powerful language the
numeric overflow might have gone unnoticed, the computers would have
remained switched on, and the rocket would have continued its upward
flight.

Exceptions and assertions are both used, in Ada and C/C++, to detect
software/hardware anomalies.  When one of these trips, it is
frequently very difficult for the designer to know how best to handle
the problem.  To continue may result in corrupt data; to abort is
drastic but eliminates the possibility that further processing will
compound the problem.

The more checks you have, the more likely it is that one of them will
trip.  If you can't think of good ways of handling these checks, the
end result, for the user, may well be very much worse than if the
check had never been performed in the first place.

Of the two handling options, neither is really acceptable.  However,
there is a third option which ought to be considered: to continue but
mark the processed data as suspect.

I.e. each data item would have a truth value of 1.0 for good data,
0.0 for absolutely rotten data, utilising values in between if you
have some idea how good the data is.  If you have numeric overflow,
you could set the data to the largest value available, and mark it as
suspect.

Any data further derived from suspect data must also be marked as
suspect.

Taking a probabilistic attitude to data would bring a lot of software
into the real world where failures can happen at all levels.  Using
this approach would made complex mission-critical software like the
failing Ariane software much easier to understand and control.  Data
would be processed along the same path regardless of whether it is
suspect or entirely valid.  Only the end-users of the data would be
affected, and where duplication of systems provides redundancy, the
algorithm would be to switch to the backup on receiving suspect data,
and switch back to the main source if the backup was suspect.  If
both sources are suspect, then take the least suspect source.  This
is simple and you don't lose your vital input data.  The data truth
values would be passed on from system to system along with the data.

You _never_ switch off a computer, but you may have cause to mark all
data emanating from it as suspect.  Leave it up to the users of the
data to decide if they want to use it or not - they may have no
choice.


Along with the data truth attribute, you need a data type attribute.
This is tending to be relatively standard stuff now that objects are
around and need to know what kind of object they are.  But adding a
data type field is still something that designers skimp on if not
supplied by the language, relying instead on implicit coding of type
information in the senders and receivers of data.

Lack of type information accounts for why the Ariane flight control
was able to interpret diagnostic data as attitude data, virtually
guaranteeing catastrophic failure.  At least if attitude data had
been cut short it could have continued in a straight line.


Well, those are what I think are the important lessons to be learned.
The main reasons cited for Ariane 501's failure are typical human
ones which will be made again on the next big project.  I.e.
inadequate testing, particularly of the complete system in its
(simulated) environment.  Surprise, surprise, this turns out to be
too difficult and too costly to achieve thoroughly.  And small system
mistakes which stress the adequate functioning of the system as a
whole (like thinking that the Ariane 4 alignment process didn't need
changing for Ariane 5).  These will happen time and again, we're only
human.  But with more realistic data processing the system as a whole
would stand a better chance of survival.

SimonB

[All my own opinions, of course.]





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-25  0:00 Simon Bluck
@ 1996-07-26  0:00 ` JP Thornley
  1996-07-29  0:00   ` Ken Garlington
                     ` (4 more replies)
  1996-07-26  0:00 ` Theodore E. Dennison
                   ` (4 subsequent siblings)
  5 siblings, 5 replies; 111+ messages in thread
From: JP Thornley @ 1996-07-26  0:00 UTC (permalink / raw)



In article: <Dv45EJ.8r@fsa.bris.ac.uk>  simonb@pact.srf.ac.uk (Simon 
Bluck) writes:
> 
> The Ariane 501 flight failure was due to the raising of an unexpected
> Ada exception, which was handled by switching off the computer.  The
> report on this:
> 
>    http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html
> 
> is clear and hard-hitting: it will result in much improved software.
> But does it get right to the bottom of the issues, 

Don't know about that until I get to read the full report (the above 
reference is to a press release about the report).

                                               ...... and does the
> software community appreciate that there are fundamental software
> control problems which can directly give rise to such enormous
> failures, in this particular case thankfully without loss of life?

Yup - that's why we accept coding rates that we haven't seen since all 
input and output was in reverse binary (and I'm not sure that we get 
even that).


> [snip]
> Exceptions and assertions are both used, in Ada and C/C++, to detect
> software/hardware anomalies.  When one of these trips, it is
> frequently very difficult for the designer to know how best to handle
> the problem.  To continue may result in corrupt data; to abort is
> drastic but eliminates the possibility that further processing will
> compound the problem.
> 

That's why the *software* designer must not make these decisions.  Any 
action in response to an unexpected event (corrupt data, out-of-range 
values, etc) affects the *system* behaviour and must be known about at 
the system level, so that the consequences can be taken into account in 
the system safety case.

> The more checks you have, the more likely it is that one of them will
> trip.  If you can't think of good ways of handling these checks, the
> end result, for the user, may well be very much worse than if the
> check had never been performed in the first place.
> 

My experience is with systems where all the code is compiled with  
checks suppressed.  This allows us to strip out the exception handling 
code from the run-time (a substantial simplification) and put in exactly 
the checks we want exactly where we want them.  (But I am aware of 
differences in approach by other people).

> Of the two handling options, neither is really acceptable.  However,
> there is a third option which ought to be considered: to continue but
> mark the processed data as suspect.
> 

Simon then goes on to describe a way of dealing with data validities 
that unfortunately breaks the most fundamental rule of safety-critical 
code - Keep It Simple.  It's an idea that might work with 
mission-critical code, but the thought of implementing it for 
safety-critical code (remembering that any one of these systems is 
probably handling in the range 200-500 pieces of data - each with its 
associated data validity) is beyond anything that I know how to tackle.

(and I've just realised that each of these 'truth values' and the data 
type information will require their own data validities - this gets 
even more complicated than I first thought)

Phil Thornley

-- 
------------------------------------------------------------------------
| JP Thornley    EMail jpt@diphi.demon.co.uk                           |
------------------------------------------------------------------------





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-25  0:00 Simon Bluck
                   ` (2 preceding siblings ...)
  1996-07-26  0:00 ` ++           robin
@ 1996-07-26  0:00 ` Bob Gilbert
  1996-07-29  0:00   ` Martin Tom Brown
  1996-07-27  0:00 ` Bill Angel
  1996-07-30  0:00 ` Dr. Richard Botting
  5 siblings, 1 reply; 111+ messages in thread
From: Bob Gilbert @ 1996-07-26  0:00 UTC (permalink / raw)



In article <Dv45EJ.8r@fsa.bris.ac.uk>, simonb@pact.srf.ac.uk (Simon Bluck) writes:
> The Ariane 501 flight failure was due to the raising of an unexpected
> Ada exception, which was handled by switching off the computer.

The failure was due to the reuse of software, a software design, and software
requirements specification which was purposely designed to raise the said
exception and perform a SRI shutdown should certain limits be exceeded. 
Unfortunately these limits were well within normal operating budget of the
Ariane 5, but would be considered a failure (most probably hardware) if 
encountered on the Ariane 4 from which the software/design/requirements
were borrowed.

>  The report on this:
> 
>    http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html
> 
> is clear and hard-hitting: it will result in much improved software.
> But does it get right to the bottom of the issues, and does the
> software community appreciate that there are fundamental software
> control problems which can directly give rise to such enormous
> failures, in this particular case thankfully without loss of life?
> 
> It is most unfortunate, but must be accepted as true, that if the
> Ariane software had been written in a less powerful language the
> numeric overflow might have gone unnoticed, the computers would have
> remained switched on, and the rocket would have continued its upward
> flight.

A "less powerful" language *might* have continued and the numeric
overflow gone unnoticed, or a similar catostrophic failure could
have occured with little chance of tracing the source of the failure.
But even if a "less powerful" language were used, it is probable
that the resulting code would have been designed to the same faulty
logic contained in the requirements specification.

The Ariane failure has little or nothing to do with language selection
or even correct coding.  The software performed exactly as designed.
The problem resulted in an oversight in the design limitations and
differences between the Ariane 4 and Ariane 5 systems.  I think the 
lessons to be learned here are about the reuse of software (especially
design and requirements) and testing.

-Bob






^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-25  0:00 Simon Bluck
  1996-07-26  0:00 ` JP Thornley
  1996-07-26  0:00 ` Theodore E. Dennison
@ 1996-07-26  0:00 ` ++           robin
  1996-07-29  0:00   ` Bill Angel
                     ` (6 more replies)
  1996-07-26  0:00 ` Bob Gilbert
                   ` (2 subsequent siblings)
  5 siblings, 7 replies; 111+ messages in thread
From: ++           robin @ 1996-07-26  0:00 UTC (permalink / raw)



	simonb@pact.srf.ac.uk (Simon Bluck) writes:

	>The Ariane 501 flight failure was due to the raising of an unexpected
	>Ada exception,

---An exception, yes, but not unexpected.

   The programming mistake made was in assuming that a
floating-point value of some 58 significant bits would
somehow "fit" into a 15-bit integer.

   There was no check that the data conversion would not
result in overflow, so the problem went to the error
handler, which shut down the system.

 	>which was handled by switching off the computer.  The
	>report on this:

	>   http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html

	>is clear and hard-hitting: it will result in much improved software.
	>But does it get right to the bottom of the issues, and does the
	>software community appreciate that there are fundamental software
	>control problems which can directly give rise to such enormous
	>failures, in this particular case thankfully without loss of life?

	>It is most unfortunate, but must be accepted as true, that if the
	>Ariane software had been written in a less powerful language the
	>numeric overflow might have gone unnoticed, the computers would have
	>remained switched on, and the rocket would have continued its upward
	>flight.

	>Exceptions and assertions are both used, in Ada and C/C++,

---and PL/I

	>to detect
	>software/hardware anomalies.  When one of these trips, it is
	>frequently very difficult for the designer to know how best to handle
	>the problem.

---Not in the case of a simple fixed-point overflow -- as was the
case with Ariane.  It is a fact that real-time programming
has been available in PL/I for some 30 years, and recovery
from errors is standard established practice.

	> To continue may result in corrupt data;

---To continue in this case probably would need the value to
be set to the maximum. And it wouldn't be corrupt data.

	>to abort is
	>drastic but eliminates the possibility that further processing will
	>compound the problem.

---What?  Here, the lack of further processing resulted in
destruction of the project!

	>The more checks you have, the more likely it is that one of them will
	>trip.  If you can't think of good ways of handling these checks, the
	>end result, for the user, may well be very much worse than if the
	>check had never been performed in the first place.

	>Of the two handling options, neither is really acceptable.  However,
	>there is a third option which ought to be considered: to continue but
	>mark the processed data as suspect.

There are other better approaches.  One is to continue
with the maximum value; another is to avoid the use of
a 16-bit variable, and to use a variable as the same
size and type (here floating-point storage),
thus avoiding the problem altogether.

	>I.e. each data item would have a truth value of 1.0 for good data,
	>0.0 for absolutely rotten data, utilising values in between if you
	>have some idea how good the data is.  If you have numeric overflow,
	>you could set the data to the largest value available, and mark it as
	>suspect.

	>Any data further derived from suspect data must also be marked as
	>suspect.

	>Taking a probabilistic attitude to data would bring a lot of software
	>into the real world where failures can happen at all levels.  Using
	>this approach would made complex mission-critical software like the
	>failing Ariane software much easier to understand and control.  Data
	>would be processed along the same path regardless of whether it is
	>suspect or entirely valid.  Only the end-users of the data would be
	>affected, and where duplication of systems provides redundancy, the
	>algorithm would be to switch to the backup on receiving suspect data,
	>and switch back to the main source if the backup was suspect.

---In Ariane, both the active processor and the backup failed at
the same time, because it was a *programming* error that was
encountered at the same time in both processors, and both
processors were shut down at the same time by their respective
error handlers.

	>  If both sources are suspect, then take the least suspect source.  This
	>is simple and you don't lose your vital input data.  The data truth
	>values would be passed on from system to system along with the data.

	>You _never_ switch off a computer, but you may have cause to mark all
	>data emanating from it as suspect.  Leave it up to the users of the
	>data to decide if they want to use it or not - they may have no
	>choice.

---Indeed.

	>Along with the data truth attribute, you need a data type attribute.
	>This is tending to be relatively standard stuff now that objects are
	>around and need to know what kind of object they are.  But adding a
	>data type field is still something that designers skimp on if not
	>supplied by the language, relying instead on implicit coding of type
	>information in the senders and receivers of data.

	>Lack of type information accounts for why the Ariane flight control
	>was able to interpret diagnostic data as attitude data, virtually
	>guaranteeing catastrophic failure.  At least if attitude data had
	>been cut short it could have continued in a straight line.

---This is more of a lack of communication between the two
programs.  Another design error.

	>Well, those are what I think are the important lessons to be learned.

---I think the real lessons are that
1. real-time programming requires special expertise.
2. the choice of language is suspect.  A better-established
   language such as PL/I -- specifically designed for
   real-time programming -- with robust compilers, and
   with its base of experienced programming
   staff could well have prevented this disaster.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-25  0:00 Simon Bluck
  1996-07-26  0:00 ` JP Thornley
@ 1996-07-26  0:00 ` Theodore E. Dennison
  1996-07-29  0:00   ` Ken Garlington
  1996-07-26  0:00 ` ++           robin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 111+ messages in thread
From: Theodore E. Dennison @ 1996-07-26  0:00 UTC (permalink / raw)



Simon Bluck wrote:
> 
> It is most unfortunate, but must be accepted as true, that if the
> Ariane software had been written in a less powerful language the
> numeric overflow might have gone unnoticed, the computers would have
> remained switched on, and the rocket would have continued its upward
> flight.

If the Ariane software had been written in a less powerful language, 
the overflow might have gone unnoticed, while writing garbage in a 
nearby data/code location. This could easily have caused the exact
same result, with the important difference that the committe could
never have isolted the problem as well as they did.

> You _never_ switch off a computer, but you may have cause to mark all
> data emanating from it as suspect.  Leave it up to the users of the
> data to decide if they want to use it or not - they may have no
> choice.

Silly. If you _never_ switch over to the standby computer, there is
no point to having it, is there?

I could see the logic in rewriting the code on the backup machine
to try to continue with "best effort" data when an error is detected,
but I can't agree with the logic that the primary machine should 
continue to spit out data, even when it knows it has errors (and there
is a backup machine available), which is what the committe seems to
be suggesting.

-- 
T.E.D.          
                |  Work - mailto:dennison@escmail.orl.mmc.com  |
                |  Home - mailto:dennison@iag.net              |
                |  URL  - http://www.iag.net/~dennison         |




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-25  0:00 Simon Bluck
                   ` (3 preceding siblings ...)
  1996-07-26  0:00 ` Bob Gilbert
@ 1996-07-27  0:00 ` Bill Angel
  1996-07-30  0:00 ` Dr. Richard Botting
  5 siblings, 0 replies; 111+ messages in thread
From: Bill Angel @ 1996-07-27  0:00 UTC (permalink / raw)



In article <Dv45EJ.8r@fsa.bris.ac.uk>,
Simon Bluck <simonb@pact.srf.ac.uk> wrote:
>
>I.e. each data item would have a truth value of 1.0 for good data,
>0.0 for absolutely rotten data, utilising values in between if you
>have some idea how good the data is...Taking a probabilistic 
>attitude to data would bring a lot of software into the real world where
>failures can happen at all levels...

I believe that your idea is a very valid one.

It is so valid, that someone else has already thought of it!

You should check out the topic of "fuzzy logic" as it is applied to
industrial control. As you are suggesting, "fuzzy logic" allows one to
determine the membership of some measurement in the set comprised of "good
data". "Fuzzy logic" can also formulate a response when the input data
from various measurements yield conflicting directives as to how the
system should behave. If necessary, the response (to conflicting, out of
range, or inconsistent measurements) could be specified as a set of
directives with different precedence levels.  The directives could be
"heuristic" in nature and could have been  specified by human experts
during the analysis/design stage of the project. 

 -- Bill Angel









^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` Theodore E. Dennison
@ 1996-07-29  0:00   ` Ken Garlington
  0 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-07-29  0:00 UTC (permalink / raw)



Theodore E. Dennison wrote:
> 
> I could see the logic in rewriting the code on the backup machine
> to try to continue with "best effort" data when an error is detected,
> but I can't agree with the logic that the primary machine should
> continue to spit out data, even when it knows it has errors (and there
> is a backup machine available), which is what the committe seems to
> be suggesting.

Note that this would require the backup computer to have different
software than the primary. That isn't the case now:

"There are two SRIs operating in parallel, with identical hardware and software.
One SRI is active and one is in "hot" stand-by, and if the OBC detects that the
active SRI has failed it immediately switches to the other one, provided that
this unit is functioning properly."

To keep the software identical in both, you would want both to attempt to
produce semi-valid data in the presence of errors. Having identical software
in both might be advantageous from an analysis standpoint.

> --
> T.E.D.
>                 |  Work - mailto:dennison@escmail.orl.mmc.com  |
>                 |  Home - mailto:dennison@iag.net              |
>                 |  URL  - http://www.iag.net/~dennison         |

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` JP Thornley
@ 1996-07-29  0:00   ` Ken Garlington
  1996-07-29  0:00   ` JP Thornley
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-07-29  0:00 UTC (permalink / raw)



JP Thornley wrote:
> 
> In article: <Dv45EJ.8r@fsa.bris.ac.uk>  simonb@pact.srf.ac.uk (Simon
> Bluck) writes:
> >
> > The Ariane 501 flight failure was due to the raising of an unexpected
> > Ada exception, which was handled by switching off the computer.  The
> > report on this:
> >
> >    http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html
> >
> > is clear and hard-hitting: it will result in much improved software.
> > But does it get right to the bottom of the issues,
> 
> Don't know about that until I get to read the full report (the above
> reference is to a press release about the report).

There is a pointer to the full report at the bottom of the press release.

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-29  0:00   ` Bill Angel
@ 1996-07-29  0:00     ` Paul_Green
  1996-07-30  0:00     ` Lloyd Fischer
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 111+ messages in thread
From: Paul_Green @ 1996-07-29  0:00 UTC (permalink / raw)



In article <4tiu6e$kpm@news2.cais.com>, wtangel@cais3.cais.com (Bill Angel) 
writes:

>
>In article <4t9vdg$jfb@goanna.cs.rmit.edu.au>,
>++           robin <rav@goanna.cs.rmit.edu.au> wrote:
>>In Ariane, both the active processor and the backup failed at
>>the same time, because it was a *programming* error that was
>>encountered at the same time in both processors, and both
>>processors were shut down at the same time by their respective
>>error handlers.
>
>	I am under the impression that for the US manned spaceflight
>program (to get to the moon) ,an on-board computer that was serving as a
>backup to the primary computer would have been performing its computations
>using completely different software than the primary computer. By
>utilizing this methodology, the same software "glitch" would not halt both
>systems simultaneously.  Perhaps a group of software developers could be
>tasked with producing a version of the on-board software for Ariane in a
>different computer language than that used by the primary processor. The
>two processors, running simultaneously, would serve to check each other's
>results with greater independence that they apparently do now.
>
> -- Bill Angel

Two doesn't do you much good. Who do you believe when they disagree? The 
fault-tolerant designs I'm aware of use at least 3 computers (so-called triple 
module redundancy). Stratus happens to use 4. The US space shuttle uses 5. 
There is no reason you can't use even more. Ever heard of the Byzantine 
Generals problem?  How does a group of generals make decisions based on the 
true consensus of the group despite the presence in their midst of a traitor.  
If you can solve this problem, you can build a fault-tolerant computer.

Last I knew, the shuttle had 4 computers programmed by one group and 1 computer 
programmed by a separate group. But this is so expensive to do that I think 
they only use this technique for the takeoff/landing phases. Even then, I 
suspect that the 5th computer is really there only in case the first 4 fail 
utterly. But perhaps someone who works on this can tell us for sure.


Paul Green                  | Mail: Paul_Green@stratus.com
Senior Technical Consultant | Voice: +1 508-460-2557 FAX: +1 508-460-0397
Stratus Computer, Inc.      | Video: PictureTel/AT&T by request.
Marlboro, MA 01752          | Disclaimer: I speak for myself, not Stratus.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` ++           robin
@ 1996-07-29  0:00   ` Bill Angel
  1996-07-29  0:00     ` Paul_Green
                       ` (6 more replies)
  1996-07-30  0:00   ` Ken Garlington
                     ` (5 subsequent siblings)
  6 siblings, 7 replies; 111+ messages in thread
From: Bill Angel @ 1996-07-29  0:00 UTC (permalink / raw)




In article <4t9vdg$jfb@goanna.cs.rmit.edu.au>,
++           robin <rav@goanna.cs.rmit.edu.au> wrote:
>In Ariane, both the active processor and the backup failed at
>the same time, because it was a *programming* error that was
>encountered at the same time in both processors, and both
>processors were shut down at the same time by their respective
>error handlers.

	I am under the impression that for the US manned spaceflight
program (to get to the moon) ,an on-board computer that was serving as a
backup to the primary computer would have been performing its computations
using completely different software than the primary computer. By
utilizing this methodology, the same software "glitch" would not halt both
systems simultaneously.  Perhaps a group of software developers could be
tasked with producing a version of the on-board software for Ariane in a
different computer language than that used by the primary processor. The
two processors, running simultaneously, would serve to check each other's
results with greater independence that they apparently do now.

 -- Bill Angel




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` JP Thornley
  1996-07-29  0:00   ` Ken Garlington
@ 1996-07-29  0:00   ` JP Thornley
  1996-07-29  0:00   ` Nigel Tzeng
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 111+ messages in thread
From: JP Thornley @ 1996-07-29  0:00 UTC (permalink / raw)



In article: <285641259wnr@diphi.demon.co.uk>  JP Thornley 
<jpt@diphi.demon.co.uk> writes:
> In article: <Dv45EJ.8r@fsa.bris.ac.uk>  simonb@pact.srf.ac.uk (Simon 
> Bluck) writes:
> > 
> > The Ariane 501 flight failure was due to the raising of an 
unexpected
> > Ada exception, which was handled by switching off the computer.  The
> > report on this:
> > 
> >    http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html
> > 
> > is clear and hard-hitting: it will result in much improved software.
> > But does it get right to the bottom of the issues, 
> 
> Don't know about that until I get to read the full report (the above 
> reference is to a press release about the report).
> 

Ah well, goofed on that one - the printed report has no more info than 
the reference above.  

To me the big lesson is not the various technical issues, but the 
statement that "the view had been taken that software should be 
considered correct until it is shown to be at fault".  This seems quite 
amazing.

The report also describes the software as "mission critical", which in 
my terminology suggests a much lower dependability of software than 
safety-critical.  Even though there were no crew at risk I would have 
expected the enormous financial cost of a failure to push the software 
into the safety-critical area.

Phil Thornley
-- 
------------------------------------------------------------------------
| JP Thornley    EMail jpt@diphi.demon.co.uk                           |
------------------------------------------------------------------------





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` Bob Gilbert
@ 1996-07-29  0:00   ` Martin Tom Brown
  1996-07-30  0:00     ` John McCabe
  0 siblings, 1 reply; 111+ messages in thread
From: Martin Tom Brown @ 1996-07-29  0:00 UTC (permalink / raw)



In article <4tb8vv$bna@zeus.orl.mmc.com>
           rgilbert@unconfigured.xvnews.domain "Bob Gilbert" writes:

> In article <Dv45EJ.8r@fsa.bris.ac.uk>, simonb@pact.srf.ac.uk (Simon Bluck)
>  writes:
> > The Ariane 501 flight failure was due to the raising of an unexpected
> > Ada exception, which was handled by switching off the computer.
> 
> The failure was due to the reuse of software, a software design, and software
> requirements specification which was purposely designed to raise the said
> exception and perform a SRI shutdown should certain limits be exceeded. 

Yes - one can argue about the reasonableness of killing the active SRI
when the backup has already failed. And also about the decision made
during s/w development that out of 7 variables at risk of overflow only
4 were fully protected, and 3 unprotected which were felt not at risk
to keep CPU load inside bounds <80% (for Ariane 4 flight trajectories).

It adds insult to injury that the calculation code which failed was 
only valid when the system was on the *ground* awaiting launch.
It's a very good report and makes interesting reading.

> Unfortunately these limits were well within normal operating budget of the
> Ariane 5, but would be considered a failure (most probably hardware) if 
> encountered on the Ariane 4 from which the software/design/requirements
> were borrowed.
> 
> >  The report on this:
> > 
> >    http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html
> > 

> The Ariane failure has little or nothing to do with language selection
> or even correct coding.  The software performed exactly as designed.

Again you can question the wisdom of shutting down the active IRS
when the backup is already dead - continuing with best guess data
would have been a marginally better option at that stage. 
The defect is traceable back to the requirements specification, and
the assumption that single point hardware failure is more likely 
than systematic software error in a redundant system.

> The problem resulted in an oversight in the design limitations and
> differences between the Ariane 4 and Ariane 5 systems.  I think the 
> lessons to be learned here are about the reuse of software (especially
> design and requirements) and testing.

The report's recommendations seem generally applicable to a large
number of systems engineering projects with a software component.

Regards,
-- 
Martin Brown  <martin@nezumi.demon.co.uk>     __                CIS: 71651,470
Scientific Software Consultancy             /^,,)__/




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` JP Thornley
  1996-07-29  0:00   ` Ken Garlington
  1996-07-29  0:00   ` JP Thornley
@ 1996-07-29  0:00   ` Nigel Tzeng
  1996-07-30  0:00   ` Robert I. Eachus
  1996-08-01  0:00   ` Ken Garlington
  4 siblings, 0 replies; 111+ messages in thread
From: Nigel Tzeng @ 1996-07-29  0:00 UTC (permalink / raw)



In article <285641259wnr@diphi.demon.co.uk>,
JP Thornley  <jpt@diphi.demon.co.uk> wrote:
>In article: <Dv45EJ.8r@fsa.bris.ac.uk>  simonb@pact.srf.ac.uk (Simon 
>Bluck) writes:

[snip] 

>> Of the two handling options, neither is really acceptable.  However,
>> there is a third option which ought to be considered: to continue but
>> mark the processed data as suspect.
>> 
>
>Simon then goes on to describe a way of dealing with data validities 
>that unfortunately breaks the most fundamental rule of safety-critical 
>code - Keep It Simple.  It's an idea that might work with 
>mission-critical code, but the thought of implementing it for 
>safety-critical code (remembering that any one of these systems is 
>probably handling in the range 200-500 pieces of data - each with its 
>associated data validity) is beyond anything that I know how to tackle.
>
>(and I've just realised that each of these 'truth values' and the data 
>type information will require their own data validities - this gets 
>even more complicated than I first thought)

Actually we do this all the time...on the ground.  Generating "truth
values" isn't very different (if I understand it correctly) from doing
limit checking and detection of stale data in the telemetry stream.

Now the primary caveat is that on the ground we have tons of CPU to
throw at the problem.  Not something you can do with a 1750A or even a
386.

Actually if you flag the data when an overflow exception is generated
it wont be too bad at all...hmmm...and it gives all downstream
processes visibility in which data point(s) went bad.  Not as good as
true limit checking but much much faster.

It would be annoying to shoehorn this into legacy code but if you're
doing something from scratch...you need a reasonably transparent way
of associating data points with truth values.  That could be as simple
as creating your own int and float classes.  I wont hazard a guess at
what performance hits you'd take though...and there are simpler ways
(already suggested elsewhere in this thread) to solve this particular
problem and it doesn't address the higher level problem of the
failover mechanism from primary to backup.

>Phil Thornley

Nigel





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` ++           robin
  1996-07-29  0:00   ` Bill Angel
@ 1996-07-30  0:00   ` Ken Garlington
  1996-08-02  0:00     ` Craig P. Beyers
  1996-07-30  0:00   ` Steve O'Neill
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 111+ messages in thread
From: Ken Garlington @ 1996-07-30  0:00 UTC (permalink / raw)



++ robin wrote:
> 
>         simonb@pact.srf.ac.uk (Simon Bluck) writes:
> 
>         >The Ariane 501 flight failure was due to the raising of an unexpected
>         >Ada exception,
> 
> ---An exception, yes, but not unexpected.
> 
>    The programming mistake made was in assuming that a
> floating-point value of some 58 significant bits would
> somehow "fit" into a 15-bit integer.

Actually, you have values much larger than 58 significant bits
fit into a 15-bit (or 16-bit) integer. You just have to throw
out enough precision!

>    There was no check that the data conversion would not
> result in overflow, so the problem went to the error
> handler, which shut down the system.

This is certainly true.

> ---To continue in this case probably would need the value to
> be set to the maximum. And it wouldn't be corrupt data.

No. But it might still generate an incorrect vector to the flight
controls, which would still have crashed the vehicle. It's just that
the IRS would be spewing data right up to that last "hard landing."

> There are other better approaches.  One is to continue
> with the maximum value; another is to avoid the use of
> a 16-bit variable, and to use a variable as the same
> size and type (here floating-point storage),
> thus avoiding the problem altogether.

Not always easy to do. Do much MIL-STD-1553 processing between
dissimilar CPUs?

> ---This is more of a lack of communication between the two
> programs.  Another design error.

Actually, the amount of communication between a primary and a
backup system is another tough system problem. We went through
this on the F-16. In general, the backup shouldn't trust state
data from the primary, since this can create a common mode failure.
On the other hand, with _no_ state data, the backup may be unable
to take over from the primary. Add to this the desire to keep the
backup software identical to the primary, to reduce the amount of
unique software to analyze and test, and it's a non-trivial thought
process.

> ---I think the real lessons are that
> 1. real-time programming requires special expertise.

Amen to this. Safety-critical real-time programming even more so.

> 2. the choice of language is suspect.  A better-established
>    language such as PL/I -- specifically designed for
>    real-time programming -- with robust compilers, and
>    with its base of experienced programming
>    staff could well have prevented this disaster.

Every claim you just made for PL/I can amply be made for Ada.

Am I using anything you've built in PL/I?

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-29  0:00   ` Bill Angel
  1996-07-29  0:00     ` Paul_Green
  1996-07-30  0:00     ` Lloyd Fischer
@ 1996-07-30  0:00     ` Ken Garlington
  1996-07-30  0:00     ` Bob Kurtz
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-07-30  0:00 UTC (permalink / raw)



Bill Angel wrote:
> 
>         I am under the impression that for the US manned spaceflight
> program (to get to the moon) ,an on-board computer that was serving as a
> backup to the primary computer would have been performing its computations
> using completely different software than the primary computer. By
> utilizing this methodology, the same software "glitch" would not halt both
> systems simultaneously.  Perhaps a group of software developers could be
> tasked with producing a version of the on-board software for Ariane in a
> different computer language than that used by the primary processor. The
> two processors, running simultaneously, would serve to check each other's
> results with greater independence that they apparently do now.

The problem with N-version programming is the common mode requirements error.
See Dr. Levison's work, which fits my own experience. In this case, if both
systems had the same problems with throughput margins, and the same basic
information regarding the need to protect calculations were available to both
teams, the very likely would have chosen the same (wrong) design solution.

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` JP Thornley
                     ` (2 preceding siblings ...)
  1996-07-29  0:00   ` Nigel Tzeng
@ 1996-07-30  0:00   ` Robert I. Eachus
  1996-07-31  0:00     ` JP Thornley
  1996-08-01  0:00   ` Ken Garlington
  4 siblings, 1 reply; 111+ messages in thread
From: Robert I. Eachus @ 1996-07-30  0:00 UTC (permalink / raw)



In article <483202904wnr@diphi.demon.co.uk> JP Thornley <jpt@diphi.demon.co.uk> writes:

  > The report also describes the software as "mission critical", which in 
  > my terminology suggests a much lower dependability of software than 
  > safety-critical.  Even though there were no crew at risk I would have 
  > expected the enormous financial cost of a failure to push the software 
  > into the safety-critical area.

   First, I think of mission critical as a different category than
safety critical.  In safety critical systems, fail safe is often an
option where in mission critical systems you need to fail operational.
And yes, systems can be safety AND mission critical.  Those are the
expensive ones.

   Having said that, this software should have been classed exactly
that way, given the amount of miscellaneous missle parts that ended up
scattered over the launch site, and the possibility that a guidance
failure could put the missle anywhere in the world.



--

					Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-29  0:00   ` Bill Angel
                       ` (2 preceding siblings ...)
  1996-07-30  0:00     ` Ken Garlington
@ 1996-07-30  0:00     ` Bob Kurtz
  1996-07-30  0:00     ` Nancy Mead
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 111+ messages in thread
From: Bob Kurtz @ 1996-07-30  0:00 UTC (permalink / raw)



In article <4tiu6e$kpm@news2.cais.com>, wtangel@cais3.cais.com (Bill
Angel) wrote:

> In article <4t9vdg$jfb@goanna.cs.rmit.edu.au>,
> ++           robin <rav@goanna.cs.rmit.edu.au> wrote:
> >In Ariane, both the active processor and the backup failed at
> >the same time, because it was a *programming* error that was
> >encountered at the same time in both processors, and both
> >processors were shut down at the same time by their respective
> >error handlers.
> 
>         I am under the impression that for the US manned spaceflight
> program (to get to the moon) ,an on-board computer that was serving as a
> backup to the primary computer would have been performing its computations
> using completely different software than the primary computer. By
> utilizing this methodology, the same software "glitch" would not halt both
> systems simultaneously.  Perhaps a group of software developers could be
> tasked with producing a version of the on-board software for Ariane in a
> different computer language than that used by the primary processor. The
> two processors, running simultaneously, would serve to check each other's
> results with greater independence that they apparently do now.
> 
>  -- Bill Angel

A number of people have posted on this topic, about the space shuttle, the
A340, etc.  And I'll admit, there is some value (though expensive) to this
approach.  But...

Very frequently, errors are introduced into large software systems in the
requirements specification and design phases.  These are often problems
that dwarf mere coding errors, and are much more difficult to detect
("Gosh, it all passed unit testing okay...").  The Ariane 501 failure is,
in my opinion, an error of this type.  The root cause of the inertial
navigation system failure was the support for an Ariane 4 alignment
requirement not valid for Ariane 5, along with Ariane 4 trajectory
constraints also not valid for Ariane 5.

This sounds like a serious requirements/design oversight.  It's not clear
that having several developer teams working independently would not result
in two completely different programs that exhibit the same disasterous
behavior.

-- 
Bob Kurtz (kurtz@mustang.nrl.navy.mil)
Hughes STX Corp., US Naval Research Lab, Washington DC




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00   ` David Weller
@ 1996-07-30  0:00     ` Robert Dewar
  0 siblings, 0 replies; 111+ messages in thread
From: Robert Dewar @ 1996-07-30  0:00 UTC (permalink / raw)



iDavid Weller said

"C/C++ assertions are not part of the language, but part of a separate
library (assert.h)."

Is this true? I thought that at least the draft standard for C++
included this library. I don't know about ANSI C?

Libraries that are included in the ANSI C standard or C++ DIS are every
much a part of the language as the if statement -- the same is true
of course for Annex A in the Ada 95 standard.





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00     ` Richard Shetron
@ 1996-07-30  0:00       ` ++           robin
  0 siblings, 0 replies; 111+ messages in thread
From: ++           robin @ 1996-07-30  0:00 UTC (permalink / raw)



	multics@wizvax.wizvax.net (Richard Shetron) writes:

	>In article <4tiu6e$kpm@news2.cais.com>,
	>Bill Angel <wtangel@cais3.cais.com> wrote:

	>>In article <4t9vdg$jfb@goanna.cs.rmit.edu.au>,
	>>++           robin <rav@goanna.cs.rmit.edu.au> wrote:
	>>>In Ariane, both the active processor and the backup failed at
	>>>the same time, because it was a *programming* error that was
	>>>encountered at the same time in both processors, and both
	>>>processors were shut down at the same time by their respective
	>>>error handlers.

	>>	I am under the impression that for the US manned spaceflight
	>>program (to get to the moon) ,an on-board computer that was serving as a
	>>backup to the primary computer would have been performing its computations
	>>using completely different software than the primary computer. By
	>>utilizing this methodology, the same software "glitch" would not halt both
	>>systems simultaneously.  Perhaps a group of software developers could be
	>>tasked with producing a version of the on-board software for Ariane in a
	>>different computer language than that used by the primary processor. The
	>>two processors, running simultaneously, would serve to check each other's
	>>results with greater independence that they apparently do now.

	>I've been told that the shuttle uses 5 computers with software developed
	>by 3 independent programming groups.  A best 2 out of 3 is used to
	>determine which software/hardware is operating properly.

---Ariane's SRI computer (for processing sensor inputs)
had a backup running an identical program.  That's why
they both experienced the same fixed-point overflow in
the same place  at the same time, with the same data.
And that's why both shut down almost simultaneously.
(As you now know, any trivial error resulted in "sudden
death".  No room to maneuver.)

   The main computer (the OBC = On-Board Computer) also
had a backup.

   That's 4 computers.

   It's all in the report.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00 ` Dr. Richard Botting
@ 1996-07-30  0:00   ` David Weller
  1996-07-30  0:00     ` Robert Dewar
  0 siblings, 1 reply; 111+ messages in thread
From: David Weller @ 1996-07-30  0:00 UTC (permalink / raw)



In article <4tjr76$ktj@news.csus.edu>,
Dr. Richard Botting <dick@silicon.csci.csusb.edu> wrote:
>Simon Bluck at University of Bristol, England asserted incorrectly:
>>>Exceptions and assertions are both used, in Ada and C/C++
>Ada has no assertions.

	Bzzt!  Wrong, but thank you for playing.  GNAT, by far the
	most popular Ada 95 compiler, supports 'pragma Assert'.

>C/C++ assertions are a debugging aid.
>

C/C++ assertions are not part of the language, but part of a separate
library (assert.h).  

That being said, assertions are indeed a useful debugging aid -- in
both Ada 95 and C :-)


-- 
    Visit the Ada 95 Booch Components Homepage: www.ocsystems.com/booch
           This is not your father's Ada -- lglwww.epfl.ch/Ada




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-29  0:00   ` Bill Angel
  1996-07-29  0:00     ` Paul_Green
@ 1996-07-30  0:00     ` Lloyd Fischer
  1996-07-30  0:00     ` Ken Garlington
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 111+ messages in thread
From: Lloyd Fischer @ 1996-07-30  0:00 UTC (permalink / raw)



Bill Angel wrote:
> 
> In article <4t9vdg$jfb@goanna.cs.rmit.edu.au>,
> ++           robin <rav@goanna.cs.rmit.edu.au> wrote:
> >In Ariane, both the active processor and the backup failed at
> >the same time, because it was a *programming* error that was
> >encountered at the same time in both processors, and both
> >processors were shut down at the same time by their respective
> >error handlers.
> 
>         I am under the impression that for the US manned spaceflight
> program (to get to the moon) ,an on-board computer that was serving as a
> backup to the primary computer would have been performing its computations
> using completely different software than the primary computer. By
> utilizing this methodology, the same software "glitch" would not halt both
> systems simultaneously.  Perhaps a group of software developers could be
> tasked with producing a version of the on-board software for Ariane in a
> different computer language than that used by the primary processor. The
> two processors, running simultaneously, would serve to check each other's
> results with greater independence that they apparently do now.
> 
>  -- Bill Angel
A better example is the flight control system for the A320 aircraft.
From memory now: there are 4 flight control computers, each controlling
separate hardware. The computers are of two types, with different
hardware and software. The designers of each type were completely
isolated from the designers of the other. The idea is that the computers
battle for control of the plane. If one computer generates completely
erroneous controls the other three can completely overpower it. If one,
two, or three die there is no problem. I can't recall if the computers
have the power to cause the shutdown of an offender and how the handled
the 2 vs. 2 problem.



If anyone has a spec for the A320 system please pipe in. I'm out of
aerospace now and can't just run down to the library.

IMHO the ariane, 4 and 5, with two computers running the same software
is a systemic error just waiting to happen. 
-- 
Lloyd Fischer  lloyd@dvcorp.com  fischer@crocker.com (home)  
DataViews Corporation                  |
47 Pleasant St, Northampton, MA 01060  | from disclaimers import
standard
Voice 413 586-4144 Fax 413 586-3805    |




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-29  0:00   ` Martin Tom Brown
@ 1996-07-30  0:00     ` John McCabe
  1996-07-31  0:00       ` Greg Bond
  0 siblings, 1 reply; 111+ messages in thread
From: John McCabe @ 1996-07-30  0:00 UTC (permalink / raw)



Martin Tom Brown <Martin@nezumi.demon.co.uk> wrote:

<..snip..>

>Yes - one can argue about the reasonableness of killing the active SRI
>when the backup has already failed. And also about the decision made
>during s/w development that out of 7 variables at risk of overflow only
>4 were fully protected, and 3 unprotected which were felt not at risk
>to keep CPU load inside bounds <80% (for Ariane 4 flight trajectories).

I find it difficult to understand why the design and development team
even considered maintaining the CPU load at <80% for this particular
case. If they requested a waiver on that margin and were refused then
obviously their prime contractor (or whatever) is to blame, but there
is no way that CPU loadings with margins of 20% should have been
enforced at the risk of mission failure.

<..snip..>

>Again you can question the wisdom of shutting down the active IRS
>when the backup is already dead - continuing with best guess data
>would have been a marginally better option at that stage. 
>The defect is traceable back to the requirements specification, and
>the assumption that single point hardware failure is more likely 
>than systematic software error in a redundant system.

True, but it certainly surprises me that there are 2 _identical_
computers running _identical_ software on a piece of equipment that is
so important to the future of the European space industry, and that
has had so much taxpayers money invested in it.

I would have thought that a triplex system using a voting mechanism
would have been used at least. I know this wold have extra cost, but
how much extra in terms of what we have seen happen with the system as
provided on flight 501. $300M (or was it pounds) worth of experimental
satellite went up with that rocket, would it have cost that much to
add one extra computer and some more software? I don't think so!

>> The problem resulted in an oversight in the design limitations and
>> differences between the Ariane 4 and Ariane 5 systems.  I think the 
>> lessons to be learned here are about the reuse of software (especially
>> design and requirements) and testing.

>The report's recommendations seem generally applicable to a large
>number of systems engineering projects with a software component.

I have been working in the European space industry for 9 years now,
and, hopefully with no disrespect to my peers, I am convinced that
there is far too little analysis of requirements performed. Around 3
years ago I took over a job that had been running for ~1.5 years, and
had produced development model software. Shortly after that I was
involved in redesigning the software in Ada (originally in C) for a
different processor. When it actually came down to finding out what
was really required, we spent months in discussions with our customer
because they really had no idea what they wanted, and had provided us
with what can only be described as a very poor requirement
specification. The problem was that this interaction with the customer
had never before taken place despite how far the software development
had gone. Even now there is the odd bit where what we are doing is
incompatible with their requirements (but that is mainly because they
changed one end of a timing requirement on an interface without
bothering to change the other end!).

I am convinced that with more effort in that area we are likely to see
a great improvement in the quality and reliability of software in
spaceborne equipment.

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` ++           robin
  1996-07-29  0:00   ` Bill Angel
  1996-07-30  0:00   ` Ken Garlington
@ 1996-07-30  0:00   ` Steve O'Neill
  1996-07-31  0:00     ` Martin Tom Brown
  1996-08-01  0:00     ` ++           robin
  1996-08-01  0:00   ` Jon S Anthony
                     ` (3 subsequent siblings)
  6 siblings, 2 replies; 111+ messages in thread
From: Steve O'Neill @ 1996-07-30  0:00 UTC (permalink / raw)



++ robin wrote:
> ---I think the real lessons are that
> 1. real-time programming requires special expertise.

Agreed wholeheartedly

> 2. the choice of language is suspect.  A better-established
>    language such as PL/I -- specifically designed for
>    real-time programming -- with robust compilers, and
>    with its base of experienced programming
>    staff could well have prevented this disaster.

I disagree completely!  The language was not the problem the design decisions in how the language 
was used were.  Ada is completely capable the realm of real-time programming, has robust 
compilers and tools, and has quite a few experienced software engineers capable of implementing 
just about any requirements thrown their way (been there, done that).  

Had the designers of the system allowed the implementors to use Ada exception mechanisms fully 
and properly they could have localized the failure to, at worst, the alignment function (which 
was not necessary at the time of the failure anyway) without shutting down the entire device.  
Instead, as is common practice in the safety-critical world, local exception handlers are 
frequently banned and a global 'shut it all down' handler is the only stop gap measure.  
Unbelievably the rationale for disallowing local handlers is because they make it difficult to 
verify complete code coverage since they are only executed in the case of exceptional conditions 
(i.e. given the expected data (Ariane 4 profile) the handlers are not executed and therefore we 
can't prove that all of our code has been exercised at least once). I find this logic suspect in 
the extreme!  As somebody once said "expect the unexpected". In addition to trying for fault 
avoidance through analysis we should also be planning for fault resiliency in the presence of 
reality.

You're other conclusions are right on target though - you should never shut a system down 
(unless its presence is impacting system performance as in the case of babbling nodes et.al.) but 
do indicate its distress to a higher authority who then can take this into account in using the 
information provided.

-- 
Steve O'Neill                      | "No,no,no, don't tug on that!
Sanders, A Lockheed Martin Company |  You never know what it might
smoneill@sanders.lockheed.com      |  be attached to." 
(603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-29  0:00   ` Bill Angel
                       ` (3 preceding siblings ...)
  1996-07-30  0:00     ` Bob Kurtz
@ 1996-07-30  0:00     ` Nancy Mead
  1996-07-31  0:00       ` Steve O'Neill
                         ` (2 more replies)
  1996-07-30  0:00     ` Richard Shetron
  1996-08-04  0:00     ` Richard Riehle
  6 siblings, 3 replies; 111+ messages in thread
From: Nancy Mead @ 1996-07-30  0:00 UTC (permalink / raw)




In article <4tiu6e$kpm@news2.cais.com>, wtangel@cais3.cais.com (Bill Angel) writes:
|> 
|> In article <4t9vdg$jfb@goanna.cs.rmit.edu.au>,
|> ++           robin <rav@goanna.cs.rmit.edu.au> wrote:
|> >In Ariane, both the active processor and the backup failed at
|> >the same time, because it was a *programming* error that was
|> >encountered at the same time in both processors, and both
|> >processors were shut down at the same time by their respective
|> >error handlers.
|> 
|> 	I am under the impression that for the US manned spaceflight
|> program (to get to the moon) ,an on-board computer that was serving as a
|> backup to the primary computer would have been performing its computations
|> using completely different software than the primary computer. By
|> utilizing this methodology, the same software "glitch" would not halt both
|> systems simultaneously.  Perhaps a group of software developers could be
|> tasked with producing a version of the on-board software for Ariane in a
|> different computer language than that used by the primary processor. The
|> two processors, running simultaneously, would serve to check each other's
|> results with greater independence that they apparently do now.
|> 
|>  -- Bill Angel

The Space Shuttle software has 4 computers running the same software, and
a 5th running different software (same function, different development team).
I'm not sure about the Apollo software, although I think there were some 
calculations that could be done on-board as well on the primary computer.

You may recall that one of the early shuttle launches was cancelled because
of a timing difference between the 4 computers and the single computer.  This
was indeed an intermittent software error that caused the problem, and the
glitch resulted in cancellation of the launch in that particular case.  Of
course, error recovery was a lot less sophisticated in those days, and
it was probably impossible to isolate the cause of the discrepancy in real
time and proceed with the launch.  

I was not one of the developers, but I was at IBM Federal Systems HQ at the
time, and IBM FSD was one of the development organizations.  I believe
Rockwell (the prime contractor) developed the software that ran on the single
computer, but it might have been another subcontractor.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-29  0:00   ` Bill Angel
                       ` (4 preceding siblings ...)
  1996-07-30  0:00     ` Nancy Mead
@ 1996-07-30  0:00     ` Richard Shetron
  1996-07-30  0:00       ` ++           robin
  1996-08-04  0:00     ` Richard Riehle
  6 siblings, 1 reply; 111+ messages in thread
From: Richard Shetron @ 1996-07-30  0:00 UTC (permalink / raw)



In article <4tiu6e$kpm@news2.cais.com>,
Bill Angel <wtangel@cais3.cais.com> wrote:
>
>In article <4t9vdg$jfb@goanna.cs.rmit.edu.au>,
>++           robin <rav@goanna.cs.rmit.edu.au> wrote:
>>In Ariane, both the active processor and the backup failed at
>>the same time, because it was a *programming* error that was
>>encountered at the same time in both processors, and both
>>processors were shut down at the same time by their respective
>>error handlers.
>
>	I am under the impression that for the US manned spaceflight
>program (to get to the moon) ,an on-board computer that was serving as a
>backup to the primary computer would have been performing its computations
>using completely different software than the primary computer. By
>utilizing this methodology, the same software "glitch" would not halt both
>systems simultaneously.  Perhaps a group of software developers could be
>tasked with producing a version of the on-board software for Ariane in a
>different computer language than that used by the primary processor. The
>two processors, running simultaneously, would serve to check each other's
>results with greater independence that they apparently do now.

I've been told that the shuttle uses 5 computers with software developed
by 3 independent programming groups.  A best 2 out of 3 is used to
determine which software/hardware is operating properly.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-25  0:00 Simon Bluck
                   ` (4 preceding siblings ...)
  1996-07-27  0:00 ` Bill Angel
@ 1996-07-30  0:00 ` Dr. Richard Botting
  1996-07-30  0:00   ` David Weller
  5 siblings, 1 reply; 111+ messages in thread
From: Dr. Richard Botting @ 1996-07-30  0:00 UTC (permalink / raw)



Simon Bluck at University of Bristol, England asserted incorrectly:
>>Exceptions and assertions are both used, in Ada and C/C++
Ada has no assertions.
C/C++ assertions are a debugging aid.

--
dick botting     http://www.csci.csusb.edu/dick/signature.html
Disclaimer:      CSUSB may or may not agree with this message.
Copyright(1996): Copy freely but say where it came from.
	I have nothing to sell, and I'm giving it away.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00   ` Steve O'Neill
@ 1996-07-31  0:00     ` Martin Tom Brown
  1996-07-31  0:00       ` Nigel Tzeng
  1996-08-02  0:00       ` Ken Garlington
  1996-08-01  0:00     ` ++           robin
  1 sibling, 2 replies; 111+ messages in thread
From: Martin Tom Brown @ 1996-07-31  0:00 UTC (permalink / raw)



In article <31FE35BC.1A0D@sanders.lockheed.com>
           smoneill@sanders.lockheed.com "Steve O'Neill" writes:

> ++ robin wrote:

> > 2. the choice of language is suspect.  A better-established
> >    language such as PL/I -- specifically designed for
> >    real-time programming -- with robust compilers, and
> >    with its base of experienced programming
> >    staff could well have prevented this disaster.
> 
> I disagree completely!  The language was not the problem the design decisions
>  in how the language was used were.  

It goes further back than that - the requirement specifications were
seriously at fault and incomplete. It was *not* a stated requirement 
that the unit would function correctly on the Ariane 5 trajectory.

An implementation of the algorithms in any language which detects
numerical overflow would have shut down the IRS to comply with spec.

> Ada is completely capable the realm of real-time programming,
>  has robust compilers and tools, and has quite a few experienced 
> software engineers capable of implementing ...

I agree that the language is in the clear.

> Had the designers of the system allowed the implementors to use Ada exception
>  mechanisms fully 
> and properly they could have localized the failure to, at worst, the alignment
>  function (which 
> was not necessary at the time of the failure anyway) without shutting down the
>  entire device.  

This was what surprised me - coming from an environment (not safety critical)
where continued function even if degraded is preferred to hard shutdown.
It seems unduly perverse to guarantee total system failure once an 
untrapped exception occurs. Is it really safer to blow the thing out of
the sky than inject its payload into an inaccurate orbit?

After all the hardware failsafe *will* destroy it automatically 
if the trajectory deviates sufficiently - as happened when the IRS
started feeding the navigation computer diagnostic bit patterns as data.

> Instead, as is common practice in the safety-critical world, local exception
>  handlers are 
> frequently banned and a global 'shut it all down' handler is the only stop gap
>  measure.  

This is an interesting insight.

> Unbelievably the rationale for disallowing local handlers is because they make
>  it difficult to 
> verify complete code coverage since they are only executed in the case of
>  exceptional conditions 

I can see there is a point there, OTOH perhaps there is something wrong
with a test philosophy that don't attempt to push the envelope of valid
data to find what happens if ...
Designing test data to execute each and every path is part of the game.

Regards,
-- 
Martin Brown  <martin@nezumi.demon.co.uk>     __                CIS: 71651,470
Scientific Software Consultancy             /^,,)__/




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00     ` Nancy Mead
  1996-07-31  0:00       ` Steve O'Neill
@ 1996-07-31  0:00       ` Tucker Taft
  1996-08-01  0:00       ` root
  2 siblings, 0 replies; 111+ messages in thread
From: Tucker Taft @ 1996-07-31  0:00 UTC (permalink / raw)



Nancy Mead (nrm@sei.cmu.edu) wrote:

: ...
: The Space Shuttle software has 4 computers running the same software, and
: a 5th running different software (same function, different development team).

This mode is only used during critical phases of a mission.  In other
phases, the "backup" computer may be used for independent activities.

: I'm not sure about the Apollo software, although I think there were some 
: calculations that could be done on-board as well on the primary computer.

: You may recall that one of the early shuttle launches was cancelled because
: of a timing difference between the 4 computers and the single computer.  

Not really.  It was a timing difference among the 4 computers running
identical software.  The timing difference was detected by the 5th ("backup")
computer, and it ordered the shut-down.

: ... This
: was indeed an intermittent software error that caused the problem, and the
: glitch resulted in cancellation of the launch in that particular case.  Of
: course, error recovery was a lot less sophisticated in those days, and
: it was probably impossible to isolate the cause of the discrepancy in real
: time and proceed with the launch.  

It was not trivial to determine the cause, and it is highly unlikely
it could have been determined in "real time," even using more "sophisticated"
error recovery.  The problem related to the fact that the computers were
started almost exactly on a clock tick, and 2 of the computers got
one time, and the other 2 got a time one tick later.

: I was not one of the developers, but I was at IBM Federal Systems HQ at the
: time, and IBM FSD was one of the development organizations.  I believe
: Rockwell (the prime contractor) developed the software that ran on the single
: computer, but it might have been another subcontractor.

The backup flight software was developed by Intermetrics, as a subcontractor
to Rockwell.

For what it is worth, the Space Shuttle software is developed in the
language Hal/S, which was also developed by Intermetrics ;-).

--
-Tucker Taft   stt@inmet.com   http://www.inmet.com/~stt/
Intermetrics, Inc.  Cambridge, MA  USA




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00     ` Nancy Mead
@ 1996-07-31  0:00       ` Steve O'Neill
  1996-07-31  0:00       ` Tucker Taft
  1996-08-01  0:00       ` root
  2 siblings, 0 replies; 111+ messages in thread
From: Steve O'Neill @ 1996-07-31  0:00 UTC (permalink / raw)



Nancy Mead wrote:
> I was not one of the developers, but I was at IBM Federal Systems HQ at the
> time, and IBM FSD was one of the development organizations.  I believe
> Rockwell (the prime contractor) developed the software that ran on the single
> computer, but it might have been another subcontractor.

I believe that the BFS software was written by Intermetrics.

-- 
Steve O'Neill                      | "No,no,no, don't tug on that!
Sanders, A Lockheed Martin Company |  You never know what it might
smoneill@sanders.lockheed.com      |  be attached to." 
(603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00     ` John McCabe
@ 1996-07-31  0:00       ` Greg Bond
  1996-08-03  0:00         ` John McCabe
  0 siblings, 1 reply; 111+ messages in thread
From: Greg Bond @ 1996-07-31  0:00 UTC (permalink / raw)



John McCabe wrote:
> snip...
>
> I find it difficult to understand why the design and development team
> even considered maintaining the CPU load at <80% for this particular
> case. If they requested a waiver on that margin and were refused then
> obviously their prime contractor (or whatever) is to blame, but there
> is no way that CPU loadings with margins of 20% should have been
> enforced at the risk of mission failure.
> 
> <..snip..>

Correct me if I'm wrong, but doesn't a lower CPU utilization help ensure
that hard deadlines will be met in exceptional copmutational
circumstances (thereby helping to prevent mission failure....)?

--
* Greg Bond                         * Dept. of Electrical Eng.  
* email: bond@ee.ubc.ca             * Univ. of British Columbia      
* voice: (604) 822 0899             * 2356 Main Mall                 
* fax:   (604) 822 5949             * Vancouver, BC              
* web: http://www.ee.ubc.ca/~bond   * Canada, V6T 1Z4




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-31  0:00     ` Martin Tom Brown
@ 1996-07-31  0:00       ` Nigel Tzeng
  1996-08-02  0:00       ` Ken Garlington
  1 sibling, 0 replies; 111+ messages in thread
From: Nigel Tzeng @ 1996-07-31  0:00 UTC (permalink / raw)



In article <838805582snz@nezumi.demon.co.uk>,
Martin Tom Brown  <Martin@nezumi.demon.co.uk> wrote:
>In article <31FE35BC.1A0D@sanders.lockheed.com>
>           smoneill@sanders.lockheed.com "Steve O'Neill" writes:

[snip]

>It goes further back than that - the requirement specifications were
>seriously at fault and incomplete. It was *not* a stated requirement 
>that the unit would function correctly on the Ariane 5 trajectory.

Well, it also states that had adequate simulations been done the fault
would have been detected fairly early in the sims.  The biggest flaw
in their simulation testing was not actually using flight s/w and
hardware during the testing (well, at least using the Engineering Test
Units for simulation).

Many compounding errors were required to create this problem.

[snip]

>This was what surprised me - coming from an environment (not safety critical)
>where continued function even if degraded is preferred to hard shutdown.
>It seems unduly perverse to guarantee total system failure once an 
>untrapped exception occurs. Is it really safer to blow the thing out of
>the sky than inject its payload into an inaccurate orbit?

>After all the hardware failsafe *will* destroy it automatically 
>if the trajectory deviates sufficiently - as happened when the IRS
>started feeding the navigation computer diagnostic bit patterns as data.

Well, that surprised me a little.  Granted that their facilites are in
the middle of nowhere I still would have expected that range safety
would have destroyed the vehicle given description of the attitude
deviation (20 degree AOA...that must have been interesting) rather
than having the breakup of the vehicle initiate the destructs...then
again it happened only 4 seconds after the nozzles were commanded to
extreme positions.

Anyone know off the cuff what the ER or WR would have done in this
case?  I'm assuming that they can see the relevant telemetry...at
least the LCC I'm working on has requirements that they can but
I'm new to the launch vehicle world.

[snip]

>Regards,
>-- 
>Martin Brown  <martin@nezumi.demon.co.uk>     __                CIS: 71651,470
>Scientific Software Consultancy             /^,,)__/

Nigel




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00   ` Robert I. Eachus
@ 1996-07-31  0:00     ` JP Thornley
  1996-08-01  0:00       ` Alan Brain
  0 siblings, 1 reply; 111+ messages in thread
From: JP Thornley @ 1996-07-31  0:00 UTC (permalink / raw)



In article: <EACHUS.96Jul30184744@spectre.mitre.org>  
eachus@spectre.mitre.org (Robert I. Eachus) writes:
>    First, I think of mission critical as a different category than
> safety critical.  In safety critical systems, fail safe is often an
> option where in mission critical systems you need to fail operational.

Hmmm, hadn't come across that distinction before, but it does seem to 
make sense in some cases.  But making fail operational a defining 
characteristic of mission critical systems seems a bit too strong - how 
many 'glass cockpit' aircraft have no backup suck and blow instruments 
to use when all the screens go blank?

> And yes, systems can be safety AND mission critical.  Those are the
> expensive ones.
> 

Again, in my terminology, the development standards for mission critical 
are wholly subsumed in those for safety-critical code, so classifying 
something both as no real effect on the software development methods 
used.

>    Having said that, this software should have been classed exactly
> that way, given the amount of miscellaneous missle parts that ended up
> scattered over the launch site, and the possibility that a guidance
> failure could put the missle anywhere in the world.
> 

As I read the report, the recommendation that "software should be 
assumed to be faulty until applying the currently accepted best practice 
methods can demonstrate that it is correct" is saying that if the system 
design is to be based on the assumption of correct software then they 
will have to build that software to safety-critical standards.  I wonder 
if they realise just how expensive that is going to be.

Phil Thornley

-- 
------------------------------------------------------------------------
| JP Thornley    EMail jpt@diphi.demon.co.uk                           |
------------------------------------------------------------------------





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` JP Thornley
                     ` (3 preceding siblings ...)
  1996-07-30  0:00   ` Robert I. Eachus
@ 1996-08-01  0:00   ` Ken Garlington
  4 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-08-01  0:00 UTC (permalink / raw)



Robert I. Eachus wrote:
> 
>    First, I think of mission critical as a different category than
> safety critical.  In safety critical systems, fail safe is often an
> option where in mission critical systems you need to fail operational.
> And yes, systems can be safety AND mission critical.  Those are the
> expensive ones.

Actually, safety-critical systems can either be fail-safe or fail-op, just
like mission critical systems. A nuclear reactor might be able to be fail safe,
but a flight control system might have to be fail op.

You can get a lot of definitions of safety critical if you work at it.
Here's AFISC SSH 1-1's definition:

"Those software operations that, if not performed, performed out-of-sequence,
or performed incorrectly could result in improper control functions (or lack
of cotnrol functions required for proper system operation) which could
directly or indirectly cause or allow a hazardous condition to exist."

"Hazardous" usually gets defined as loss of life, serious injurity, or major
property loss.

If the absense of the software function doesn't lead to a hazardous condition,
then the system can be fail-safe. If the software function must be present to
avoid a hazardous condition, then it usually has to be fail-op. However, there's
not exactly a hard and fast rule here. It depends on the system.

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-01  0:00     ` ++           robin
@ 1996-08-01  0:00       ` Ken Garlington
  1996-08-05  0:00         ` John McCabe
  1996-08-02  0:00       ` Pascal Martin @lone
  1996-08-05  0:00       ` Steve O'Neill
  2 siblings, 1 reply; 111+ messages in thread
From: Ken Garlington @ 1996-08-01  0:00 UTC (permalink / raw)



++ robin wrote:

[Editing ++robins statement to remove unnecessary words...]

> "A [...] programmer
> experienced with real time systems, would have CHALLENGED
> such a stupid requirement that the computer be shut down by the
> error-handler in the event of a fixed-point overflow.  He would
> have had it changed.
> 
> "I'd go further to say that no experienced [...] programmer
> would have shut down the system as a result of a fixed-point
> overflow.
> 
> "Furthermore, he would have included a check that the value
> did not go out of range;"

I can certainly agree that the absence of a check should have been
challenged, and that the system should have been analyzed to show
that the check was not needed.

However, the report states that the analysis was done. If you're saying
(1) that they should have done the analysis correctly, who can argue with
that? If you're saying (2) that the programmer should have ignored the analysis,
put the check in anyway, and said "Screw you, guys, I'm a [...] programmer
and I know better than you!", well, you might have a point. However, such
an action certainly requires a very brave (or very foolhardy) [...] programmer.

> 
>         >(which
>         >was not necessary at the time of the failure anyway)
> 
> ---what?  The OBC was using the attitude information to
> direct the nozzles.  It was their [the nozzles] sudden change
> that caused the space vehicle to break up, thereby forcing
> the vehicle to self-destruct automatically [that sudden
> change was the result of the OBC interpreting the error
> readout from the shut-down SRI computer as attitide data.]

I had to read the report a few times to catch the jist of this myself.
Although it's confusing, it sounds like the alignment function was
running, but was not providing correction vectors to the SRI output.
Therefore, the alignment function did not have to be executing at all.

This, to me, is very strange. From my limited experience with IRS
alignments, the alignment function shuts down as soon as the IRS moves (or the
alignment is complete), since you can't usually align a moving IRS
without a fixed reference input (like you get on shipboard systems).
The rationale in the report, that keeping the alignment system running
made it easier to realign the SRI close to liftoff, makes some sense.
But it's still an unusual approach.

Translating additional remarks into format appropriate to the
selected newsgroups:

> "This project might well have been written in [a real-time programming
> language], which
> has excellent real-time facilities, including error
> handling, error simulation and validation facilities.
> The language has robust compilers, and experts with many
> years of [real-time language] programming experience.

Of course, just because there are experts in a particular language,
this doesn't mean that all systems written in that language are
written by experts. I don't know the average experience level for
the Ariane V, so there's no way to know in this case...

> "As to [language] facilities, I refer to the statement
> [called SIGNAL in PL/I and RAISE in Ada],
> with which given conditions (errors such as fixed-point
> overflow) can be signalled as if the condition (error)
> actually occurred.
> 
> "This alone would have showed up the deficiency of the
> overall design (that the system would shut itself down for
> fixed-point overflow)."

Of course, the programmer (and tester) must employ these facilities
for them to be useful. If the programmer fails to use these facilities,
or defeats these facilities
(e.g., suppressing the built-in fixed-point overflow checks in Ada),
there's precious little the language can do.

As you noted earlier, the way in which the system handles the exception
is also important. In this case, clearly the exception was handled
inappropriately given the Ariane V environment.

It is also interesting to note one subtle distinction: The designers knew
that fixed-point overflow would shut the system down. They just didn't
believe the specific calculations could overflow. I suspect that, if a
SIGNAL or RAISE statement were inserted in the code, these same engineers
would say, "But you can't reach that statement in practice!" and so their
effect on the analysis might have been irrelevant.

> 
>         >(i.e. given the expected data (Ariane 4 profile)
>         >the handlers are not executed and therefore we
>         >can't prove that all of our code has been
>         >exercised at least once).
> 
> ---But they can be, and shown to be, in [a real-time language] -- the language
> with the right tools -- with the [SIGNAL, RAISE, etc.] statement.  That
> statement leaves an indisputable footprint!

Again, this depends upon the programmer using the statement for it to be
an "indisputable footprint." Some languages, such as Ada, attempt to overcome
this issue by _also_ inserting implicit exception raises for certain events.
However, if the programmer turns these features off, or fails to handle them,
their effect can be blunted.

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-01  0:00       ` root
@ 1996-08-01  0:00         ` Tucker Taft
  0 siblings, 0 replies; 111+ messages in thread
From: Tucker Taft @ 1996-08-01  0:00 UTC (permalink / raw)



Chris Morgan (chris.morgan@baesma.co.uk) wrote:
: In article <DvEw7C.2K8.0.-s@inmet.camb.inmet.com> 
: stt@henning.camb.inmet.com (Tucker Taft) writes:

: [BIG SNIP]

:    For what it is worth, the Space Shuttle software is developed in the
:    language Hal/S, which was also developed by Intermetrics ;-).

: So what's that like then? Please give us your thoughts on comparisons
: between it and Ada95 (a newish language I think you know something of
: ;-)

Hal/S is also a Pascal-based language.  It had prioritized 
multi-threading and synchronization primitives.  It also
had built-in support for matrix arithmetic, since this is used
heavily in navigation computations.  However, like Pascal, it
lacked any notion of "abstract data type," so it was not particularly
extensible via user-defined abstractions.  It was designed about
10 years before (the first) Ada.

: Chris Morgan

: chris.morgan@baesema.co.uk

-Tucker Taft   stt@inmet.com   http://www.inmet.com/~stt/
Intermetrics, Inc.  Cambridge, MA  USA




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00   ` Steve O'Neill
  1996-07-31  0:00     ` Martin Tom Brown
@ 1996-08-01  0:00     ` ++           robin
  1996-08-01  0:00       ` Ken Garlington
                         ` (2 more replies)
  1 sibling, 3 replies; 111+ messages in thread
From: ++           robin @ 1996-08-01  0:00 UTC (permalink / raw)



	Steve O'Neill <smoneill@sanders.lockheed.com> writes:

	>++ robin wrote:
	>> ---I think the real lessons are that
	>> 1. real-time programming requires special expertise.

	>Agreed wholeheartedly

	>> 2. the choice of language is suspect.  A better-established
	>>    language such as PL/I -- specifically designed for
	>>    real-time programming -- with robust compilers, and
	>>    with its base of experienced programming
	>>    staff could well have prevented this disaster.

	>I disagree completely!  The language was not the
	>problem the design decisions in how the language 
	>was used were.

---The choice of language is indeed very relevant.
What I wrote in an earlier posting on this topic is highly
apt:

"A PL/I programmer
experienced with real time systems, would have CHALLENGED
such a stupid requirement that the computer be shut down by the
error-handler in the event of a fixed-point overflow.  He would
have had it changed.

"I'd go further to say that no experienced PL/I programmer
would have shut down the system as a result of a fixed-point
overflow.

"Furthermore, he would have included a check that the value
did not go out of range;"

	>Ada is completely capable the realm [sic]
	>of real-time programming, has robust 
	>compilers and tools, and has quite a few experienced
	>software engineers capable of implementing 
	>just about any requirements thrown their way (been there, done that).  

	>Had the designers of the system allowed the
	>implementors to use Ada exception mechanisms fully 
	>and properly they could have localized the failure
	>to, at worst, the alignment function

---But all it needed was a check that the value was in range.
Such checks had been included on other similar conversions in
the vicinity!

	>(which 
	>was not necessary at the time of the failure anyway)

---what?  The OBC was using the attitude information to
direct the nozzles.  It was their [the nozzles] sudden change
that caused the space vehicle to break up, thereby forcing
the vehicle to self-destruct automatically [that sudden
change was the result of the OBC interpreting the error
readout from the shut-down SRI computer as attitide data.]

	> without shutting down the entire device.  
	>Instead, as is common practice in the safety-
	>critical world, local exception handlers are 
	>frequently banned and a global 'shut it all
	>down' handler is the only stop gap measure.  
	>Unbelievably the rationale for disallowing local
	>handlers is because they make it difficult to 
	>verify complete code coverage since they are
	>only executed in the case of exceptional conditions 

---As I wrote in an earlier post:

"This project might well have been written in PL/I, which
has excellent real-time facilities, including error
handling, error simulation and validation facilities.
The language has robust compilers, and experts with many
years of PL/I programming experience.

"As to PL/I facilities, I refer to the SIGNAL statement,
with which given conditions (errors such as fixed-point
overflow) can be signalled as if the condition (error)
actually occurred.

"This alone would have showed up the deficiency of the
overall design (that the system would shut itself down for
fixed-point overflow)."

	>(i.e. given the expected data (Ariane 4 profile)
	>the handlers are not executed and therefore we 
	>can't prove that all of our code has been
	>exercised at least once).

---But they can be, and shown to be, in PL/I -- the language
with the right tools -- with the SIGNAL statement.  That
statement leaves an indisputable footprint!

	>I find this logic suspect in 
	>the extreme!  As somebody once said "expect the
	>unexpected".  In addition to trying for fault 
	>avoidance through analysis we should also be
	>planning for fault resiliency in the presence of 
	>reality.

---Exactly what I wrote in an earlier posting.

	>You're other conclusions are right on target
	>though - you should never shut a system down 
	>(unless its presence is impacting system performance
	>as in the case of babbling nodes et.al.) but 
	>do indicate its distress to a higher authority
	>who then can take this into account in using the 
	>information provided.

	>Steve O'Neill                      | "No,no,no, don't tug on that!
	>Sanders, A Lockheed Martin Company |  You never know what it might
	>smoneill@sanders.lockheed.com      |  be attached to." 




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-31  0:00     ` JP Thornley
@ 1996-08-01  0:00       ` Alan Brain
  1996-08-02  0:00         ` JP Thornley
  0 siblings, 1 reply; 111+ messages in thread
From: Alan Brain @ 1996-08-01  0:00 UTC (permalink / raw)



JP Thornley <jpt@diphi.demon.co.uk> wrote:

>As I read the report, the recommendation that "software should be 
>assumed to be faulty until applying the currently accepted best practice 
>methods can demonstrate that it is correct" is saying that if the system 
>design is to be based on the assumption of correct software then they 
>will have to build that software to safety-critical standards.  I wonder 
>if they realise just how expensive that is going to be.

Umm. It appears I may have a small but critical difference of opinion here. IMHO 
safety-critical software _in particular_ should be assumed to be faulty, (perhaps) 
_even though_ shown to be correct.

To make an analogy, on one side you have "guarenteed impenetrable" armour plate, 
surrounding a fragile crystal glass. On the other, you have ballistic gelatine. I 
prefer the latter, as it keeps on working sorta, kinda, even though your basic 
assumptions re Immunity to Murphy may be incorrect.

I've seen error-_tolerance_ work very well in practice. The biggest problem is 
finding the bugs that exist, because the darn thing still works! Only careful 
examination of error logs reveals you're running at 5% efficiency, and encountering 
200 Software Detected Errors per second ( Real figures by the way ).
 





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` ++           robin
                     ` (2 preceding siblings ...)
  1996-07-30  0:00   ` Steve O'Neill
@ 1996-08-01  0:00   ` Jon S Anthony
  1996-08-02  0:00   ` James Kanze US/ESC 60/3/141 #40763
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 111+ messages in thread
From: Jon S Anthony @ 1996-08-01  0:00 UTC (permalink / raw)



In article <4totv7$o9f@goanna.cs.rmit.edu.au> rav@goanna.cs.rmit.edu.au (++           robin) writes:

> ---The choice of language is indeed very relevant.
> What I wrote in an earlier posting on this topic is highly
> apt:

Not in this particular case.  It could have been written in anything and
it would not have made a difference (assuming it was "correctly" written,
i.e., conforming to spec.

> 	>Had the designers of the system allowed the
> 	>implementors to use Ada exception mechanisms fully 
> 	>and properly they could have localized the failure
> 	>to, at worst, the alignment function
> 
> ---But all it needed was a check that the value was in range.
> Such checks had been included on other similar conversions in
> the vicinity!

Irrelevant.  The point is that the requirements stated that the program
was to proceed as it did.  As several have pointed out this was not a
"programming" error.


> 	>(which 
> 	>was not necessary at the time of the failure anyway)
> 
> ---what?  The OBC was using the attitude information to
> direct the nozzles.  It was their [the nozzles] sudden change

The point is the particular system in question was only relevant _prior_
to launch.  Since it was clearly after launch that the failure happened
it should have been irrelevant.


> "This project might well have been written in PL/I, which

First, PL/I has nothing "extra" here at all.  Second, if the thing had
been written in PL/I and it had been in conformance with the
requirements, the thing would have failed.  Of course, you could claim
that were it written in PL/I it would not likely be in conformance and
then it might not have failed.  Shrug.

/Jon

-- 
Jon Anthony
Organon Motives, Inc.
1 Williston Road, Suite 4
Belmont, MA 02178

617.484.3383
jsa@organon.com





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00     ` Nancy Mead
  1996-07-31  0:00       ` Steve O'Neill
  1996-07-31  0:00       ` Tucker Taft
@ 1996-08-01  0:00       ` root
  1996-08-01  0:00         ` Tucker Taft
  2 siblings, 1 reply; 111+ messages in thread
From: root @ 1996-08-01  0:00 UTC (permalink / raw)



In article <DvEw7C.2K8.0.-s@inmet.camb.inmet.com> stt@henning.camb.inmet.com (Tucker Taft) writes:

[BIG SNIP]

   For what it is worth, the Space Shuttle software is developed in the
   language Hal/S, which was also developed by Intermetrics ;-).

So what's that like then? Please give us your thoughts on comparisons
between it and Ada95 (a newish language I think you know something of
;-)

Chris Morgan

chris.morgan@baesema.co.uk




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-01  0:00       ` Alan Brain
@ 1996-08-02  0:00         ` JP Thornley
  0 siblings, 0 replies; 111+ messages in thread
From: JP Thornley @ 1996-08-02  0:00 UTC (permalink / raw)



Alan Brain <aebrain@dynamite.com.au> writes:
> Umm. It appears I may have a small but critical difference of opinion 
here. IMHO 
> safety-critical software _in particular_ should be assumed to be 
faulty, (perhaps) 
> _even though_ shown to be correct.
> 

We definitely have a difference of opinion here - a software component 
of a system is classified as safety-critical if failure of that 
component *alone* creates a significant risk of the system suffering or 
causing a catastrophic accident.  If the system is designed so that the 
risk only becomes significant when both this software component and some 
other component of the system (wholly independent of this software 
component) fail, then the software is not safety-critical.

So, assuming that the software is faulty (which I take to mean 'can be 
expected to suffer a hazardous failure') results in an assumption that 
the catastrophic accident *will occur*.

My favorite example at the moment is the Flight Control System on the 
Boeing 777 - running on three separate and diverse boxes (I think that 
one is a 68K, another is a 486 and I can't remember the third) but all 
programmed from the same Ada source.  No single box is safety-critical, 
as there are two wholly independent back-ups, but the software clearly 
is.  An assumption that this software is faulty must lead to the 
conclusion that the plane should never be certified.

Clearly this isn't the case, and the software must have been created 
using a rigorous process that gives adequate assurance that it will not 
suffer a hazardous failure (and that's what I think the report means 
when it talks about "applying the currently accepted best practice 
methods" in order to "demonstrate that it is correct" ).

Phil Thornley

-- 
------------------------------------------------------------------------
| JP Thornley    EMail jpt@diphi.demon.co.uk                           |
------------------------------------------------------------------------





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-01  0:00     ` ++           robin
  1996-08-01  0:00       ` Ken Garlington
@ 1996-08-02  0:00       ` Pascal Martin @lone
  1996-08-03  0:00         ` Dr. Richard Botting
  1996-08-06  0:00         ` ++           robin
  1996-08-05  0:00       ` Steve O'Neill
  2 siblings, 2 replies; 111+ messages in thread
From: Pascal Martin @lone @ 1996-08-02  0:00 UTC (permalink / raw)




In article <4totv7$o9f@goanna.cs.rmit.edu.au>,
   rav@goanna.cs.rmit.edu.au (++           robin) writes:
>
> "Furthermore, he would have included a check that the value
> did not go out of range;"

This comment is totally misplaced: an Ada exception **is** a check.

Someone noted that shutdowning the system before it takes off was
probably a sound decision. Shutdown on board of a flying system looks
like the worst possible choice (could going **down** in a **flying**
system be a good choice, anyway ? :-).

I have seen systems which ignored suspect values, and send the latest
trusted result again and again instead, on the basis that the correct
result was probably close to this one. When you have nothing to work
with, this is perhaps the best gamble you can make. After all, rocket
science has a lot in common with gambling, looks like :-)

> "This project might well have been written in PL/I, which
> has excellent real-time facilities, including error
> handling, error simulation and validation facilities.
> The language has robust compilers, and experts with many
> years of PL/I programming experience.

Which PL/I compilers are available today on Sparc/Solaris, HP700/HPUX,
Windows/Intel and PowerPC/* ? I thought the language was dead since
the demise of IBM mainframes and the death of (the beloved) Multics.

I also though the PL/I experts have so many years of programming
experience that they are now.. retired.

PL/I had everything, except reserved words. The infamous "if if then then".

>"As to PL/I facilities, I refer to the SIGNAL statement,
>with which given conditions (errors such as fixed-point
>overflow) can be signalled as if the condition (error)
>actually occurred.

Sounds like the author never opened an Ada reference manual..

Pascal.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-30  0:00   ` Ken Garlington
@ 1996-08-02  0:00     ` Craig P. Beyers
  0 siblings, 0 replies; 111+ messages in thread
From: Craig P. Beyers @ 1996-08-02  0:00 UTC (permalink / raw)



Ken Garlington wrote:
> 
> Actually, the amount of communication between a primary and a
> backup system is another tough system problem. We went through
> this on the F-16. In general, the backup shouldn't trust state
> data from the primary, since this can create a common mode failure.
> On the other hand, with _no_ state data, the backup may be unable
> to take over from the primary. Add to this the desire to keep the
> backup software identical to the primary, to reduce the amount of
> unique software to analyze and test, and it's a non-trivial thought
> process.Clearly (from the report at least) the two IRS's were intended to 
provide redundant position capability. The back-up IRS is there to 
reduce the risk of hardware failure. But the Ariane folks missed the 
problem of software failure and left the bird without any backup. It's 
interesting to note, too, that the back-up IRS failed first for the same 
reason the primary failed, leaving the rocket nowhere to turn (pun 
intended). Worse, it appears that no one anticipated both IRS's failing 
concurrently, so there's no provision in the s/w to at least center the 
nozzles and at least attempt to get the bird up higher and further 
out over the water--and thus safer--before destroying it. To me, the 
provision for the Ariane 4 re-start fix (the 50-sec. alignment function) 
sounds like an "easy" fix that did not receive the proper study. Worse, 
it doesn't apply to the Ariane 5. Pretty expensive s/w error? Of course 
not--it's not a s/w error by a set of errors in decisions, since it 
appears that the s/w did exactly what it was supposed to do with the 
data it received! Sort of "the operation was a success but the patient 
died" situation.

CPB
-- 
American Management Systems, Inc.
"Achieving breakthrough performance through 
the intelligent use of information technology"
703-267-7194/703-267-2222 (fax); craig_beyers@mail.amsinc.com




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-31  0:00     ` Martin Tom Brown
  1996-07-31  0:00       ` Nigel Tzeng
@ 1996-08-02  0:00       ` Ken Garlington
  1996-08-03  0:00         ` Thomas Kendelbacher
  1 sibling, 1 reply; 111+ messages in thread
From: Ken Garlington @ 1996-08-02  0:00 UTC (permalink / raw)



Martin Tom Brown wrote:

> > Instead, as is common practice in the safety-critical world, local exception
> >  handlers are
> > frequently banned and a global 'shut it all down' handler is the only stop gap
> >  measure.
> 
> This is an interesting insight.

At least in my experience, the statement is true but incomplete. One approach to
safety-critical systems is to write the software so as to _avoid_ exceptions. For
example, based on what I've read on the SRI exception, the precision of the 16-bit 
conversion might have been set such that any 64-bit value would reliably be
convertable to the 16-bit field. Other options include saturation arithmetic or
explicit checks (vs. Ada implicit checks).

Of course, just avoiding the exception is not enough. The alternative chosen has to
be capable of meeting the mission requirements.

If the software design is judged to be sufficiently reliable, and _sufficient_
analysis is done to show that input data cannot cause an exception, then the remaining
exception possibilities are things such as hardware failures. In this case, there may
not be an adequate internal response, and shutting down _with an appropriate failure
indication_ may be the best choice, if continued operation might cause adverse system
impacts. For example, if a system fails such that it is saturating a communications
channel with garbage data, you may want the system to shut down so that other 
communications can continue.

> 
> > Unbelievably the rationale for disallowing local handlers is because they make
> >  it difficult to
> > verify complete code coverage since they are only executed in the case of
> >  exceptional conditions
> 
> I can see there is a point there, OTOH perhaps there is something wrong
> with a test philosophy that don't attempt to push the envelope of valid
> data to find what happens if ...
> Designing test data to execute each and every path is part of the game.

Again, this isn't exactly the issue, in my experience. Whether you have exception
handlers or not, you want to test (as much as practical) the full "envelope" of
data inputs -- valid and invalid. However, each time code is added to the system,
you have to test the functionality and structure of that added code. If that code
is not shown to add sufficient value, then you are diverting resoruces from analyzing
and testing improtant code to test the less-important code. If you can show that
an exception cannot be raised at a certain point, this is sometimes a more effective
use of resources than adding code to catch the exception. Of course, whether you
do the analysis, or add code, you have to do the engineering correctly. In the
Ariane 5 case, this didn't happen.

> 
> Regards,
> --
> Martin Brown  <martin@nezumi.demon.co.uk>     __                CIS: 71651,470
> Scientific Software Consultancy             /^,,)__/

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` ++           robin
                     ` (3 preceding siblings ...)
  1996-08-01  0:00   ` Jon S Anthony
@ 1996-08-02  0:00   ` James Kanze US/ESC 60/3/141 #40763
  1996-08-06  0:00   ` Stefan 'Stetson' Skoglund
  1996-08-06  0:00   ` Robert I. Eachus
  6 siblings, 0 replies; 111+ messages in thread
From: James Kanze US/ESC 60/3/141 #40763 @ 1996-08-02  0:00 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 931 bytes --]


In article <DvHuJq.7uJ@thomsoft.com> pmartin@alsys.com (Pascal Martin
@lone) writes:

|> Someone noted that shutdowning the system before it takes off was
|> probably a sound decision. Shutdown on board of a flying system looks
|> like the worst possible choice (could going **down** in a **flying**
|> system be a good choice, anyway ? :-).

Often it is.  Normally, you should have a back-up system; if you have
some reason to doubt the correctness of your answers, it might be better
to turn things over to the back-up system.  (Note that in the end, the
final back-up system may often be some sort of manual control.)
-- 
James Kanze         Tel.: (+33) 88 14 49 00        email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Conseils, �tudes et r�alisations en logiciel orient� objet --
                -- A la recherche d'une activit� dans une region francophone





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-31  0:00       ` Greg Bond
@ 1996-08-03  0:00         ` John McCabe
  0 siblings, 0 replies; 111+ messages in thread
From: John McCabe @ 1996-08-03  0:00 UTC (permalink / raw)



Greg Bond <bond@ee.ubc.ca> wrote:

>John McCabe wrote:
>> snip...
>>
>> I find it difficult to understand why the design and development team
>> even considered maintaining the CPU load at <80% for this particular
>> case. If they requested a waiver on that margin and were refused then
>> obviously their prime contractor (or whatever) is to blame, but there
>> is no way that CPU loadings with margins of 20% should have been
>> enforced at the risk of mission failure.
>> 
>> <..snip..>

>Correct me if I'm wrong, but doesn't a lower CPU utilization help ensure
>that hard deadlines will be met in exceptional copmutational
>circumstances (thereby helping to prevent mission failure....)?

Yes, that is also true, but if adequate analysis and testing of
real-life situations take place, then this requirement should be
waived. On my current program we have performed these analyses to
discover exactly what can happen and when, and this takes into account
things that may happen but aren't supposed to. We can therefore ensure
(withiin reason) that the timing requirements we meet are realistic
but do not necessarily meet a (in our case) 30% margin.


Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-02  0:00       ` Ken Garlington
@ 1996-08-03  0:00         ` Thomas Kendelbacher
  0 siblings, 0 replies; 111+ messages in thread
From: Thomas Kendelbacher @ 1996-08-03  0:00 UTC (permalink / raw)



In article <3201D8EC.45E4@lmtas.lmco.com>, Ken Garlington <garlingtonke@lmtas.lmco.com> writes:
>If the software design is judged to be sufficiently reliable, and _sufficient_
>analysis is done to show that input data cannot cause an exception, then the remaining
>exception possibilities are things such as hardware failures. In this case, there may
>not be an adequate internal response, and shutting down _with an appropriate failure
>indication_ may be the best choice, if continued operation might cause adverse system
>impacts. For example, if a system fails such that it is saturating a communications
>channel with garbage data, you may want the system to shut down so that other 
>communications can continue.

Wouldn't that be "shut up" instead of "shut down", in that case?  :-D

Sorry, couldn't resist.

-- 
Thomas Kendelbacher   |   email : Thomas.Kendelbacher@erno.de
DASA RI / Abt. RIT14  |   voice : +49 421 539 5492 (working hours)
Postfach 28 61 56     |      or : +49 421 57 04 37 (any other time)
D-28361 Bremen        |     fax : +49 421 539 4529 (any time)
Germany






^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-02  0:00       ` Pascal Martin @lone
@ 1996-08-03  0:00         ` Dr. Richard Botting
  1996-08-05  0:00           ` system
  1996-08-06  0:00         ` ++           robin
  1 sibling, 1 reply; 111+ messages in thread
From: Dr. Richard Botting @ 1996-08-03  0:00 UTC (permalink / raw)



Pascal Martin @lone (pmartin@alsys.com) wrote:

: Which PL/I compilers are available today on Sparc/Solaris, HP700/HPUX,
: Windows/Intel and PowerPC/* ? I thought the language was dead since
: the demise of IBM mainframes and the death of (the beloved) Multics.

: I also though the PL/I experts have so many years of programming
: experience that they are now.. retired.

: PL/I had everything, except reserved words. The infamous "if if then then".

As one who taught PL/I by IBM trainers, while working,
when it was called NPL, and never used the language and got paid for it,
I completely agree with Pascal's comments... and will add the following opinion:
	PL/I was not designed for real time but to be a weapon
	against Algol, COBOL, FORTRAN and other tools that let
	people port their sofware to non-IBM platforms.

--
dick botting     http://www.csci.csusb.edu/dick/signature.html
Disclaimer:      CSUSB may or may not agree with this message.
Copyright(1996): Copy freely but say where it came from.
	I have nothing to sell, and I'm giving it away.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-29  0:00   ` Bill Angel
                       ` (5 preceding siblings ...)
  1996-07-30  0:00     ` Richard Shetron
@ 1996-08-04  0:00     ` Richard Riehle
  1996-08-05  0:00       ` Fergus Henderson
                         ` (3 more replies)
  6 siblings, 4 replies; 111+ messages in thread
From: Richard Riehle @ 1996-08-04  0:00 UTC (permalink / raw)



On 29 Jul 1996, Bill Angel wrote:

> 	I am under the impression that for the US manned spaceflight
> program (to get to the moon) ,an on-board computer that was serving as a
> backup to the primary computer would have been performing its computations
> using completely different software than the primary computer. By
> utilizing this methodology, the same software "glitch" would not halt both
> systems simultaneously.  Perhaps a group of software developers could be
> tasked with producing a version of the on-board software for Ariane in a
> different computer language than that used by the primary processor. The
> two processors, running simultaneously, would serve to check each other's
> results with greater independence that they apparently do now.

  I have been following this thread with interest.  I am no expert on
  Ariane, but lack of expertise has not an obstacle to others who have
  posted on this topic.

  1) redundant processors

     The idea of using different processor architectures is a good
     one and often employed for other systems such as the Boeing 777.
     However, if I recall correctly, Ariane has a "rad-hard" requirement
     (right or wrong) and uses Mil-Std 1750A processors to satisfy that
     requirement. This would not permit using multiple processors of
     differing architectures.

  2) Pl/I

     a) There is no Pl/I compiler for the 1750A

     b) Ada is far more suitable for safety-sensitive software than Pl/I

     c) This failure was not a language issue. It is a management issue.
        Specifically, it is a failure of engineering management.

     d) Given the incorrect specifications against which the program was
        designed, the same failure would have occurred in Pl/I or any
        other language.


  3) Turning off the Computer

     Not always an incorrect decision in embedded computing. This time
     it clearly was.

  4) Software Reuse

     If one intends to "reuse" software, such as Ariane 4xx software in
     Ariane 5xxx, in a significantly different architecture, there is some
     virtue in extensive testing.

  5) Unchecked Conversion

     Ada practitioners have been preaching for years that this should not
     be done without substantial examination and testing. One more example
     of why unchecked_conversion is usually not a good idea. Sometimes it
     is unavoidable, I know.

  6) Exception Handling

     Anyone remember C.A.R Hoare's Turing Lecture?

  7) Ada

     This is still the best language for doing this kind of system. But
     stupid management is something no programming language can change.
     Given other engineering constraints on this project, Ada is really
     the only reasonable language to choose.


   Richard Riehle








^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-01  0:00     ` ++           robin
  1996-08-01  0:00       ` Ken Garlington
  1996-08-02  0:00       ` Pascal Martin @lone
@ 1996-08-05  0:00       ` Steve O'Neill
  1996-08-06  0:00         ` Francis Lipski
                           ` (2 more replies)
  2 siblings, 3 replies; 111+ messages in thread
From: Steve O'Neill @ 1996-08-05  0:00 UTC (permalink / raw)



++ robin wrote:
>         Steve O'Neill <smoneill@sanders.lockheed.com> writes:
>         >I disagree completely!  The language was not the
>         >problem the design decisions in how the language
>         >was used were.
> 
> ---The choice of language is indeed very relevant.
> What I wrote in an earlier posting on this topic is highly
> apt:
> 
> "A PL/I programmer
> experienced with real time systems, would have CHALLENGED
> such a stupid requirement that the computer be shut down by the
> error-handler in the event of a fixed-point overflow.  He would
> have had it changed.
> 
> "I'd go further to say that no experienced PL/I programmer
> would have shut down the system as a result of a fixed-point
> overflow.

Substitute Ada (or C or FORTRAN or Assembly) for PL/I here and you see my 
point.  It's not the language that makes the developer challange the 
ridiculous requirement to shut down it is the developer "experienced with 
real-time systems".  Just because I am programming in PL/I doesn't mean I 
am magically a better real-time developer.  As a real-time designer 
concerned with the system-wide aspects of completely shutting down any 
sensor I would question this approach regardless of the language in use. 
This has nothing to do with the fact that much of my experience is with 
Ada.

The (flawed) reasoning for why certain conversions were not protected was 
also covered in the report.  Invalid assumptions were made and we know 
what assuming does don't we (makes an ASS out of U and ME).  This was 
compounded by the requirement for 20% spare capacity.  Spare capacity for 
what we don't know.  Especially considering that the very software which 
failed didn't need to be and should not have been running at the time 
consuming some of that precious spare.

Certainly you and I would not have shut down the system but what about 
the vast majority of developers without as much experience or who thought 
that their job was to implement the requirements that they were given?

> 
> "Furthermore, he would have included a check that the value
> did not go out of range;"
> 
> ---But all it needed was a check that the value was in range.
> Such checks had been included on other similar conversions in
> the vicinity!
> 

Yes, and there was mention in the report that 'they' thought that this 
would violate that precious spare requirement.  So they set about picking 
and choosing which conversions to protect.  I find it extremely hard to 
believe that the (small) handful of instructions to do a range check 
would have been too much!  And, in hindsight, well worth it.

The issue of the OBC interpreting the 'essentially diagnostic data' as 
valid sensor data really makes me wonder.  In a system with a reasonable 
interface between the two devices this should *never* happen.  I am 
surprised that this misinterpretation didn't cause a similar overflow in 
the OBC and resulting shutdown! :(

I think that we agree in our assessment of the situation and the fact 
that these problems could have been avoided with a better overall system 
design and more extensive testing.  Essentially the same conclusions that 
the review board came to.  My only disagreement is with your _opinion_ 
that the simple choice of a different language would have saved the day. 
 And with this point I will continue to disagree. 

-- 
Steve O'Neill                      | "No,no,no, don't tug on that!
Sanders, A Lockheed Martin Company |  You never know what it might
smoneill@sanders.lockheed.com      |  be attached to." 
(603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-01  0:00       ` Ken Garlington
@ 1996-08-05  0:00         ` John McCabe
  1996-08-06  0:00           ` Ken Garlington
                             ` (2 more replies)
  0 siblings, 3 replies; 111+ messages in thread
From: John McCabe @ 1996-08-05  0:00 UTC (permalink / raw)



Ken Garlington <garlingtonke@lmtas.lmco.com> wrote:

<..snip..>

>I had to read the report a few times to catch the jist of this myself.
>Although it's confusing, it sounds like the alignment function was
>running, but was not providing correction vectors to the SRI output.
>Therefore, the alignment function did not have to be executing at all.

>This, to me, is very strange. From my limited experience with IRS
>alignments, the alignment function shuts down as soon as the IRS moves (or the
>alignment is complete), since you can't usually align a moving IRS
>without a fixed reference input (like you get on shipboard systems).
>The rationale in the report, that keeping the alignment system running
>made it easier to realign the SRI close to liftoff, makes some sense.
>But it's still an unusual approach.

If you checks through the report again, you'll notice that having the
alignment function running up to 40 seconds after lift-off was not an
applicable requirement for Ariane 5, it was left over from Ariane 4. 

Your comments on the alignment function providing a reference against
a fixed point seem perfectly valid to me also, and I cannot really
understand how they could continue to align after lift-off. It sounds
very silly to me, as I can't believe that even the Ariane 4 software
would continue to take alignment data into account after lift-off.

<..snip..>

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-04  0:00     ` Richard Riehle
  1996-08-05  0:00       ` Fergus Henderson
  1996-08-05  0:00       ` John McCabe
@ 1996-08-05  0:00       ` Nigel Tzeng
  1996-08-06  0:00         ` John McCabe
  1996-08-13  0:00       ` ++           robin
  3 siblings, 1 reply; 111+ messages in thread
From: Nigel Tzeng @ 1996-08-05  0:00 UTC (permalink / raw)



In article <Pine.GSO.3.92.960804145456.23377A-100000@nunic.nu.edu>,
Richard Riehle  <rriehle@nunic.nu.edu> wrote:
>On 29 Jul 1996, Bill Angel wrote:

[snip]

>  I have been following this thread with interest.  I am no expert on
>  Ariane, but lack of expertise has not an obstacle to others who have
>  posted on this topic.
>
>  1) redundant processors
>
>     The idea of using different processor architectures is a good
>     one and often employed for other systems such as the Boeing 777.
>     However, if I recall correctly, Ariane has a "rad-hard" requirement
>     (right or wrong) and uses Mil-Std 1750A processors to satisfy that
>     requirement. This would not permit using multiple processors of
>     differing architectures.

There are several space qual'd processors, including the 386.  We
flew one on SAMPEX and will also fly them on XTE and TRMM.  I think
the R3000 has also been flight qual'd for space (fairly sure there was
one on Clementine).  IIRC the 68K has been on a getaway special
although when I last looked it had not ever been flight qual'd.

And yeah...I'd say there's good reason to spec out rad tested components
for use on any launch vehicle or spacecraft.

(As an aside, SAMPEX did not use two different CPUs...SAMPEX was a
cheaper, faster, better NASA project that no one ever heard about.  It
was around 40 million as well and unlike Clementine isn't spinning
uselessly about.  I've worked for both NRL and NASA.  NRL Code 5100 is
as competent as NASA Code 700 in producing small inexpensive
satellites but NRL/BMDO's PR dept blows away NASA's PR dept).

[snip]

>  4) Software Reuse
>
>     If one intends to "reuse" software, such as Ariane 4xx software in
>     Ariane 5xxx, in a significantly different architecture, there is some
>     virtue in extensive testing.

Understatement.  Goddard regression tests the flight software between
releases much less between missions (ie TRMM and XTE are based on the
SAMPEX flt hw/sw) using their simulator to feed sensor data to the
actual flight s/w running on the engineering test unit boards and
later on the actual flight h/w itself during I&T.

Screw ups still happen. :)

>   Richard Riehle

Nigel





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-03  0:00         ` Dr. Richard Botting
@ 1996-08-05  0:00           ` system
  0 siblings, 0 replies; 111+ messages in thread
From: system @ 1996-08-05  0:00 UTC (permalink / raw)



 dick@silicon.csci.csusb.edu (Dr. Richard Botting) writes:
>Pascal Martin @lone (pmartin@alsys.com) wrote:
>
>: Which PL/I compilers are available today on Sparc/Solaris, HP700/HPUX,
>: Windows/Intel and PowerPC/* ? I thought the language was dead since
>: the demise of IBM mainframes and the death of (the beloved) Multics.

Well I know even less about PL/I than I do about Ada, (and I think the
origin of this thread (within the Adrian thread) was pretty tacky)
but it appears to be fairly healthy:

Windows 95, Windows NT, OS/2, VMS on both VAX and Alpha, AIX and
Digital Unix plus the traditional mainframes.

Robert
-------------------------------------------------------
**From IBM**

The new IBM PL/I for Windows+ Version 1.2 provides a PL/I application
   development environment on Windows NT and Windows 95 that is designed
   to allow you to create mission-critical, line-of-business applications
   that can run on host systems, workstations, or client/server systems
   with access to DB2(R), VSAM/SAM, and other data systems. IBM PL/I for
   Windows provides the PL/I programmer with an optimizing compiler and a
   set of high-productivity, Windows-based tools for the development of
   applications.
-------------------------------------------------------

   IBM PL/I for OS/2(R) Professional Version 1.2 builds upon the function
   in Version 1.1 and adds new enhancements, such as an improved run-time
   environment and additional data access support.
-------------------------------------------------------

   IBM introduces three new members of the PL/I family that have been
   designed with today's programmer in mind:
     * PL/I for OS/2 Personal Edition
     * PL/I for OS/2 Professional Edition
     * PL/I for OS/2 Toolkit, with Visual PL/I

-------------------------------------------------------
   IBM PL/I Set for AIX provides a PL/I application development
   environment designed to allow you to create mission critical,
   line-of-business applications that can run on host systems,
   workstations, or client/server systems with access to DB2(R),
-------------------------------------------------------
Some of the efforts we are investigating include:
     * Update MVS & VM compiler by porting function of the workstation
       compiler product to the host
     * Provide an object-oriented development environment that lets
       programmers develop PL/I applications using object-oriented
       extensions
     * Support for additional platforms such as Win32, PowerPC, and other
       UNIX environments
     * Support for open standards such as XPG and POSIX
-------------------------------------------------------
**From Digital**
VAX PL/I and DEC PL/I for OpenVMS AXP products have been transferred to
UniPrise Systems, Inc., of Irvine, California. UniPrise will continue to
offer these products under the current product names. Digital will continue
to sell and distribute the products under standard Digital license terms and
conditions. Digital will also continue to be the service provider. Current
Software Product Services users should be unaffected by this transfer;
Digital remains the primary contact for support.

The UniPrise acquisition of Digital's PL/I products will bring a renewed
focus on these compilers. Continued development and maintenance efforts will
provide investment protection for users with PL/I applications on the
OpenVMS VAX platform and allow users to move these applications to the
OpenVMS AXP platform easily.
   
-------------------------------------------------------
**From UniPrise**

   client/server systems. Whether your choice of environments is OpenVMS
   or UNIX, UniPrise has the full function Digital Alpha PL/I compiler
   for you.

Morphis@physics.niu.edu

Real Men change diapers




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-04  0:00     ` Richard Riehle
  1996-08-05  0:00       ` Fergus Henderson
@ 1996-08-05  0:00       ` John McCabe
  1996-08-05  0:00       ` Nigel Tzeng
  1996-08-13  0:00       ` ++           robin
  3 siblings, 0 replies; 111+ messages in thread
From: John McCabe @ 1996-08-05  0:00 UTC (permalink / raw)



Richard Riehle <rriehle@nunic.nu.edu> wrote:

<..snip..>

>  1) redundant processors

>     The idea of using different processor architectures is a good
>     one and often employed for other systems such as the Boeing 777.
>     However, if I recall correctly, Ariane has a "rad-hard" requirement
>     (right or wrong) and uses Mil-Std 1750A processors to satisfy that
>     requirement. This would not permit using multiple processors of
>     differing architectures.

As far as I know, from a much earlier post, Ariane 5 used 68000 series
processors. I don't see why it should have a rad-hard requirement, but
even then that should not enforce the use of MIL-STD-1750A processors,
as there are a number of alternatives e.g. 8086, RTX2010, and some
versions of the ADSP2100 are apparently rad-tolerant. Even if
MIL-STD-1750 processors were necessary, there is more than one
rad-hard version produced.

However, I don't really think that it is the processor architecture
itself that is a issue, more the system architecture as it is possible
to design completely differet systems round a single processor to do
the same job.

>  2) Pl/I

<..snip..>

>     c) This failure was not a language issue. It is a management issue.
>        Specifically, it is a failure of engineering management.

>     d) Given the incorrect specifications against which the program was
>        designed, the same failure would have occurred in Pl/I or any
>        other language.

Exactly.

>  3) Turning off the Computer

>     Not always an incorrect decision in embedded computing. This time
>     it clearly was.

It sounds to me like not enough analysis was performed to make a
reasonable judgement on whether computer shutdown was sensible in this
case. It obviously was not sensible in this case because there was a
comon-mode failure of both SRIs.

>  4) Software Reuse

>     If one intends to "reuse" software, such as Ariane 4xx software in
>     Ariane 5xxx, in a significantly different architecture, there is some
>     virtue in extensive testing.

Exactly. And even more virtue in correctly analysing and specifying
the requirements.

>  5) Unchecked Conversion

>     Ada practitioners have been preaching for years that this should not
>     be done without substantial examination and testing. One more example
>     of why unchecked_conversion is usually not a good idea. Sometimes it
>     is unavoidable, I know.

Mmm. Despite the language used in the report, I don't think it was the
Ada feature Unchecked_Conversion that was involved here, merely that a
conversion from float to integer was used. Unchecked_Conversion
wouldn't be used to convert a 64bit float into a 16bit integer by
anyone unless they were really stupid.

>  6) Exception Handling

>     Anyone remember C.A.R Hoare's Turing Lecture?

Not me. Any more information on it?

>  7) Ada

>     This is still the best language for doing this kind of system. But
>     stupid management is something no programming language can change.
>     Given other engineering constraints on this project, Ada is really
>     the only reasonable language to choose.

Or PL/I :-) Only joking!



Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-04  0:00     ` Richard Riehle
@ 1996-08-05  0:00       ` Fergus Henderson
  1996-08-05  0:00       ` John McCabe
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 111+ messages in thread
From: Fergus Henderson @ 1996-08-05  0:00 UTC (permalink / raw)



Richard Riehle <rriehle@nunic.nu.edu> writes:

>  5) Unchecked Conversion
>
>     Ada practitioners have been preaching for years that this should not
>     be done without substantial examination and testing. One more example
>     of why unchecked_conversion is usually not a good idea. Sometimes it
>     is unavoidable, I know.

I agree that in general unchecked_conversions should be avoided,
but Ariane 5 wasn't an example of unchecked_conversion going wrong,
in fact it was just the opposite, it was an example of a checked
conversion going wrong.  If the conversion had been unchecked,
then the rocket may not have crashed (indeed the problem may have
gone unnoticed).

--
Fergus Henderson <fjh@cs.mu.oz.au>   |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>   |  of excellence is a lethal habit"
PGP: finger fjh@128.250.37.3         |     -- the last words of T. S. Garp.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-05  0:00       ` Steve O'Neill
@ 1996-08-06  0:00         ` Francis Lipski
  1996-08-07  0:00           ` Martin Tom Brown
  1996-08-06  0:00         ` Frank Manning
  1996-08-13  0:00         ` ++           robin
  2 siblings, 1 reply; 111+ messages in thread
From: Francis Lipski @ 1996-08-06  0:00 UTC (permalink / raw)



In article <32065615.77C7@sanders.lockheed.com>, you write:
> ++ robin wrote:
> >         Steve O'Neill <smoneill@sanders.lockheed.com> writes:
> >         >I disagree completely!  The language was not the
> >         >problem the design decisions in how the language
> >         >was used were.
> > 
> > ---The choice of language is indeed very relevant.
> > What I wrote in an earlier posting on this topic is highly
> > apt:
> > 
> > "A PL/I programmer
> > experienced with real time systems, would have CHALLENGED
> > such a stupid requirement that the computer be shut down by the
> > error-handler in the event of a fixed-point overflow.  He would
> > have had it changed.

   Not always possible.  If you are in the minority and are unsuccessful
to argue others to your point, what do you do?  

  As a previous message in this thread had stated, what should someone do?  Say to hell with the requirements, I'm going to code what I think is correct.  You
can argue you position for only so long.  If you haven't convinced others that
your position is correct, after a reasonable time, then either you can't argue
effectively, or may your position is wrong.  My recommendations would be to
document your position and your attempts to persuade others.  Then if something
like the Ariane 5 happens, you can say, "see I told you so".  Not that thats
a big consolation after the loss of a rocket, or if peoples lives have been
lost. 
> > 
> > "I'd go further to say that no experienced PL/I programmer
> > would have shut down the system as a result of a fixed-point
> > overflow.
> 
> Substitute Ada (or C or FORTRAN or Assembly) for PL/I here and you see my 
> point.  It's not the language that makes the developer challange the 
> ridiculous requirement to shut down it is the developer "experienced with 
> real-time systems".  Just because I am programming in PL/I doesn't mean I 
> am magically a better real-time developer.  As a real-time designer 
> concerned with the system-wide aspects of completely shutting down any 
> sensor I would question this approach regardless of the language in use. 
> This has nothing to do with the fact that much of my experience is with 
> Ada.
> 
> The (flawed) reasoning for why certain conversions were not protected was 
> also covered in the report.  Invalid assumptions were made and we know 
> what assuming does don't we (makes an ASS out of U and ME).  This was 
> compounded by the requirement for 20% spare capacity.  Spare capacity for 
> what we don't know.  Especially considering that the very software which 
> failed didn't need to be and should not have been running at the time 
> consuming some of that precious spare.
> 
> Certainly you and I would not have shut down the system but what about 
> the vast majority of developers without as much experience or who thought 
> that their job was to implement the requirements that they were given?
> 

  The report states that the rationale was based on the "culture within the
Ariane programme of only addressing random hardware failures.  From this point of view exception - or error- handling mechanisms are designed for a random
hardware failure which can quite rationally be handled by a backup system"

  If all conversions and other possible overflow conditions are protected,
and then an overflow occurs, what action should be taken?  The system has
just had a random hardware failure.  Continue to operate with known bad 
hardware?  In the case of an overflow, set to max value, continue and
hope for the best?  

  While clearly the design, in this case, did not protect itself sufficiently,
and compounded errors by not handling the case of a simultaneous failure of
both processors, what action should be taken on an overflow if not to shut
down.  With flight controls or inertial systems, partitioning into tasks and
then restarting the offending task is not an option.  It would take entirely
too long to restart the task to be able to effectively recover.

  Regarding the spare requirements.  The answer as to why to have spare time
is to ensure that all hard deadlines are met and to allow growth for future
versions of SW.   Allowing room for growth is necessary in development programs
however, the requirement is usually never relaxed as more functionality is
added.  That is another story.  However, it is necessary to ensure sufficient
time is available to complete all the processing within the allotted time.
The execution time of the software is at best a statistical problem, at least
the hardware times can be statistical.  If the SW is always measured as a worse
case time, and all these are added together can can not allow this time
to meet or exceed the allowable time, given the statistical nature of the HW.
So how much spare time should be allotted?  If 20% is unrealistic, what
number should be used, 10%, 1%, 0.001%?  

> > 
> > "Furthermore, he would have included a check that the value
> > did not go out of range;"
> > 
> > ---But all it needed was a check that the value was in range.
> > Such checks had been included on other similar conversions in
> > the vicinity!
> > 
> 
> Yes, and there was mention in the report that 'they' thought that this 
> would violate that precious spare requirement.  So they set about picking 
> and choosing which conversions to protect.  I find it extremely hard to 
> believe that the (small) handful of instructions to do a range check 
> would have been too much!  And, in hindsight, well worth it.
> 
> The issue of the OBC interpreting the 'essentially diagnostic data' as 
> valid sensor data really makes me wonder.  In a system with a reasonable 
> interface between the two devices this should *never* happen.  I am 
> surprised that this misinterpretation didn't cause a similar overflow in 
> the OBC and resulting shutdown! :(

 I was also amazed by the poor design of the interface that didn't detect
this problem.  Probably given enough time, some form of error would
have occurred resulting in the OBC shutting down.

> 
> I think that we agree in our assessment of the situation and the fact 
> that these problems could have been avoided with a better overall system 
> design and more extensive testing.  Essentially the same conclusions that 
> the review board came to.  My only disagreement is with your _opinion_ 
> that the simple choice of a different language would have saved the day. 
>  And with this point I will continue to disagree. 
> 
> -- 
> Steve O'Neill                      | "No,no,no, don't tug on that!
> Sanders, A Lockheed Martin Company |  You never know what it might
> smoneill@sanders.lockheed.com      |  be attached to." 
> (603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai
----------------------------------------------------------------------
Standard Disclaimer applies.
      Frank Lipski   lipski@fs1.mar.lmco.com          770-494-8322
"The most exciting phrase to hear in science, the one that heralds new
discoveries, is not "Eureka!" ("I found it!") but rather "hmm....that's
funny..."  --   Isaac Asimov
---------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-05  0:00       ` Nigel Tzeng
@ 1996-08-06  0:00         ` John McCabe
  0 siblings, 0 replies; 111+ messages in thread
From: John McCabe @ 1996-08-06  0:00 UTC (permalink / raw)



nigel@access1.digex.net (Nigel Tzeng) wrote:

<..snip..>

>There are several space qual'd processors, including the 386.  We
>flew one on SAMPEX and will also fly them on XTE and TRMM.  I think
>the R3000 has also been flight qual'd for space (fairly sure there was
>one on Clementine).  IIRC the 68K has been on a getaway special
>although when I last looked it had not ever been flight qual'd.

Just a comment, but a US space qualified processor isn't necessarily
ESA space qualified. Apparently true ESA qualification requires more
extensive testing than the US requirement. Seems a bit stupid to me
personally but if ESA just accepted US space qualification testing and
methods then there'd be quite a few of them out of a job.

As a matter of interest, which 386 did you use? Was it the repackaged
SEI one?

<..snip..>

>[snip]

>>  4) Software Reuse
>>
>>     If one intends to "reuse" software, such as Ariane 4xx software in
>>     Ariane 5xxx, in a significantly different architecture, there is some
>>     virtue in extensive testing.

>Understatement.  Goddard regression tests the flight software between
>releases much less between missions (ie TRMM and XTE are based on the
>SAMPEX flt hw/sw) using their simulator to feed sensor data to the
>actual flight s/w running on the engineering test unit boards and
>later on the actual flight h/w itself during I&T.

>Screw ups still happen. :)

Yes, but if the SRIs on ARiane 5 had been tested (let alone regression
tested) using a reasonable simulator then it probably wouldn't have
been destroyed in the first place!

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-05  0:00         ` John McCabe
  1996-08-06  0:00           ` Ken Garlington
  1996-08-06  0:00           ` Mark van Walraven
@ 1996-08-06  0:00           ` Ken Garlington
  2 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-08-06  0:00 UTC (permalink / raw)



FYI - This week's Aviation Week and Space Technology has an editorial on 
the Ariane 5 accident. It pretty much reflects the final report, but 
makes a few points about the management culture of the Ariane 5 team 
(more interested in selling flights than designing in safety) and the 
possible complacency of a team that had several successful designs under 
their belt. I'm not familiar enough with the real situation at 
Arianespace and their subcontractors to say that the editorial is 
correct, but it makes for interesting reading.

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-05  0:00         ` John McCabe
@ 1996-08-06  0:00           ` Ken Garlington
  1996-08-06  0:00           ` Mark van Walraven
  1996-08-06  0:00           ` Ken Garlington
  2 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-08-06  0:00 UTC (permalink / raw)



John McCabe wrote:
> 
> If you checks through the report again, you'll notice that having the
> alignment function running up to 40 seconds after lift-off was not an
> applicable requirement for Ariane 5, it was left over from Ariane 4.

Right. Notice, however, that even on the Ariane 4, the alignment function
continued to run, but _not_ apply correction vectors, prior to lift-off.
I always had the impression that when an IRS was in alignment mode, it could
not be in operational mode, and vice versa. It sounds like this IRS was
designed to have a "pseudo" alignment mode, which would continue to calculate
corrections, but only apply them when the IRS was in "true" alignment mode.
As I understood the report, this "pseudo" mode was designed to allow quick
entry and exit into the "true" alignment mode for quick updates prior to
launch.

If the Ariane 4 did not have this design feature of a "pseudo" mode, then
the alignment task would have been disabled completely as soon as the IRS
moved -- whether on the Ariane 4 or Ariane 5 -- and so the alignment
calculations would not have been running in either case. There is a lesson
learned here about adding too many "neat" features to safety-critical software.

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-05  0:00         ` John McCabe
  1996-08-06  0:00           ` Ken Garlington
@ 1996-08-06  0:00           ` Mark van Walraven
  1996-08-06  0:00           ` Ken Garlington
  2 siblings, 0 replies; 111+ messages in thread
From: Mark van Walraven @ 1996-08-06  0:00 UTC (permalink / raw)



In article <839266234.26317.0@assen.demon.co.uk>, john@assen.demon.co.uk (John McCabe) wrote:
>
>If you checks through the report again, you'll notice that having the
>alignment function running up to 40 seconds after lift-off was not an
>applicable requirement for Ariane 5, it was left over from Ariane 4. 
>
>Your comments on the alignment function providing a reference against
>a fixed point seem perfectly valid to me also, and I cannot really
>understand how they could continue to align after lift-off. It sounds
>very silly to me, as I can't believe that even the Ariane 4 software
>would continue to take alignment data into account after lift-off.

As I understand it, alignment data is computed for approx. fifty seconds after
Ariane 4 enters "flight-mode",  9 seconds before launch.  If the count-down
is halted after -9 secs, there is enough time to secure the rocket before the
alignment process stops.  Provided the count-down was held before -6 secs, it
should be possible to restart the count-down without the lengthy (45 min?)
delay needed to start alignment from cold.

I expect the alignment data was ignored after launch.  How ironic.

Regards,
Mark.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` ++           robin
                     ` (4 preceding siblings ...)
  1996-08-02  0:00   ` James Kanze US/ESC 60/3/141 #40763
@ 1996-08-06  0:00   ` Stefan 'Stetson' Skoglund
  1996-08-06  0:00   ` Robert I. Eachus
  6 siblings, 0 replies; 111+ messages in thread
From: Stefan 'Stetson' Skoglund @ 1996-08-06  0:00 UTC (permalink / raw)



Hrmm, the two onboard computers is build by SAAB in Linkoping Sweden.
It is most definitely NOT of-the-shelf.

I don't think anybody really wanted to port PL/I to that architecture.

-- 
---------------------------------------------------------------------
Stefan 'Stetson' Skoglund          I               |
sp2stes1@ida.his.se                I               |
<http://www.his.se/ida/~sp2stes1/> I         _____/0\_____
                                   I ____________O(.)O___________
H\"ogskolan i Sk\"ovde, Sverige    I      I-+-I    O    I-+-I
                                   I
                                   I      Viggen with two Rb04
---------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-07-26  0:00 ` ++           robin
                     ` (5 preceding siblings ...)
  1996-08-06  0:00   ` Stefan 'Stetson' Skoglund
@ 1996-08-06  0:00   ` Robert I. Eachus
  6 siblings, 0 replies; 111+ messages in thread
From: Robert I. Eachus @ 1996-08-06  0:00 UTC (permalink / raw)




In article <4totv7$o9f@goanna.cs.rmit.edu.au> rav@goanna.cs.rmit.edu.au (++           robin) writes:

  > "A PL/I programmer experienced with real time systems, would have
  > CHALLENGED such a stupid requirement that the computer be shut
  > down by the error-handler in the event of a fixed-point overflow.
  > He would have had it changed...

  > "This alone would have showed up the deficiency of the
  > overall design (that the system would shut itself down for
  > fixed-point overflow)."

   Substitute Ada for PL/I and you have it exactly right, except...

   Management decided not to authorize the change.  This was appealed
up the line, and the change was not approved at any level.  Only a
footnote, except that the same management approved reuse of the
computer (and software) in the Arianne 5 without a requirements
review.  That review would have shown both that this software should
not be running after launch, and that running it after launch would
result in a crash.

   The software was, and continues to be perfectly safe in the Ariane
4, although I suspect it will be changed for configuration management
reasons.  Say Boeing took the flight control software for the 747-300
and used it unchanged and without a requirements review in the
747-400.  Assume further that the first test flight crashed because of
a computer malfunction, and it turned out the cause that the crash
occured because the software decided that the (correct) center of
balance value was bogus, and either 1) substitued a "maximum" value
appropriate for the 747-300, 2) used the last "in-range" value, or 3)
shut down and printed diagnostic data.  Does which of those three
occured matter?  The cause of the crash was reusing the software
without checking that it met the new requirements.  Same here.

--

					Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-05  0:00       ` Steve O'Neill
  1996-08-06  0:00         ` Francis Lipski
@ 1996-08-06  0:00         ` Frank Manning
  1996-08-08  0:00           ` Steve O'Neill
  1996-08-09  0:00           ` JP Thornley
  1996-08-13  0:00         ` ++           robin
  2 siblings, 2 replies; 111+ messages in thread
From: Frank Manning @ 1996-08-06  0:00 UTC (permalink / raw)



In article <32065615.77C7@sanders.lockheed.com> Steve O'Neill
<smoneill@sanders.lockheed.com> writes:

>    [...]
> Just because I am programming in PL/I doesn't mean I am magically a
> better real-time developer.  
>    [...]

I agree totally.

> Certainly you and I would not have shut down the system but what about 
> the vast majority of developers without as much experience or who thought 
> that their job was to implement the requirements that they were given?
>    [...]

Over the years I periodically see Usenet debates about what the term
"software engineering" really means. It seems to me the Ariane problem
brings the question into sharp relief.

In a sci.aeronautics discussion about the Ariane problem, someone
brought up the interesting point there are two trends on a collision
course -- the increasing automation of flight vehicles, and the
plummeting number of new vehicle designs (see article
<rddDvHqG9.JIs@netcom.com> in sci.aero).

In other words, there are few engineers with more than one new vehicle
design under their belts, much less with vehicles having such
unprecedented automation requirements.

So how do you prevent another Ariane 5 problem? You really need people
who understand both aerospace engineering (another somewhat nebulous
term) and software engineering. Merely using ++robin's PL/I magic
bullet won't do it.

Like Steve says, you can't fault developers who implement the
requirements they're given, especially if the developers have no
specialized training in aerospace engineering. The converse is also
true -- what do you do about aero engineers (or mechanical/electrical/
etc.) who have little training in -- forgive me -- software
engineering? How do you bridge the gulf?

-- Frank Manning
-- Chair, AIAA-Tucson Section




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-02  0:00       ` Pascal Martin @lone
  1996-08-03  0:00         ` Dr. Richard Botting
@ 1996-08-06  0:00         ` ++           robin
  1996-08-08  0:00           ` Darius Blasband
  1 sibling, 1 reply; 111+ messages in thread
From: ++           robin @ 1996-08-06  0:00 UTC (permalink / raw)



	pmartin@alsys.com (Pascal Martin @lone) writes:

	>Which PL/I compilers are available today on Sparc/Solaris, HP700/HPUX,
	>Windows/Intel and PowerPC/* ? I thought the language was dead since
	>the demise of IBM mainframes and the death of (the beloved) Multics.

   PL/I is available on at least the following systems:

     *	IBM PC and compatibles (80x86).

        *  IBM PL/I for OS/2: is available in 3 versions:
		Personal Edition	}
		Professional Edition	} Details at the bottom of this posting
		Toolkit.		}	
		---available from IBM

        *  Liant Open PL/I, for 80x86 & Pentium running UNIX SVR3 and SVR4
		---available from Liant Software Corporation
	   	959 Concord Street
	   	Framingham, MA 01701-4613
	   	Tel. (508) 872-8700  Fax (508) 626-2221
		(their PL/I generally is available on Unix-based systems)

	* Windows NT -- available from Liant

	* Windows 95/NT -- available 29 June 1996 from IBM

     * IBM AS/400
    		--- available from IBM

     *	IBM mainframes
		--- available from IBM

     *	HP 9000 HP-UX
		---available from Liant Software Corporation (address sbove)

     *	SPARC SunOS 4.x, Solaris 2.x
		---available from Liant

     *	IBM RS/6000 AIX
		---available from Liant Software Corporation;
		---also available from IBM as PL/I Set for AIX.

     *  Data General AViiON with DG-UX
		---available from Liant.

     *  Digital Equipment Corp. on Open VMS and Alpha AXP systems
		---available from Digital Equipment Corporation.
		UniPrise also has compilers for these systems.

     *  Stratus Computer, Inc.
		---available under VOS on all Stratus computers except AX/R-S.


_____________________________________________
P.S. IBM mainframes are still being installed.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-06  0:00         ` Francis Lipski
@ 1996-08-07  0:00           ` Martin Tom Brown
  1996-08-09  0:00             ` Ken Garlington
  0 siblings, 1 reply; 111+ messages in thread
From: Martin Tom Brown @ 1996-08-07  0:00 UTC (permalink / raw)



In article <4u7fdm$e6m@morgan.vf.lmco.com>
           g1006@fs1.mar.lmco.com "Francis Lipski" writes:

>   If all conversions and other possible overflow conditions are protected,
> and then an overflow occurs, what action should be taken?  

The most obvious choice is drop back to a simple, but not necessarily
accurate primitive hardware backup system like levers and gyroscopes.
I have much more faith in the ability of mechanical and electronics
engineers to tolerance their designs for adverse conditions. When my
neck is on the line I like to see physical hardware interlocks in place.

> The system has just had a random hardware failure.  
> Continue to operate with known bad hardware?  

This decision is a hard one, but when the choice is between self-destruct 
now or flag the problem and press on and pray. I know which I'd chose.
It also depends to a large extent on the damage which could occur
if failure is delayed vs the cost to abort the mission.

> In the case of an overflow, set to max value, continue and 
> hope for the best?  

Not ideal, but neither was sending random diagnostic test data 
to the trajectory computation masquerading as IRS data packets.
With hindsight (a wonderful commodity, but in short supply)
we now know that ignoring the overflow would have been OK.

Regards,
-- 
Martin Brown  <martin@nezumi.demon.co.uk>     __                CIS: 71651,470
Scientific Software Consultancy             /^,,)__/




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
@ 1996-08-08  0:00 Marin David Condic, 407.796.8997, M/S 731-93
  1996-08-09  0:00 ` John McCabe
  0 siblings, 1 reply; 111+ messages in thread
From: Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-08-08  0:00 UTC (permalink / raw)



Francis Lipski <g1006@FS1.MAR.LMCO.COM> writes (with deletions):
>> > "A PL/I programmer
>> > experienced with real time systems, would have CHALLENGED
>> > such a stupid requirement that the computer be shut down by the
>> > error-handler in the event of a fixed-point overflow.  He would
>> > have had it changed.
>
>   Not always possible.  If you are in the minority and are unsuccessful
>to argue others to your point, what do you do?
>
    That's not always the case. Sometimes, the issue is "Either we do
    the project with runtime checks supressed or we don't do it at all
    because we don't have the CPU margin to make it work." Often what
    you do is turn off most or all of the runtime checks, then
    implement interrupt service routines to saturate math results on
    overflows, etc. and hope that will do the trick for any
    unanticipated errors.

    If they were running at 80% utilization without runtime checks,
    including the checks might have left an unacceptable risk. If they
    had run with checks in place and were at 98% utilization and hit a
    "corner case" in the software which drove them over 100%, we'd be
    able to sit here now and criticize them for failing to remove the
    checks to leave a safety margin on utilization.

    There's always tradeoffs in engineering. You have to weigh risks
    and rewards. Risk: public humiliation, billions of $ lost,
    thousands of casualties. Reward: a certificate with your name on
    it in a plastic frame. The Ariane 5 engineers have no doubt
    learned this lesson.

    With respect to the earlier poster's comments about "experienced
    PL/I programmers" I'd have to say that smacks of language bigotry.
    It would be the same sort of thing as saying "experienced German
    speaking engineers wouldn't have made such a stupid mistake. It's
    because the engineers were speaking French that the rocket went
    down."

    MDC

Marin David Condic, Senior Computer Engineer    ATT:        407.796.8997
M/S 731-96                                      Technet:    796.8997
Pratt & Whitney, GESP                           Fax:        407.796.4669
P.O. Box 109600                                 Internet:   CONDICMA@PWFL.COM
West Palm Beach, FL 33410-9600                  Internet:   CONDIC@FLINET.COM
===============================================================================
    "Some people say the rainforests must be saved because the cure for
    cancer might be there. Why aren't these same people worried that
    the scientist who would have found that cure might be aborted?"

        --  John Switzer
===============================================================================




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-06  0:00         ` ++           robin
@ 1996-08-08  0:00           ` Darius Blasband
  1996-08-10  0:00             ` dwnoon
                               ` (3 more replies)
  0 siblings, 4 replies; 111+ messages in thread
From: Darius Blasband @ 1996-08-08  0:00 UTC (permalink / raw)



++ robin wrote:
> 
>         pmartin@alsys.com (Pascal Martin @lone) writes:
> 
>         >Which PL/I compilers are available today on Sparc/Solaris, HP700/HPUX,
>         >Windows/Intel and PowerPC/* ? I thought the language was dead since
>         >the demise of IBM mainframes and the death of (the beloved) Multics.
> 
>    PL/I is available on at least the following systems:     [ Quite an impressive list of platforms deleted here ]

But basically, how many of these compilers are used for other than
historical purposes ? Migrating and maintaining existing applications ?
How often do we see a project where virtually any tool can be used
and where PL/1 happens to be chosen out of a large number of possible
choices ? The fact that there are compilers available, the fact that billions
of PL/1 lines are in production today will not hide a basic, ugly truth:
PL/1 is one of the worst language design ever. As far as I know (or rather,
as far as what I believe a good language design should be) only C++
comes close.

IMHO...

Darius




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-06  0:00         ` Frank Manning
@ 1996-08-08  0:00           ` Steve O'Neill
  1996-08-09  0:00             ` Pat Rogers
  1996-08-09  0:00           ` JP Thornley
  1 sibling, 1 reply; 111+ messages in thread
From: Steve O'Neill @ 1996-08-08  0:00 UTC (permalink / raw)



Frank Manning wrote:
> In other words, there are few engineers with more than one new vehicle
> design under their belts, much less with vehicles having such
> unprecedented automation requirements.

This a new perspective on the problem, and a serious situation indeed.

> So how do you prevent another Ariane 5 problem? You really need people
> who understand both aerospace engineering (another somewhat nebulous
> term) and software engineering. Merely using ++robin's PL/I magic
> bullet won't do it.
> 
> Like Steve says, you can't fault developers who implement the
> requirements they're given, especially if the developers have no
> specialized training in aerospace engineering. The converse is also
> true -- what do you do about aero engineers (or mechanical/electrical/
> etc.) who have little training in -- forgive me -- software
> engineering? How do you bridge the gulf?

Easy, hire me, I've got a BS in Aero and about 15 years experience in the 
analysis and development of related real-time systems.

Sorry, I couldn't resist. ;)

But the fact remains that cross-training is critical for (especially) the 
system engineers and system designers of such complex systems.  Had 
someone with a 'big-picture' view of the Ariane system really spent the 
time to understand the implications of shutting down a critical sensor 
perhaps this tragic event would have been avoided.

-- 
Steve O'Neill                      | "No,no,no, don't tug on that!
Sanders, A Lockheed Martin Company |  You never know what it might
smoneill@sanders.lockheed.com      |  be attached to." 
(603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-08  0:00 Ariane 5 - not an exception? Marin David Condic, 407.796.8997, M/S 731-93
@ 1996-08-09  0:00 ` John McCabe
  0 siblings, 0 replies; 111+ messages in thread
From: John McCabe @ 1996-08-09  0:00 UTC (permalink / raw)



>    If they were running at 80% utilization without runtime checks,
>    including the checks might have left an unacceptable risk. If they
>    had run with checks in place and were at 98% utilization and hit a
>    "corner case" in the software which drove them over 100%, we'd be
>    able to sit here now and criticize them for failing to remove the
>    checks to leave a safety margin on utilization.

True, but even with a maximum limit of 80% utilisation, it's going to
be very difficult to guarantee that you are never going to exceed
100%.

The program I am on now has a 70% loading requirement, but we have
~8us on a 10MHz 1750A processor to do some work. Now, as you will know
from your own experience of 1750A, 8us is not a lot of instructions!
There is no way we can satisfy the loading requirement (which would
allow us ~5.6us) on that particular case, in actual fact we use about
7us nominally. There is nothing more we can do about it though as we
have already coded that section in assembler, but we have analysed the
consequences of going over 100% and taken account of them in our
system design. 

The point I am trying to make here is that I believe that the success
of a mission should never be traded off against such an arbitrary
requirement as a loading margin.

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-06  0:00         ` Frank Manning
  1996-08-08  0:00           ` Steve O'Neill
@ 1996-08-09  0:00           ` JP Thornley
  1 sibling, 0 replies; 111+ messages in thread
From: JP Thornley @ 1996-08-09  0:00 UTC (permalink / raw)



frank@bigdog.engr.arizona.edu (Frank Manning) writes:
> So how do you prevent another Ariane 5 problem? You really need people
> who understand both aerospace engineering (another somewhat nebulous
> term) and software engineering. 
> 

I actually feel that there is a ray of hope here in the move towards 
executable prototypes.  To do these successfully requires a combination 
of systems and software expertise, and will *have* to be produced by 
combined teams - so that the dividing line between the two disciplines 
will become much less distinct.  (And if the systems engineers do carry 
on throwing the spec over the wall then the failures will come a lot 
quicker.)

Phil Thornley
-- 
------------------------------------------------------------------------
| JP Thornley    EMail jpt@diphi.demon.co.uk                           |
------------------------------------------------------------------------





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-08  0:00           ` Steve O'Neill
@ 1996-08-09  0:00             ` Pat Rogers
  0 siblings, 0 replies; 111+ messages in thread
From: Pat Rogers @ 1996-08-09  0:00 UTC (permalink / raw)



In <320A288E.725B@sanders.lockheed.com>, Steve O'Neill <smoneill@sanders.lockheed.com> writes:
>Frank Manning wrote:
[snip]
>> Like Steve says, you can't fault developers who implement the
>> requirements they're given, especially if the developers have no
>> specialized training in aerospace engineering. The converse is also
>> true -- what do you do about aero engineers (or mechanical/electrical/
>> etc.) who have little training in -- forgive me -- software
>> engineering? How do you bridge the gulf?
>
>Easy, hire me, I've got a BS in Aero and about 15 years experience in the 
>analysis and development of related real-time systems.
>
>Sorry, I couldn't resist. ;)


OK -- I'll hire you, but you'll have to move to Houston.... :)

pat
---------------
Patrick Rogers
progers@acm.org





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-07  0:00           ` Martin Tom Brown
@ 1996-08-09  0:00             ` Ken Garlington
  0 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-08-09  0:00 UTC (permalink / raw)



Martin Tom Brown wrote:
> 
> The most obvious choice is drop back to a simple, but not necessarily
> accurate primitive hardware backup system like levers and gyroscopes.
> I have much more faith in the ability of mechanical and electronics
> engineers to tolerance their designs for adverse conditions. When my
> neck is on the line I like to see physical hardware interlocks in place.

I think "faith" is certainly the right word for this belief, particularly
given the complexity of modern ASICs. If there's a simple mechanical
backup or interlock available, it's certainly a worthwhile design
consideration. However, feedback systems don't always react well to suddenly
being locked into a fixed position!

> --
> Martin Brown  <martin@nezumi.demon.co.uk>     __                CIS: 71651,470
> Scientific Software Consultancy             /^,,)__/

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-08  0:00           ` Darius Blasband
@ 1996-08-10  0:00             ` dwnoon
  1996-08-12  0:00               ` Thomas Kendelbacher
  1996-08-13  0:00             ` Roy Gardiner
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 111+ messages in thread
From: dwnoon @ 1996-08-10  0:00 UTC (permalink / raw)



In <3209A6E6.17D4@phidani.be>, Darius Blasband <darius@phidani.be> writes:
>PL/1 is one of the worst language design ever. As far as I know (or rather,
>as far as what I believe a good language design should be) only C++
>comes close.

As far as lucid exression of algorithms, C++ isn't fit to lick PL/I's boots.

I suggest you try posting this message in a SmallTalk or Eiffel newsgroup
and see what their reaction is that C++ is a decent langauge. Even they
have their standards.

Regards

Dave
<Team PL/I>




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-10  0:00             ` dwnoon
@ 1996-08-12  0:00               ` Thomas Kendelbacher
  1996-08-13  0:00                 ` ++           robin
  0 siblings, 1 reply; 111+ messages in thread
From: Thomas Kendelbacher @ 1996-08-12  0:00 UTC (permalink / raw)



In article <4uic6d$1p76@news-s01.ny.us.ibm.net>, dwnoon@ibm.net writes:
>In <3209A6E6.17D4@phidani.be>, Darius Blasband <darius@phidani.be> writes:
>>PL/1 is one of the worst language design ever. As far as I know (or rather,
>>as far as what I believe a good language design should be) only C++
>>comes close.
>
>As far as lucid exression of algorithms, C++ isn't fit to lick PL/I's boots.
>
>I suggest you try posting this message in a SmallTalk or Eiffel newsgroup
>and see what their reaction is that C++ is a decent langauge. Even they
>have their standards.
>
>Regards
>
>Dave
><Team PL/I>

I'm not a native English speaker, but I'm pretty sure the original statement
meant to express that only C++ comes close *to PL/I* in the category of *worst*
language designs -- but certainly not close to *good language design*!

Maybe Darius as the original poster should clarify this.

-- 
Thomas Kendelbacher   |   email : Thomas.Kendelbacher@erno.de
DASA RI / Abt. RIT14  |   voice : +49 421 539 5492 (working hours)
Postfach 28 61 56     |      or : +49 421 57 04 37 (any other time)
D-28361 Bremen        |     fax : +49 421 539 4529 (any time)
Germany






^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
@ 1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
  1996-08-15  0:00 ` John McCabe
  0 siblings, 1 reply; 111+ messages in thread
From: Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-08-13  0:00 UTC (permalink / raw)



John McCabe <john@ASSEN.DEMON.CO.UK> writes:
>The point I am trying to make here is that I believe that the success
>of a mission should never be traded off against such an arbitrary
>requirement as a loading margin.
>
    Well, I have to agree that the important thing is mission success,
    not loading margin. But the general reason you establish some sort
    of "goal" for margin is to insure mission success. When going to
    Zero Margin (or worse) means dropping the rocket in the drink, not
    leaving yourself some room for "corner cases" which you never
    tested could be construed as imprudent - just as turning off
    checks could be considered imprudent. Had the "unanticipated case"
    never occurred, the software developers would have been "heros"
    and would have been given a certificate with their name on it in a
    cheap plastic frame. They took a gamble and lost, so now they get
    to be the scapegoats for us to kick around for a while.

    I'll admit that I also dislike setting some absolute number for
    CPU margin and sticking to it blindly. Eroding margin simply
    erodes the level of confidence and you can afford to do that
    sometimes. Especially if you're willing to do the work to
    demonstrate that you really have found the worst-case behavior or
    that the system is sufficiently deterministic that you can run
    with less margin and maintain sufficient confidence. (Define
    "sufficient confidence...")

    Lots of people have tried to make the case that you should never
    turn off the runtime checks that Ada provides because they're
    critical to the safety of the system you are developing. I'd like
    to agree and certainly Ariane 5 is an example of where this might
    have prevented disaster. But sometimes us poor saps who have
    nothing to work with but a Mil-Std-1750a are stuck making
    tradeoffs between safety checks and building a system that will
    work at all.

    Anybody want to make me a rad-hard, space tested, 200mips
    processor that I can buy in small lots at $40 a piece and has a
    full suite of development tools (including Ada95 compiler)
    available for it? (Sober up, Marin! ;-)

    MDC

Marin David Condic, Senior Computer Engineer    ATT:        407.796.8997
M/S 731-96                                      Technet:    796.8997
Pratt & Whitney, GESP                           Fax:        407.796.4669
P.O. Box 109600                                 Internet:   CONDICMA@PWFL.COM
West Palm Beach, FL 33410-9600                  Internet:   CONDIC@FLINET.COM
===============================================================================
    "Being in a minority, even a minority of one, did not make you
    mad. There was truth, and there was untruth, and if you clung to
    the truth even against the whole world, you were not mad. 'Sanity
    is not statistical.'"

        --  G. Orwell, "1984"
===============================================================================




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
@ 1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
  1996-08-15  0:00 ` John McCabe
  0 siblings, 1 reply; 111+ messages in thread
From: Marin David Condic, 407.796.8997, M/S 731-93 @ 1996-08-13  0:00 UTC (permalink / raw)



++ robin <rav@GOANNA.CS.RMIT.EDU.AU> writes:
>---As I stated, a PL/I programmer experienced in real-time
>programming, would not have made this stupid mistake.
>

    This still smacks of language bigotry. Why is it that only an
    experienced PL/I programmer would not make this "mistake"? I've
    personally seen *lots* of mistakes made by many "experienced"
    programmers in just about every language there is - including
    PL/I.

    Just remember that Ada has built-in runtime checks on conversions
    and the ability to write interrupt service routines as well. (And
    we "experienced Ada programmers" know how to use them, too!) The
    monday morning quarterbacks with 20/20 hindsight binoculars can
    easily see that the best thing would have been to leave in the
    checks or write an ISR which saturated the math rather than shut
    the unit down. But you don't need to be fluent in PL/I to see
    that.

    The language the system was programmed in or the language spoken
    by the developers has nothing to do with the error that occurred.
    It occurred because there was a conscious decision on someone's
    part to remove the safety net and to handle all exceptions by
    shutting down the channel. The designers no doubt made this
    decision for engineering reasons that are more complex than are
    outlined in the failure report and certainly had little or nothing
    at all to do with the language of implementation. And sitting in
    the back-seat after the crash telling the driver "If *I* had been
    driving, I'd never have crashed..." is condescending as well as
    being completely unprovable.

    MDC


Marin David Condic, Senior Computer Engineer    ATT:        407.796.8997
M/S 731-96                                      Technet:    796.8997
Pratt & Whitney, GESP                           Fax:        407.796.4669
P.O. Box 109600                                 Internet:   CONDICMA@PWFL.COM
West Palm Beach, FL 33410-9600                  Internet:   CONDIC@FLINET.COM
===============================================================================
    "It may be true that the law cannot make a man love me. But it can
    keep him from lynching me, and I think that's pretty important."

            --  Rev. Martin Luther King, Jr
===============================================================================




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-08  0:00           ` Darius Blasband
  1996-08-10  0:00             ` dwnoon
  1996-08-13  0:00             ` Roy Gardiner
@ 1996-08-13  0:00             ` ++           robin
  1996-08-15  0:00             ` Richard Riehle
  3 siblings, 0 replies; 111+ messages in thread
From: ++           robin @ 1996-08-13  0:00 UTC (permalink / raw)



	Darius Blasband <darius@phidani.be> writes:

	>++ robin wrote:
	>>         pmartin@alsys.com (Pascal Martin @lone) writes:
	>>         >Which PL/I compilers are available today on Sparc/Solaris, HP700/HPUX,
	>>         >Windows/Intel and PowerPC/* ? I thought the language was dead since
	>>         >the demise of IBM mainframes and the death of (the beloved) Multics.
	>>    PL/I is available on at least the following systems:     [ Quite an impressive list of platforms deleted here ]

	>But basically, how many of these compilers are used for other than
	>historical purposes ?

---Problably all of them.

   If you have been reading job postings in comp.lang.pl1 and other
(job) newsgroups, you would have see that new projects are being
developed in PL/I.

	>Migrating and maintaining existing applications ?

---Certainly.

	>How often do we see a project where virtually any tool can be used
	>and where PL/1 happens to be chosen out of a large number of possible
	>choices ?

---The real choices are few.  PL/I is superior to most languages
for general programming.

   You see, PL/I was a significant advance on other languages
when it was introduced in 1966.  Features such as interrupt
handling (great for real-time processing) didn't then exist
in most other HL languages.

   Those features have kept PL/I at the forefront today.

   And the PL/I language has been extended recently with the
new implementations on the OS/2, Windows 95/NT and AIX platforms.

	>Darius




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-05  0:00       ` Steve O'Neill
  1996-08-06  0:00         ` Francis Lipski
  1996-08-06  0:00         ` Frank Manning
@ 1996-08-13  0:00         ` ++           robin
  1996-08-13  0:00           ` Steve O'Neill
  2 siblings, 1 reply; 111+ messages in thread
From: ++           robin @ 1996-08-13  0:00 UTC (permalink / raw)



	g1006@fs1.mar.lmco.com (Francis Lipski) writes:

	>In article <32065615.77C7@sanders.lockheed.com>, you write:
	>> ++ robin wrote:
	>> >         Steve O'Neill <smoneill@sanders.lockheed.com> writes:
	>> >         >I disagree completely!  The language was not the
	>> >         >problem the design decisions in how the language
	>> >         >was used were.
	>> > 
	>> > ---The choice of language is indeed very relevant.
	>> > What I wrote in an earlier posting on this topic is highly
	>> > apt:
	>> > 
	>> > "A PL/I programmer
	>> > experienced with real time systems, would have CHALLENGED
	>> > such a stupid requirement that the computer be shut down by the
	>> > error-handler in the event of a fixed-point overflow.  He would
	>> > have had it changed.

	>   Not always possible.  If you are in the minority and are unsuccessful
	>to argue others to your point, what do you do?  

---Don't be absurd.  The checks WERE included in all but 3
of the type conversions in the vicinity of the conversion
that blew up.

	>  As a previous message in this thread had stated, what
	>should someone do?  Say to hell with the requirements,
	>I'm going to code what I think is correct.

---The requirements were that any kind of interrupt was
going to be handled by the interrupt handler (which would
then shut doen the computer).

   A *real* real-time PL/I programmer would have included
a test to make certain that the interrupt could not occur.
That was NOT going against the specifications.

   But, as I wrote in a previous post, a belt-and-braces
approach should have been taken, viz, to include an
error handler for fixed-point overflow, as an interrupt
was to be taken as SUDDEN DEATH for the project.

   This is where a PL/I programmer would have had the
specification changed.

	>> > "I'd go further to say that no experienced PL/I programmer
	>> > would have shut down the system as a result of a fixed-point
	>> > overflow.

	>> Substitute Ada (or C or FORTRAN or Assembly) for
	>> PL/I here and you see my point.

---Neither C nor Fortran have error-handling.
Ada *was* used, and look what happened.
Hence the suggestion that PL/I expertise on the
project would have been advantage.  You see,
real-time programming in PL/I has been part of the scene
since 1966!

	>> It's not the language that makes the developer challange the 
	>> ridiculous requirement to shut down it is the developer "experienced with 
	>> real-time systems".  Just because I am programming in PL/I doesn't mean I 
	>> am magically a better real-time developer.  As a real-time designer 
	>> concerned with the system-wide aspects of completely shutting down any 
	>> sensor I would question this approach regardless of the language in use. 
	>> This has nothing to do with the fact that much of my experience is with 
	>> Ada.

	>> The (flawed) reasoning for why certain conversions were not protected was 
	>> also covered in the report.  Invalid assumptions were made

---Yes; it was assumed that the value would not overflow
but it did!.  They have forgotten Murphy's Law:
"If anything can go wrong, it will".  And Robert's
Law: "Even if it *can't* go wrong, it will".

	>> Certainly you and I would not have shut down the system but what about 
	>> the vast majority of developers without as much experience or who thought 
	>> that their job was to implement the requirements that they were given?

---They could have implemented the "requirements"
WITHOUT raising a fixed-point interrupt,
just by checking for overflow!

	>  The report states that the rationale was based on the "culture within the
	>Ariane programme of only addressing random hardware failures.  From this point of view exception - or error- handling mechanisms are designed for a random
	>hardware failure which can quite rationally be handled by a backup system"

	>  If all conversions and other possible overflow
	>conditions are protected,
	>and then an overflow occurs, what action should be taken?

---Action should be taken to deal with a fixed-point overflow!
Something was overlooked.  It needed to be dealt with.  That
it was not is a fundamental error!  That's why error-handling
is provided!  To provide a margin of safety.

	> The system has
	>just had a random hardware failure.  Continue to operate with known bad 
	>hardware?  In the case of an overflow, set to max value, continue and
	>hope for the best?  

---Good idea, already suggested in the report.  But the
report also suggested that the design needed to
take into account programmer error.

	>  While clearly the design, in this case, did not protect itself sufficiently,
	>and compounded errors by not handling the case of a simultaneous failure of
	>both processors, what action should be taken on an overflow if not to shut
	>down.  With flight controls or inertial systems, partitioning into tasks and
	>then restarting the offending task is not an option.  It would take entirely
	>too long to restart the task to be able to effectively recover.

	>  Regarding the spare requirements.  The answer as to why to have spare time
	>is to ensure that all hard deadlines are met and to allow growth for future
	>versions of SW.   Allowing room for growth is necessary in development programs
	>however, the requirement is usually never relaxed as more functionality is
	>added.  That is another story.  However, it is necessary to ensure sufficient
	>time is available to complete all the processing within the allotted time.
	>The execution time of the software is at best a statistical problem, at least
	>the hardware times can be statistical.  If the SW is always measured as a worse
	>case time, and all these are added together can can not allow this time
	>to meet or exceed the allowable time, given the statistical nature of the HW.
	>So how much spare time should be allotted?  If 20% is unrealistic, what
	>number should be used, 10%, 1%, 0.001%?  

	>> > 
	>> > "Furthermore, he would have included a check that the value
	>> > did not go out of range;"
	>> > 
	>> > ---But all it needed was a check that the value was in range.
	>> > Such checks had been included on other similar conversions in
	>> > the vicinity!

	>> Yes, and there was mention in the report that 'they' thought that this 
	>> would violate that precious spare requirement.

---That's a red herring.

        > So they set about picking
        >> and choosing which conversions to protect.
   
---This doesn't sppear specifically in the report as regards
this conversion and the 2 others in the vicinity.  There's
the impliciation that these conversions were overlooked.
In any case, the test would have introduced a trivial
number of additional instructions.

        >>  I find it extremely hard to
        >> believe that the (small) handful of instructions to do a range check
        >> would have been too much!

---Agreed.

        >>  And, in hindsight, well worth it.

---Agreed again.

        >> The issue of the OBC interpreting the 'essentially diagnostic data' as
        >> valid sensor data really makes me wonder.  In a system with a reasonable
        >> interface between the two devices this should *never* happen.  I am
        >> surprised that this misinterpretation didn't cause a similar overflow in
        >> the OBC and resulting shutdown! :(

---Yes.

        > I was also amazed by the poor design of the interface that didn't detect
        >this problem.  Probably given enough time, some form of error would
        >have occurred resulting in the OBC shutting down.

---There were a number of inadequacies revealed in the design.

        >> I think that we agree in our assessment of the situation and the fact
        >> that these problems could have been avoided with a better overall system
        >> design and more extensive testing.  Essentially the same conclusions that
        >> the review board came to.  My only disagreement is with your _opinion_
        >> that the simple choice of a different language would have saved the day.

---As I stated, a PL/I programmer experienced in real-time
programming, would not have made this stupid mistake.

        >>  And with this point I will continue to disagree.

---You do not appear to have grounds for this opinion.

        >> Steve O'Neill                      | "No,no,no, don't tug on that!
        >> Sanders, A Lockheed Martin Company |  You never know what it might
        >> smoneill@sanders.lockheed.com      |  be attached to."
        >> (603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-04  0:00     ` Richard Riehle
                         ` (2 preceding siblings ...)
  1996-08-05  0:00       ` Nigel Tzeng
@ 1996-08-13  0:00       ` ++           robin
  1996-08-13  0:00         ` Ken Garlington
                           ` (3 more replies)
  3 siblings, 4 replies; 111+ messages in thread
From: ++           robin @ 1996-08-13  0:00 UTC (permalink / raw)



	Richard Riehle <rriehle@nunic.nu.edu> writes:

	>On 29 Jul 1996, Bill Angel wrote:

	>> 	I am under the impression that for the US manned spaceflight
	>> program (to get to the moon) ,an on-board computer that was serving as a
	>> backup to the primary computer would have been performing its computations
	>> using completely different software than the primary computer. By
	>> utilizing this methodology, the same software "glitch" would not halt both
	>> systems simultaneously.  Perhaps a group of software developers could be
	>> tasked with producing a version of the on-board software for Ariane in a
	>> different computer language than that used by the primary processor. The
	>> two processors, running simultaneously, would serve to check each other's
	>> results with greater independence that they apparently do now.

	>  I have been following this thread with interest.  I am no expert on
	>  Ariane, but lack of expertise has not an obstacle to others who have
	>  posted on this topic.

	>  1) redundant processors

	>     The idea of using different processor architectures is a good
	>     one and often employed for other systems such as the Boeing 777.
	>     However, if I recall correctly, Ariane has a "rad-hard" requirement
	>     (right or wrong) and uses Mil-Std 1750A processors to satisfy that
	>     requirement. This would not permit using multiple processors of
	>     differing architectures.

	>  2) PL/I

	>     a) There is no PL/I compiler for the 1750A

---Not an obstacle.  How was an Ada compiler written for it?

	>     b) Ada is far more suitable for safety-sensitive software than Pl/I

---Nonsense.  PL/I has a long (30 years) record in
excellent real-time facilities, and with people with
experience in error-recovery and fail-soft in routine
commercial applications as well as real-time programming.

	>     c) This failure was not a language issue.

---Isn't it?  One of the arguments put forward was that
an Ada condition couldn't be raised and leave a trace,
and that it would be argued that there was no guarantee
whether a piece of code was executed.

   In PL/I, a SIGNAL statement (which can be used for
program checkout) leaves a printed record that it was
executed.  It gives a message that the condition was
raised, and comes with line numbers, etc.  There is
absolutely no doubt that the statement did not execute!

	> It is a management issue.
	>        Specifically, it is a failure of engineering management.

---There are lots of things for which one can blame 
management, but the lack of a check for overflow has
to come down to the programmer.

	>     d) Given the incorrect specifications against which the program was
	>        designed, the same failure would have occurred in PL/I or any
	>        other language.

---No it wouldn't.  The lack of a test for overflow was
the problem.  But even supposing for a moment that
all conversions were checked, then
an interrupt handler could be included for fixed-point
overflow.  This would have trapped any unchecked
overflow.  A R/T (and even non R/T) PL/I programmer
routinely puts in error control.

	>  3) Turning off the Computer

	>     Not always an incorrect decision in embedded computing. This time
	>     it clearly was.

	>  4) Software Reuse

	>     If one intends to "reuse" software, such as Ariane 4xx software in
	>     Ariane 5xxx, in a significantly different architecture, there is some
	>     virtue in extensive testing.

---In this case, with simulated inputs, and with SIGNAL
statements to check out what happens when an interrupt
occurs.  If this had been done (routine in PL/I), the
effect of an unchecked conversion would have been observed.

	>  5) Unchecked Conversion

	>     Ada practitioners have been preaching for years that this should not
	>     be done without substantial examination and testing. One more example
	>     of why unchecked_conversion is usually not a good idea. Sometimes it
	>     is unavoidable, I know.

	>  7) Ada

	>     This is still the best language for doing this kind of system.

---PL/I would be clearly better, as it meets the requirments
for audit trails in program and system checkout (in addition
to the other facilities that it offers).

	> But
	>     stupid management is something no programming language can change.
	>     Given other engineering constraints on this project, Ada is really
	>     the only reasonable language to choose.

---Scarcely convincing, in view of the failure.

	>   Richard Riehle




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-08  0:00           ` Darius Blasband
  1996-08-10  0:00             ` dwnoon
@ 1996-08-13  0:00             ` Roy Gardiner
  1996-08-13  0:00               ` Ken Garlington
  1996-08-13  0:00               ` Lance Kibblewhite
  1996-08-13  0:00             ` ++           robin
  1996-08-15  0:00             ` Richard Riehle
  3 siblings, 2 replies; 111+ messages in thread
From: Roy Gardiner @ 1996-08-13  0:00 UTC (permalink / raw)



Darius Blasband <darius@phidani.be> wrote:

>PL/1 is one of the worst language design ever. As far as I know (or rather,
>as far as what I believe a good language design should be) only C++
>comes close.
>
>IMHO...
>
>Darius

Exactly the opposite of my view, and I am sure I am as unlikely to be 
convinced of C++ as Darius is of PLI.

So let me revive a question aired on this group once before:

Who out there became expert in C first, and now prefers PLI? and vice 
versa?

All those persons who still prefer their 1st are ineligible to vote (I'm 
out, PLI first)

Regards, Roy Gardiner





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00       ` ++           robin
  1996-08-13  0:00         ` Ken Garlington
@ 1996-08-13  0:00         ` Darren C Davenport
  1996-08-14  0:00         ` John McCabe
  1996-08-20  0:00         ` Richard Riehle
  3 siblings, 0 replies; 111+ messages in thread
From: Darren C Davenport @ 1996-08-13  0:00 UTC (permalink / raw)



++ robin wrote:
> 
> overflow.  A R/T (and even non R/T) PL/I programmer
> routinely puts in error control.
> 
> occurs.  If this had been done (routine in PL/I), the
> effect of an unchecked conversion would have been observed.

I think that it is quite evident that the world would be a better
place if we all just convert to PL/I because we will be more
intelligent and never make silly mistakes.  As a side effect, world
peace will become a reality and hunger will be eradicated.

Darren




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00         ` ++           robin
@ 1996-08-13  0:00           ` Steve O'Neill
  0 siblings, 0 replies; 111+ messages in thread
From: Steve O'Neill @ 1996-08-13  0:00 UTC (permalink / raw)



++ robin wrote:
I previously said:
>         > So they set about picking
>         >> and choosing which conversions to protect.
> 

to which ++ robin replied:
> ---This doesn't sppear specifically in the report as regards
> this conversion and the 2 others in the vicinity.  There's
> the impliciation that these conversions were overlooked.
> In any case, the test would have introduced a trivial
> number of additional instructions.

Well, here's what the report said:
"It has been state that to the Board that not all of the conversions 
were protected because a maximum workload target of 80% had been set for 
the SRI computer.  To determine the vulnerability of unprotected code, 
an analysis was performed on every operation which could give rise to an 
exception, including an Operand Error.  In particular, the conversion of 
floating point values to integers was analysed and operations involving 
seven variables were at risk of leading to an Operand Error.  This led 
to protection being added to four of the variables, evidence of which 
appears in the Ada code.  However, three of the variables were left 
unprotected.  No reference to justification of this decision was found 
directly in the source code.  Given the large amount of documentation 
associated with any industrial application, the assumption, was 
essentially obscured, though not deliberately, from any external review.

The reason for the three remaining variables, including the one denoting 
horizontal bias, being unprotectedwas that further reasoning indicated 
that they were either physically limited or that there was a large margin 
of safety, a reasoning which in the case of the variable BH turned out to 
be faulty.  It is important to note that the decision to protect certain 
variables but not others was taken jointly by project partners at several 
contractual levels."

So, what this tells us is that they did what they thought was a thorough 
analysis and consciously made the decision to leave 3 variables 
unprotected.  An that this was not a unilateral decision.  I'm not 
saying that this whole approach made sense.  What I am saying is that 
this scenario is independent of the language used.  

Yes, in hindsight, it would have been much easier to add a couple lines 
of code even if it did cost 0.01% of that margin?

and ++robin continues to profess:
> 
> ---As I stated, a PL/I programmer experienced in real-time
> programming, would not have made this stupid mistake.
> 

And with this point I will continue to disagree. I guess that I need to 
learn PL/I since it obviously automatically endows people with god-like 
powers of reasoning that makes it impossible for them to build flawed 
systems.  Wow, I wonder if NASA and the FAA know this?! ;)

++robin> ---You do not appear to have grounds for this opinion.

Well, everyone has grounds for their opinions - it's called belief and 
it's mostly based on their experiences.  My experiences obviously differ 
from yours therefore we have different opinions.  Probably neither mine 
nor yours is entirely right or entirely wrong.

And with that I think I'll stop my participation in this thread.

-- 
Steve O'Neill                      | "No,no,no, don't tug on that!
Sanders, A Lockheed Martin Company |  You never know what it might
smoneill@sanders.lockheed.com      |  be attached to." 
(603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00       ` ++           robin
@ 1996-08-13  0:00         ` Ken Garlington
  1996-08-13  0:00           ` Kirk Bradley
  1996-08-22  0:00           ` ++           robin
  1996-08-13  0:00         ` Darren C Davenport
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 111+ messages in thread
From: Ken Garlington @ 1996-08-13  0:00 UTC (permalink / raw)



++ robin wrote:
> 
>         >  2) PL/I
> 
>         >     a) There is no PL/I compiler for the 1750A
> 
> ---Not an obstacle.  How was an Ada compiler written for it?

I believe an existing one was purchased. I don't believe an Ada
compiler was written specifically for the Ariane SRI, as apparently
you are suggesting be done for PL/I. How would your argument about
mature PL/I compliers stand up in the face of a requirement to 
develop a brand-new compiler for a new target?

> 
>         >     b) Ada is far more suitable for safety-sensitive software than Pl/I
> 
> ---Nonsense.  PL/I has a long (30 years) record in
> excellent real-time facilities, and with people with
> experience in error-recovery and fail-soft in routine
> commercial applications as well as real-time programming.

It must be the lag in these newsgroups. I've posted more than one
request in comp.lang.pl1 for some examples of PL/I's use in safety-critical
real-time flight software, and narya a taker yet. I look forward to
some examples...

Perhaps those people are too busy building such software to read
the Internet?

>         >     c) This failure was not a language issue.
> 
> ---Isn't it?  One of the arguments put forward was that
> an Ada condition couldn't be raised and leave a trace,
> and that it would be argued that there was no guarantee
> whether a piece of code was executed.

That argument could be put forth, but of course it would be false.
We routinely trace Ada 83 exceptions in our debugging environments.
Ada 95 also adds a standardized capability to annotate exceptions with
user-defined information.

>    In PL/I, a SIGNAL statement (which can be used for
> program checkout) leaves a printed record that it was
> executed.  It gives a message that the condition was
> raised, and comes with line numbers, etc.  There is
> absolutely no doubt that the statement did not execute!

There is no SIGNAL statement in PL/I. In fact, PL/I has no
exception handling capabilities. (I figure, since ++robin can
continue to ignore the capabilities of Ada, why shouldn't we
ignore his claims as to the capabilities of PL/I)?

As a side issue: Could someone post the printer port address for the
Ariane SRI? Most of our IRSs don't have a printer attachment.
Must be something those experienced PL/I programmers demand be
added to the hardware... :)

> 
>         > But
>         >     stupid management is something no programming language can change.
>         >     Given other engineering constraints on this project, Ada is really
>         >     the only reasonable language to choose.
> 
> ---Scarcely convincing, in view of the failure.

Again, I would be more convinced of _your_ arguments if you would post some
of the flight software items you've developed in PL/I. For some reason,
this request keeps getting lost in the newsgroup. Until then, I think I'll
just enjoy the speculation...

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00             ` Roy Gardiner
@ 1996-08-13  0:00               ` Ken Garlington
  1996-08-13  0:00               ` Lance Kibblewhite
  1 sibling, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-08-13  0:00 UTC (permalink / raw)



Roy Gardiner wrote:
> 
> Who out there became expert in C first, and now prefers PLI? and vice
> versa?

I used PL/I prior to Ada, and I prefer Ada to PL/I for safety-critical
real-time flight applications.

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00             ` Roy Gardiner
  1996-08-13  0:00               ` Ken Garlington
@ 1996-08-13  0:00               ` Lance Kibblewhite
  1 sibling, 0 replies; 111+ messages in thread
From: Lance Kibblewhite @ 1996-08-13  0:00 UTC (permalink / raw)



Roy Gardiner <gardinerr@dcslambert.agw.bt.co.uk> wrote:
>Darius Blasband <darius@phidani.be> wrote:
>
>>PL/1 is one of the worst language design ever. As far as I know (or rather,
>>as far as what I believe a good language design should be) only C++
>>comes close.
>>
>>IMHO...
>>
>>Darius
>
>Exactly the opposite of my view, and I am sure I am as unlikely to be 
>convinced of C++ as Darius is of PLI.
>
>So let me revive a question aired on this group once before:
>
>Who out there became expert in C first, and now prefers PLI? and vice 
>versa?
>
>All those persons who still prefer their 1st are ineligible to vote (I'm 
>out, PLI first)

How about this, in order of learning.

   Language      Preference.

1. FORTRAN       FORTRAN (what else?)
2. COBOL         FORTRAN
3. PASCAL        PASCAL
4. PL/I          PL/I
5. C             PL/I
6. Ada           Ada
7. C++           Ada

Skipping things such as Prolog, Snobol, Lisp, APL, etc, which I
consider intersting and useful, but never of primary interst.





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00         ` Ken Garlington
@ 1996-08-13  0:00           ` Kirk Bradley
  1996-08-14  0:00             ` Ken Garlington
  1996-08-22  0:00           ` ++           robin
  1 sibling, 1 reply; 111+ messages in thread
From: Kirk Bradley @ 1996-08-13  0:00 UTC (permalink / raw)




Why do you say that PL/I has no signalling facility. What do you
call ON units? how do they get triggered?  I'd say it for sure
has signalling 

In article <32106B34.57DB@lmtas.lmco.com>, Ken Garlington
<garlingtonke@lmtas.lmco.com> wrote:

> ++ robin wrote:
> > 
> >         >  2) PL/I
> > 
> >         >     a) There is no PL/I compiler for the 1750A
> > 
> > ---Not an obstacle.  How was an Ada compiler written for it?
> 
> I believe an existing one was purchased. I don't believe an Ada
> compiler was written specifically for the Ariane SRI, as apparently
> you are suggesting be done for PL/I. How would your argument about
> mature PL/I compliers stand up in the face of a requirement to 
> develop a brand-new compiler for a new target?
> 
> > 
> >         >     b) Ada is far more suitable for safety-sensitive
software than Pl/I
> > 
> > ---Nonsense.  PL/I has a long (30 years) record in
> > excellent real-time facilities, and with people with
> > experience in error-recovery and fail-soft in routine
> > commercial applications as well as real-time programming.
> 
> It must be the lag in these newsgroups. I've posted more than one
> request in comp.lang.pl1 for some examples of PL/I's use in safety-critical
> real-time flight software, and narya a taker yet. I look forward to
> some examples...
> 
> Perhaps those people are too busy building such software to read
> the Internet?
> 
> >         >     c) This failure was not a language issue.
> > 
> > ---Isn't it?  One of the arguments put forward was that
> > an Ada condition couldn't be raised and leave a trace,
> > and that it would be argued that there was no guarantee
> > whether a piece of code was executed.
> 
> That argument could be put forth, but of course it would be false.
> We routinely trace Ada 83 exceptions in our debugging environments.
> Ada 95 also adds a standardized capability to annotate exceptions with
> user-defined information.
> 
> >    In PL/I, a SIGNAL statement (which can be used for
> > program checkout) leaves a printed record that it was
> > executed.  It gives a message that the condition was
> > raised, and comes with line numbers, etc.  There is
> > absolutely no doubt that the statement did not execute!
> 
> There is no SIGNAL statement in PL/I. In fact, PL/I has no
> exception handling capabilities. (I figure, since ++robin can
> continue to ignore the capabilities of Ada, why shouldn't we
> ignore his claims as to the capabilities of PL/I)?
> 
> As a side issue: Could someone post the printer port address for the
> Ariane SRI? Most of our IRSs don't have a printer attachment.
> Must be something those experienced PL/I programmers demand be
> added to the hardware... :)
> 
> > 
> >         > But
> >         >     stupid management is something no programming language
can change.
> >         >     Given other engineering constraints on this project, Ada
is really
> >         >     the only reasonable language to choose.
> > 
> > ---Scarcely convincing, in view of the failure.
> 
> Again, I would be more convinced of _your_ arguments if you would post some
> of the flight software items you've developed in PL/I. For some reason,
> this request keeps getting lost in the newsgroup. Until then, I think I'll
> just enjoy the speculation...
> 
> -- 
> LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-12  0:00               ` Thomas Kendelbacher
@ 1996-08-13  0:00                 ` ++           robin
  0 siblings, 0 replies; 111+ messages in thread
From: ++           robin @ 1996-08-13  0:00 UTC (permalink / raw)



	Thomas.Kendelbacher@erno.de (Thomas Kendelbacher) writes:

	>In article <4uic6d$1p76@news-s01.ny.us.ibm.net>, dwnoon@ibm.net writes:
	>>In <3209A6E6.17D4@phidani.be>, Darius Blasband <darius@phidani.be> writes:
	>>>PL/1 is one of the worst language design ever. As far as I know (or rather,
	>>>as far as what I believe a good language design should be) only C++
	>>>comes close.
	>>
	>>As far as lucid exression of algorithms, C++ isn't fit to lick PL/I's boots.
	>>
	>>I suggest you try posting this message in a SmallTalk or Eiffel newsgroup
	>>and see what their reaction is that C++ is a decent langauge. Even they
	>>have their standards.
	>>
	>>Regards
	>>
	>>Dave
	>><Team PL/I>

	>I'm not a native English speaker, but I'm pretty sure the original statement
	>meant to express that only C++ comes close *to PL/I* in the category of *worst*
	>language designs -- but certainly not close to *good language design*!

---Whatever the original poster was saying, Dave was
contradicting that assertion, as many of us know that
statement to be false.

   PL/I stands as an example of good language design.

   And, in fact, it is one of the enduring languages.

   BTW, the language has been extended, and the new
language is now available on OS/2, Windows 95/NT, and AIX.

	>Thomas Kendelbacher   |   email : Thomas.Kendelbacher@erno.de
	>Germany




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00       ` ++           robin
  1996-08-13  0:00         ` Ken Garlington
  1996-08-13  0:00         ` Darren C Davenport
@ 1996-08-14  0:00         ` John McCabe
  1996-08-19  0:00           ` Chris Papademetrious
  1996-08-22  0:00           ` ++           robin
  1996-08-20  0:00         ` Richard Riehle
  3 siblings, 2 replies; 111+ messages in thread
From: John McCabe @ 1996-08-14  0:00 UTC (permalink / raw)



rav@goanna.cs.rmit.edu.au (++           robin) wrote:
<..snip..>

>	>  2) PL/I

>	>     a) There is no PL/I compiler for the 1750A

>---Not an obstacle.  How was an Ada compiler written for it?

Because the US Military decided that their standard microprocessor
(Mil-Std-1750A) should have a compiler for their standard language Ada
(Mil-Std-1815).

>	>     c) This failure was not a language issue.

>---Isn't it?  One of the arguments put forward was that
>an Ada condition couldn't be raised and leave a trace,
>and that it would be argued that there was no guarantee
>whether a piece of code was executed.

Of course it isn't. How many people have to point out that no matter
what language had been used the specification and design was still
faulty.

>	> It is a management issue.
>	>        Specifically, it is a failure of engineering management.

>---There are lots of things for which one can blame 
>management, but the lack of a check for overflow has
>to come down to the programmer.

Read the report - the lack of checks was the result of analysis done
on the software and obviously was accepted at a higher level. The
important point being that the analysis was done and the checks
removed, _not_ that they weren't there in the first place.

>	>     d) Given the incorrect specifications against which the program was
>	>        designed, the same failure would have occurred in PL/I or any
>	>        other language.

>---No it wouldn't.  The lack of a test for overflow was
>the problem.  But even supposing for a moment that
>all conversions were checked, then
>an interrupt handler could be included for fixed-point
>overflow.  This would have trapped any unchecked
>overflow.  A R/T (and even non R/T) PL/I programmer
>routinely puts in error control.

Chances are that this is exactly how the Operand Error exception was
raised. Why don't you find out about Ada and how it is implemented in
embedded systems before stating rubbish like this. You may learn a
lot.

>	>  4) Software Reuse

>	>     If one intends to "reuse" software, such as Ariane 4xx software in
>	>     Ariane 5xxx, in a significantly different architecture, there is some
>	>     virtue in extensive testing.

>---In this case, with simulated inputs, and with SIGNAL
>statements to check out what happens when an interrupt
>occurs.  If this had been done (routine in PL/I), the
>effect of an unchecked conversion would have been observed.

It's obvious from the report that the testing was inadequate. If the
inputs had been simulated well, the exception would have been raised.

>	>  7) Ada

>	>     This is still the best language for doing this kind of system.

>---PL/I would be clearly better, as it meets the requirments
>for audit trails in program and system checkout (in addition
>to the other facilities that it offers).

Read the Ada manual. Every feature you have mentioned for PL/I is
available in Ada.



Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00           ` Kirk Bradley
@ 1996-08-14  0:00             ` Ken Garlington
  0 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-08-14  0:00 UTC (permalink / raw)



Kirk Bradley wrote:
> 
> Why do you say that PL/I has no signalling facility. What do you
> call ON units? how do they get triggered?  I'd say it for sure
> has signalling

To repeat the explanation from the original post:

> > (I figure, since ++robin can
> > continue to ignore the capabilities of Ada, why shouldn't we
> > ignore his claims as to the capabilities of PL/I)?

Once the PL/I world realizes that there are other languages that
also have exception handling, I'll accept your statement. Otherwise,
why shouldn't _both_ newsgroups show ignorance of each other's language?

(In other words, would a ;) have helped here?)

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
@ 1996-08-15  0:00 ` John McCabe
  0 siblings, 0 replies; 111+ messages in thread
From: John McCabe @ 1996-08-15  0:00 UTC (permalink / raw)



"Marin David Condic, 407.796.8997, M/S 731-93" <condicma@PWFL.COM>
wrote:

<..snip..>
>    ... the language spoken
>    by the developers has nothing to do with the error that occurred.

I wouldn't be too sure about that - cultural differences can
contribute to that kind of thing. As part of an Anglo-French company I
have become aware of the cultural differences between the British
"test to destruction" culture and the French "Analyse to distraction"
culture.

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
@ 1996-08-15  0:00 ` John McCabe
  0 siblings, 0 replies; 111+ messages in thread
From: John McCabe @ 1996-08-15  0:00 UTC (permalink / raw)



"Marin David Condic, 407.796.8997, M/S 731-93" <condicma@PWFL.COM>
wrote:

<..snip..>

>    Anybody want to make me a rad-hard, space tested, 200mips
>    processor that I can buy in small lots at $40 a piece and has a
>    full suite of development tools (including Ada95 compiler)
>    available for it? (Sober up, Marin! ;-)

Wow, that would be nice - that would make my job such a lot easier :-)

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-08  0:00           ` Darius Blasband
                               ` (2 preceding siblings ...)
  1996-08-13  0:00             ` ++           robin
@ 1996-08-15  0:00             ` Richard Riehle
  3 siblings, 0 replies; 111+ messages in thread
From: Richard Riehle @ 1996-08-15  0:00 UTC (permalink / raw)



On Thu, 8 Aug 1996, Darius Blasband wrote:

I am including the comp.lang.eiffel newsgroup in this reply since
it is guaranteed to give them a good bellylaugh.

> PL/1 is one of the worst language design ever. As far as I know (or rather,
> as far as what I believe a good language design should be) only C++
> comes close.

  I would hope Darious is joking. I suspect he is serious, but give
  him the benefit of the doubt.

  Richard Riehle





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-14  0:00         ` John McCabe
@ 1996-08-19  0:00           ` Chris Papademetrious
  1996-08-22  0:00           ` ++           robin
  1 sibling, 0 replies; 111+ messages in thread
From: Chris Papademetrious @ 1996-08-19  0:00 UTC (permalink / raw)



john@assen.demon.co.uk (John McCabe) wrote:
>Of course it isn't. How many people have to point out that no matter
>what language had been used the specification and design was still
>faulty.

 Exactly.  People have to realize that it's possible to write the
routine to blow up a rocket in ANY language.  A language is just a
facility to allow directives to be executed, but if the programmer
wants the thing blown up, it will happen...


-=-=-=-=-=-=-=-=-=-=-=-=-
 Chris Papademetrious
 Data Fusion Laboratory
 Drexel University
 Philadelphia, PA
-=-=-=-=-=-=-=-=-=-=-=-=-





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00       ` ++           robin
                           ` (2 preceding siblings ...)
  1996-08-14  0:00         ` John McCabe
@ 1996-08-20  0:00         ` Richard Riehle
  3 siblings, 0 replies; 111+ messages in thread
From: Richard Riehle @ 1996-08-20  0:00 UTC (permalink / raw)



On 13 Aug 1996, ++           robin wrote:

   From: Richard Riehle

   Robin wrote in response to my posting regarding PL/I for Arianne:

>
> 	>  2) PL/I
>
> 	>     a) There is no PL/I compiler for the 1750A
>
> ---Not an obstacle.  How was an Ada compiler written for it?

  There are many Ada compilers for the 1750A from many different
  compiler publishers.  And there is considerable experience using
  Ada for this architecture.  The unavailability of a PL/I
  compiler is very much an obstacle to using it.

  Moreover, PL/I has plenty of problems of its own. From an
  engineering viewpoint, little nuisances such as "default
  identifiers,"  the ability to reference an unknown name
  outside a nested block, side effects created by "secret"
  variables, the poor facilities for explicit scope resolution,
  the unpredictability of "partial qualification," etc. I could
  go on for several pages, but this should give some idea.

  PL/I, when used carefully, has been used successfully for
  a wide-range of important applications, but it is not
  without a substantial number of warts and imperfections.
  Once again, I have not used PL/I for a long time, so some
  things about the language may have gotten better.


> 	>     b) Ada is far more suitable for safety-sensitive
>       >      software than Pl/I
>
> ---Nonsense.  PL/I has a long (30 years) record in
> excellent real-time facilities, and with people with
> experience in error-recovery and fail-soft in routine
> commercial applications as well as real-time programming.

  And there is far more successful experience using Ada for this
  processor architecture than there is PL/I. Moreover, Ada is
  explicitly designed for safety-sensitive software.

  Moreover, Ada's track record in safety-critical real time
  systems is excellent and getting better all the time.

> 	>     c) This failure was not a language issue.
>
> ---Isn't it?  One of the arguments put forward was that
> an Ada condition couldn't be raised and leave a trace,
> and that it would be argued that there was no guarantee
> whether a piece of code was executed.

  Vis a vis the Ada language, that is an incorrect statement.

>
>    In PL/I, a SIGNAL statement (which can be used for
> program checkout) leaves a printed record that it was
> executed.  It gives a message that the condition was
> raised, and comes with line numbers, etc.  There is
> absolutely no doubt that the statement did not execute!

  So who gets the message? Where is it stored?  Printed record?
  Now that is interesting. It reflects a mainframe point-of-view
  rather than an embedded systems point-of-view.  We seldom
  include a printer on a  space bound system.  On the other
  hand, we do collect a lot of telemetry data, and this should
  be available.  However, it would have been of little use for
  Arianne V since no one would be using it for corrective action
  in time to save the system.

> ---There are lots of things for which one can blame
> management, but the lack of a check for overflow has
> to come down to the programmer.

  Wrong again.  In an data processing system, we give the programmer
  greater latitude. In this kind of application, the programmer is a
  contributor, but not a final authority. This is engineering, not
  programming.  Or it should be.

>> d) Given the incorrect specifications against which the program was
>>    designed, the same failure would have occurred in PL/I or any
>>    other language.

  If a programmer decides, independently of the specifications, the
  systems engineering designers, the V & V team, and his peer review
  group, to include unapproved code with such serious implications as
  error correction, that programmer will never work on this kind of
  project again.

>
> overflow.  A R/T (and even non R/T) PL/I programmer
> routinely puts in error control.

  This is not the exclusive province of PL/I programmers. I am amazed
  at such a narrow view.  Error management is a well-known part of
  programming, and Ada has excellent facilities for doing it. Facilities
  every bit as good, perhaps better ( I have written PL/I in my ancient
  past) than PL/I.  However, the programmer may alert the development
  team to a potential error, but this software is the work of a team
  of engineers, not the independent creative effort of some single
  programmer.

> ---In this case, with simulated inputs, and with SIGNAL
> statements to check out what happens when an interrupt
> occurs.  If this had been done (routine in PL/I), the
> effect of an unchecked conversion would have been observed.

  Apparently, as I have learned from another post and a face-to-face
  conversation with one of the engineers on the project, this was not
  a function of unchecked conversion, so that is moot.

> 	>  7) Ada
>
> 	>     This is still the best language for doing this kind of system.
>
> ---PL/I would be clearly better, as it meets the requirments
> for audit trails in program and system checkout (in addition
> to the other facilities that it offers).

  Frankly, I am still baffled by this argument. It is increasingly clear
  that your knowledge of Ada is somewhere between sparse and none.
  PL/I was well-known at the time a decision was taken to bypass it as
  a choice for the new DoD language in the late 1970's. Why?

  I can think of lots of reasons, but they would be lost on anyone who
  is not ready to acknowledge their validity.

  In response to my comment regarding the role of management in this
  failure, you reply,

> ---Scarcely convincing, in view of the failure.

  Well, it had better be convincing to someone. If I understand my
  understanding as I think I understand it, the failure was a direct
  result of assuming that software which behaved correctly for
  Arianne IV, would also work correctly for Arianne V.  This assumption
  was made in spite of the fact that Arianne V was designed with a
  different set of launch behaviors that Arianne IV.

  On Arianne IV, the software, at the point where the overflow would
  be detected, was designed to shut down the system while it was still
  on the launch pad. Due to differences in launch behavior, this same
  software shut down the system after lift-off.

  The software behaved exactly as it should for the Arianne IV. It was
  an engineering error to use the same software, unchanged in a system
  with different launch characteristics.  No programming language can
  tell the engineers they are making such a fundamental error. Even
  your beloved PL/I would have failed under these circumstances, unless
  it has taken on far greater run-time intelligence than I recall.

  Richard Riehle





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-22  0:00           ` ++           robin
@ 1996-08-22  0:00             ` Martin Tom Brown
  1996-08-22  0:00             ` John McCabe
  1996-08-23  0:00             ` Bob Gilbert
  2 siblings, 0 replies; 111+ messages in thread
From: Martin Tom Brown @ 1996-08-22  0:00 UTC (permalink / raw)



In article <4vgmit$124@goanna.cs.rmit.edu.au>
           rav@goanna.cs.rmit.edu.au "++           robin" writes:

>    ANY kind of interrupt (even a trivial one) in Ariane 5
> would cause sudden death to the project (the shutdown
> of the processor).  

That was a fundamental *design* decision - read the report.

> It was the programmer's job to ensure
> that such an error (number too large) never occurred under
> any circumstances.
 
Sad though it is - the specification did not require the code
to work when flown along an Ariane 5 trajectory, it was *assumed*
that the code would continue to work as it had for Ariane 4.
No live trajectory checks of the IRS were done :(

You cannot blame the programmers for inadequate specifications.

Regards,
-- 
Martin Brown  <martin@nezumi.demon.co.uk>     __                CIS: 71651,470
Scientific Software Consultancy             /^,,)__/




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-13  0:00         ` Ken Garlington
  1996-08-13  0:00           ` Kirk Bradley
@ 1996-08-22  0:00           ` ++           robin
  1996-08-22  0:00             ` Ken Garlington
  1 sibling, 1 reply; 111+ messages in thread
From: ++           robin @ 1996-08-22  0:00 UTC (permalink / raw)



	Ken Garlington <garlingtonke@lmtas.lmco.com> writes:

	>++ robin wrote:
	>>         >  2) PL/I
	>>         >     a) There is no PL/I compiler for the 1750A
	>> ---Not an obstacle.  How was an Ada compiler written for it?

	>I believe an existing one was purchased. I don't believe an Ada
	>compiler was written specifically for the Ariane SRI, as apparently
	>you are suggesting be done for PL/I. How would your argument about
	>mature PL/I compliers stand up in the face of a requirement to 
	>develop a brand-new compiler for a new target?

---This is a red herring.  The same could be said of the
Ada compiler when it was written.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-14  0:00         ` John McCabe
  1996-08-19  0:00           ` Chris Papademetrious
@ 1996-08-22  0:00           ` ++           robin
  1996-08-22  0:00             ` Martin Tom Brown
                               ` (2 more replies)
  1 sibling, 3 replies; 111+ messages in thread
From: ++           robin @ 1996-08-22  0:00 UTC (permalink / raw)



	john@assen.demon.co.uk (John McCabe) writes:

	>rav@goanna.cs.rmit.edu.au (++           robin) wrote:
	><..snip..>

	>>	>  2) PL/I

	>>	>     a) There is no PL/I compiler for the 1750A

	>>---Not an obstacle.  How was an Ada compiler written for it?

	>Because the US Military decided that their standard microprocessor
	>(Mil-Std-1750A) should have a compiler for their standard language Ada
	>(Mil-Std-1815).

---One had to be written, no?

	>>	>     c) This failure was not a language issue.

	>>---Isn't it?  One of the arguments put forward was that
	>>an Ada condition couldn't be raised and leave a trace,
	>>and that it would be argued that there was no guarantee
	>>whether a piece of code was executed.

	>Of course it isn't. How many people have to point out that no matter
	>what language had been used the specification

---No, the specification wasn't faulty.  The implementation
was.  Because of a programming error, a data conversion from
double precision floating-point to a 16-bit integer
overflowed.  In the absence of a check, an exception occurred,
the immediate action of which was to shut down the processor.
That shutdown resulted in the inevitable and almost immediate
destruction of the project.  Read the report.

	>and design was still faulty.

---There was a number of design problems that needs to be
addressed.

	>>	> It is a management issue.
	>>	>        Specifically, it is a failure of engineering management.

	>>---There are lots of things for which one can blame 
	>>management, but the lack of a check for overflow has
	>>to come down to the programmer.

	>Read the report - the lack of checks was the result of analysis done
	>on the software and obviously was accepted at a higher level.

---I had read the report ages ago, before making any
posting on the issue, and was horrified to read of the
cause being a simple overflow.

   If you had read the report, you would have notected that
the committee could not find any explanation in the code
as to why this & 2 other conversions did not have checks,
while all the other similar conversions in the vicinity
did.  There is the suggestion that the checks were overlooked.

	>The
	>important point being that the analysis was done and the checks
	>removed, _not_ that they weren't there in the first place.

---On the contrary, if you read the report, it states
clearly and unequivocally that the analysis was done
and the checks *added* where they felt that they were 
needed (NOT removed from ones that they felt did not
need it).

	>>	>     d) Given the incorrect specifications against which the program was
	>>	>        designed, the same failure would have occurred in PL/I or any
	>>	>        other language.

	>>---No it wouldn't.  The lack of a test for overflow was
	>>the problem.  But even supposing for a moment that
	>>all conversions were checked, then
	>>an interrupt handler could be included for fixed-point
	>>overflow.  This would have trapped any unchecked
	>>overflow.  A R/T (and even non R/T) PL/I programmer
	>>routinely puts in error control.

	>Chances are that this is exactly how the Operand Error exception was
	>raised.

---This is how -- obviously -- the exception was raised.
It is the *absence* of a specific check that caused this
to happen (the report used the term "unchecked").
You really should read the report.

	>Why don't you find out about Ada and how it is implemented in
	>embedded systems before stating rubbish like this. You may learn a
	>lot.

---BTW, it isn't rubbish.  You just haven't understood.
What I said was that a belt-and braces method was
needed.  All conversions should have been checked.
In addition, an interrupt handler should have been
included for fixed-point overflow, just in case
a check had been inadvertently omitted from any conversion.

   These matters are routine for a PL/I programmer.

   ANY kind of interrupt (even a trivial one) in Ariane 5
would cause sudden death to the project (the shutdown
of the processor).  It was the programmer's job to ensure
that such an error (number too large) never occurred under
any circumstances.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-22  0:00           ` ++           robin
  1996-08-22  0:00             ` Martin Tom Brown
@ 1996-08-22  0:00             ` John McCabe
  1996-08-23  0:00               ` Ken Garlington
  1996-08-23  0:00             ` Bob Gilbert
  2 siblings, 1 reply; 111+ messages in thread
From: John McCabe @ 1996-08-22  0:00 UTC (permalink / raw)



rav@goanna.cs.rmit.edu.au (++           robin) wrote:

>	>>	>     a) There is no PL/I compiler for the 1750A

>	>>---Not an obstacle.  How was an Ada compiler written for it?

>	>Because the US Military decided that their standard microprocessor
>	>(Mil-Std-1750A) should have a compiler for their standard language Ada
>	>(Mil-Std-1815).

>---One had to be written, no?

What do you mean "One had to be written"? There are a number of
commercially available Ada compilers for the MIL-STD-1750A processor
e.g. TLD, Tartan, EDS-Scicon XD-Ada, DDC-I all already produce a
1750/Ada compiler. If you are suggesting that one had to be written
especially for the Ariane project you are wrong. As far as PL/I is
concerned, it is clear that there is absolutely no demand for a
1750/PL/I compiler so in this case, if the Ariane developers had
wanted to use PL/I and 1750 it would have been necessary to develop a
completely new compiler rather than use one that could already be
purchased off the shelf.

>	>>	>     c) This failure was not a language issue.

>	>>---Isn't it?  One of the arguments put forward was that
>	>>an Ada condition couldn't be raised and leave a trace,
>	>>and that it would be argued that there was no guarantee
>	>>whether a piece of code was executed.

>	>Of course it isn't. How many people have to point out that no matter
>	>what language had been used the specification

>---No, the specification wasn't faulty.  The implementation
>was.  Because of a programming error, a data conversion from
>double precision floating-point to a 16-bit integer
>overflowed.  In the absence of a check, an exception occurred,
>the immediate action of which was to shut down the processor.
>That shutdown resulted in the inevitable and almost immediate
>destruction of the project.  Read the report.

Why was the processor shut down? Because that was the SPECIFIED action
of the exception handler. The programmer implemented the
specification. Why was there an overflow? Because the implementation
followed the specification which _failed_ to specify the Ariane 5
requirements correctly (especially inrelation to the commonality
between Ariane 4 and Ariane 5).

And BTW I have read the report, a number of times and every time I
read it, it tells me that the programmers were not at fault. Think
about it, how much authority on design decisions has a programmer on a
large project - very little! If they raise a query on the design it is
still someone else's responsibility to make the decision as to what to
do about it.

>	>and design was still faulty.

>---There was a number of design problems that needs to be
>addressed.

Yes, but by the designer, not necessarily by the programmer.

>	>Read the report - the lack of checks was the result of analysis done
>	>on the software and obviously was accepted at a higher level.

>---I had read the report ages ago, before making any
>posting on the issue, and was horrified to read of the
>cause being a simple overflow.

>   If you had read the report, you would have notected that
>the committee could not find any explanation in the code
>as to why this & 2 other conversions did not have checks,
>while all the other similar conversions in the vicinity
>did.  There is the suggestion that the checks were overlooked.

Don't talk rubbish! Read this - an excerpt from the report:

"This led to protection being added to four of the variables, evidence
of which appears in the Ada code. However, three of the variables were
left unprotected. No reference to justification of this decision was
found directly in the source code. Given the large amount of
documentation associated with any industrial application, the
assumption, although agreed, was essentially obscured, though not 
------NOTE  ^^^^^^^^^^^^^^^
deliberately, from any external review.

>	>The
>	>important point being that the analysis was done and the checks
>	>removed, _not_ that they weren't there in the first place.

>---On the contrary, if you read the report, it states
>clearly and unequivocally that the analysis was done
>and the checks *added* where they felt that they were 
Accepted, however the report goes on to say:

"The reason for the three remaining variables, including the one
denoting horizontal bias, being unprotected was that further reasoning
indicated that they were either physically limited or that there was a
large margin of safety, a reasoning which in the case of the variable
BH turned out to be faulty. It is important to note that the decision
to protect certain variables but not others was taken jointly by
project partners at several contractual levels"
------------ NOTE   ^^^^^^^^^^^^^^^^^^^^^^^^^^

******* IT WAS NOT A PROGRAMMING ERROR ********

>needed (NOT removed from ones that they felt did not
>need it).

>	>>---No it wouldn't.  The lack of a test for overflow was
>	>>the problem.  But even supposing for a moment that
>	>>all conversions were checked, then
>	>>an interrupt handler could be included for fixed-point
>	>>overflow.  This would have trapped any unchecked
>	>>overflow.  A R/T (and even non R/T) PL/I programmer
>	>>routinely puts in error control.

>	>Chances are that this is exactly how the Operand Error exception was
>	>raised.

>---This is how -- obviously -- the exception was raised.
>It is the *absence* of a specific check that caused this
>to happen (the report used the term "unchecked").

The absence of a specific check is what ultimately caused the failure,
however this was not the problem - the problem was far more
fundamental than that.

>You really should read the report.

I have read the report - a number of times - you should try it
sometime it is rather interesting.

>	>Why don't you find out about Ada and how it is implemented in
>	>embedded systems before stating rubbish like this. You may learn a
>	>lot.

>---BTW, it isn't rubbish.  You just haven't understood.
>What I said was that a belt-and braces method was
>needed.  All conversions should have been checked.

The excuse was that not all conversions were checked because there was
not enough processing power to do so. I would, however, have to agree
with you here as I believe that processor loading margin requirements
should NEVER be met at the expense of mission success.

>In addition, an interrupt handler should have been
>included for fixed-point overflow, just in case
>a check had been inadvertently omitted from any conversion.

The "interrupt handler" was added - it was the Operand Error excepion
handler.

>   These matters are routine for a PL/I programmer.

>   ANY kind of interrupt (even a trivial one) in Ariane 5
>would cause sudden death to the project (the shutdown
>of the processor).

That is probably completely untrue! The exception that occurred would
be tied to a specific interrupt - if MIL-STD-1750 processors were
used, they allow up to 16 different interrupts to be handled, this
exception handler would not be invoked for all interrupts.

>It was the programmer's job to ensure
>that such an error (number too large) never occurred under
>any circumstances.

No it wasn't - a programmers job is to implement a specification and
raise any queries regarding his implementation. If those queries end
up in incorrect decisions to proceed in a given direction "by project
partners at several different contractual levels", is it the
programmers job to ignore that? I don't think so.

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-22  0:00           ` ++           robin
@ 1996-08-22  0:00             ` Ken Garlington
  0 siblings, 0 replies; 111+ messages in thread
From: Ken Garlington @ 1996-08-22  0:00 UTC (permalink / raw)



++ robin wrote:
> 
>         Ken Garlington <garlingtonke@lmtas.lmco.com> writes:
> 
>         >++ robin wrote:
>         >>         >  2) PL/I
>         >>         >     a) There is no PL/I compiler for the 1750A
>         >> ---Not an obstacle.  How was an Ada compiler written for it?
> 
>         >I believe an existing one was purchased. I don't believe an Ada
>         >compiler was written specifically for the Ariane SRI, as apparently
>         >you are suggesting be done for PL/I. How would your argument about
>         >mature PL/I compliers stand up in the face of a requirement to
>         >develop a brand-new compiler for a new target?
> 
> ---This is a red herring.  The same could be said of the
> Ada compiler when it was written.

And, it was said in in the 1980's, when the first Ada compilers were sold for the 1750.
That is why these Ada compilers were not considered mature when they first came out.
That is why people like myself had to do extensive work to use those first compilers
in the early Ada/1750 flight control systems.

Now, let's talk about the present-day, and the Ariane 5. Which is a more mature
compiler, a PL/I compiler for the 1750 which would have had its first use on
the Ariane 5, or an Ada compiler which has probably been used on several real-time
projects previously? You previous comment was that "PL/I is more mature than Ada,"
but with respect to the 1750, the Ada compilers would be the more mature 
implementations, right?

How's that list of PL/I-based real-time airborne control systems coming along, BTW? :)

When do we get to find out your background in real-time systems?

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-22  0:00             ` John McCabe
@ 1996-08-23  0:00               ` Ken Garlington
  1996-08-24  0:00                 ` John McCabe
  0 siblings, 1 reply; 111+ messages in thread
From: Ken Garlington @ 1996-08-23  0:00 UTC (permalink / raw)



John McCabe wrote:
> 
> And BTW I have read the report, a number of times and every time I
> read it, it tells me that the programmers were not at fault.

After some thought, I've come to the conclusion that ++robin is a troll.
I can't comprehend the alternative.

-- 
LMTAS - "Our Brand Means Quality"




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-22  0:00           ` ++           robin
  1996-08-22  0:00             ` Martin Tom Brown
  1996-08-22  0:00             ` John McCabe
@ 1996-08-23  0:00             ` Bob Gilbert
  1996-08-24  0:00               ` Robert I. Eachus
  1996-08-26  0:00               ` Jon S Anthony
  2 siblings, 2 replies; 111+ messages in thread
From: Bob Gilbert @ 1996-08-23  0:00 UTC (permalink / raw)



In article <4vgmit$124@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (++           robin) writes:
> 
> ---No, the specification wasn't faulty.

I disagree, I think the specification was faulty, and is the
primary cause of the problem.

>  The implementation
> was.  Because of a programming error, a data conversion from
> double precision floating-point to a 16-bit integer
> overflowed.  In the absence of a check, an exception occurred,
> the immediate action of which was to shut down the processor.

Exactly as the requirements specified, and the software performed
exactly to the specification, including saving into PROM the
current state.  It is the specification which wrongly assumed that
a data overflow would indicate a probably hardware failure, and if
a hardware failure occurred the action was to shut down the allegedly
failed equipment.

> That shutdown resulted in the inevitable and almost immediate
> destruction of the project.  Read the report.

>    If you had read the report, you would have notected that
> the committee could not find any explanation in the code
> as to why this & 2 other conversions did not have checks,
> while all the other similar conversions in the vicinity
> did.  There is the suggestion that the checks were overlooked.

Actually I think they suggested that there was weak documentation
in the code noting that an analysis of the conversions was
performed.  It is my impression that they concluded that an
analysis of the conversions was done, and a conscious decision to 
omit them was (wrongly) made, in part to meet their 80% processor
utilization goal (something some have suggested should have been
waived in this circumstance).

>   john@assen.demon.co.uk (John McCabe) writes:
> 	>The important point being that the analysis was done and the 
> 	>checks removed, _not_ that they weren't there in the first place.
> 
> ---On the contrary, if you read the report, it states
> clearly and unequivocally that the analysis was done
> and the checks *added* where they felt that they were 
> needed (NOT removed from ones that they felt did not
> need it).

I think the above poster meant that the checks were removed from the 
requirements specification, maybe?

As you say, an analysis was done and checks added where they felt it
was necessary.  Sounds like the analysis, which determined the
requirements, was in error.  The code was written and performed
per the faulty specification, that's not a programming error, and
certainly not a language issue.

-Bob








^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-23  0:00             ` Bob Gilbert
@ 1996-08-24  0:00               ` Robert I. Eachus
  1996-08-25  0:00                 ` John McCabe
  1996-08-27  0:00                 ` Tom Speer
  1996-08-26  0:00               ` Jon S Anthony
  1 sibling, 2 replies; 111+ messages in thread
From: Robert I. Eachus @ 1996-08-24  0:00 UTC (permalink / raw)



In article <4vk8r4$2r7@zeus.orl.mmc.com> rgilbert@unconfigured.xvnews.domain (Bob Gilbert) writes:

  > ...It is my impression that they concluded that an analysis of the
  > conversions was done, and a conscious decision to omit them was
  > (wrongly) made, in part to meet their 80% processor utilization
  > goal (something some have suggested should have been waived in
  > this circumstance)...

  > As you say, an analysis was done and checks added where they felt it
  > was necessary.  Sounds like the analysis, which determined the
  > requirements, was in error...

    There is one detail you seem to have missed.  The analysis was
correct for the Ariane 4.  The incredible management blunder was that
reanalysis was not done for the Ariane 5, because the plan was to test
the actual hardware (and software) instead.  But later changes in
plans eliminated the full up testing.  So the software was not written
to Ariane 5 specifications--in fact the report specifically states
that the developers never had access to those specifications.  The
software was never analyzed with respect to those specifications. And
the software and hardware was never tested against those different
specifications.

    I would be be fired three times over for incompetent engineering
on that scale, and so would you.  But the decisions were political and
managerial.  I can't overemphasize this.  The decisions which caused
the failure were signed off not by engineers but by government
ministers and corporate executives.  And the effect, not the intent of
those decisions insured that the engineers never knew how badly things
were being botched.  It speaks well for the consortium that they did
an honest evaluation of what went wrong and published it.
Unfortunately it was clearly bowdlerized and mentions no names.  I
suspect that this was a condition imposed upon publishing it.





--

					Robert I. Eachus

with Standard_Disclaimer;
use  Standard_Disclaimer;
function Message (Text: in Clever_Ideas) return Better_Ideas is...




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-23  0:00               ` Ken Garlington
@ 1996-08-24  0:00                 ` John McCabe
  1996-08-26  0:00                   ` Byron B. Kauffman
  0 siblings, 1 reply; 111+ messages in thread
From: John McCabe @ 1996-08-24  0:00 UTC (permalink / raw)



Ken Garlington <garlingtonke@lmtas.lmco.com> wrote:

>John McCabe wrote:
>> 
>> And BTW I have read the report, a number of times and every time I
>> read it, it tells me that the programmers were not at fault.

>After some thought, I've come to the conclusion that ++robin is a troll.
>I can't comprehend the alternative.

I think I'd have to agree because he certainly seems to be "talking
out his arse" to coin a phrase (if anyone doesn't understand this -
e-mail me!).


Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-24  0:00               ` Robert I. Eachus
@ 1996-08-25  0:00                 ` John McCabe
  1996-08-27  0:00                 ` Tom Speer
  1 sibling, 0 replies; 111+ messages in thread
From: John McCabe @ 1996-08-25  0:00 UTC (permalink / raw)



eachus@spectre.mitre.org (Robert I. Eachus) wrote:

>    I would be be fired three times over for incompetent engineering
>on that scale, and so would you.  But the decisions were political and
>managerial.  I can't overemphasize this.  The decisions which caused
>the failure were signed off not by engineers but by government
>ministers and corporate executives.

That is one of the most major problems of working in the European
space industry - NOTHING is ever decided on purely technical terms. I
have experience of this even within my own (Anglo-French) company.

>And the effect, not the intent of
>those decisions insured that the engineers never knew how badly things
>were being botched.  It speaks well for the consortium that they did
>an honest evaluation of what went wrong and published it.

I think this was necessary due to the amount of [European] government
money that went into the project in the first place, and therefore the
consortium had to make sure that _everyone_ was made aware that they
had discovered what the problem was. It would have been reasonably
easy to keep it quiet, but they need to build peoples confidence
again, and by publishing the report detailing that they knew exactly
what happened, and they knew exactly what to do about it in future, I
think they are well on their way to restoring that confidence.

>Unfortunately it was clearly bowdlerized and mentions no names.  I
>suspect that this was a condition imposed upon publishing it.

That's unfortunate, I'd love to know exactly who was to blame -
perhaps more information is given in the restricted circulation
technical report? If anyone knows any more, please let us know.

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-24  0:00                 ` John McCabe
@ 1996-08-26  0:00                   ` Byron B. Kauffman
  1996-08-27  0:00                     ` John McCabe
  0 siblings, 1 reply; 111+ messages in thread
From: Byron B. Kauffman @ 1996-08-26  0:00 UTC (permalink / raw)



John McCabe wrote:
> 
> Ken Garlington <garlingtonke@lmtas.lmco.com> wrote:
> 
> >John McCabe wrote:
> >>
> >> And BTW I have read the report, a number of times and every time I
> >> read it, it tells me that the programmers were not at fault.
> 
> >After some thought, I've come to the conclusion that ++robin is a troll.
> >I can't comprehend the alternative.
> 
> I think I'd have to agree because he certainly seems to be "talking
> out his arse" to coin a phrase (if anyone doesn't understand this -
> e-mail me!).
> 
> Best Regards
> John McCabe <john@assen.demon.co.uk>






^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-23  0:00             ` Bob Gilbert
  1996-08-24  0:00               ` Robert I. Eachus
@ 1996-08-26  0:00               ` Jon S Anthony
  1 sibling, 0 replies; 111+ messages in thread
From: Jon S Anthony @ 1996-08-26  0:00 UTC (permalink / raw)



In article <840979842.3697.0@assen.demon.co.uk> john@assen.demon.co.uk (John McCabe) writes:

> That is one of the most major problems of working in the European
> space industry - NOTHING is ever decided on purely technical terms. I
> have experience of this even within my own (Anglo-French) company.

"Correction".  That is one of the most major problems of working in the
software biz.  I've seen it basically everywhere.  This business functions
more like the pop music biz than anything else.

1/2 :-)

/Jon
-- 
Jon Anthony
Organon Motives, Inc.
1 Williston Road, Suite 4
Belmont, MA 02178

617.484.3383
jsa@organon.com





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-24  0:00               ` Robert I. Eachus
  1996-08-25  0:00                 ` John McCabe
@ 1996-08-27  0:00                 ` Tom Speer
  1 sibling, 0 replies; 111+ messages in thread
From: Tom Speer @ 1996-08-27  0:00 UTC (permalink / raw)



Robert I. Eachus wrote:
> 
>...It speaks well for the consortium that they did
> an honest evaluation of what went wrong and published it.
> Unfortunately it was clearly bowdlerized and mentions no names.  I
> suspect that this was a condition imposed upon publishing it.
> ....

My limited experience with investigating similar events is that it is the 
usual practice in such mishaps that there are actually two 
investigations.  The first one is done as a safety investigation, and 
its purpose is to determine the cause as quickly as possible so that 
the lessons can be learned and disseminated in time to avoid a repeat of 
the problem.  The information given to such a board is privledged 
information, and the specific interviews and deliberations of the board 
in arriving at their conclusions are never divulged.  Only the board's 
findings of fact and their analysis of why the mishap occured are 
reported.

The second investigation is done for the purpose of finding fault.  If 
this board interviews you, you better know what your rights are, because 
they're out to hang the guilty bastards.

What you've seen are the results of the first board.

TS




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-26  0:00                   ` Byron B. Kauffman
@ 1996-08-27  0:00                     ` John McCabe
  1996-08-28  0:00                       ` Byron B. Kauffman
  0 siblings, 1 reply; 111+ messages in thread
From: John McCabe @ 1996-08-27  0:00 UTC (permalink / raw)



"Byron B. Kauffman" <KauffmanBB@lmtas.lmco.com> wrote:

>John McCabe wrote:
>> 
>> Ken Garlington <garlingtonke@lmtas.lmco.com> wrote:
>> 
>> >John McCabe wrote:
>> >>
>> >> And BTW I have read the report, a number of times and every time I
>> >> read it, it tells me that the programmers were not at fault.
>> 
>> >After some thought, I've come to the conclusion that ++robin is a troll.
>> >I can't comprehend the alternative.
>> 
>> I think I'd have to agree because he certainly seems to be "talking
>> out his arse" to coin a phrase (if anyone doesn't understand this -
>> e-mail me!).
>> 
>> Best Regards
>> John McCabe <john@assen.demon.co.uk>


Well that was a particularly interesting comment Byron :-)

Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-27  0:00                     ` John McCabe
@ 1996-08-28  0:00                       ` Byron B. Kauffman
  1996-08-28  0:00                         ` Robert Dewar
  1996-08-30  0:00                         ` John McCabe
  0 siblings, 2 replies; 111+ messages in thread
From: Byron B. Kauffman @ 1996-08-28  0:00 UTC (permalink / raw)



John McCabe wrote:
> 
> "Byron B. Kauffman" <KauffmanBB@lmtas.lmco.com> wrote:
> 
> >John McCabe wrote:
> >>
> >> Ken Garlington <garlingtonke@lmtas.lmco.com> wrote:
> >>
> >> >John McCabe wrote:
> >> >>
> >> >> And BTW I have read the report, a number of times and every time I
> >> >> read it, it tells me that the programmers were not at fault.
> >>
> >> >After some thought, I've come to the conclusion that ++robin is a troll.
> >> >I can't comprehend the alternative.
> >>
> >> I think I'd have to agree because he certainly seems to be "talking
> >> out his arse" to coin a phrase (if anyone doesn't understand this -
> >> e-mail me!).
> >>
> >> Best Regards
> >> John McCabe <john@assen.demon.co.uk>
> 
> Well that was a particularly interesting comment Byron :-)
> 
> Best Regards
> John McCabe <john@assen.demon.co.uk>

You know, I had a REALLY good one-liner to add, but managed to fat-finger
the keyboard or something and blew it away (I've already caught flak from
Dewar).  Now I can't remember what it was...

-- 
Byron Kauffman




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-28  0:00                       ` Byron B. Kauffman
@ 1996-08-28  0:00                         ` Robert Dewar
  1996-08-29  0:00                           ` Ted Dennison
  1996-08-30  0:00                         ` John McCabe
  1 sibling, 1 reply; 111+ messages in thread
From: Robert Dewar @ 1996-08-28  0:00 UTC (permalink / raw)



iByron said

"You know, I had a REALLY good one-liner to add, but managed to fat-finger
the keyboard or something and blew it away (I've already caught flak from
Dewar).  Now I can't remember what it was..."

The fflak here was simply a note pointing out that Byron had sent an empty message.

The reason I sent it was that I hoped it was in time to rescue what you
typed.
It is quite useful to use an environment which saves a huge window behind
you of everything you have typed in. That way you can rescue material which
is otherwise lost to the ages :-) :-)





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-28  0:00                         ` Robert Dewar
@ 1996-08-29  0:00                           ` Ted Dennison
  0 siblings, 0 replies; 111+ messages in thread
From: Ted Dennison @ 1996-08-29  0:00 UTC (permalink / raw)



Robert Dewar wrote:
> 
> It is quite useful to use an environment which saves a huge window behind
> you of everything you have typed in. That way you can rescue material which
> is otherwise lost to the ages :-) :-)

Of course for some of us, that might not be such a bad thing. :-)

-- 
T.E.D.          
                |  Work - mailto:dennison@escmail.orl.mmc.com  |
                |  Home - mailto:dennison@iag.net              |
                |  URL  - http://www.iag.net/~dennison         |




^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Ariane 5 - not an exception?
  1996-08-28  0:00                       ` Byron B. Kauffman
  1996-08-28  0:00                         ` Robert Dewar
@ 1996-08-30  0:00                         ` John McCabe
  1 sibling, 0 replies; 111+ messages in thread
From: John McCabe @ 1996-08-30  0:00 UTC (permalink / raw)



"Byron B. Kauffman" <KauffmanBB@lmtas.lmco.com> wrote:

>You know, I had a REALLY good one-liner to add, but managed to fat-finger
>the keyboard or something and blew it away (I've already caught flak from
>Dewar).  Now I can't remember what it was...

Byron,

Thanks for your email message, unfortunately I couldn't "reply" to it
as I couldn't find a valid email address in the headers anywere -
however, the answer is "for sure, go for it!"



Best Regards
John McCabe <john@assen.demon.co.uk>





^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~1996-08-30  0:00 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-08-08  0:00 Ariane 5 - not an exception? Marin David Condic, 407.796.8997, M/S 731-93
1996-08-09  0:00 ` John McCabe
  -- strict thread matches above, loose matches on Subject: below --
1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-15  0:00 ` John McCabe
1996-08-13  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-08-15  0:00 ` John McCabe
1996-07-25  0:00 Simon Bluck
1996-07-26  0:00 ` JP Thornley
1996-07-29  0:00   ` Ken Garlington
1996-07-29  0:00   ` JP Thornley
1996-07-29  0:00   ` Nigel Tzeng
1996-07-30  0:00   ` Robert I. Eachus
1996-07-31  0:00     ` JP Thornley
1996-08-01  0:00       ` Alan Brain
1996-08-02  0:00         ` JP Thornley
1996-08-01  0:00   ` Ken Garlington
1996-07-26  0:00 ` Theodore E. Dennison
1996-07-29  0:00   ` Ken Garlington
1996-07-26  0:00 ` ++           robin
1996-07-29  0:00   ` Bill Angel
1996-07-29  0:00     ` Paul_Green
1996-07-30  0:00     ` Lloyd Fischer
1996-07-30  0:00     ` Ken Garlington
1996-07-30  0:00     ` Bob Kurtz
1996-07-30  0:00     ` Nancy Mead
1996-07-31  0:00       ` Steve O'Neill
1996-07-31  0:00       ` Tucker Taft
1996-08-01  0:00       ` root
1996-08-01  0:00         ` Tucker Taft
1996-07-30  0:00     ` Richard Shetron
1996-07-30  0:00       ` ++           robin
1996-08-04  0:00     ` Richard Riehle
1996-08-05  0:00       ` Fergus Henderson
1996-08-05  0:00       ` John McCabe
1996-08-05  0:00       ` Nigel Tzeng
1996-08-06  0:00         ` John McCabe
1996-08-13  0:00       ` ++           robin
1996-08-13  0:00         ` Ken Garlington
1996-08-13  0:00           ` Kirk Bradley
1996-08-14  0:00             ` Ken Garlington
1996-08-22  0:00           ` ++           robin
1996-08-22  0:00             ` Ken Garlington
1996-08-13  0:00         ` Darren C Davenport
1996-08-14  0:00         ` John McCabe
1996-08-19  0:00           ` Chris Papademetrious
1996-08-22  0:00           ` ++           robin
1996-08-22  0:00             ` Martin Tom Brown
1996-08-22  0:00             ` John McCabe
1996-08-23  0:00               ` Ken Garlington
1996-08-24  0:00                 ` John McCabe
1996-08-26  0:00                   ` Byron B. Kauffman
1996-08-27  0:00                     ` John McCabe
1996-08-28  0:00                       ` Byron B. Kauffman
1996-08-28  0:00                         ` Robert Dewar
1996-08-29  0:00                           ` Ted Dennison
1996-08-30  0:00                         ` John McCabe
1996-08-23  0:00             ` Bob Gilbert
1996-08-24  0:00               ` Robert I. Eachus
1996-08-25  0:00                 ` John McCabe
1996-08-27  0:00                 ` Tom Speer
1996-08-26  0:00               ` Jon S Anthony
1996-08-20  0:00         ` Richard Riehle
1996-07-30  0:00   ` Ken Garlington
1996-08-02  0:00     ` Craig P. Beyers
1996-07-30  0:00   ` Steve O'Neill
1996-07-31  0:00     ` Martin Tom Brown
1996-07-31  0:00       ` Nigel Tzeng
1996-08-02  0:00       ` Ken Garlington
1996-08-03  0:00         ` Thomas Kendelbacher
1996-08-01  0:00     ` ++           robin
1996-08-01  0:00       ` Ken Garlington
1996-08-05  0:00         ` John McCabe
1996-08-06  0:00           ` Ken Garlington
1996-08-06  0:00           ` Mark van Walraven
1996-08-06  0:00           ` Ken Garlington
1996-08-02  0:00       ` Pascal Martin @lone
1996-08-03  0:00         ` Dr. Richard Botting
1996-08-05  0:00           ` system
1996-08-06  0:00         ` ++           robin
1996-08-08  0:00           ` Darius Blasband
1996-08-10  0:00             ` dwnoon
1996-08-12  0:00               ` Thomas Kendelbacher
1996-08-13  0:00                 ` ++           robin
1996-08-13  0:00             ` Roy Gardiner
1996-08-13  0:00               ` Ken Garlington
1996-08-13  0:00               ` Lance Kibblewhite
1996-08-13  0:00             ` ++           robin
1996-08-15  0:00             ` Richard Riehle
1996-08-05  0:00       ` Steve O'Neill
1996-08-06  0:00         ` Francis Lipski
1996-08-07  0:00           ` Martin Tom Brown
1996-08-09  0:00             ` Ken Garlington
1996-08-06  0:00         ` Frank Manning
1996-08-08  0:00           ` Steve O'Neill
1996-08-09  0:00             ` Pat Rogers
1996-08-09  0:00           ` JP Thornley
1996-08-13  0:00         ` ++           robin
1996-08-13  0:00           ` Steve O'Neill
1996-08-01  0:00   ` Jon S Anthony
1996-08-02  0:00   ` James Kanze US/ESC 60/3/141 #40763
1996-08-06  0:00   ` Stefan 'Stetson' Skoglund
1996-08-06  0:00   ` Robert I. Eachus
1996-07-26  0:00 ` Bob Gilbert
1996-07-29  0:00   ` Martin Tom Brown
1996-07-30  0:00     ` John McCabe
1996-07-31  0:00       ` Greg Bond
1996-08-03  0:00         ` John McCabe
1996-07-27  0:00 ` Bill Angel
1996-07-30  0:00 ` Dr. Richard Botting
1996-07-30  0:00   ` David Weller
1996-07-30  0:00     ` Robert Dewar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox