From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: fac41,a48e5b99425d742a
X-Google-Attributes: gidfac41,public
X-Google-Thread: f43e6,a48e5b99425d742a
X-Google-Attributes: gidf43e6,public
X-Google-Thread: 103376,a48e5b99425d742a
X-Google-Attributes: gid103376,public
X-Google-Thread: 1108a1,5da92b52f6784b63
X-Google-Attributes: gid1108a1,public
X-Google-Thread: ffc1e,a48e5b99425d742a
X-Google-Attributes: gidffc1e,public
From: Ken Garlington <GarlingtonKE@lmtas.lmco.com>
Subject: Re: Papers on the Ariane-5 crash and Design by Contract
Date: 1997/03/19
Message-ID: <33303627.4EDE@lmtas.lmco.com>
X-Deja-AN: 226772842
References: <332B5495.167EB0E7@eiffel.com> <tz8ohcjv7cc.fsf@aimnet.com>
 <332D113B.4A64@calfp.co.uk> <5gl1f5$a26$2@quasar.dimensional.com>
 <5gll90$2qu$1@news.irisa.fr>
Organization: Lockheed Martin Tactical Aircraft Systems
Newsgroups: 
 comp.lang.eiffel,comp.object,comp.software-eng,comp.programming.threads,comp.lang.ada
Date: 1997-03-19T00:00:00+00:00
List-Id: <comp.lang.ada>


Jean-Marc Jezequel wrote:
> 
> Let's finally sum up what I perceive as the most important claims in this paper:
> - reusing a component without checking its full specification is dangerous, which means that
> simple minded CORBA-like approaches at building components for mission-critical software are doomed.
> - using design by contract is an interesting way to specify the behavior of a component
> - at least in the case of Ariane 501, simple assertions (a la Eiffel and other languages)
> would have been expressive enough to specify the fatal hidden assumption.

Although I agree that Eiffel, Ada, and other languages are expressive
enough to specify the
"fatal hidden assumption", it is not clear that being able to specify
the assumption would have
led to detection and correction of the problem.

1. Being ABLE to specify the assumption does not mean that the
assumption WILL be specified.
   From the final inquiry: "The reason for the three remaining
variables, including the one
   denoting horizontal bias, being unprotected was that further
reasoning indicated that they
   were either physically limited or that there was a large margin of
safety." It is not
   at all obvious to me that development teams routinely document
limitations that they
   consider impossible to violate! Futhermore, it should be noted that
the assumptions that
   they felt *could* be violated were in fact documented in the code:
"In particular, the
   conversion of floating point values to integers was analysed and
operations involving seven
   variables were at risk of leading to an Operand Error. This led to
protection being added to
   four of the variables, evidence of which appears in the Ada code."

2. SPECIFYING the assumption does not mean that a violation of the
assumption will be DETECTED.

   a. If the detection is based on manual inspection of the code, human
error due to incomplete
      or faulty knowledge of the operating environment can always occur.
"There is no evidence
      that any trajectory data were used to analyse the behaviour of the
unprotected variables,
      and it is even more important to note that it was jointly agreed
not to include the Ariane
      5 trajectory data in the SRI requirements and specification." How
would the development
      team know there was a problem if they did not have sufficient
information?

      Futhermore, even if the information is available, performing the
appropriate analysis to
      determine the effect at the low-level software interfaces can be
difficult: "software is
      flexible and expressive and thus encourages highly demanding
requirements, which in turn
      lead to complex implementations which are difficult to assess."

      Finally, the human must know the right question to ask: "When
taking this
      design decision, it was not analysed or fully understood which
values this particular
      variable might assume when the alignment software was allowed to
operate after lift-off."

   b. If the detection is based on embedded run-time checks, such
detection is based on the
      operation of the system in such a manner (in a real or simulated
environment) that the
      check will be triggered. "...no test was performed to verify that
the SRI would behave
      correctly when being subjected to the count-down and flight time
sequence and the
      trajectory of Ariane 5." It should be noted that this test was
entirely possible:
      "it is possible to do ground testing by injecting simulated
accelerometric signals in
      accordance with predicted flight parameters, while also using a
turntable to simulate
      launcher angular movements."

3. Even if the problem was DETECTED, there is no guarantee that the
appropriate ACTION would
have been taken:

   a. With respect to manual detection: On the X-29 program, an engineer
in the control room knew
      that the air data sensor heating was not operational. He assumed
that others knew, and that
      the appropriate action was taken. It wasn't, and as I recall, an
aircraft was lost. In
      any large-scale development effort, it is very easy to lose
critical information of this
      type.

   b. With respect to operational run-time checks, the right decisions
about how to react to
      the check must be made: "Although the source of the Operand Error
has been identified,
      this in itself did not cause the mission to fail. The
specification of the
      exception-handling mechanism also contributed to the failure....
It was the decision to
      cease the processor operation which finally proved fatal."

Based on the published inquiry, I am extremenly skeptical that a
software design technique
by itself would have avoided the Ariane V disaster. If you have inside
information from
the Ariane V team that indicates otherwise, please share it.

The final inquiry report is at:
http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html

> Whether the last point scales up to a full sized mission critical system is still an open question.

Very much so.

> I'm quite confident it is so, but I've only my own experience with telco systems to back it up.
> 
> --
> Jean-Marc Jezequel               Tel : +33 2 99847192
> IRISA/CNRS                       Fax : +33 2 99847171
> Campus de Beaulieu               e-mail : jezequel@irisa.fr
> F-35042 RENNES (FRANCE)          http://www.irisa.fr/pampa/PROF/jmj.html

--
LMTAS - The Fighter Enterprise - "Our Brand Means Quality"
For job listings, other info: http://www.lmtas.com or
http://www.lmco.com