* *LONG* DRAFT - Critique of "The lessons of Ariane"
@ 1997-07-20 0:00 Ken Garlington
1997-07-26 0:00 ` Phlip C Plumlee
0 siblings, 1 reply; 6+ messages in thread
From: Ken Garlington @ 1997-07-20 0:00 UTC (permalink / raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 46466 bytes --]
This is a multi-part message in MIME format.
--------------1EB2F48589C
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
I�m releasing a draft of this paper for people to review prior to
sending it on to the Eiffel web site. Please send your comments to
kennieg@flash.net. I still need to review the whole Ariane thread from
the last go-round (as well as the latest one) in order to get the
footnotes completed, as well as to see if I missed anything
(particularly in section 6). I hope to send the paper to J�z�quel and
co. before the end of July. Any help would be much appreciated. Thanks.
(Note: the paper appears to run about 13 pages on my printer.)
--------------1EB2F48589C
Content-Type: text/plain; charset=iso-8859-1; name="Arianed1.txt"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline; filename="Arianed1.txt"
I�m releasing a draft of this paper for people to review prior to sending it on to the Eiffel web site. Please
send your comments to kennieg@flash.net. I still need to review the whole Ariane thread from the last go-
round (as well as the latest one) in order to get the footnotes completed, as well as to see if I missed
anything (particularly in section 6). I hope to send the paper to J�z�quel and co. before the end of July.
Any help would be much appreciated. Thanks.
-----
Critique of �Put it in the contract: The lessons of Ariane�
Ken Garlington (kennieg@flash.net)
July 20, 1997 FIRST DRAFT
1. Introduction
On June 4, 1996, the first flight of the Ariane 5 rocket ended in failure. Approximately 40 seconds after
initiation of the flight sequence, Ariane 5 veered off of its flight path, broke up and exploded. The
European Space Agency (ESA) and French National Center for Space Studies (CNES), in cooperation
with other agencies, established an inquiry board to determine the cause of the accident, to establish if the
system had been adequately tested before use, and to recommended corrective actions for the Ariane 5
system. This inquiry determined that the actions of the software inside the on-board Inertial Reference
System (IRS) caused the catastrophe [1]. (For the sake of brevity, the results of the inquiry board are
described as �the inquiry� elsewhere in this critique.)
The following year, Jean-Marc J�z�quel of IRISA and Bertrand Meyer of ISE co-wrote a paper [2] titled
�Put it in the contract: The lessons of Ariane.� This paper attempts to make the case for the use of the
Design by Contract (DBC) methodology and Eiffel language as a corrective action for the Ariane 5 IRS
fault. (This paper will subsequently be described as �the Eiffel paper.�)
This critique analyzes the Eiffel paper, principally to challenge two fundamental points:
a. That DBC/Eiffel would have been _sufficient_ to prevent the destruction of the Ariane 5, and
b. That DBC/Eiffel would have been _necessary_ to prevent its destruction.
The conclusion is that neither condition is adequately established in the Eiffel paper.
In addition, some other concerns with the paper are expressed, and finally some �non-issues� which have
been raised in the past to these arguments are addressed.
2. The central claim
What, exactly, does the Eiffel paper claim? The central issue is stated in the paper as follows:
�Does this mean that the [Ariane 5] crash would automatically have been avoided had the mission used
a language and method supporting built-in assertions and Design by Contract? Although it is always risky
to draw such after-the-fact conclusions, the answer is probably yes...�
Note three important factors with respect to this statement:
a. It does not say that DBC/Eiffel might have helped. It makes a more aggressive claim: that these
techniques would probably have saved Ariane 5. As a result, it seems fair to hold the Eiffel paper to a
reasonably strong standard of proof regarding this claim. Conversely, this critique does not need to show
that it is impossible for DBC/Eiffel to have saved Ariane 5; only that there are significant doubts about the
outcome.
b. Although some have suggested that the Eiffel paper�s recommendations can be implemented in other
languages [3], it is clear that this paper is based on a specific �language and method�. The Eiffel-centric
aspect of this claim is reinforced elsewhere in the Eiffel paper [4] and in other statements by the authors
[5]. Thus, it appears reasonable to discuss the proposed solution in terms of Eiffel assertions, rather than
the broader topic of �good methodology.�
c. The inquiry established several causes for the IRS software fault. However, nothing in the Eiffel paper
suggests that any action beyond the use of DBC/Eiffel was required to avoid the fault. Thus, it seems
logical to assume that all other aspects of the Ariane 5 development effort (in particular, decisions
regarding the specification and the associated testing approach) would have been the same.
3. Would DBC/Eiffel have been sufficient to avoid the failure?
The Eiffel paper states three ways in which DBC/Eiffel would have significantly altered the results of the
Ariane 5 development effort:
a. The use of Eiffel assertions as part of a manual review of the code in order to detect errors (the
�documentation effect�),
b. The use of such assertions during the testing stage (to be turned off after testing, but before
operational use) to detect errors (the �testing effect�), and
c. The use of such assertions during execution to detect (and presumably compensate for) errors (the
�execution effect�).
Each of these claims is analyzed in detail in the following subsections. This critique asserts that, for each
claim, there is insufficient evidence to conclude that the Ariane 5 incident would have been avoided.
Furthermore, for each type of use, there are associated risks that have the potential for actually making
such an error more probable.
3.1 The documentation effect
The Eiffel paper states:
�...most importantly the assertions are a prime component of the software and its documentation (�short
form�, produced automatically by tools). In an environment such as that of Ariane where there is so much
emphasis on quality control and thorough validation of everything, they would be the QA team's primary
focus of attention. Any team worth its salt would have checked systematically that every call satisfies the
precondition. That would have immediately revealed that the Ariane 5 calling software did not meet the
expectation of the Ariane 4 routines that it called.�
There is a chain of events that would have to occur in order for this statement to be true:
a. The critical assertion would have to be documented (correctly) in the code.
b. The review team would need the proper expertise to detect the problem.
c. The review team would have to review the code, as opposed to other documentation.
d. The review team would have to select the critical assertion from the total set of assertions, and
e. The review team would need the proper source information regarding Ariane 5 to realize that a
problem existed.
If any of these events failed to materialize, the documentation effect of the assertion would be nullified.
For all of these events, the case can be made that factors specific to the Ariane development effort could
have caused such a failure in the process.
3.1.1 Would the critical assertion have been documented (correctly) in the original code?
First, would the critical assertion have been documented in the code at all? Although the Eiffel paper
(with hindsight) describes this assertion as a �fundamental constraint�, the inquiry notes that the Ariane
development team considered this particular condition improbable [6]. Thus, there is a reasonable
expectation that the development team might not have included such a constraint in their system.
Some Eiffel advocates have claimed that the use of DBC/Eiffel causes even �improbable� assertions to be
included in the code [7]. However, this claim should be viewed with some skepticism. Consider, for
example, attempting to show that an object did not modify any data it was not supposed to modify. In
theory, this would require each object to specify assertions for all �unused� data within the system,
showing that the data had not been affected during the execution of the object. Such a strategy becomes
especially difficult when concurrency is introduced (common to real-time systems such as the Ariane
IRS), where these �unused� data elements may well be modified during the course of the object�s
execution -- by another thread which interrupted it. It is more likely that Eiffel assertions are written in
terms of conditions that the developer reasonably expects might occur. Note also that documenting all
�improbable� events, even if possible, will cause other problems (described later in 3.1.4).
In addition to documenting the assertion, the programmer must document it correctly. In other words, the
same individual who likely has introduced a software design fault within the system may be called upon to
write the protection for that fault. In this context, an incorrect assertion seems to be a possibility that
cannot be discounted. (Presumably, the process could be changed to have the assertions written by
someone independent of the developer, but this is not discussed in the Eiffel paper. Even in this case,
common-mode errors are quite possible [8])
Interestingly, the specific assertion proposed in the Eiffel paper as the solution to the Ariane IRS design
fault is neither correct nor complete. Here is the exact proposal:
�...any software element that has such a fundamental constraint should state it explicitly, as part of a
mechanism present in the programming language, as in the Eiffel construct
convert (horizontal_bias: INTEGER): INTEGER is
require
horizontal_bias <= Maximum_bias
do
...
ensure
...
end
where the precondition states clearly and precisely what the input must satisfy to be acceptable.�
First, note that the routine converts the horizontal bias (BH) variable from an integer value to another
integer value. However, the IRS error occurred when the computed BH was converted from a floating-
point value to an integer [9]. Thus, the fault would have occurred before this precondition would have
been evaluated (whether by a reviewer or during execution of the code). As a result, this precondition as
worded would not have detected the design fault.
Assuming that horizontal_bias is changed to a floating-point value, the next question has to do with the
specification of Maximum_bias. A likely candidate for this value is 32,767.0, since this is the maximum
positive value that can be stored in a 16-bit integer. However, if the �convert� routine scales the value
(i.e., uses the integer value 32,767 to represent the largest expected value for horizontal_bias), then the
actual Maximum_bias would need to be computed in terms of 32,767.0 multiplied by an appropriate
scaling value. Additional confusion could occur if Maximum_bias is derived from the system
specification, and is not written in terms of the actual scaling being performed. If the scaling is in error
(does not cover all possible values), then Maximum_bias may be too broad and thus miss the potential
conversion error.
Assuming that both horizontal_bias and Maximum_bias are specified correctly, there is still the question
of completeness. As noted, Maximum_bias would be written in terms of the largest desired positive value.
What about negative values? If the conversion of horizontal_bias yields a 16-bit integer smaller than -
32,768 (on some architectures, -32,767), Ariane 5 would have experienced the same failure as if
horizontal_bias was too big. However, the precondition proposed would permit any negative value to be
treated as valid. Therefore, it is not complete with respect to conversion failures.
Note that, given the use of horizontal_bias as an error term [10], it should always be greater than or equal
to zero. If the precondition is just written in terms of avoiding values less than -32,768, it would still
permit invalid horizontal biases to be converted. However, it is unclear whether the assertion in the
conversion routine should be limited to protecting against invalid conversions, or should also consider the
meaning of horizontal_bias within the system.
In summary, the Eiffel paper does not provide compelling evidence to support their claim that this specific
assertion would have been included by the Ariane team. It is easy to identify the need for such an assertion
after the fact, but circumstances specific to the Ariane development effort imply that the inclusion of the
proper assertion was far from certain. Furthermore, as can be seen by the paper�s example, there is also
the possibility that the assertion would have been improperly specified (although there is no way to
quantify this possibility).
3.1.2 Would the review team have had the expertise to detect the assertion violation?
Even if the assertion was properly defined, it would not have provided any benefit if the right personnel
did not review the assertion for correctness. The Eiffel paper speaks of a �QA team� performing this
function. However, it is the author�s experience that QA teams, while uniformly diligent in their efforts,
do not always have the experience and knowledge base to determine difficult technical questions. This is
due, in part, to the requirement in some methodologies that the QA team be composed of individuals who
are not also responsible for development of the system [11]. This independence is useful, but it can cause
the QA team to be more effective in catching process failures vs. technical faults.
More critically, it is unclear that any team within the subcontractor responsible for the IRS development
had the necessary expertise to determine the fault that lie within their system. This fault was not an
internal contradiction within the IRS design. It was a fault in the operation of the IRS within the larger
Ariane 5 system [12]. The total system was the responsibility of a different company [13]. As a result, the
IRS vendor might not have had the total expertise base available to properly analyze such an assertion.
The inquiry suggests that the relationship between the companies was not as efficient as it could have
been [14], and also suggests that experts from outside the project should have been brought in to provide
adequate analysis capabilities [15].
Overall, despite the Dilbert-esque assertion that all that was needed was a review team �worth its salt,� it
is far from clear that circumstances specific to the Ariane 5 program were conducive to having the right
people on the right team available to review the IRS software design for the Ariane 5. Thus, the evidence
presented in the paper is insufficient to show that a proper team would have been available.
3.1.3 Would the review team have read the code?
The Eiffel paper asserts: �The requirement that the horizontal bias should fit on 16 bits was in fact stated
in an obscure part of a document. But in the code itself it was nowhere to be found!� This statement
implies that the inquiry found that documenting such assertions in the code was preferable to other
documents. Although it is true that the decision was not specified directly in the code, the inquiry�s
statement is ambiguous as to whether it was actually clearly stated in the documentation, either [16].
Furthermore, the inquiry�s statement appears to address documentation of the justification to not provide a
handler for an overflow, vs. a �requirement that the horizontal bias should fit on 16 bits.�
Nonetheless, the inquiry does make it clear in its recommendations that documentation be included in
such reviews, and no mention is made that the source code is the preferred location for such data [17]. Is
there any reason to believe that the code is not the preferred location for this information? It is the
author�s experience that non-programmers, particularly senior systems architects of the type described in
section 3.1.2 above, often prefer to read English documentation over source code. (For some reason, test
engineers do not always share this bias.) Of course, this preference can be mitigated somewhat by the
ability of the source language to be easily read by non-programmers, but nothing in the Eiffel paper
addresses this issue.
Overall, the Eiffel paper assumes that source code is the best place to document such assertions, but does
not provide justification for this in the context of the Ariane 5 review team described in the previous
section. On the other hand, there is at least anecdotal evidence that the code may not be the preferred
place to document such assertions. While there may be justifications for including assertions in source
code in general (better for programmers to review, less chance of documentation �drift� from the source),
these advantages may not have been as useful for the specific issue under discussion.
3.1.4 Would the review team have fully reviewed the critical assertion?
Assuming that the right team is looking at the code, and that the code contains the right assertion, it is
still possible that the team would not have adequately reviewed the critical assertion. Putting aside the
team�s likely bias toward looking for the reaction of the software to hardware failures vs. software design
faults [18], there is also the issue of the inherent difficulty of performing manual reviews of complex
systems, as the inquiry acknowledges [19]. There are at least two types of complexity involved here:
a. The number of assertions to be reviewed. When there are �thousands of problems and potential
failures� [19] to be considered, it is very easy to miss a critical issue. Ironically, the addition of assertions
to cover �improbable� cases, in order to answer the criticism described in 3.1.1, makes this problem much
worse.
b. The complexity of analyzing individual assertions. The specific assertion required in the Ariane case
is a good example of such complexity. The goal of the reviewer is to ensure that horizontal bias will not
exceed the maximum bias. However, the actual range of horizontal bias for the Ariane 5 environment is
not necessarily directly specified in the requirements. It is a calculated value, whose actual range would
have to be determined in part by the flight profile [12] and possibly other factors. Although the inquiry
does not state the complexity of this calculation, even a fairly simple calculation may be error-prone when
done manually. Furthermore, the necessary calculation is not stated in the assertion as it is presented in
the paper; the reviewer would have to trace back through all calculations involving horizontal bias (and
ancestor values) to ensure that the appropriate actual range was created.
Given that non-trivial work may be required for each assertion, and that there are many assertions to be
reviewed, the chance for a reviewer error or oversight cannot be discounted. No empirical evidence is
provided in the Eiffel paper as to the accuracy of reviews for a system of this size and complexity.
Therefore, the case is not made that a manual review would have detected the fault.
3.1.5 Would the review team have had the necessary information to detect the error?
Finally, as noted in section 3.1.4 above, the actual range of horizontal bias is to be calculated based in part
on the Ariane 5 flight profile. However, this profile was deliberately left out of the IRS specification [20].
Therefore, the necessary data to evaluate this assertion was not available to the IRS developer. More to the
point, the deliberate omission of this data reflects the (incorrect) assumption that the profiles were not
significantly different. Would a reviewer, having to analyze a large set of assertions, have spent much
time questioning a mutually-accepted decision of this type? There is nothing in the Eiffel paper which
suggests that (s)he would in fact press the issue. As a result, it is reasonable to assume that this assertion
would not have been a focus area of the review, although again in hindsight it was obviously a
�fundamental constraint.� Therefore, the incorrect assumption would likely not have been challenged, and
the Ariane disaster would still have occurred.
This effect may be mitigated somewhat if the reviewer is from the prime contractor and has direct access
to the Ariane 5 flight profile, as described in 3.1.2. However, keep in mind that the prime contractor
participated in the original decision to leave out flight profile, and thus may already be biased against this
as a source of error. In fact, it is possible that the same senior engineer or manager that left out the
relevant data is the one performing the review of the IRS code!
3.1.6 Adverse effects of documenting assertions
There is a line of reasoning that says, �Even if DBC/assertions would have been of minimal use, why not
include them anyway just in case they do some good?� Such statements assume there are no adverse
impacts to including assertions in code for documentation purposes. However, there are always adverse
effects to including additional code:
a. As with any other project, there are management pressures to meet cost and schedule drivers.
Additional effort, therefore, is discouraged unless justified.
b. More importantly, all projects have finite time and other resources available. Time spent on writing
and analyzing assertions is time not spent elsewhere. If the work that is not performed as a result would
have been more valuable (in terms of its effect on the safety and integrity of the system) than the time
spent with assertions, then the resulting system may be less safe.
There is a growing consensus in the safety-critical software field that simpler software tends to be safer
software [21]. With this philosophy, additional code such as assertions should only be added where there
is a clear benefit to the overall safety of the particular system being developed.
3.2 The testing effect
The Eiffel paper states:
�Assertions (preconditions and post-conditions in particular) can be automatically turned on during
testing, through a simple compiler option. The error might have been caught then.�
However, it is unlikely that, in the context of the Ariane 5 program, the use of assertions during testing
would have provided much benefit. Even the Eiffel paper itself tends to be less enthusiastic about the
value of testing [22] vs. the manual reviews described earlier. (Note that this is not to say that, with the
appropriate changes in the Ariane process, a reasonable test could not have been conducted; see section
4.2.)
3.2.1 Ariane V testing limitations
There were two key problems in the Ariane 5 test program:
a. As noted earlier, the Ariane 5 flight profile was not included in the IRS system specification.
Therefore, the IRS contractor did not have the proper data with which to perform a test. Testing with the
Ariane 4 flight data would not have violated the assertion [23], and thus the error would not have been
detected.
b. Integrated system testing at the prime contractor, using the proper test environment and data, would
have been able to detect the error. However, the prime contractor decided against performing these tests,
preferring instead to assume that their experience with this IRS on Ariane 4 was sufficient, along with the
vendor�s qualification tests, to qualify the IRS in the total system
Thus, without other changes in the Ariane test philosophy (which are not called for in the Eiffel paper),
the assertions would not have been tested in the proper conditions and thus would not have detected the
fault.
3.2.2 Adverse effects of testing with assertions
Assume for a moment that the proper testing environment and data had been available. Putting aside for
the moment the question as to whether assertions would have been necessary to detect the fault (see
section 4.2), are there any disadvantages to using assertions during testing, then disabling them for the
operational system? In the author�s experience, there are some concerns about using this approach for
safety-critical systems:
a. The addition of code at the object level obviously affects the time it takes for an object to complete
execution. Particularly for real-time systems (such as the Ariane IRS), differences in timing between the
system being tested and the operational system may cause timing faults, such as race conditions or
deadlock, to go undetected. Such timing faults are serious failures in real-time systems, and a test which is
hindered from detected them loses some of its effectiveness.
b. In addition, the differences in object code between the tested and operational systems raise the issue of
errors in the object code for the operational system. Such errors are most likely to occur due to an error in
the compilation environment, although it is possible that other factors, such as human error (e.g.
specifying the wrong version of a file when the code is recompiled) can be involved. For example, the
author has documented cases where Ada compilers generate the correct code when exceptions are not
suppressed, but generate incorrect code (beyond the language�s definition of �erroneous�) when they are
suppressed. This is not entirely unexpected; given the number of user-selectable options present with most
compilation environments, it is difficult if not impossible to perform adequate testing over all
combinations of options. Nothing in the Eiffel paper indicates that Eiffel compilers are any better (or
worse) than other compilers in this area. Although this is a fault of the implementation, not the language
or methodology, it is nonetheless a practical limitation for safety-critical systems, where one object code
error can have devastating results.
One possible approach to resolving this issue is to completely test the system twice; once with assertions
on and another time with assertions suppressed. However, the adverse factors described in section 3.1.6
then come into play: By adding to the test time in this manner, other useful test techniques (which might
have greater value) are not performed. Generally, it is difficult to completely test such systems once, never
mind twice! This effect is worse for safety-critical systems that perform object-code branch coverage
testing, since this testing is completely invalidated when the object code changes [25].
Overall, there are significant limitations to using this technique for safety-critical systems, and in
particular real-time systems such as the Ariane 5 IRS.
3.3 The execution effect
The final option is to not disable the assertions after testing, but to leave them on during execution. The
Eiffel paper describes this option as follows:
�Assertions can remain turned on during execution, triggering an exception if violated. Given the
performance constraints on such a mission, however, this would probably not have been the case.�
There are three basic objections to this approach in safety-critical systems, and in particular real-time
systems:
a. As Meyer notes, throughput is an issue. This was particularly true of the Ariane IRS [26].
b. As described in section 3.1.6, an argument can be made that adding code adds the risk of introducing
the very design errors one is attempting to protect against.
c. Most importantly, nothing in the Eiffel paper discusses the very difficult issue of the proper way to
react if the assertion is violated. Although the Eiffel paper implies that no handlers were within the system
[27], the inquiry makes it clear that an exception hander was properly invoked per the system
specification, but that the system specification was based on an incorrect view of the nature of the fault
and so the handler took inappropriate actions [28]. It is the author�s experience that, particularly for real-
time systems that can neither turn to a human operator nor shut down in the face of faults, that writing the
proper actions in such an exception handler can be very difficult. The inquiry makes recommendations
about how to handle such a fault [29], although their solution could have obscured the random faults with
which the original IRS design was overly-concerned. It should also be noted that turning to a human
operator is not always a safe option, as in the recent case of the corrupted Internet address database [30].
Thus, the Eiffel paper is correct in dismissing this approach as a valid option for the Ariane 5 case.
4. Was DBC/Eiffel necessary to avoid the failure?
Assuming that the reader concurs that a compelling case for DBC/Eiffel is not made in the context of the
Ariane 5 incident, should the conclusion be drawn that nothing could be done? Not at all! For example,
the inquiry makes it clear that, even with no reviews and with the code exactly as implemented, it was
quite reasonable to perform an integrated system test that would have clearly exposed the problem [24]. It
is the author�s experience, as confirmed by the inquiry, that such integrated system testing should be a
mandatory part of any safety-critical qualification. Although the Eiffel paper is quite correct that testing is
not a panacea [22], in this particular case there is high confidence that the problem would have been
caught:
a. The fault occurred in the normal processing mode, within the normal flight profile, without any
previous unusual events (e.g. other failures) preceding it. Thus, it is very likely that this scenario would
have been tested.
b. The response of the system to the test case (complete shutdown) is sufficiently obvious to be observed
by test personnel.
In addition to the testing effect, the documentation effect could have been addressed without the
introduction of DBC/Eiffel. As described earlier, the inquiry indicates the use of �justification documents�
to describe the necessary review information [17]. Also, the Ada-Based Design Approach for Real-Time
Systems is an example of a competing methodology that also incorporates the specification of assertions
(and more generally assumptions), although comments are used rather than executable code [31]. As to
language, the Eiffel paper itself notes that the IRS implementation language, Ada, would have been
adequate for handling the Ariane case [32].
Thus, it can be concluded that, for this particular case alone, DBC/Eiffel was not necessary. This does not
mean that DBC/Eiffel is unnecessary for detection of any software fault, of course, just the fault described
in the Eiffel paper.
5. Other issues
There are other aspects of the Eiffel paper that, although not related directly to the core argument, distract
from its effectiveness:
a. Lack of peer review. The Eiffel paper would have been much stronger if it could have cited evidence
that the inquiry board, the Ariane developers, or others in the general safety-critical community had read
the paper and agreed with its findings. It should be noted that, since the publication of the paper,
comments the author has read from persons identified as familiar with the IRS domain have not been
positive [33].
b. Lack of empirical data. There does not appear to be any evidence that DBC/Eiffel has been
successfully used in a system like the Ariane IRS, much less that it produced significantly improved safety
or reliability. By itself, this is not particularly damaging: All new ideas have to have a first user. However,
the approach used in the development of Ada, where the language team funded the development of
prototypes in relevant domains to test it�s effectiveness, should be considered by the Eiffel team. More to
the point, no relevant evidence from any other domain is presented in the Eiffel paper. There should be
some results from controlled studies showing the advantage of DBC/Eiffel over the methodology and tools
used for the Ariane project, even if not related to the Ariane domain. In particular, there must be some
data with respect to the reliability of the Eiffel compilation system, at a minimum. No data of this type is
used to support the Eiffel paper�s assertions.
c. Appeals to authority. Quoting Demarco (particularly when his assessment that �almost no other
defense would have been likely to catch� the problem is contradicted by the inquiry) gives the Eiffel paper
a �sales� feel, rather than a scholarly work. The quoting of one of the Eiffel paper�s authors (Meyer) in
support of his own paper seems particularly odd. Again, quotes from a relevant source would have been
more powerful, in the author�s opinion.
6. Non-issues
Elements of this criticism have been aired in various Internet forums, and at times objections have been
raised that cannot be dealt with in the context of the previous sections. These objections are addressed
here.
a. "Casuistry." It would not be appropriate to use the criticisms here to say in the general case that
assertions have no value, anywhere (�casuistry�), but this criticism does not attempt to do this. It focuses
on the specific claim in the Eiffel paper in the context of the Ariane IRS fault.
b. Utility of DBC/Eiffel in other domains. Several Eiffel advocates will attest that they like the use of
Eiffel for their domain Eiffel may be very useful in some domains, however Ariane is a real-time
embedded safety-critical system and has unique properties (as described above). Again, this is a specific,
not a general, criticism of DBC/Eiffel.
c. Ada vs. Eiffel. Some Eiffel advocates have attempted to cast these arguments as a �language war.�
They note that many of these criticisms apply equally well to Ada as Eiffel. They are correct. The author
has, in fact, publicly criticized the use of Ada exceptions on the same grounds as Eiffel assertions, within
the same context (real-time safety-critical systems). However, the statement �Ada has these problems, too�
does not in any way contradict the arguments against the use of Eiffel assertions in the Ariane context. If
nothing else, such statements indicate that the criticisms are valid for Eiffel.
7. Conclusions
In the specific case of the Ariane IRS design fault, there is not clear and compelling evidence that
DBC/Eiffel assertions were likely to have uncovered the fault prior to operational use, either through their
documentation, test, or execution value. Furthermore, alternative means were available to the Ariane team
to isolate the particular fault, even without the use of DBC/Eiffel. Therefore, although there may be a
compelling claim to use DBC/Eiffel in real-time safety-critical systems, the Ariane case (and the Eiffel
paper describing this case) does not support such a claim.
ABOUT THE AUTHOR
Ken Garlington has been developing software for safety-critical real-time flight control systems since
1984, including the F-16, F-16 Ada flight controls ground demonstrator, F-111, A-12, YF-22A, F-22, and
Joint Strike Fighter systems. More than one of these projects has included the integration of an inertial
reference system with the flight controls system, as was done for Ariane.
The opinions in this paper are those of the author, and should not be construed as an official position of
his employer, Lockheed Martin.
FOOTNOTES
[1] The public version of the Ariane 5 inquiry report is located on the World Wide Web at
http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html . Subsequent footnotes will reference
this report as �(Inquiry)�.
[2] The version of this paper being critiqued is located on the World Wide Web at
http://www.eiffel.com/doc/manuals/technology/contract/ariane/index.html . Subsequent footnotes will
reference this report as �(Eiffel)�.
[3] <Get DejaNews on claims of using DBC without Eiffel.>
[4] (Eiffel) See the two paragraphs that begin �From CORBA to C++...�, particularly the reference to
�Eiffel-like assertions.�
[5] <Get DejaNews, particularly Meyer statement that �only Eiffel and Sather� can be used effectively
with DBC.>
[6] (Inquiry) �The reason for the three remaining variables, including the one denoting horizontal bias,
being unprotected was that further reasoning indicated that they were either physically limited or that
there was a large margin of safety, a reasoning which in the case of the variable BH turned out to be
faulty. It is important to note that the decision to protect certain variables but not others was taken jointly
by project partners at several contractual levels.�
[7] <Get DejaNews claims on documenting �all possible� assertions.>
[8] See, for example, the work of Levison and Knight on common-mode errors in N-version programs.
[9] (Inquiry) �The internal SRI software exception was caused during execution of a data conversion from
64-bit floating point to 16-bit signed integer value. The floating point number which was converted had a
value greater than what could be represented by a 16-bit signed integer.� The paper does not provide
further explanation as to the reason for this conversion. However, such a conversion is commonly used to
convert internally calculated values to a format suitable for output on a 16-bit data bus. Given the use of
horizontal bias as a measurement of the error in the horizontal measurement, this would seem to be a
likely value to output to other systems.
[10] (Inquiry) �The Operand Error occurred due to an unexpected high value of an internal alignment
function result called BH, Horizontal Bias, related to the horizontal velocity sensed by the platform. This
value is calculated as an indicator for alignment precision over time.�
[11] See, for example, the US Department of Defense software standards DoD-STD-2167 and DoD-STD-
2168, which were used in the same time-frame as the original Ariane software development.
[12] (Inquiry) �Ariane 5 has a high initial acceleration and a trajectory which leads to a build-up of
horizontal velocity which is five times more rapid than for Ariane 4. The higher horizontal velocity of
Ariane 5 generated, within the 40-second time-frame, the excessive value [BH] which caused the inertial
system computers to cease operation.� It is important to note from this statement that BH is computed
from the measured horizontal velocities; it is not itself a measured value.
[13] The inquiry refers to the prime contractor as the �Industrial Architecture�, and does not refer to any
of the parties by name. However, Aviation Week and Space Technology ran articles contemporaneously
with the incident, and (if memory serves) identified the IRS supplier as Sextant and the prime contractor
as Arianespace, along with a description of their respective roles.
[14] (Inquiry) �A more transparent organisation of the cooperation among the partners in the Ariane 5
programme must be considered. Close engineering cooperation, with clear cut authority and
responsibility, is needed to achieve system coherence, with simple and clear interfaces between partners.�
[15] (Inquiry) �the overriding means of preventing failures are the reviews which are an integral part of
the design and qualification process, and which are carried out at all levels and involve all major partners
in the project (as well as external experts).... Solutions to potential problems in the on-board computer
software, paying particular attention to on-board computer switch over, shall be proposed by the project
team and reviewed by a group of external experts.�
[16] (Inquiry) �...three of the variables were left unprotected. No reference to justification of this decision
was found directly in the source code. Given the large amount of documentation associated with any
industrial application, the assumption, although agreed, was essentially obscured, though not deliberately,
from any external review.�
[17] (Inquiry) �Give the justification documents the same attention as code. Improve the technique for
keeping code and its justifications consistent.... The Board has also noted that the systems specification
[i.e. the Software Requirements Specification document] of the SRI does not indicate operational
restrictions that emerge from the chosen implementation. Such a declaration of limitation, which should
be mandatory for every mission-critical device, would have served to identify any non-compliance with the
trajectory of Ariane 5.... Include external (to the project) participants when reviewing specifications, code
and justification documents.�
[18] (Inquiry) �An underlying theme in the development of Ariane 5 is the bias towards the mitigation of
random failure. The supplier of the SRI was only following the specification given to it, which stipulated
that in the event of any detected exception the processor was to be stopped. The exception which occurred
was not due to random failure but a design error. The exception was detected, but inappropriately handled
because the view had been taken that software should be considered correct until it is shown to be at fault.
The Board has reason to believe that this view is also accepted in other areas of Ariane 5 software design.�
[19] (Inquiry) �In a programme of this size, literally thousands of problems and potential failures are
successfully handled in the review process and it is obviously not easy to detect software design errors of
the type which were the primary technical cause of the 501 failure.�
[20] (Inquiry) �There is no evidence that any trajectory data were used to analyse the behaviour of the
unprotected variables, and it is even more important to note that it was jointly agreed not to include the
Ariane 5 trajectory data in the SRI requirements and specification.�
[21] See, for example, Levison�s excellent book �Safeware�. The US Air Force software safety handbook
AFISC SSH 1-1 has a similar philosophy. Various integrity models (e.g. Musa) also relate the probability
of residual errors to the size of the software.
[22] (Eiffel) �But if one can test more one cannot test all. Testing, we all know, can show the presence of
errors, not their absence.�
[23] (Inquiry) �In Ariane 4 flights using the same type of inertial reference system there has been no such
failure because the trajectory during the first 40 seconds of flight is such that the particular variable
related to horizontal velocity cannot reach, with an adequate operational margin, a value beyond the limit
present in the software.�
[24] (Inquiry) See the several paragraphs that begin: �In order to understand the explanations given for
the decision not to have the SRIs in the closed-loop simulation...� It is also interesting to note that �Post-
flight simulations have been carried out on a computer with software of the inertial reference system and
with a simulated environment, including the actual trajectory data from the Ariane 501 flight. These
simulations have faithfully reproduced the chain of events leading to the failure of the inertial reference
systems.�
[25] See, for example, the US Federal Aviation Administrations software process DO-178B, or AFISC
SSH 1-1. Beizer provides justification for the utility of structural testing.
[26] (Inquiry) �It has been stated to the Board that not all the conversions were protected because a
maximum workload target of 80% had been set for the SRI computer.�
[27] (Eiffel) �...in the Ariane 5 flight it caused an exception, which was not caught... There was no
explicit exception handler to catch the exception, so it followed the usual fate of uncaught exceptions and
crashed the entire software, hence the on-board computers, hence the mission.�
[28] (Inquiry) �The specification of the exception-handling mechanism also contributed to the failure. In
the event of any kind of exception, the system specification stated that: the failure should be indicated on
the databus, the failure context should be stored in an EEPROM memory (which was recovered and read
out for Ariane 501), and finally, the SRI processor should be shut down.... From this point of view
exception - or error - handling mechanisms are designed for a random hardware failure which can quite
rationally be handled by a backup system.... An underlying theme in the development of Ariane 5 is the
bias towards the mitigation of random failure. The supplier of the SRI was only following the specification
given to it, which stipulated that in the event of any detected exception the processor was to be stopped.
The exception which occurred was not due to random failure but a design error. The exception was
detected, but inappropriately handled because the view had been taken that software should be
considered correct until it is shown to be at fault.�
[29] (Inquiry) �For example the computers within the SRIs could have continued to provide their best
estimates of the required attitude information.� The inquiry does not state the algorithm to use to do this.
[30] This story was covered in several newspapers as well as in the RISKS journal (either July 18 or July
19, 1997). A program which failed to properly convert an Ingress database generated alarms to the
operator, who for some reason failed to see them. The resulting corrupted database causes widespread
Internet disruptions for several hours.
[31] See the Software Productivity Consortium web site at http://www.software.org for more information
on ADARTS.
[32] (Eiffel) �Although one may criticize the Ada exception mechanism, it could have been used here to
catch the exception.�
[33] <Get negative comments from DejaNews>
--------------1EB2F48589C--
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: *LONG* DRAFT - Critique of "The lessons of Ariane"
1997-07-20 0:00 *LONG* DRAFT - Critique of "The lessons of Ariane" Ken Garlington
@ 1997-07-26 0:00 ` Phlip C Plumlee
1997-07-27 0:00 ` Nasser
1997-07-31 0:00 ` Ken Garlington
0 siblings, 2 replies; 6+ messages in thread
From: Phlip C Plumlee @ 1997-07-26 0:00 UTC (permalink / raw)
To: kennieg
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 901 bytes --]
Ken Garlington wrote:
> I�m releasing a draft of this paper for people to review prior to
> sending it on to the Eiffel web site.
Your paper is quite comprehensive. And like a trial lawyer's rebuttal,
you only reply to issues raised during the argument phase.
Since you seem to be interested in collecting more Ariane V threads
before publishing, please take a look at a very brief essay on my web
page on the subject. I wrote it to direct student programmers to read,
not has a high-level treatise to be taken seriously. But...
My essay is the only one I have read so far that ignores DBC and testing
issues. It proposes a much simpler software design goal that might have
prevented the crash and made DBC a moot point. Please take the time to
read it, and you'll see what I mean.
-- Phlip
====== http://users.deltanet.com/~tegan/home.html ======
-- It's a small web, after all --
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: *LONG* DRAFT - Critique of "The lessons of Ariane"
1997-07-26 0:00 ` Phlip C Plumlee
@ 1997-07-27 0:00 ` Nasser
1997-07-31 0:00 ` Ken Garlington
1 sibling, 0 replies; 6+ messages in thread
From: Nasser @ 1997-07-27 0:00 UTC (permalink / raw)
In article <33DAA372.5C1CAADD@deltanet.com>, Phlip says...
>My essay is the only one I have read so far that ignores DBC and testing
>issues. It proposes a much simpler software design goal that might have
>prevented the crash and made DBC a moot point. Please take the time to
>read it, and you'll see what I mean.
>
> -- Phlip
>====== http://users.deltanet.com/~tegan/home.html ======
thanks for the pointer to your web page for your article on Ariane. But
you have some figure of a head on top of the page (you wrote it in java
maybe?) that keeps spining all the time, I got very dizzy reading the page
and trying to avoid looking at the turning head at the same time. Can
you please stop that head turning so much? or may be make it spin slower
to make it easier to read your web page?
thanks !
Nassser
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: *LONG* DRAFT - Critique of "The lessons of Ariane"
1997-07-26 0:00 ` Phlip C Plumlee
1997-07-27 0:00 ` Nasser
@ 1997-07-31 0:00 ` Ken Garlington
1997-08-01 0:00 ` W. Wesley Groleau x4923
1997-08-01 0:00 ` W. Wesley Groleau x4923
1 sibling, 2 replies; 6+ messages in thread
From: Ken Garlington @ 1997-07-31 0:00 UTC (permalink / raw)
Phlip C Plumlee wrote:
>
> My essay is the only one I have read so far that ignores DBC and testing
> issues. It proposes a much simpler software design goal that might have
> prevented the crash and made DBC a moot point. Please take the time to
> read it, and you'll see what I mean.
>
> -- Phlip
> ====== http://users.deltanet.com/~tegan/home.html ======
Here's why the typecast occured:
Inside the IRS, many of the calculations need to be done at a fairly
high precision. Lower precision calculations cause the "drift" rate
(the rate at which the IRS measurements start to disagree with the
"real" system state) to be unacceptably large.
However, some of the users (and Flight Controls is typically in this
group) don't need all of that precision. So, the high-precision floating
point value is converted to a fixed-point value for transmission
purposes.
Why not just send the extra precision as a floating-point value
anyway? There are several potential reasons; here are a few common
ones...
1. Floating-point formats vary between processor types. For example,
the MIL-STD-1750 processor (very popular for spaceborne operations,
because of its ability to withstand adverse effects from solar
radiation) does not use the IEEE format popular with other types of
processors. Converting between the two formats is time-consuming
and error-prone. The only real problem with integer formats is
little-endian-ness.
2. Certain devices (microsequencers, digital-to-analog converters)
don't support ANY type of floating-point format. In these cases,
you have to comply with the interface to these devices. (Your
note about "We are not talking about 4-bit Zilog chips here" is
interesting, because in fact we still have operational systems
using 4-bit and 8-bit Zilogs! Generally, high-speed CPUs are
different to use in airborne control systems due to power
utilization, thermal properties, board space, weight requirements,
cost, rad-hardness, and design maturity.)
3. For busses organized around 16-word message groups (such as the
MIL-STD-1553), there can be a problem sending values that don't
fit in a single word. Some hardware devices that deal with these
protocols can permit the software to read one word that was sent
in the current message block, and another word that was sent in
the previous message block. If this were to happen in the middle
of a floating-point value, then the value would be invalid. There
are several ways to avoid this, but keeping values limited to
a single word helps.
Usually, most of the typecasting in these systems is at the I/O
boundary, to match the interface of some other piece of equipment.
Scaling is a common requirement for that interface. Believe me,
no one does scaling for the fun of it! However, there are ways
to avoid the potential for an exception, such as saturation
arithmetic. Unfortunately, without explicit hardware support,
saturation arithmetic can add time. When a large amount of I/O
is being processed, this time can be significant. So, it is not
unusual to do what the Ariane 5 team did: analyze the system to
try to apply protection only where needed. Unfortunately, their
analysis was unable to take into account the future reuse, and
the system was not re-analyzed (or re-tested!) for the new
environment.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: *LONG* DRAFT - Critique of "The lessons of Ariane"
1997-07-31 0:00 ` Ken Garlington
@ 1997-08-01 0:00 ` W. Wesley Groleau x4923
1997-08-01 0:00 ` W. Wesley Groleau x4923
1 sibling, 0 replies; 6+ messages in thread
From: W. Wesley Groleau x4923 @ 1997-08-01 0:00 UTC (permalink / raw)
> saturation arithmetic can add time. When a large amount of I/O
> is being processed, this time can be significant. So, it is not
> unusual to do what the Ariane 5 team did: analyze the system to
> try to apply protection only where needed. Unfortunately, their
> analysis was unable to take into account the future reuse, and
> the system was not re-analyzed (or re-tested!) for the new
> environment.
Lest anyone get confused, read "what the Ariane 4 team did"
Ariane 5 was the "future re-use" that was "not re-analyzed" adequately.
--
----------------------------------------------------------------------
Wes Groleau, Hughes Defense Communications, Fort Wayne, IN USA
Senior Software Engineer - AFATDS Tool-smith Wanna-be
Don't send advertisements to this domain unless asked! All disk space
on fw.hac.com hosts belongs to either Hughes Defense Communications or
the United States government. Using email to store YOUR advertising
on them is trespassing!
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: *LONG* DRAFT - Critique of "The lessons of Ariane"
1997-07-31 0:00 ` Ken Garlington
1997-08-01 0:00 ` W. Wesley Groleau x4923
@ 1997-08-01 0:00 ` W. Wesley Groleau x4923
1 sibling, 0 replies; 6+ messages in thread
From: W. Wesley Groleau x4923 @ 1997-08-01 0:00 UTC (permalink / raw)
> > My essay is the only one I have read so far that ignores DBC and
> > testing issues. It proposes a much simpler software design goal that
> > might have prevented the crash and made DBC a moot point. Please
> > take the time to read it, and you'll see what I mean.
> >
> > ====== http://users.deltanet.com/~tegan/home.html ======
This "essay" rants against type casting, giving several examples.
Yes, type casting is dangerous, especially when it's "transparent"
(i.e., not obvious in the source code, as in C's promotions).
However, in all of his examples, the real problem was not the type
casting. The real problem was what made the cast necessary, i.e.,
not declaring things of the appropriate type in the first place!
--
----------------------------------------------------------------------
Wes Groleau, Hughes Defense Communications, Fort Wayne, IN USA
Senior Software Engineer - AFATDS Tool-smith Wanna-be
Don't send advertisements to this domain unless asked! All disk space
on fw.hac.com hosts belongs to either Hughes Defense Communications or
the United States government. Using email to store YOUR advertising
on them is trespassing!
----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~1997-08-01 0:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-07-20 0:00 *LONG* DRAFT - Critique of "The lessons of Ariane" Ken Garlington
1997-07-26 0:00 ` Phlip C Plumlee
1997-07-27 0:00 ` Nasser
1997-07-31 0:00 ` Ken Garlington
1997-08-01 0:00 ` W. Wesley Groleau x4923
1997-08-01 0:00 ` W. Wesley Groleau x4923
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox