comp.lang.ada
 help / color / mirror / Atom feed
* Ariane5 FAQ, Professional version, second draft (perhaps final)
@ 2003-08-11 23:55 Alexandre E. Kopilovitch
  0 siblings, 0 replies; only message in thread
From: Alexandre E. Kopilovitch @ 2003-08-11 23:55 UTC (permalink / raw)
  To: comp.lang.ada

Here is the second draft of that Professional version of the FAQ. Two new Q-A
pairs are added. Also, numbers are assigned to all Q's and A's in this version.

For now I don't have any other info waiting for inclusion in this Professional
version. So, if there will be no consistent objections or suggestions then I'll
consider this Professional version of the FAQ as completed.

----------------------------------------------------------------------------

Q-1. Can you explain in several words what was the actual cause of the Ariane 5
launch failure in 1996, technically?

A-1. There are several points which are different for Ariane 5 vs. Ariane 4,
one of which was instrumental to the events: Ariane 4 is a vertical launch
vehicle where as Ariane 5 is slightly tilted.
  Ariane 4 software was developed to tolerate certain amount of inclination
but not as much as required by Ariane 5. The chain of events were as follows:

- The on-board software detects that one of the accelerometers is out of range,
this was interpreted as hardware error and caused the backup processor to take
over;
- The backup processor also detects that one of the accelerometers is out of
range, which caused the system to advice an auto destruction.

Q-2. At which levels and in which parts of the Ariane 5 development project
the critical errors (that caused the launch failure) were made?

A-2. There was a compound, 3-stage construction of the failure; all 3 component
errors were made at the top level of the project, within Arianespace.

The first error-stage was improper reuse of software.

The second and third error-stages ordered sized down verification:

- the second error-stage excluded from the rocket's testing procedure one
subsystem -- Inertial Reference System device, replacing it by a simulator,

- the third error-stage excluded one part of the device's software from the
simulator development contract, and refused the simulator's developers from
the device's documentation (giving them the device's software source code only). 

Q-3. Can you describe this development project failure in general terms of
large-scale system engineering?

A-3. The failure was in the process that Arianespace set up, not in the work
of any contractor, and certainly not in the work of any employee of those
contractors. The process that Arianespace set up delegated requirements
to individual subcontracts, which is fine. But there was neither process for
checking that changes in the subcontracts did not result in failure to test
some requirements, nor a final pre-launch validation that all requirements
had been tested.

The scope of one of the subcontracts was reduced, and as a result
certain tests that were part of the original test plan did not get
performed. However, Arianespace's project management process equated
completion of all subcontracts with completion of all testing.

Q-4. But certainly there were engineers, who can see possible consequences
of that approach. So why they weren't alarmed enough?

A-4. This is difficult question indeed. An explanation exists, which tells that
the informational paths within the project were interspersed with those 
managers of non-engineering kind, and because of that no one of the engineers
can obtain enough information for recognition of the danger. In particular,
no one of the engineers was in position to compare requirements for Ariane 4
with trajectory data for Ariane 5.

A contributing factor was the specifics of communications and crossings of
responsibilities, which often manifests itself within international projects.
Here is an insider's view on that specifics:

"As with many international projects, some of the information is eyes only.
This is sometimes a burden for engineers that write the software, since they
have to rely on good will and reliable deliveries of sub-components.
As you can imagine, Ariane is a fairly complex system which relies on many
"sub-systems"; now imagine that all those subsystems come from a different
supplier. The integration of all of them is a very large and complex project
on is own."

Q-5. Did the Arianespace learned the lesson?

A-5. It seems, not enough, for now. Several subsequent Ariane 5 failures
followed essentially the same or similar error pattern. (Only significant
difference from the first failure is that the subsequent failures weren't
related to software -- probably because all the Ariane 5 software was reviewed
after the first crash.)

For example, consider the point of the second Ariane 5 failure investigation.
Diffferent launch, different subsystem, very different failure mode. But the
thing both failures had in common was systems reused from Ariane 4 without
checking that they met the new requirements. The failure didn't get nearly
the press that the first one did, but the result was the same, a launch
failure (http://spaceflightnow.com/ariane/v142/010713followup.html and
http://www.arianespace.com/site/news/03_06_19_release_index.html).

There was also a fourth Ariane 5 failure (out of 14 tries) on flight 157
(http://www.esa.int/export/esaCP/ESA7198708D_index_0.html). This was due to
failure of the cooling of the Vulcain 2 engine, new to the Ariane 5 ECA.
Although this failure had nothing to do with Ariane 4 reuse, what do we find
under contributing factors?  "non-exhaustive definition of the loads to which
the Vulcain 2 engine is subjected during flight" -- another requirements
definition failure. The first three launch failures were all due to the
failure of change mananagement and requirements tracking during the original
Ariane 5 development. But this latest failure involves a design subsequent
to the first two Ariane 5 failures.

Q-6. What was a probable error pattern in reasoning, which paved the way to
the failure? What precautions can be made against it?

A-6. Generally, reasoning is a series of steps, and in every step we have
assumptions and implications. It is very important for proper analysis to keep
them all separate (at each step). But it is quite customary (both in individual's
internal reasoning and within a discussion) to conjugate one or two of assumptions
with an implication. In our case it well may be something like that:

"Before takeoff the Ariane 5 and Ariane 4 look identical for the device.
 As the preparation phase for the device is executed before takeoff only,
 it may be safely excluded from the simulation."

while the proper expression would be the following:

"Before takeoff the Ariane 5 and Ariane 4 look indentical for the device.
 The preparation phase for the device is executed before takeoff only,
 So, the preparation phase may be safely excluded from the simulation."

This is a subtle difference, but there is substantially more chances to
recognize the error in second variant than in the first one (which is about
the difference between the Ariane 5 and Ariane 4 in this respect) -- just
because in the second presentation the erroneous assumption is separated.
The cause of this distinction is that our mind is less stressed when processes
one separate statement at a time, and therefore can provide more curiosity and
doubt about it; but facing conjugated statements it has less free resources for
that "extra" work.

So, avoid conjugation in reasoning during analysis -- separate all assumptions
from each other and from implications. That will greatly assist you and your
colleagues in recognition of subtle errors. Similarly, ask for that separation
when you are listener or reader.

Q-7. Is that failure somehow extraordinary from the general engineering
viewpoint?

A-7. No. The history, and even modern history of general engineering is full of
similar (from the general engineering viewpoint) stories.

For example, a similar generation of mistakes happened in Allied military
aircraft during WWII.  There was a period in 1942 when the 'solution' to
all combat aircraft problems was to modify the engines to provide more
horsepower.  Most of 1943 was spent fixing the problems caused by the
bigger engines.  The net result was better aircraft, but it was very
expensive in lives of pilots, many of them in training. (For a particular
example you may look into a book "Fork-tailed devil: the P-38" by Martin
Caidin, in the chapter about another airplane the P-47 Thunderbolt, and
about the differences made by replacing the propeller. They had improved
the engine to provide more horsepower, without changing the propeller to
match.)

Generally, as designs scale, second or third order effects that are
inconsequential in a "prototype" model/environment can suddenly become very
significant when the scale changes to new model/environment. For some good
examples of scaling failures made by very competent engineers who made lapses
in judgment, you may look into a book "Design Paradigms: Case Histories of
Error and Judgement in Engineering" by Henry Petrovski,

Note also, that scaling failures may happen when you go down scale as well.
For example look at the integrated circuits. As the transistor geometry shrinks,
the device characteristics change, sometimes dramatically. Quantum effects
that were only theoretical problems 10 years ago are now becoming significant.
A circuit that worked in an earlier version of some chip at 1.8 microns is
now failing at 1.3 microns.

Q-8. Where can I find official report for the investigation of the Ariane 5
crash?

A-8. At the moment of writing this FAQ this report was, for example. at:
 http://www.dcs.ed.ac.uk/home/pxs/Book/ariane5rep.html
But read it to the end, because your overall impression will probably be
different (and wrong) if you stop in the middle of it, deciding that you
got it all clear enough.

Q-9. Where this topic was discussed in depth?

A-9. For example, in comp.lang.ada newsgroup (several times). Search that
newsgroup for "Ariane 5", and you'll find several threads discussing this
topic (most recent at the moment of writing this FAQ was quite long thread
with subject line "Boeing and Dreamliner"; during the development of this FAQ
another long thread with the subject line "Ariane5 FAQ" was running).

----------------------------------------------------------------------------




^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2003-08-11 23:55 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-11 23:55 Ariane5 FAQ, Professional version, second draft (perhaps final) Alexandre E. Kopilovitch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox