From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=unavailable autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,f6c4398701362a38,start X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-08-11 16:56:23 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!bloom-beacon.mit.edu!npeer.de.kpn-eurorings.net!proxad.net!usenet-fr.net!enst.fr!not-for-mail From: "Alexandre E. Kopilovitch" Newsgroups: comp.lang.ada Subject: Ariane5 FAQ, Professional version, second draft (perhaps final) Date: Tue, 12 Aug 2003 03:55:24 +0400 (MSD) Organization: h w c employees, b f Message-ID: NNTP-Posting-Host: marvin.enst.fr Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: avanie.enst.fr 1060646182 3305 137.194.161.2 (11 Aug 2003 23:56:22 GMT) X-Complaints-To: usenet@enst.fr NNTP-Posting-Date: Mon, 11 Aug 2003 23:56:22 +0000 (UTC) To: comp.lang.ada@ada.eu.org Return-Path: X-Mailer: Mail/@ [v2.44 MSDOS] X-BeenThere: comp.lang.ada@ada.eu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: comp.lang.ada mail to news gateway List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Xref: archiver1.google.com comp.lang.ada:41328 Date: 2003-08-12T03:55:24+04:00 Here is the second draft of that Professional version of the FAQ. Two new Q-A pairs are added. Also, numbers are assigned to all Q's and A's in this version. For now I don't have any other info waiting for inclusion in this Professional version. So, if there will be no consistent objections or suggestions then I'll consider this Professional version of the FAQ as completed. ---------------------------------------------------------------------------- Q-1. Can you explain in several words what was the actual cause of the Ariane 5 launch failure in 1996, technically? A-1. There are several points which are different for Ariane 5 vs. Ariane 4, one of which was instrumental to the events: Ariane 4 is a vertical launch vehicle where as Ariane 5 is slightly tilted. Ariane 4 software was developed to tolerate certain amount of inclination but not as much as required by Ariane 5. The chain of events were as follows: - The on-board software detects that one of the accelerometers is out of range, this was interpreted as hardware error and caused the backup processor to take over; - The backup processor also detects that one of the accelerometers is out of range, which caused the system to advice an auto destruction. Q-2. At which levels and in which parts of the Ariane 5 development project the critical errors (that caused the launch failure) were made? A-2. There was a compound, 3-stage construction of the failure; all 3 component errors were made at the top level of the project, within Arianespace. The first error-stage was improper reuse of software. The second and third error-stages ordered sized down verification: - the second error-stage excluded from the rocket's testing procedure one subsystem -- Inertial Reference System device, replacing it by a simulator, - the third error-stage excluded one part of the device's software from the simulator development contract, and refused the simulator's developers from the device's documentation (giving them the device's software source code only). Q-3. Can you describe this development project failure in general terms of large-scale system engineering? A-3. The failure was in the process that Arianespace set up, not in the work of any contractor, and certainly not in the work of any employee of those contractors. The process that Arianespace set up delegated requirements to individual subcontracts, which is fine. But there was neither process for checking that changes in the subcontracts did not result in failure to test some requirements, nor a final pre-launch validation that all requirements had been tested. The scope of one of the subcontracts was reduced, and as a result certain tests that were part of the original test plan did not get performed. However, Arianespace's project management process equated completion of all subcontracts with completion of all testing. Q-4. But certainly there were engineers, who can see possible consequences of that approach. So why they weren't alarmed enough? A-4. This is difficult question indeed. An explanation exists, which tells that the informational paths within the project were interspersed with those managers of non-engineering kind, and because of that no one of the engineers can obtain enough information for recognition of the danger. In particular, no one of the engineers was in position to compare requirements for Ariane 4 with trajectory data for Ariane 5. A contributing factor was the specifics of communications and crossings of responsibilities, which often manifests itself within international projects. Here is an insider's view on that specifics: "As with many international projects, some of the information is eyes only. This is sometimes a burden for engineers that write the software, since they have to rely on good will and reliable deliveries of sub-components. As you can imagine, Ariane is a fairly complex system which relies on many "sub-systems"; now imagine that all those subsystems come from a different supplier. The integration of all of them is a very large and complex project on is own." Q-5. Did the Arianespace learned the lesson? A-5. It seems, not enough, for now. Several subsequent Ariane 5 failures followed essentially the same or similar error pattern. (Only significant difference from the first failure is that the subsequent failures weren't related to software -- probably because all the Ariane 5 software was reviewed after the first crash.) For example, consider the point of the second Ariane 5 failure investigation. Diffferent launch, different subsystem, very different failure mode. But the thing both failures had in common was systems reused from Ariane 4 without checking that they met the new requirements. The failure didn't get nearly the press that the first one did, but the result was the same, a launch failure (http://spaceflightnow.com/ariane/v142/010713followup.html and http://www.arianespace.com/site/news/03_06_19_release_index.html). There was also a fourth Ariane 5 failure (out of 14 tries) on flight 157 (http://www.esa.int/export/esaCP/ESA7198708D_index_0.html). This was due to failure of the cooling of the Vulcain 2 engine, new to the Ariane 5 ECA. Although this failure had nothing to do with Ariane 4 reuse, what do we find under contributing factors? "non-exhaustive definition of the loads to which the Vulcain 2 engine is subjected during flight" -- another requirements definition failure. The first three launch failures were all due to the failure of change mananagement and requirements tracking during the original Ariane 5 development. But this latest failure involves a design subsequent to the first two Ariane 5 failures. Q-6. What was a probable error pattern in reasoning, which paved the way to the failure? What precautions can be made against it? A-6. Generally, reasoning is a series of steps, and in every step we have assumptions and implications. It is very important for proper analysis to keep them all separate (at each step). But it is quite customary (both in individual's internal reasoning and within a discussion) to conjugate one or two of assumptions with an implication. In our case it well may be something like that: "Before takeoff the Ariane 5 and Ariane 4 look identical for the device. As the preparation phase for the device is executed before takeoff only, it may be safely excluded from the simulation." while the proper expression would be the following: "Before takeoff the Ariane 5 and Ariane 4 look indentical for the device. The preparation phase for the device is executed before takeoff only, So, the preparation phase may be safely excluded from the simulation." This is a subtle difference, but there is substantially more chances to recognize the error in second variant than in the first one (which is about the difference between the Ariane 5 and Ariane 4 in this respect) -- just because in the second presentation the erroneous assumption is separated. The cause of this distinction is that our mind is less stressed when processes one separate statement at a time, and therefore can provide more curiosity and doubt about it; but facing conjugated statements it has less free resources for that "extra" work. So, avoid conjugation in reasoning during analysis -- separate all assumptions from each other and from implications. That will greatly assist you and your colleagues in recognition of subtle errors. Similarly, ask for that separation when you are listener or reader. Q-7. Is that failure somehow extraordinary from the general engineering viewpoint? A-7. No. The history, and even modern history of general engineering is full of similar (from the general engineering viewpoint) stories. For example, a similar generation of mistakes happened in Allied military aircraft during WWII. There was a period in 1942 when the 'solution' to all combat aircraft problems was to modify the engines to provide more horsepower. Most of 1943 was spent fixing the problems caused by the bigger engines. The net result was better aircraft, but it was very expensive in lives of pilots, many of them in training. (For a particular example you may look into a book "Fork-tailed devil: the P-38" by Martin Caidin, in the chapter about another airplane the P-47 Thunderbolt, and about the differences made by replacing the propeller. They had improved the engine to provide more horsepower, without changing the propeller to match.) Generally, as designs scale, second or third order effects that are inconsequential in a "prototype" model/environment can suddenly become very significant when the scale changes to new model/environment. For some good examples of scaling failures made by very competent engineers who made lapses in judgment, you may look into a book "Design Paradigms: Case Histories of Error and Judgement in Engineering" by Henry Petrovski, Note also, that scaling failures may happen when you go down scale as well. For example look at the integrated circuits. As the transistor geometry shrinks, the device characteristics change, sometimes dramatically. Quantum effects that were only theoretical problems 10 years ago are now becoming significant. A circuit that worked in an earlier version of some chip at 1.8 microns is now failing at 1.3 microns. Q-8. Where can I find official report for the investigation of the Ariane 5 crash? A-8. At the moment of writing this FAQ this report was, for example. at: http://www.dcs.ed.ac.uk/home/pxs/Book/ariane5rep.html But read it to the end, because your overall impression will probably be different (and wrong) if you stop in the middle of it, deciding that you got it all clear enough. Q-9. Where this topic was discussed in depth? A-9. For example, in comp.lang.ada newsgroup (several times). Search that newsgroup for "Ariane 5", and you'll find several threads discussing this topic (most recent at the moment of writing this FAQ was quite long thread with subject line "Boeing and Dreamliner"; during the development of this FAQ another long thread with the subject line "Ariane5 FAQ" was running). ----------------------------------------------------------------------------