* Re: Ariane 5 failure [not found] <agrapsDy4oJH.29G@netcom.com> @ 1996-09-25 0:00 ` @@ robin 1996-09-25 0:00 ` Bob Kitzberger ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: @@ robin @ 1996-09-25 0:00 UTC (permalink / raw) agraps@netcom.com (Amara Graps) writes: >I read the following message from my co-workers that I thought was >interesting. So I'm forwarding it to here. >(begin quote) >Ariane 5 failure was attributed to a faulty DOUBLE -> INT conversion >(as the proximate cause) in some ADA code in the inertial guidance >system. Diagnostic error messages from the (faulty) inertial guidance >system software were interpreted by the steering system as valid data. >English text of the inquiry board's findings is at > http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html >(end quote) >Amara Graps email: agraps@netcom.com >Computational Physics vita: finger agraps@best.com THere's a little more to it . . . The unchecked data conversion in the Ada program resulted in the shutdown of the computer. The backup computer had already shut down a whisker of a second before, Consequently, the on-board computer was unable to switch to the backup, and used the error codes from the shutdown computer as flight data. This is not the first time that such a programming error (integer out of range) has occurred. In 1981, the manned STS-2 was preparing to take off, but because some fuel was accidentally spilt and some tiles accidentally dislodged, takeoff was delayed by a month. During that time, the astronauts decided to get in some more practice with the simulator. During a simulated descent, the 4 computing systems (the main and the 3 backups) got stuck in a loop, with the complete loss of control. The cause? An integer out of range -- the same problem as with Ariane 5, where an integer became out of range. In the STS-2 case, the precise cause was a computed GOTO with a bad index (similar to a CASE statement without an OTHERWISE clause). In both cases, the programing error could have been detected with a simple test, but in both cases, no test was included. One would have thought that having had one failure (at least) for integer out-of-range, that the implementors of the software for Ariane 5 would have been extra careful in ensuring that all data conversions were within range -- since any kind of interrupt would result in destruction of the spacecraft. There's a case for a review of the programming language used. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` Ariane 5 failure @@ robin @ 1996-09-25 0:00 ` Bob Kitzberger 1996-09-26 0:00 ` Ronald Kunne 1996-09-25 0:00 ` Michel OLAGNON 1996-09-27 0:00 ` John McCabe 2 siblings, 1 reply; 58+ messages in thread From: Bob Kitzberger @ 1996-09-25 0:00 UTC (permalink / raw) @@ robin (rav@goanna.cs.rmit.edu.au) wrote: : The cause? An integer out of range -- the same problem : as with Ariane 5, where an integer became out of range. ... : There's a case for a review of the programming language used. Why do you persist? Ada _has_ range checks built into the language. They were explicitly disabled in this case. What are you failing to grasp? -- Bob Kitzberger Rational Software Corporation rlk@rational.com http://www.rational.com http://www.rational.com/pst/products/testmate.html ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` Bob Kitzberger @ 1996-09-26 0:00 ` Ronald Kunne 1996-09-26 0:00 ` Matthew Heaney ` (3 more replies) 0 siblings, 4 replies; 58+ messages in thread From: Ronald Kunne @ 1996-09-26 0:00 UTC (permalink / raw) In article <52bm1c$gvn@rational.rational.com> rlk@rational.com (Bob Kitzberger) writes: >Ada _has_ range checks built into the language. They were explicitly >disabled in this case. The problem of constructing bug-free real-time software seems to me a trade-off between safety and speed of execution (and maybe available memory?). In other words: including tests on array boundaries might make the code saver, but also slower. Comments? Greetings, Ronald ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-26 0:00 ` Ronald Kunne @ 1996-09-26 0:00 ` Matthew Heaney 1996-09-27 0:00 ` Wayne Hayes ` (2 more replies) 1996-09-27 0:00 ` Ken Garlington ` (2 subsequent siblings) 3 siblings, 3 replies; 58+ messages in thread From: Matthew Heaney @ 1996-09-26 0:00 UTC (permalink / raw) In article <1780E8471.KUNNE@frcpn11.in2p3.fr>, KUNNE@frcpn11.in2p3.fr (Ronald Kunne) wrote: >In article <52bm1c$gvn@rational.rational.com> >rlk@rational.com (Bob Kitzberger) writes: > >>Ada _has_ range checks built into the language. They were explicitly >>disabled in this case. > >The problem of constructing bug-free real-time software seems to me >a trade-off between safety and speed of execution (and maybe available >memory?). In other words: including tests on array boundaries might >make the code saver, but also slower. > >Comments? Why, yes. If the rocket blows up, at the cost of millions of dollars, then I'm not clear what the value of "faster execution" is. The rocket's gone, so what difference does it make how fast the code executed? If you left the range checks in, your code would be *marginally* slower, but you'd still have your rocket, now wouldn't you? >Ronald Matt -------------------------------------------------------------------- Matthew Heaney Software Development Consultant mheaney@ni.net (818) 985-1271 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-26 0:00 ` Matthew Heaney @ 1996-09-27 0:00 ` Wayne Hayes 1996-09-27 0:00 ` Richard Pattis 1996-09-27 0:00 ` Ronald Kunne 1996-09-28 0:00 ` Ken Garlington 2 siblings, 1 reply; 58+ messages in thread From: Wayne Hayes @ 1996-09-27 0:00 UTC (permalink / raw) In article <mheaney-ya023180002609962252500001@news.ni.net>, Matthew Heaney <mheaney@ni.net> wrote: >Why, yes. If the rocket blows up, at the cost of millions of dollars, then >I'm not clear what the value of "faster execution" is. The rocket's gone, >so what difference does it make how fast the code executed? If you left >the range checks in, your code would be *marginally* slower, but you'd >still have your rocket, now wouldn't you? You have a moot point. In this case, catching the error wouldn't have helped. The out-of-bounds error happened in a piece of code designed for the Ariane-4, in which it was *physically impossible* for the value to overflow (the Ariane-4 didn't go that fast, and it was a velocity variable). Then the code was used, as-is, in the Ariane-5, without an analysis of how the code would react in the new hardware, which flew faster. Had the analysis been done, they wouldn't have added bounds checking, they would have modified the code to actually *work*, because they would have realized that the code was *guaranteed* to fail on the first flight. -- "And a woman needs a man... || Wayne Hayes, wayne@cs.utoronto.ca like a fish needs a bicycle..." || Astrophysics & Computer Science -- U2 (apparently quoting Gloria Steinem?) || http://www.cs.utoronto.ca/~wayne ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Wayne Hayes @ 1996-09-27 0:00 ` Richard Pattis 1996-09-29 0:00 ` Alan Brain ` (3 more replies) 0 siblings, 4 replies; 58+ messages in thread From: Richard Pattis @ 1996-09-27 0:00 UTC (permalink / raw) As an instructor in CS1/CS2, this discussion interests me. I try to talk about designing robust, reusable code, and actually have students reuse code that I have written as well as some that they (and their peers) have written. The Ariane falure adds a new view to robustness, having to do with future use of code, and mathematical proof vs "engineering" considerations.. Should a software engineer remove safety checks if he/she can prove - based on physical limitations, like a rocket not exceeding a certain speed - that they are unnecessary. Or, knowing that his/her code will be reused (in an unknown context, by someone who is not so skilled, and will probably not think to redo the proof) should such checks not be optimized out? What rule of thumb should be used to decide (e.g., what if the proof assumes the rocket speed will not exceed that of light)? Since software operates in the real world (not the world of mathematics) should mathematical proofs about code always yield to engineering rules of thumb to expect the unexpected. "In the Russian theatre, every 5 years an unloaded gun accidentally discharges and kills someone; every 20 years a broom does." What is the rule of thumb about when should mathematics be believed? As to saving SPEED by disabling the range checks: did the code not meet its speed requirements with range checks on? Only in this case would I have turned them off. Does "real time" mean fast enough or as fast as possible? To misquote Einstein, "Code should run as fast as necessary, but no faster...." since something is always traded away to increase speed. If I were to try to create a lecture on this topic, what other similar failures should I know about (beside the legendary Venus probe)? Your comments? Rich ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Richard Pattis @ 1996-09-29 0:00 ` Alan Brain 1996-09-29 0:00 ` Dann Corbit ` (2 subsequent siblings) 3 siblings, 0 replies; 58+ messages in thread From: Alan Brain @ 1996-09-29 0:00 UTC (permalink / raw) Richard Pattis wrote: > > As an instructor in CS1/CS2, this discussion interests me. I try to talk about > designing robust, reusable code.... --->8---- > The Ariane falure adds a new view to robustness, having to do with future > use of code, and mathematical proof vs "engineering" considerations.. > > Should a software engineer remove safety checks if he/she can prove - based on > physical limitations, like a rocket not exceeding a certain speed - that they > are unnecessary. Or, knowing that his/her code will be reused (in an unknown > context, by someone who is not so skilled, and will probably not think to > redo the proof) should such checks not be optimized out? What rule of thumb > should be used to decide (e.g., what if the proof assumes the rocket speed > will not exceed that of light)? Since software operates in the real world (not > the world of mathematics) should mathematical proofs about code always yield > to engineering rules of thumb to expect the unexpected. > What is the rule of thumb about when should mathematics be believed? > Firstly, I wish more there were more CS teachers like you. These are excellent Engineering questions. Secondly, answers: I tend towards the philosophy of "Leave every check in". In 12+ years of Ada programming, I've never seen Pragma Suppress All Checks make the difference between success and failure. At best it gives a 5% improvement. This means in order to debug the code quickly, it's useful to have such checks, even when not strictly neccessary. For re-use, you then often have the Ariane problem. That is, the un-neccessary checks you included coming around and biting you, as the assumptions you were making in the previous project become invalid. So.... You make sure the assumptions/consequences get put into a seperate package. A system-specific package, that will be changed when re-used. Which means that if the subsystem gets re-used a lot, the system specific stuff will eventually be re-written so as to allow for re-use easily. Example: Car's Cruise Control: MAX_SPEED : constant 200.0*MPH; Get's re-used in an airliner - change to 700.0*MPH. Then onto an SST - 2000.0*MPH. Eventually, you make it 2.98E26*MetresPerSec. Then some Bunt invents a Warp Drive, and you're wrong again. Summary: Label the constraints and assumptions, stick them as comments in the code and design notes, put them in a seperate package...and some dill will still stuff up, but that's the best you can do. And in the meantime, you allow the possibility of finding a number of errors early. ---------------------- <> <> How doth the little Crocodile | Alan & Carmel Brain| xxxxx Improve his shining tail? | Canberra Australia | xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM ---------------------- o OO*O^^^^O*OO o oo oo oo oo By pulling Maerklin Wagons, in 1/220 Scale ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Richard Pattis 1996-09-29 0:00 ` Alan Brain @ 1996-09-29 0:00 ` Dann Corbit 1996-09-29 0:00 ` Chris McKnight 1996-10-01 0:00 ` Ariane 5 failure Ken Garlington 3 siblings, 0 replies; 58+ messages in thread From: Dann Corbit @ 1996-09-29 0:00 UTC (permalink / raw) I propose a software IC metaphor for high reliability projects. (And all eventually). Currently, the software industry goes by what I call a "software schematic" metaphor. We put in components that are tested, but we do not necessarily know the performance curves. If you look at S. Moshier's code in the Cephes Library on Netlib, you will see that he offers statistical evidence that his programs are robust. So you can at least infer, on a probability basis, what the odds are of a component failing. So instead of just dropping in a resistor or a transistor, we read the little gold band, or the spec on the transistor that shows what voltages it can operate under. For simple components with, say, five bytes of input, we could exhaustively test all possible inputs and outputs. For more complicated procedures with many bytes of inputs, we could perform probability testing, and test other key values. Imagine a database like the following: TABLE: MODULES int ModuleUniqueID int ModuleCategory char*60 ModuleName char*255 ModuleDescription text ModuleCode text TestRoutineUsed bit CompletelyTested TABLE: TestResults (many result sets for one module) int TestResultUniqueID int ModuleUniqueID char*60 OperatingSystem char*60 CompilerUsed binary ResultChart text ResultDescription float ProbabilityOfFailure float RmsErrorObserved float MaxErrorObserved TABLE: KnownBugs (many known bugs for one module) int KnownBugUniqueID int ModuleUniqueID char*60 KnownBugDescription text BugDefinition text PossibleWorkAround Well, this is just a rough outline, but the value of a database like this would be obvious. This could easily be improved and expanded. (More domain tables, tables for defs of parameters to the module, etc.) If we had a tool like that, we would be using software IC's, not software schematics. -- "I speak for myself and all of the lawyers of the world" If I say something dumb, then they will have to sue themselves. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Richard Pattis 1996-09-29 0:00 ` Alan Brain 1996-09-29 0:00 ` Dann Corbit @ 1996-09-29 0:00 ` Chris McKnight 1996-09-29 0:00 ` Real-world education (was: Ariane 5 failure) Michael Feldman 1996-10-01 0:00 ` Ariane 5 failure Ken Garlington 3 siblings, 1 reply; 58+ messages in thread From: Chris McKnight @ 1996-09-29 0:00 UTC (permalink / raw) In article Hzz@beaver.cs.washington.edu, pattis@cs.washington.edu (Richard Pattis) writes: >As an instructor in CS1/CS2, this discussion interests me. I try to talk about >designing robust, reusable code, and actually have students reuse code that >I have written as well as some that they (and their peers) have written. >The Ariane falure adds a new view to robustness, having to do with future >use of code, and mathematical proof vs "engineering" considerations.. An excellent bit of teaching, IMHO. Glad to hear they're putting some more of the real world issues in the class room. >Should a software engineer remove safety checks if he/she can prove - based on >physical limitations, like a rocket not exceeding a certain speed - that they >are unnecessary. Or, knowing that his/her code will be reused (in an unknown >context, by someone who is not so skilled, and will probably not think to >redo the proof) should such checks not be optimized out? What rule of thumb >should be used to decide (e.g., what if the proof assumes the rocket speed >will not exceed that of light)? Since software operates in the real world (not >the world of mathematics) should mathematical proofs about code always yield >to engineering rules of thumb to expect the unexpected. A good question. For the most part, I'd go with engineering rules of thumb (what did you expect, I'm an engineer). As an engineer, you never know what may happen in the real world (in spite of what you may think), so I prefer error detection and predictable recovery. The key factors to consider include the likelihood and the cost of failures, and the cost of leaving in (or adding where your language doesn't already provide it) the checks. Consider these factors, likelihood and cost of failures: In a real-time embedded system, both of these factors are often high. Of the two, I think people most often get caught on misbeliefs on likelihood of failure. As an example, I've argued more than once with engineers who think that since a device is only "able" to give them a value in a certain range, they needn't check for out of range values. I've seen enough failed hardware to know that anything is possible, regardless of what the manufacturer may claim. Consider your speed of light example, what if the sensor goes bonkers and tells you that you're going faster? Your "proof" that you can't get that value falls apart then. Your point about reuse is also well made. Who knows what someone else may want to use your code for? As for cost of failure, it's usually obvious; in dollars, in lives, or both. As for cost of leaving checks in (or putting them in): IMHO, the cost is almost always insignificant. If the timing is so tight that removing checks makes the difference, it's probably time to redesign anyway. Afterall, in the real world there's always going to be fixes, new features, etc.. that need to be added later, so you'd better plan for it. Also, it's been my experience that removing checks is somewhere in the single digits on % improvement. If you're really that tight, a good optimizer can yield 10%-15% or more (actual mileage may vary of course). But again, if that makes the difference, you'd better rethink your design. So the rule of thumb I use is, unless a device is not physically capable (as opposed to theoretically capable) of giving me out of range data, I'm going to range check it. I.E. if there's 3 bits, you'd better check for 8 values regardless of the number of values you think you can get. That having been said, it's often not up to the engineer to make these decisions. Such things as political considerations, customer demands, and (more often than not) management decisions have been known to succeed in convincing me to turn checks off. As a rule, however, I fight to keep them in, at very least through development and integration. > As to saving SPEED by disabling the range checks: did the code not meet its >speed requirements with range checks on? Only in this case would I have turned >them off. Does "real time" mean fast enough or as fast as possible? To >misquote Einstein, "Code should run as fast as necessary, but no faster...." >since something is always traded away to increase speed. Precisely! And when what's being traded is safety, it's not worth it. Cheers, Chris ========================================================================= "I was gratified to be able to answer promptly. I said I don't know". -- Mark Twain ========================================================================= ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Real-world education (was: Ariane 5 failure) 1996-09-29 0:00 ` Chris McKnight @ 1996-09-29 0:00 ` Michael Feldman 0 siblings, 0 replies; 58+ messages in thread From: Michael Feldman @ 1996-09-29 0:00 UTC (permalink / raw) In article <1996Sep29.193602.17369@enterprise.rdd.lmsc.lockheed.com>, Chris McKnight <cmcknigh@hercii.lasc.lockheed.com> wrote: [Rich Pattis' good stuff snipped.] > > An excellent bit of teaching, IMHO. Glad to hear they're putting some > more of the real world issues in the class room. Rich Pattis is indeed an experienced, even gifted teacher of introductory courses, with a very practical view of what they should be about. Without diminishing Rich Pattis' teaching experience or skill one bit, I am somewhat perplexed at the unfortunate stereotypical view you seem to have of CS profs. Yours is the second post today to have shown evidence of that stereotypical view; both you and the other poster have industry addresses. This is my 22nd year as a CS prof, I travel a lot in CS education circles, and - while we, like any population, tend to hit a bell curve - I've found that there are a lot more of us out here than you may think with Pattis-like commitment to bring the real world into our teaching. Sure, there are theorists, as there are in any field, studying and teaching computing just because it's "beautiful", with little reference to real application, and there's a definite place in the teaching world for them. Indeed, exposure to their "purity" of approach is healthy for undergraduates - there is no harm at all in taking on computing - sometimes - as purely an intellectual exercise. But it's a real reach from there to an assumption that most of us are in that theoretical category. I must say that there's a definite connection between an interest in Ada and an interest in real-world software; certainly most of the Ada teachers I've met are more like Pattis than you must think. Indeed, it's probably our commitment to that "engineering" view of computing that brings us to like and teach Ada. But it's not just limited to Ada folks. I had the pleasure of participating in a SIGCSE panel last March entitled "the first year beyond language." Organized by Owen Astrachan of Duke, a C++ fan, this panel consisted of 6 teachers of first-year courses, each using a different language. Pascal, C++, Ada, Scheme, Eiffel, and (as I recall) ML were represented. The challenge Owen made to each of us was to give a 10-minute "vision statement" for first-year courses, without identifying which language we "represented." Owen revealed the languages to the audience only after the presentations were done. It was _really_ gratifying that - with no prior agreement or discussion among us - five of the six of us presented very similar visions, in the "computing as engineering" category. It doesn;t matter which language the 6th used; the important thing was that, considering the diversity of our backgrounds, teaching everywhere from small private colleges to big public universities, we were in _amazing_ agreement. The message for me in the stereotype presented above is that it's probably out of date and certainly out of touch. I urge my industry friends to get out of _their_ ivory towers, and come visit us. Find out what we're _really_ doing. I think you'll be pleasantly surprised. Especially, check out those of us who are introducing students to _Ada_ as their first, foundation language. Mike Feldman ------------------------------------------------------------------------ Michael B. Feldman - chair, SIGAda Education Working Group Professor, Dept. of Electrical Engineering and Computer Science The George Washington University - Washington, DC 20052 USA 202-994-5919 (voice) - 202-994-0227 (fax) http://www.seas.gwu.edu/faculty/mfeldman ------------------------------------------------------------------------ Pork is all that money the government gives the other guys. ------------------------------------------------------------------------ WWW: http://lglwww.epfl.ch/Ada/ or http://info.acm.org/sigada/education ------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Richard Pattis ` (2 preceding siblings ...) 1996-09-29 0:00 ` Chris McKnight @ 1996-10-01 0:00 ` Ken Garlington 3 siblings, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-10-01 0:00 UTC (permalink / raw) Richard Pattis wrote: > [snip] > If I were to try to create a lecture on this topic, what other similar > failures should I know about (beside the legendary Venus probe)? > Your comments? "Safeware" by Levison has some additional good examples about what can go wrong with software. The RISKS conference also has a lot of info on this. There was a study done several years ago by a Dr. Avezzianis (I always screw up that spelling, and I'm always too lazy to go look it up...) trying to show the worth of N-version programming. He had five teams of students write code for part of a flight control system. Each team was given the same set of control law diagrams (which are pretty detailed, as requirements go), and each team used the same sort of meticulous software engineering approach that you would expect for a safety-critical system (no formal methods, however). Each team's software was almost error-free, based on tests done using the same test data as the actual delivered flight controls. Note I said "almost". Every team made one mistake. Worse, it was the _same_ mistake. The control law diagrams were copies. The copier apparently wasn't a good one, because a comma in one of the gains ended up looking like a decimal point (or maybe it was the other way around -- I forget). Anyway, the gain was accidentally coded as 2.345 vs 2,345, or something like that. That kind of error makes a big difference! In the face of that kind of error, I've never felt that formal methods had a chance. That's not to say that formal methods can't detect a lot of different kinds of failures, but at some level some engineer has to be able to say: "That doesn't make sense..." If you want to try to find this study, I believe it was reported at a Digital Avionics Systems Conference many years ago (in San Jose?), probably around 1986. > > Rich -- LMTAS - "Our Brand Means Quality" ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-26 0:00 ` Matthew Heaney 1996-09-27 0:00 ` Wayne Hayes @ 1996-09-27 0:00 ` Ronald Kunne 1996-09-27 0:00 ` Lawrence Foard ` (2 more replies) 1996-09-28 0:00 ` Ken Garlington 2 siblings, 3 replies; 58+ messages in thread From: Ronald Kunne @ 1996-09-27 0:00 UTC (permalink / raw) In article <mheaney-ya023180002609962252500001@news.ni.net> mheaney@ni.net (Matthew Heaney) writes: >>The problem of constructing bug-free real-time software seems to me >>a trade-off between safety and speed of execution (and maybe available >>memory?). In other words: including tests on array boundaries might >>make the code saver, but also slower. >Why, yes. If the rocket blows up, at the cost of millions of dollars, then >I'm not clear what the value of "faster execution" is. The rocket's gone, >so what difference does it make how fast the code executed? If you left >the range checks in, your code would be *marginally* slower, but you'd >still have your rocket, now wouldn't you? Despite the sarcasm, I will elaborate. Suppose an array goes from 0 to 100, and the calculated index is known not to go outside this range. Why would one insist on putting the range test in, which will slow down the code? This might be a problem if the particular piece of code is heavily used, and the code executes too slowly otherwise. "Marginally slower" if it happens only once, but such checks on indices and function arguments (like squareroots), are necessary *everywhere* in code, if one is consequent. Actually, this was the case here: the code was taken from an Ariane 4 code where it was physically impossible that the index would go out of range: a test would have been a waste of time. Unfortunately this was no longer the case in the Ariane 5. Friendly greetings, Ronald Kunne ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Ronald Kunne @ 1996-09-27 0:00 ` Lawrence Foard 1996-10-04 0:00 ` @@ robin 1996-09-28 0:00 ` Ken Garlington 1996-09-29 0:00 ` Alan Brain 2 siblings, 1 reply; 58+ messages in thread From: Lawrence Foard @ 1996-09-27 0:00 UTC (permalink / raw) Ronald Kunne wrote: > > Actually, this was the case here: the code was taken from an Ariane 4 > code where it was physically impossible that the index would go out > of range: a test would have been a waste of time. > Unfortunately this was no longer the case in the Ariane 5. Actually it would still present a danger on Ariane 4. If the sensor which apparently was no longer needed during flight became defective, then you could get a value out of range. -- The virgin birth of Pythagoras via Apollo. The martyrdom of St. Socrates. The Gospel according to Iamblichus. -- Have an 18.9cents/minute 6 second billed calling card tomorrow -- http://www.vwis.com/cards.html ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Lawrence Foard @ 1996-10-04 0:00 ` @@ robin 0 siblings, 0 replies; 58+ messages in thread From: @@ robin @ 1996-10-04 0:00 UTC (permalink / raw) Lawrence Foard <entropy@vwis.com> writes: >Ronald Kunne wrote: >> Actually, this was the case here: the code was taken from an Ariane 4 >> code where it was physically impossible that the index would go out >> of range: a test would have been a waste of time. ---A test for overflow in a system that aborts if unexpected overflow occurs, is never a waste of time. Recall Murphy's Law: "If anything can go wrong, it will." Then there's Robert's Law: "Even if it can't go wrong, it will." >> Unfortunately this was no longer the case in the Ariane 5. >Actually it would still present a danger on Ariane 4. If the sensor >which apparently was no longer needed during flight became defective, >then you could get a value out of range. ---Good point Lawrence. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Ronald Kunne 1996-09-27 0:00 ` Lawrence Foard @ 1996-09-28 0:00 ` Ken Garlington 1996-09-28 0:00 ` Ken Garlington 1996-09-29 0:00 ` Alan Brain 2 siblings, 1 reply; 58+ messages in thread From: Ken Garlington @ 1996-09-28 0:00 UTC (permalink / raw) Ronald Kunne wrote: > > In article <mheaney-ya023180002609962252500001@news.ni.net> > mheaney@ni.net (Matthew Heaney) writes: > > >>The problem of constructing bug-free real-time software seems to me > >>a trade-off between safety and speed of execution (and maybe available > >>memory?). In other words: including tests on array boundaries might > >>make the code saver, but also slower. > > >Why, yes. If the rocket blows up, at the cost of millions of dollars, then > >I'm not clear what the value of "faster execution" is. The rocket's gone, > >so what difference does it make how fast the code executed? If you left > >the range checks in, your code would be *marginally* slower, but you'd > >still have your rocket, now wouldn't you? > > Despite the sarcasm, I will elaborate. > > Suppose an array goes from 0 to 100, and the calculated index is known > not to go outside this range. Why would one insist on putting the > range test in, which will slow down the code? This might be a problem > if the particular piece of code is heavily used, and the code executes > too slowly otherwise. "Marginally slower" if it happens only once, but > such checks on indices and function arguments (like squareroots), are > necessary *everywhere* in code, if one is consequent. I might agree with the conclusion, but probably not with the argument. If the array is statically typed to go from 0 to 100, and everything that indexes it is statically typed for that range or smaller, most modern Ada compilers won't generate _any_ code for the check. I still believe the more interesting issue has to do with the _consequences_ of the check. If your environment doesn't lend itself to a reasonable response to the check (quite possible in fail-operate systems inside systems that move really fast), and you have to test the checks to make sure they don't _create_ a problem, then you've got a hard decision on your hands: suppress the check (which might trigger a compiler bug or some other problems), or leave the check in (which might introduce a problem, or divert your attention away from some other problem). -- LMTAS - "Our Brand Means Quality" ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-28 0:00 ` Ken Garlington @ 1996-09-28 0:00 ` Ken Garlington 0 siblings, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-09-28 0:00 UTC (permalink / raw) From the "There's always time to test it the second time around" department... ORBITAL JUNK: The second Ariane 5 to be launched in April at the earliest will put two dummy satellites, worth less than $3 million, into orbit. The first Ariane 5 exploded in June carrying four uninsured satellites worth $500 million. (Financial Times) I wonder if the test labs at Arianespace, etc. are keeping busy... :) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Ronald Kunne 1996-09-27 0:00 ` Lawrence Foard 1996-09-28 0:00 ` Ken Garlington @ 1996-09-29 0:00 ` Alan Brain 1996-09-29 0:00 ` Robert A Duff 1996-10-01 0:00 ` Ken Garlington 2 siblings, 2 replies; 58+ messages in thread From: Alan Brain @ 1996-09-29 0:00 UTC (permalink / raw) Ronald Kunne wrote: > Suppose an array goes from 0 to 100, and the calculated index is known > not to go outside this range. Why would one insist on putting the > range test in, which will slow down the code? This might be a problem > if the particular piece of code is heavily used, and the code executes > too slowly otherwise. "Marginally slower" if it happens only once, but > such checks on indices and function arguments (like squareroots), are > necessary *everywhere* in code, if one is consequent. Why insist? 1. Suppressing all checks in Ada-83 makes about a 5% difference in execution speed, in typical real-time and avionics systems. (For example, B2 simulator, CSU-90 sonar, COSYS-200 Combat system). If your hardware budget is this tight, you'd better not have lives at risk, or a lot of money, as technical risk is appallingly high. 2. If you know the range is 0-100, and you get 101, what does this show? a) A bug in the code (99.9999....% probable). b) A hardware fault. c) A soft failure, as in a stray cosmic ray zapping a bit. d) a faulty analysis of your "can't happen" situation. As in re-use, or where your array comes from an IO channel with noise on.... Type a) and d) failures should be caught during testing. Most of them. OK, some of them. Range checking here is a neccessary debugging aid. But type b) and c) can happen too out in the real world, and if you don't test for an error early, you often can't recover the situation. Lives or $ lost. Brain's law: "Software Bugs and Hardware Faults are no excuse for the Program not to work". So: it costs peanuts, and may save your hide. ---------------------- <> <> How doth the little Crocodile | Alan & Carmel Brain| xxxxx Improve his shining tail? | Canberra Australia | xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM ---------------------- o OO*O^^^^O*OO o oo oo oo oo By pulling Maerklin Wagons, in 1/220 Scale ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-29 0:00 ` Alan Brain @ 1996-09-29 0:00 ` Robert A Duff 1996-09-30 0:00 ` Wayne L. Beavers 1996-10-01 0:00 ` Ken Garlington 1 sibling, 1 reply; 58+ messages in thread From: Robert A Duff @ 1996-09-29 0:00 UTC (permalink / raw) In article <324F1157.625C@dynamite.com.au>, Alan Brain <aebrain@dynamite.com.au> wrote: >Brain's law: >"Software Bugs and Hardware Faults are no excuse for the Program not to >work". > >So: it costs peanuts, and may save your hide. This reasoning doesn't sound right to me. The hardware part, I mean. The reason checks-on costs only 5% or so is that compilers aggressively optimize out almost all of the checks. When the compiler proves that a check can't fail, it assumes that the hardware is perfect. So, hardware faults and cosmics rays and so forth are just as likely to destroy the RTS, or cause the program to take a wild jump, or destroy the call stack, or whatever -- as opposed to getting a Constraint_Error a reocovering gracefully. After all, the compiler doesn't range-check the return address just before doing a return instruction! - Bob ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-29 0:00 ` Robert A Duff @ 1996-09-30 0:00 ` Wayne L. Beavers 1996-10-01 0:00 ` Ken Garlington 1996-10-03 0:00 ` Richard A. O'Keefe 0 siblings, 2 replies; 58+ messages in thread From: Wayne L. Beavers @ 1996-09-30 0:00 UTC (permalink / raw) I have been reading this thread awhile and one topic that I have not seen mentioned is protecting the code area from damage. When I code in PL/I or any other reentrant language I always make sure that the executable code is executing from read-only storage. There is no way to put the data areas in read-only storage (obviously) but I can't think of any reason to put the executable code in writeable storage. I one had to port 8,000 subroutines in PL/I, 24 megabytes of executable code from one system to another. The single most common error I had to correct was incorrect usage of pointer variables. I caught a lot of them when ever they attempted to accidently store into the code area. At that point it is trivial to correct the bug. This technique certainly doesn't catch all pointer failures, but it will catch at least some of them. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-30 0:00 ` Wayne L. Beavers @ 1996-10-01 0:00 ` Ken Garlington 1996-10-01 0:00 ` Wayne L. Beavers 1996-10-03 0:00 ` Richard A. O'Keefe 1 sibling, 1 reply; 58+ messages in thread From: Ken Garlington @ 1996-10-01 0:00 UTC (permalink / raw) Wayne L. Beavers wrote: > > I have been reading this thread awhile and one topic that I have not seen mentioned is protecting the code > area from damage. When I code in PL/I or any other reentrant language I always make sure that the executable > code is executing from read-only storage. There is no way to put the data areas in read-only storage > (obviously) but I can't think of any reason to put the executable code in writeable storage. That's actually a pretty common rule of thumb for safety-critical systems. Unfortunately, read-only memory isn't exactly read-only. For example, hardware errors can cause a random change in the memory. So, it's not a perfect fix. > > I one had to port 8,000 subroutines in PL/I, 24 megabytes of executable code from one system to another. The > single most common error I had to correct was incorrect usage of pointer variables. I caught a lot of them > when ever they attempted to accidently store into the code area. At that point it is trivial to correct the > bug. This technique certainly doesn't catch all pointer failures, but it will catch at least some of them. -- LMTAS - "Our Brand Means Quality" ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-01 0:00 ` Ken Garlington @ 1996-10-01 0:00 ` Wayne L. Beavers 1996-10-01 0:00 ` Ken Garlington 0 siblings, 1 reply; 58+ messages in thread From: Wayne L. Beavers @ 1996-10-01 0:00 UTC (permalink / raw) Ken Garlington wrote: > That's actually a pretty common rule of thumb for safety-critical systems. > Unfortunately, read-only memory isn't exactly read-only. For example, hardware errors > can cause a random change in the memory. So, it's not a perfect fix. Your right, but the risk and probability of memory failures is pretty low I would think. I have never seen or heard of a memory failure in any of the systems that I have worked on. I don't know what the current technology is but I can remember quite awhile ago that at least one vendor was claiming that ALL double bit memory errors were fully detectable and recoverable, ALL triple bit errors were detectable but only some were correctable. But I also don't work on realtime systems, my experience is with commercial systems. Are you refering to on-board systems for aircraft where weight and vibration are also a factor or are you refering to ground base systems that don't have similar constraints? Does anyone know just how good memory ECC is these days? Wayne L. Beavers wayneb@beyond-software.com Beyond Software, Inc. The Mainframe/Internet Company http://www.beyond-software.com/ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-01 0:00 ` Wayne L. Beavers @ 1996-10-01 0:00 ` Ken Garlington 1996-10-02 0:00 ` Sandy McPherson 0 siblings, 1 reply; 58+ messages in thread From: Ken Garlington @ 1996-10-01 0:00 UTC (permalink / raw) Wayne L. Beavers wrote: > > Ken Garlington wrote: > > > That's actually a pretty common rule of thumb for safety-critical systems. > > Unfortunately, read-only memory isn't exactly read-only. For example, hardware errors > > can cause a random change in the memory. So, it's not a perfect fix. > > Your right, but the risk and probability of memory failures is pretty low I would think. I have never seen > or heard of a memory failure in any of the systems that I have worked on. I don't know what the current > technology is but I can remember quite awhile ago that at least one vendor was claiming that ALL double bit > memory errors were fully detectable and recoverable, ALL triple bit errors were detectable but only some were > correctable. But I also don't work on realtime systems, my experience is with commercial systems. > > Are you refering to on-board systems for aircraft where weight and vibration are also a factor or are you > refering to ground base systems that don't have similar constraints? On-board systems. The failure _rate_ is usually pretty low, but in a harsh environment you can get quite a few failure _sources_, including mechanical failures (stress fractures, solder loss due to excessive heat, etc.), electrical failures (EMI, lightening), and so forth. You don't have to take out the actual chip, of course: just as bad is a failure in the address or data lines connecting the memory to the CPU. Add a memory management unit to the mix, along with various I/O devices mapped into the memory space, and you can get a whole slew of memory-related failure modes. You can also get into some neat system failures. For example, some "read-only" memory actually allows writes to the execution space in certain modes, to allow quick reprogramming. If you have a system failure that allows writes at the wrong time, coupled with a failure that does a write where it shouldn't... ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-01 0:00 ` Ken Garlington @ 1996-10-02 0:00 ` Sandy McPherson 0 siblings, 0 replies; 58+ messages in thread From: Sandy McPherson @ 1996-10-02 0:00 UTC (permalink / raw) Ken Garlington wrote: > > Wayne L. Beavers wrote: > > > > Ken Garlington wrote: > > > > > That's actually a pretty common rule of thumb for safety-critical systems. > > > Unfortunately, read-only memory isn't exactly read-only. For example, hardware errors > > > can cause a random change in the memory. So, it's not a perfect fix. > > > > Your right, but the risk and probability of memory failures is pretty low I would think. I have never seen > > or heard of a memory failure in any of the systems that I have worked on. I don't know what the current > > technology is but I can remember quite awhile ago that at least one vendor was claiming that ALL double bit > > memory errors were fully detectable and recoverable, ALL triple bit errors were detectable but only some were > > correctable. But I also don't work on realtime systems, my experience is with commercial systems. > > > > Are you refering to on-board systems for aircraft where weight and vibration are also a factor or are you > > refering to ground base systems that don't have similar constraints? > > On-board systems. The failure _rate_ is usually pretty low, but in a harsh environment > you can get quite a few failure _sources_, including mechanical failures (stress > fractures, solder loss due to excessive heat, etc.), electrical failures (EMI, > lightening), and so forth. You don't have to take out the actual chip, of course: just > as bad is a failure in the address or data lines connecting the memory to the CPU. Add > a memory management unit to the mix, along with various I/O devices mapped into the > memory space, and you can get a whole slew of memory-related failure modes. > > You can also get into some neat system failures. For example, some "read-only" memory > actually allows writes to the execution space in certain modes, to allow quick > reprogramming. If you have a system failure that allows writes at the wrong time, > coupled with a failure that does a write where it shouldn't... It depends upon what you mean by a memory failure. I can imagine that the chances of your memory being trashed completely is very very low and in rad-hardened systems the chances of a single-event-upset (SEU) is also low, but has to be guarded against. I have recently been working on a system where the specified hardware has a parity bit for each octet of memory, so SEUs which flip bit values in the memory can be detected. This parity check is built into the system's micro-code. Similarily the definition of what is and isn't read only memory is usually a feature of the processor and or operating system being used. A compiler cannot put code into read only areas of memory, unless the processor its micro-code and/or o/s are playing ball as well. If you are unfortunate enough to be in this situation (are there any such systems left?), then the only thing you can do is DIY, but the compiler can't help you much, other than the for-use-at. I once read an interesting definition of two types of bugs in "transaction processing" by Gray & Reuter, Heisenbugs and Bohrbugs. Identification of potential Heisenbugs, estimation of probability of occurence, impact to system on occurrence and appropriate recovery procedures are part of the risk analysis. An SEU is a classic Heisenbug, which IMO is out of scope of compiler checks, because they can result in a valid but incorrect value for a variable and are just as likely to occur in the code section as the data section of your application. A complete memory failure is of course beyond the scope of the compiler. IMO an Ada compiler's job (when used properly) is to make sure that syntactic Bohrbugs do not enter a system and all semantic Bohrbugs get detected at runtime (as Bohrbugs, by definition have a fixed location and are certain to occur under given conditions- the Ariane 5 bug was definitely a Bohrbug). The compiler cannot do anything about Heisenbugs (because they only have a probability of occurrence). To handle Heisenbugs generally you need to have a detection, reporting and handling mechanism: built using the hardwares error detection, generally accepted software practices (e.g. duplicate storage, process-pairs) and an application dependent exception handling mechanism. Ada provides the means to trap the error condition once it has been reported, but it does not implement exception handlers for you, other than the default "I'm gone..."; additionally if the underlying system does not provide the means to detect a probable error, you have to implement the means of detectin the probel and reporting this through the Ada exception handling yourself. -- Sandy McPherson MBCS CEng. tel: +31 71 565 4288 (w) ESTEC/WAS P.O. Box 299 NL-2200AG Noordwijk ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-30 0:00 ` Wayne L. Beavers 1996-10-01 0:00 ` Ken Garlington @ 1996-10-03 0:00 ` Richard A. O'Keefe 1 sibling, 0 replies; 58+ messages in thread From: Richard A. O'Keefe @ 1996-10-03 0:00 UTC (permalink / raw) "Wayne L. Beavers" <wayneb@beyond-software.com> writes: >I have been reading this thread awhile and one topic that I have not >seen mentioned is protecting the code area from damage. I imagine that everyone else has taken this for granted. UNIX compilers have been doing it for years, and so I believe have VMS ones. >When I code in PL/I or any other reentrant language I always make sure >that the executable code is executing from read-only storage. (a) This is not something that the programmer should normally have to be concerned with, it just happens. (b) It cannot always be done. Run-time code generation is a practical and important technique. (Making a page read-only after new code has been written to it is a good idea, of course.) >There is no way to put the data areas in read-only storage (obviously) It may be obvious, but in important cases it isn't true. UNIX (and I believe VMS) compilers have for years had the ability to put _selected_ data in read-only storage. And of course it is perfectly feasible in many operating systems (certainly UNIX and VMS) to write data into a page and then ask the operating system to make that page read-only. >but I can't think of any reason to put the executable code in writeable >storage. Run-time binary translation. Some approaches to relocation. How many reasons do you want? >I one had to port 8,000 subroutines in PL/I, 24 megabytes of executable >code from one system to another. In a language where the last revision of the standard was 1976? You have my deepest sympathy. -- Australian citizen since 14 August 1996. *Now* I can vote the xxxs out! Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-29 0:00 ` Alan Brain 1996-09-29 0:00 ` Robert A Duff @ 1996-10-01 0:00 ` Ken Garlington 1 sibling, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-10-01 0:00 UTC (permalink / raw) Alan Brain wrote: > > 1. Suppressing all checks in Ada-83 makes about a 5% difference in > execution speed, in typical real-time and avionics systems. (For > example, B2 simulator, CSU-90 sonar, COSYS-200 Combat system). If your > hardware budget is this tight, > you'd better not have lives at risk, or a lot of money, as technical > risk is > appallingly high. Actually, I've seen systems where checks make much more than a 5% difference. For example, in a flight control system, checks done in the redundancy management monitor (comparing many redundant inputs in a tight loop) can easily add 10% or more. I have also seen flight-critical systems where 5% is a big deal, and where you can _not_ add a more powerful processor to fix the problem. Flight control software usually exists in a flight control _system_, with system issues of power, cooling, space, etc. to consider. On a missile, these are important issues. You might consider the technical risk "appalingly high," but the fix for that risk can introduce equally dangerous risks in other areas. > 2. If you know the range is 0-100, and you get 101, what does this show? > a) A bug in the code (99.9999....% probable). b) A hardware fault. c) A > soft failure, as in a stray cosmic ray zapping a bit. d) a faulty > analysis of your "can't happen" situation. As in re-use, or where your > array comes from an IO channel with noise on.... You forgot (e) - a failure in the inputs. The range may be calculated, directly or indirectly, from an input to the system. In practice, at least for the systems I'm familiar with, that's usually where the error came from -- either a connector fell off, or some wiring shorted out, or a bird strike took out half of your sensors. I definitely would say that, when we have a failure reported in operation, it's not usually because of a bug in the software for our systems! > Type a) and d) failures should be caught during testing. Most of them. > OK, some of them. Range checking here is a neccessary debugging aid. But > type b) and c) can happen too out in the real world, and if you don't > test for an error early, you often can't recover the situation. Lives or > $ lost. > > Brain's law: > "Software Bugs and Hardware Faults are no excuse for the Program not to > work". Too bad that law can't be enforced :) -- LMTAS - "Our Brand Means Quality" ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-26 0:00 ` Matthew Heaney 1996-09-27 0:00 ` Wayne Hayes 1996-09-27 0:00 ` Ronald Kunne @ 1996-09-28 0:00 ` Ken Garlington 2 siblings, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-09-28 0:00 UTC (permalink / raw) Matthew Heaney wrote: > ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-26 0:00 ` Ronald Kunne 1996-09-26 0:00 ` Matthew Heaney @ 1996-09-27 0:00 ` Ken Garlington 1996-09-27 0:00 ` Alan Brain 1996-09-29 0:00 ` Louis K. Scheffer 3 siblings, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-09-27 0:00 UTC (permalink / raw) Ronald Kunne wrote: > > In article <52bm1c$gvn@rational.rational.com> > rlk@rational.com (Bob Kitzberger) writes: > > >Ada _has_ range checks built into the language. They were explicitly > >disabled in this case. > > The problem of constructing bug-free real-time software seems to me > a trade-off between safety and speed of execution (and maybe available > memory?). In other words: including tests on array boundaries might > make the code saver, but also slower. Particularly for fail-operate systems that must continue to function in harsh environments, memory and throughput can be tight. This usually happens because the system must continue to operate on emergency power and/or cooling. At least until recently, the processing systems that had lots of memory and CPU power also had larger power and cooling requirements, so they couldn't always be used in this class of systems. (That's changing, somewhat.) So, the tradeoff you describe can occur. The trade-off I find even more interesting is the safety gained from adding extra features vs. the safety _lost_ by adding those features. Every time you add a check, whether it's an explicit check or one automatically generated by the compiler, you have to have some way to gain confidence that the check will not only work, but won't create some side-effect that causes a different problem. The effort expended to get confidence for that additional feature is effort that can't be spent gaining assurance of other features in the system, assuming finite resources. There is no magic formula I've ever seen to make that trade-off - ultimately, it's human judgement. -- LMTAS - "Our Brand Means Quality" ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-26 0:00 ` Ronald Kunne 1996-09-26 0:00 ` Matthew Heaney 1996-09-27 0:00 ` Ken Garlington @ 1996-09-27 0:00 ` Alan Brain 1996-09-28 0:00 ` Ken Garlington 1996-09-29 0:00 ` Louis K. Scheffer 3 siblings, 1 reply; 58+ messages in thread From: Alan Brain @ 1996-09-27 0:00 UTC (permalink / raw) Ronald Kunne wrote: > The problem of constructing bug-free real-time software seems to me > a trade-off between safety and speed of execution (and maybe available > memory?). In other words: including tests on array boundaries might > make the code saver, but also slower. > > Comments? Bug-free software is not a reasonable criterion for success in a safety-critical system, IMHO. A good program should meet the requirements for safety etc despite bugs. Also despite hardware failures, soft failures, and so on. A really good safety-critical program should be remarkably difficult to de-bug, as the only way you know it's got a major problem is by examining the error log, and calculating that it's performance is below theoretical expectations. And if it runs too slow, many times in the real-world you can spend 2 years of development time and many megabucks kludging the software, or wait 12 months and get the new 400 Mhz chip instead of your current 133. ---------------------- <> <> How doth the little Crocodile | Alan & Carmel Brain| xxxxx Improve his shining tail? | Canberra Australia | xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM ---------------------- o OO*O^^^^O*OO o oo oo oo oo By pulling Maerklin Wagons, in 1/220 Scale ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` Alan Brain @ 1996-09-28 0:00 ` Ken Garlington 0 siblings, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-09-28 0:00 UTC (permalink / raw) Alan Brain wrote: > > Ronald Kunne wrote: > > > The problem of constructing bug-free real-time software seems to me > > a trade-off between safety and speed of execution (and maybe available > > memory?). In other words: including tests on array boundaries might > > make the code saver, but also slower. > > > > Comments? > > Bug-free software is not a reasonable criterion for success in a > safety-critical system, IMHO. A good program should meet the > requirements for safety etc despite bugs. An OK statement for a fail-safe system. How do you propose to implement this theory for a fail-operate system, particularly if there are system constraints on weight, etc. that preclude hardware backups? > Also despite hardware > failures, soft failures, and so on. A system which will always meet its requirements despite any combination of failures is in the same regime as the perpetual motion system. If you build one, you'll probably make a lot of money, so go to it! > A really good safety-critical > program should be remarkably difficult to de-bug, as the only way you > know it's got a major problem is by examining the error log, and > calculating that it's performance is below theoretical expectations. > And if it runs too slow, many times in the real-world you can spend 2 > years of development time and many megabucks kludging the software, or > wait 12 months and get the new 400 Mhz chip instead of your current 133. I really need to change jobs. It sounds so much simpler to build software for ground-based PCs, where you don't have to worry about the weight, power requirements, heat dissipation, physical size, vulnerability to EMI/radiation/salt fog/temperature/etc. of your system. -- LMTAS - "Our Brand Means Quality" ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-26 0:00 ` Ronald Kunne ` (2 preceding siblings ...) 1996-09-27 0:00 ` Alan Brain @ 1996-09-29 0:00 ` Louis K. Scheffer 3 siblings, 0 replies; 58+ messages in thread From: Louis K. Scheffer @ 1996-09-29 0:00 UTC (permalink / raw) KUNNE@frcpn11.in2p3.fr (Ronald Kunne) writes: >The problem of constructing bug-free real-time software seems to me >a trade-off between safety and speed of execution (and maybe available >memory?). In other words: including tests on array boundaries might >make the code saver, but also slower. > >Comments? True in this case, but not in the way you might expect. The software group decided that they wanted the guidance computers to be no more than 80 percent busy. Range checking ALL the variables took too much time, so they analyzed the situation and only checked those that might overflow. In the Ariane 4, this particular variable could not overflow unless the trajectory was wildly off, so they left out the range checking. I think you could make a good case for range checking in the Ariane software making it less safe, rather than more safe. The only reason they check for overflow is to find hardware errors - since the software is designed to not overflow, then any overflow must be because of a hardware problem, so if any processor detects an overflow it shuts down. So on the one hand, each additional range check increases the odds of catching a hardware error before it does damage, but increases the odds that a processor shuts down while it could still be delivering useful data. (Say the overflow occurs while computing unimportant results, as on the Ariane 5). Given the relative odds of hardware and software errors, it's not at all obvious to me that range checking helps at all in this case! The real problem is that they did not re-examine this software for the Ariane 5.If they had eitehr simulated it, or examined it closely, they would probably have found this problem. -Lou Scheffer ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` Ariane 5 failure @@ robin 1996-09-25 0:00 ` Bob Kitzberger @ 1996-09-25 0:00 ` Michel OLAGNON 1996-09-25 0:00 ` Byron Kauffman 1996-09-25 0:00 ` Chris Morgan 1996-09-27 0:00 ` John McCabe 2 siblings, 2 replies; 58+ messages in thread From: Michel OLAGNON @ 1996-09-25 0:00 UTC (permalink / raw) In article <52a572$9kk@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (@@ robin) writes: >[reports of Ariane and STS-2 bugs deleted] > > >In both cases, the programing error could have been detected >with a simple test, but in both cases, no test was included. > >One would have thought that having had one failure (at least) >for integer out-of-range, that the implementors of the software >for Ariane 5 would have been extra careful in ensuring that >all data conversions were within range -- since any kind >of interrupt would result in destruction of the spacecraft. > May be the main reason for the lack of testing and care was that the conversion exception could only occur after lift off, and that that particular piece of program was of no use after lift off. It was only kept running for 50 s in order to speed up countdown restart in case of an interruption between H0-9 and H0-5. Conclusion: Never compute values that are of no use when you can avoid it ! >There's a case for a review of the programming language used. Michel -- | Michel OLAGNON email : Michel.Olagnon@ifremer.fr| | IFREMER: Institut Francais de Recherches pour l'Exploitation de la Mer| ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` Michel OLAGNON @ 1996-09-25 0:00 ` Byron Kauffman 1996-09-25 0:00 ` A. Grant 1996-09-25 0:00 ` Chris Morgan 1 sibling, 1 reply; 58+ messages in thread From: Byron Kauffman @ 1996-09-25 0:00 UTC (permalink / raw) Michel OLAGNON wrote: > > May be the main reason for the lack of testing and care was > that the conversion exception could only occur after lift off, > and that that particular piece of program was of no use after > lift off. It was only kept running for 50 s in order to > speed up countdown restart in case of an interruption between > H0-9 and H0-5. > > Conclusion: Never compute values that are of no use when you can > avoid it ! > > >There's a case for a review of the programming language used. > > Michel > -- > | Michel OLAGNON email : Michel.Olagnon@ifremer.fr| > | IFREMER: Institut Francais de Recherches pour l'Exploitation de la Mer| Of course, Michel, you've got a great point, but let me give you some advice, assuming you haven't read this thread for the last few months (seems like years). Robin's whole point is that he firmly believes that the problem would not have occurred if PL/I had been used instead of Ada. Several EXTREMELY competent and experienced engineers who actually have written flight-control software have patiently, and in some cases (though I can't blame them) impatiently attempted to explain the situation - that this was a bad design/management decision combined with a fatal oversight in testing - to this poor student, but alas, to no avail. My advice, Michel - blow it off and don't let ++robin (or is it @@robin?) get to you, because "++robin" is actually an alias for John Cleese. He's gathering material for a sequel to "The Argument Sketch"... :-) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` Byron Kauffman @ 1996-09-25 0:00 ` A. Grant 1996-09-25 0:00 ` Ken Garlington ` (2 more replies) 0 siblings, 3 replies; 58+ messages in thread From: A. Grant @ 1996-09-25 0:00 UTC (permalink / raw) In article <32492E5C.562@lmtas.lmco.com> Byron Kauffman <KauffmanBB@lmtas.lmco.com> writes: >Several EXTREMELY competent and experienced engineers who actually have >written flight-control software have patiently, and in some cases >(though I can't blame them) impatiently attempted to explain the >situation - that this was a bad design/management decision combined with >a fatal oversight in testing - to this poor student, but alas, to no >avail. Robin is not a student. He is a senior lecturer at the Royal Melbourne Institute of Technology, a highly reputable institution. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` A. Grant @ 1996-09-25 0:00 ` Ken Garlington 1996-09-26 0:00 ` Byron Kauffman 1996-09-26 0:00 ` Sandy McPherson 2 siblings, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-09-25 0:00 UTC (permalink / raw) A. Grant wrote: > Robin is not a student. He is a senior lecturer at the Royal > Melbourne Institute of Technology, a highly reputable institution. When it comes to building embedded safety-critical systems, trust me: He's a student! -- LMTAS - "Our Brand Means Quality" ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` A. Grant 1996-09-25 0:00 ` Ken Garlington @ 1996-09-26 0:00 ` Byron Kauffman 1996-09-27 0:00 ` A. Grant 1996-09-26 0:00 ` Sandy McPherson 2 siblings, 1 reply; 58+ messages in thread From: Byron Kauffman @ 1996-09-26 0:00 UTC (permalink / raw) A. Grant wrote: > > In article <32492E5C.562@lmtas.lmco.com> Byron Kauffman <KauffmanBB@lmtas.lmco.com> writes: > >Several EXTREMELY competent and experienced engineers who actually have > >written flight-control software have patiently, and in some cases > >(though I can't blame them) impatiently attempted to explain the > >situation - that this was a bad design/management decision combined with > >a fatal oversight in testing - to this poor student, but alas, to no > >avail. > > Robin is not a student. He is a senior lecturer at the Royal > Melbourne Institute of Technology, a highly reputable institution. A. - Thank you for confirming my long-held theory that those who inhabit the ivory towers of engineering/CS academia should spend 2 of every 5 years working at a real job out in the real world. My intent is not to slam professors who are in touch with reality, of course (e.g., Feldman, Dewar, et al), but the idealistic theoretical side often is a far cry from the practical, just-get-it-done world we have to deal with once we're out of school. I just KNOW there's a good Dilbert strip here somewhere... ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-26 0:00 ` Byron Kauffman @ 1996-09-27 0:00 ` A. Grant 0 siblings, 0 replies; 58+ messages in thread From: A. Grant @ 1996-09-27 0:00 UTC (permalink / raw) In article <324A7C1C.6718@lmtas.lmco.com> Byron Kauffman <KauffmanBB@lmtas.lmco.com> writes: >A. Grant wrote: >> Robin is not a student. He is a senior lecturer at the Royal >> Melbourne Institute of Technology, a highly reputable institution. >Thank you for confirming my long-held theory that those who inhabit the >ivory towers of engineering/CS academia should spend 2 of every 5 years >working at a real job out in the real world. My intent is not to slam >professors who are in touch with reality, of course (e.g., Feldman, >Dewar, et al), but the idealistic theoretical side often is a far cry >from the practical, just-get-it-done world we have to deal with once >we're out of school. You're being a bit hard on theoretical computer scientists here. Just because it's called computer science doesn't mean it has to be able to instantly make money on real computers. And the Ariane 5 failure was due to pragmatism (reusing old stuff to save money) not idealism (applying theoretical proofs of correctness). But in any case RMIT is noted for its involvement with industry. (I used to work for a start-up company out of RMIT premises.) If PL/I is being pushed by RMIT it's probably because the DP managers in Collins St. want it. Australia doesn't have much call for aerospace systems. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` A. Grant 1996-09-25 0:00 ` Ken Garlington 1996-09-26 0:00 ` Byron Kauffman @ 1996-09-26 0:00 ` Sandy McPherson 2 siblings, 0 replies; 58+ messages in thread From: Sandy McPherson @ 1996-09-26 0:00 UTC (permalink / raw) A. Grant wrote: > > Robin is not a student. He is a senior lecturer at the Royal > Melbourne Institute of Technology, a highly reputable institution. Why doesn't he wise up and act like one then? I don't know the man, and I suspect he has been winding everybody up just for a laugh. But, if this is not the case, the thought of such a closed mind teaching students is quite horrific. "Use PL/I mate, you'll be tucker", -- Sandy McPherson MBCS CEng. tel: +31 71 565 4288 (w) ESTEC/WAS P.O. Box 299 NL-2200AG Noordwijk ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` Michel OLAGNON 1996-09-25 0:00 ` Byron Kauffman @ 1996-09-25 0:00 ` Chris Morgan 1 sibling, 0 replies; 58+ messages in thread From: Chris Morgan @ 1996-09-25 0:00 UTC (permalink / raw) In article <ag129.804.0011F709@ucs.cam.ac.uk> ag129@ucs.cam.ac.uk (A. Grant) writes: Robin is not a student. He is a senior lecturer at the Royal Melbourne Institute of Technology, a highly reputable institution. I'm tempted to say "not so reputable to readers of this newsgroup" after the ridiculous statements made by Robin w.r.t. Ariane 5 but Richard A. O'Keefe's regular excellent postings more than balance them out. Chris -- -- Chris Morgan |email cm@mihalis.demon.co.uk (home) http://www.mihalis.demon.co.uk/ | or chris.morgan@baesema.co.uk (work) ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-25 0:00 ` Ariane 5 failure @@ robin 1996-09-25 0:00 ` Bob Kitzberger 1996-09-25 0:00 ` Michel OLAGNON @ 1996-09-27 0:00 ` John McCabe 1996-10-01 0:00 ` Michael Dworetsky 1996-10-04 0:00 ` @@ robin 2 siblings, 2 replies; 58+ messages in thread From: John McCabe @ 1996-09-27 0:00 UTC (permalink / raw) rav@goanna.cs.rmit.edu.au (@@ robin) wrote: <..snip..> Just a point for your information. From clari.tw.space: "An inquiry board investigating the explosion concluded in July that the failure was caused by software design errors in a guidance system." Note software DESIGN errors - not programming errors. Best Regards John McCabe <john@assen.demon.co.uk> ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` John McCabe @ 1996-10-01 0:00 ` Michael Dworetsky 1996-10-04 0:00 ` Steve Bell 1996-10-04 0:00 ` @@ robin 1 sibling, 1 reply; 58+ messages in thread From: Michael Dworetsky @ 1996-10-01 0:00 UTC (permalink / raw) In article <843845039.4461.0@assen.demon.co.uk> john@assen.demon.co.uk (John McCabe) writes: >rav@goanna.cs.rmit.edu.au (@@ robin) wrote: > ><..snip..> > >Just a point for your information. From clari.tw.space: > > "An inquiry board investigating the explosion concluded in >July that the failure was caused by software design errors in a >guidance system." > >Note software DESIGN errors - not programming errors. > Indeed, the problems were in the specifications given to the programmers, not in the coding activity itself. They wrote exactly what they were asked to write, as far as I could see from reading the report summary. The problem was caused by using software developed for Ariane 4's flight characteristics, which were different from those of Ariane 5. When the launch vehicle exceeded the boundary parameters of the Ariane-4 software, it send an error message and, as specified by the remit given to programmers, a critical guidance system shut down in mid-flight. Ka-boom. -- Mike Dworetsky, Department of Physics | Haiku: Nine men ogle gnats & Astronomy, University College London | all lit Gower Street, London WC1E 6BT UK | till last angel gone. email: mmd@star.ucl.ac.uk | Men in Ukiah. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-01 0:00 ` Michael Dworetsky @ 1996-10-04 0:00 ` Steve Bell 1996-10-07 0:00 ` Ken Garlington 1996-10-09 0:00 ` @@ robin 0 siblings, 2 replies; 58+ messages in thread From: Steve Bell @ 1996-10-04 0:00 UTC (permalink / raw) Michael Dworetsky wrote: > > >Just a point for your information. From clari.tw.space: > > > > "An inquiry board investigating the explosion concluded in > >July that the failure was caused by software design errors in a > >guidance system." > > > >Note software DESIGN errors - not programming errors. > > > > Indeed, the problems were in the specifications given to the programmers, > not in the coding activity itself. They wrote exactly what they were > asked to write, as far as I could see from reading the report summary. > > The problem was caused by using software developed for Ariane 4's flight > characteristics, which were different from those of Ariane 5. When the > launch vehicle exceeded the boundary parameters of the Ariane-4 software, > it send an error message and, as specified by the remit given to > programmers, a critical guidance system shut down in mid-flight. Ka-boom. > I work for an aerospace company, and we recieved a fairly detailed accounting of what went wrong with the Ariane 5. Launch vehicles, while they are sitting on the launch pad, run a guidance program that updates their position and velocity in reference to an coordinate frame whose origin is at the center of the earth (usually called an Earth-Centered-Inertial (ECI) frame). This program is usually started up from 1 to 3-4 hours before launch and is allowed to run all the way until liftoff, so that the rocket will know where it's at and how fast it's going at liftoff. Although called "ground software," (because it runs while the rocket is on the ground), it resides inside the rocket's guidance computer(s), and for the Titan family of launch vehicles, the code is exited at t=0 (liftoff). This code is designed with knowing that the rocket is rotating on the surface of the earth, and the algorithms expect only very mild accelerations (as compared to when the rocket hauls ass off the pad at liftoff). Well, the French do things a little differently (but probably now they don't). The Ariane 4 and the first Ariane 5 allow(ed) this program to keep running for 40 secs past liftoff. They do (did) this in case there are any unanticipated holds in the countdown right close to liftoff. In this way, this position and velocity updating code would *not* have to be reset if they could get off the ground within just a few seconds of nominal. Well, it appears that the Ariane 5 really hauls ass off the pad, because at about 30 secs, it was pulling some accelerations that caused floating pount overflows in the still functioning ground software. The actual flight software (which was also running, naturally) was computing the positions and velocities that were being used to actually fly the rocket, and it was doing just fine - no overflow errors there because it was designed to expect high accelerations. There are two flight computers on the Ariane 5 - a primary and a backup - and each was designed to shut down if an error such as a floating point overflow occurred, thinking that the other one would take over. Both computers were running the ground software, and both experienced the floating point errors. Actually, the primary went belly-up first, and then the backup within a fraction of a second later. With no functioning guidance computer on board, well, ka-boom as you say. Apparently the Ariane 4 gets off the ground with smaller accelerations than the 5, and this never happened with a 4. You might take note that this would never happen with a Titan because we don't execute this ground software after liftoff. Even if we did, we would have caught the floating point overflows way before launch because we run all code in what's called "Real-Time Simulations" where actual flight harware and software are subjected to any and all known physical conditions. This was another finding of the investigation board - apparently the French don't do enough of this type of testing because it's real expensive. Oh well, they probably do now! -- Clear skies, Steve Bell sb635@delphi.com http://people.delphi.com/sb635 - Astrophoto page ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` Steve Bell @ 1996-10-07 0:00 ` Ken Garlington 1996-10-09 0:00 ` @@ robin 1 sibling, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-10-07 0:00 UTC (permalink / raw) Steve Bell wrote: > Well, the French do things a little differently (but probably now they don't). The > Ariane 4 and the first Ariane 5 allow(ed) this program to keep running for 40 secs > past liftoff. They do (did) this in case there are any unanticipated holds in the > countdown right close to liftoff. In this way, this position and velocity updating > code would *not* have to be reset if they could get off the ground within just a few > seconds of nominal. But why 40 seconds? Why not 1 second (or one millisecond, for that matter)? > You might take note that this would never happen with a > Titan because we don't execute this ground software after liftoff. Even if we did, we > would have caught the floating point overflows way before launch because we run all > code in what's called "Real-Time Simulations" where actual flight harware and software > are subjected to any and all known physical conditions. This was another finding of > the investigation board - apparently the French don't do enough of this type of > testing because it's real expensive. Going way back into my history, I believe this is also true for Atlas. > -- > Clear skies, > Steve Bell > sb635@delphi.com > http://people.delphi.com/sb635 - Astrophoto page -- LMTAS - "Our Brand Means Quality" For more info, see http://www.lmtas.com or http://www.lmco.com ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` Steve Bell 1996-10-07 0:00 ` Ken Garlington @ 1996-10-09 0:00 ` @@ robin 1996-10-09 0:00 ` Steve O'Neill 1 sibling, 1 reply; 58+ messages in thread From: @@ robin @ 1996-10-09 0:00 UTC (permalink / raw) Steve Bell <sb635@delphi.com> writes: >Michael Dworetsky wrote: >> >> >Just a point for your information. From clari.tw.space: >> > >> > "An inquiry board investigating the explosion concluded in >> >July that the failure was caused by software design errors in a >> >guidance system." >> > >> >Note software DESIGN errors - not programming errors. >> > >> >> Indeed, the problems were in the specifications given to the programmers, >> not in the coding activity itself. They wrote exactly what they were >> asked to write, as far as I could see from reading the report summary. >> >> The problem was caused by using software developed for Ariane 4's flight >> characteristics, which were different from those of Ariane 5. When the >> launch vehicle exceeded the boundary parameters of the Ariane-4 software, >> it send an error message and, as specified by the remit given to >> programmers, a critical guidance system shut down in mid-flight. Ka-boom. >> >I work for an aerospace company, and we recieved a fairly detailed accounting of what >went wrong with the Ariane 5. Launch vehicles, while they are sitting on the launch >pad, run a guidance program that updates their position and velocity in reference to >an coordinate frame whose origin is at the center of the earth (usually called an >Earth-Centered-Inertial (ECI) frame). This program is usually started up from 1 to 3-4 >hours before launch and is allowed to run all the way until liftoff, so that the >rocket will know where it's at and how fast it's going at liftoff. Although called >"ground software," (because it runs while the rocket is on the ground), it resides >inside the rocket's guidance computer(s), and for the Titan family of launch vehicles, >the code is exited at t=0 (liftoff). This code is designed with knowing that the >rocket is rotating on the surface of the earth, and the algorithms expect only very >mild accelerations (as compared to when the rocket hauls ass off the pad at liftoff). >Well, the French do things a little differently (but probably now they don't). The >Ariane 4 and the first Ariane 5 allow(ed) this program to keep running for 40 secs >past liftoff. They do (did) this in case there are any unanticipated holds in the >countdown right close to liftoff. In this way, this position and velocity updating >code would *not* have to be reset if they could get off the ground within just a few >seconds of nominal. Well, it appears that the Ariane 5 really hauls ass off the pad, >because at about 30 secs, it was pulling some accelerations that caused floating pount >overflows ---Definitely not. No floating-point overflow occurred. In Ariane 5, the overflow occurred on converting a double-precision (some 56 bits?) floating-point to a 16-bit integer (15 significant bits). That's why it was so important to have a check that the conversion couldn't overflow! in the still functioning ground software. The actual flight software (which >was also running, naturally) was computing the positions and velocities that were >being used to actually fly the rocket, and it was doing just fine - no overflow errors >there because it was designed to expect high accelerations. There are two flight >computers on the Ariane 5 - a primary and a backup - and each was designed to shut >down if an error such as a floating point overflow occurred, ---Again, not at all. It was designed to shut down if any interrupt occurred. It wasn't intended to be shut down for a routine thing as a conversion of floating-point to integer. thinking that the other >one would take over. Both computers were running the ground software, and both >experienced the floating point errors. ---No, the backup SRI experienced the programming error (UNCHECKED CONVERSION from floating-point to integer) first, and shut itself down, then the active SRI computer experienced the same programming error, then it shut itself down. Actually, the primary went belly-up first, and >then the backup within a fraction of a second later. With no functioning guidance >computer on board, well, ka-boom as you say. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-09 0:00 ` @@ robin @ 1996-10-09 0:00 ` Steve O'Neill 1996-10-12 0:00 ` Alan Brain 0 siblings, 1 reply; 58+ messages in thread From: Steve O'Neill @ 1996-10-09 0:00 UTC (permalink / raw) @@ robin wrote: > ---Definitely not. No floating-point overflow occurred. In > Ariane 5, the overflow occurred on converting a double-precision > (some 56 bits?) floating-point to a 16-bit integer (15 > significant bits). > > That's why it was so important to have a check that the > conversion couldn't overflow! > Agreed. Yes, the basic reason for the destruction of a billion dollar vehicle was for want of a couple of lines of code. But it relects a systemic problem much more damaging than what language was used. I would have expected that in a mission/safety critical application the proper checks would have been implemented, no matter what. And in a 'belts-and-suspenders' mode I would also expect an exception handler to take care of unforeseen possibilities at the lowest possible level and raise things to a higher level only when absolutely necessary. Had these precautions been taken there would probably be lots of entries in an error log but the satellites would now be orbiting. As outsiders we can only second guess as to why this approach was not taken but the review board implies that 1) the SRI software developers had an 80% max utilization requirement and 2) careful consideration (including faulty assumptions) was used in deciding what to protect and not protect. >It was designed to shut down if any interrupt occurred. It wasn't ^^^^^^^^^ exception, actually >intended to be shut down for a routine thing as a conversion of >floating-point to integer. This was based on the (faulty) system-wide assumption that any exception was the result of a random hardware failure. This is related to the other faulty assumption that "software should be considered correct until is proven to be at fault". But that's what the specification said. > ---No, the backup SRI experienced the programming error (UNCHECKED > CONVERSION from floating-point to integer) first, and shut itself > down, then the active SRI computer experienced the same programming > error, then it shut itself down. Yes, according to the report the backup died first (by 0.05 seconds). Probably not as a result of an unchecked_conversion though - the source and target are of different sizes which would not be allowed. Most likely just a conversion of a float to an sixteen-bit integer. This would have raised a Constraint_Error (or Operand_Error in this environment). This error could have been handled within the context of this procedure (and the mission continued) but obviously was not. Instead it appears to have been propagated to a global exception handler which performed the specified actions admirably. Unfortunately these included committing suicide and, in doing so, dooming the mission. -- Steve O'Neill | "No,no,no, don't tug on that! Sanders, A Lockheed Martin Company | You never know what it might smoneill@sanders.lockheed.com | be attached to." (603) 885-8774 fax: (603) 885-4071| Buckaroo Banzai ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-09 0:00 ` Steve O'Neill @ 1996-10-12 0:00 ` Alan Brain 0 siblings, 0 replies; 58+ messages in thread From: Alan Brain @ 1996-10-12 0:00 UTC (permalink / raw) Steve O'Neill wrote: > I would have expected that in a mission/safety critical application > the proper checks would have been implemented, no matter what. And in a > 'belts-and-suspenders' mode I would also expect an exception handler to > take care of unforeseen possibilities at the lowest possible level and > raise things to a higher level only when absolutely necessary. Had these > precautions been taken there would probably be lots of entries in an > error log but the satellites would now be orbiting. Concur completely. This should be Standard Operating Procedure, a matter of habit. Frankly, it's just good engineering practice. But is honoured more in the breach than the observance it seems, because.... > As outsiders we can only second guess as to why this approach was not > taken but the review board implies that 1) the SRI software developers > had an 80% max utilization requirement and 2) careful consideration > (including faulty assumptions) was used in deciding what to protect and > not protect. ... as some very reputable people, working for very reputable firms have tried to pound into my thick skull, they are used to working with 15%, no more, tolerances. And with diamond-grade Hard Real Time slices, where any over-run, no matter how slight, means disaster. In this case, Formal Proof and strict attention to the no of CPU cycles in all possible paths seems the only way to go. But this leaves you so open to error in all but the simplest, most trivial tasks, (just the race analysis would be nightmarish) that these slices had better be a very small part of the task, or the task itself must be very simple indeed. Either way, not having much bearing on the vast majority of problems I've encountered. If the tasks are not simple....then can I please ask the firms concerned to tell me which aircraft their software is on, so I can take appropriate action? ---------------------- <> <> How doth the little Crocodile | Alan & Carmel Brain| xxxxx Improve his shining tail? | Canberra Australia | xxxxxHxHxxxxxx _MMMMMMMMM_MMMMMMMMM ---------------------- o OO*O^^^^O*OO o oo oo oo oo By pulling Maerklin Wagons, in 1/220 Scale ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-09-27 0:00 ` John McCabe 1996-10-01 0:00 ` Michael Dworetsky @ 1996-10-04 0:00 ` @@ robin 1996-10-04 0:00 ` Michel OLAGNON ` (2 more replies) 1 sibling, 3 replies; 58+ messages in thread From: @@ robin @ 1996-10-04 0:00 UTC (permalink / raw) john@assen.demon.co.uk (John McCabe) writes: >Just a point for your information. From clari.tw.space: > "An inquiry board investigating the explosion concluded in >July that the failure was caused by software design errors in a >guidance system." >Note software DESIGN errors - not programming errors. >Best Regards >John McCabe <john@assen.demon.co.uk> ---If you read the Report, you'll see that that's not the case. This is what the report says: "* The internal SRI software exception was caused during execution of a data conversion from 64-bit floating point to 16-bit signed integer value. The floating point number which was converted had a value greater than what could be represented by a 16-bit signed integer. This resulted in an Operand Error. The data conversion instructions (in Ada code) were not protected from causing an Operand Error, although other conversions of comparable variables in the same place in the code were protected. "In the failure scenario, the primary technical causes are the Operand Error when converting the horizontal bias variable BH, and the lack of protection of this conversion which caused the SRI computer to stop." ---As you can see, it's clearly a programming error. It's a failure to check for overflow on converting a double precision value to a 16-bit integer. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` @@ robin @ 1996-10-04 0:00 ` Michel OLAGNON 1996-10-09 0:00 ` @@ robin 1996-10-04 0:00 ` Joseph C Williams 1996-10-17 0:00 ` Ralf Tilch 2 siblings, 1 reply; 58+ messages in thread From: Michel OLAGNON @ 1996-10-04 0:00 UTC (permalink / raw) In article <532k32$r4r@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (@@ robin) writes: > john@assen.demon.co.uk (John McCabe) writes: > > >Just a point for your information. From clari.tw.space: > > > "An inquiry board investigating the explosion concluded in > >July that the failure was caused by software design errors in a > >guidance system." > > >Note software DESIGN errors - not programming errors. > > >Best Regards > >John McCabe <john@assen.demon.co.uk> > >---If you read the Report, you'll see that that's not the case. >This is what the report says: > > "* The internal SRI software exception was caused during execution of a > data conversion from 64-bit floating point to 16-bit signed integer > value. The floating point number which was converted had a value > greater than what could be represented by a 16-bit signed integer. > This resulted in an Operand Error. The data conversion instructions > (in Ada code) were not protected from causing an Operand Error, > although other conversions of comparable variables in the same place > in the code were protected. > > "In the failure scenario, the primary technical causes are the Operand Error > when converting the horizontal bias variable BH, and the lack of protection > of this conversion which caused the SRI computer to stop." > >---As you can see, it's clearly a programming error. It's a failure >to check for overflow on converting a double precision value to >a 16-bit integer. But if you read a bit further on, it is stated that The reason why three conversions, including the horizontal bias variable one, were not protected, is that it was decided that they were physically bounded or had a wide safety margin (...) The decision was a joint one of the project partners at various contractual levels. Deciding at various contractual levels is not what one usually means by ``programming''. It looks closer to ``design'', IMHO. But, of course, anyone can give any word any meaning. And it might be probable that the action taken in case of protected conversion, and exception, would also have been stop the SRI computer because such a high horizontal bias would have meant that it was broken.... Michel -- | Michel OLAGNON email : Michel.Olagnon@ifremer.fr| | IFREMER: Institut Francais de Recherches pour l'Exploitation de la Mer| ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` Michel OLAGNON @ 1996-10-09 0:00 ` @@ robin 0 siblings, 0 replies; 58+ messages in thread From: @@ robin @ 1996-10-09 0:00 UTC (permalink / raw) molagnon@ifremer.fr (Michel OLAGNON) writes: >In article <532k32$r4r@goanna.cs.rmit.edu.au>, rav@goanna.cs.rmit.edu.au (@@ robin) writes: >> john@assen.demon.co.uk (John McCabe) writes: >> >> >Just a point for your information. From clari.tw.space: >> >> > "An inquiry board investigating the explosion concluded in >> >July that the failure was caused by software design errors in a >> >guidance system." >> >> >Note software DESIGN errors - not programming errors. >> >> >Best Regards >> >John McCabe <john@assen.demon.co.uk> >> >>---If you read the Report, you'll see that that's not the case. >>This is what the report says: >> >> "* The internal SRI software exception was caused during execution of a >> data conversion from 64-bit floating point to 16-bit signed integer >> value. The floating point number which was converted had a value >> greater than what could be represented by a 16-bit signed integer. >> This resulted in an Operand Error. The data conversion instructions >> (in Ada code) were not protected from causing an Operand Error, >> although other conversions of comparable variables in the same place >> in the code were protected. >> >> "In the failure scenario, the primary technical causes are the Operand Error >> when converting the horizontal bias variable BH, and the lack of protection >> of this conversion which caused the SRI computer to stop." >> >>---As you can see, it's clearly a programming error. It's a failure >>to check for overflow on converting a double precision value to >>a 16-bit integer. >But if you read a bit further on, it is stated that > The reason why three conversions, including the horizontal bias variable one, > were not protected, is that it was decided that they were physically bounded > or had a wide safety margin (...) The decision was a joint one of the project > partners at various contractual levels. >Deciding at various contractual levels is not what one usually means by >``programming''. It looks closer to ``design'', IMHO. But, of course, anyone >can give any word any meaning. >And it might be probable that the action taken in case of protected conversion, >and exception, would also have been stop the SRI computer because such a high >horizontal bias would have meant that it was broken.... >| Michel OLAGNON email : Michel.Olagnon@ifremer.fr| But if you read further on .... "However, three of the variables were left unprotected. No reference to justification of this decision was found directly in the source code. Given the large amount of documentation associated with any industrial application, the assumption, although agreed, was essentially obscured, though not deliberately, from any external review." .... you'll see that there was no documentation in the code to explain why these particular 3 (dangerous) conversions were left unprotected. There is the implication that one or more of them might have been overlooked . . . .. Don't place too much reliance on the conclusion of the report, when the detail is right there in the body of the report. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` @@ robin 1996-10-04 0:00 ` Michel OLAGNON @ 1996-10-04 0:00 ` Joseph C Williams 1996-10-06 0:00 ` Wayne Hayes 1996-10-17 0:00 ` Ralf Tilch 2 siblings, 1 reply; 58+ messages in thread From: Joseph C Williams @ 1996-10-04 0:00 UTC (permalink / raw) Why didn't they run the code against an Ariane 5 simulator to reverify the Ariane 4 software what was reused? A good real-time engineering simulation would have caught the problem. ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` Joseph C Williams @ 1996-10-06 0:00 ` Wayne Hayes 0 siblings, 0 replies; 58+ messages in thread From: Wayne Hayes @ 1996-10-06 0:00 UTC (permalink / raw) In article <32551A66.41C6@gsde.hso.link.com>, Joseph C Williams <u6p35@gsde.hso.link.com> wrote: >Why didn't they run the code against an Ariane 5 simulator to >reverify the Ariane 4 software what was reused? Money. (The more cynical among us may say this translates to "stupidity".) -- "Unix is simple and coherent, but it takes || Wayne Hayes, wayne@cs.utoronto.ca a genius (or at any rate, a programmer) to || Astrophysics & Computer Science appreciate its simplicity." -Dennis Ritchie|| http://www.cs.utoronto.ca/~wayne ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-04 0:00 ` @@ robin 1996-10-04 0:00 ` Michel OLAGNON 1996-10-04 0:00 ` Joseph C Williams @ 1996-10-17 0:00 ` Ralf Tilch 1996-10-17 0:00 ` Ravi Sundaram 2 siblings, 1 reply; 58+ messages in thread From: Ralf Tilch @ 1996-10-17 0:00 UTC (permalink / raw) -- Hello, I followed the discussion of the ARIANE 5 failure. I didn't read all the mail's, and I am quite astonished how far and how many details can discussed. Like, What for program-language would have been the best, ..... It's good to know what's happen. I think more important, you built something new (very complex). You invest some billion to develop it. You built it (an ARIANE 5, put several sattelites). The price of it several hundred millions and you don't check as much as possible, make a 'very complete check', especially the software. The reason that the software wasn't checked: It was too 'expensive'?!?!. They forgot murphy's law, which always 'works'. I think you can't design a new car without testing it completely. We test 95% of the construction and after six month selling the new car a weel will fall of at 160km/h. Ok, there was a small problem in the construction-software some wrong values, due to some over- or underflows or whatever. The result, the company probhably will have to pay quite a lot and probhably to close ! -------------------------------------------------------- -DON'T TRUST YOURSELF, TRUST MURPHY'S LAW !!!! "If anything can go wrong, it will." -------------------------------------------------------- With this, have fun and continue the discussion about conversion from 64bit to 16bit values,etc.. RT ________________|_______________________________________|_ | E-mail : R.Tilch@gmd.de | | Tel. : (+49) (0)2241/14-23.69 | ________________|_______________________________________|_ | | ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-17 0:00 ` Ralf Tilch @ 1996-10-17 0:00 ` Ravi Sundaram 1996-10-22 0:00 ` shmuel 0 siblings, 1 reply; 58+ messages in thread From: Ravi Sundaram @ 1996-10-17 0:00 UTC (permalink / raw) Ralf Tilch wrote: > The reason that the software wasn't checked: > It was too 'expensive'?!?!. Yeah, isn't hindsight a wonderful thing? They, whoever were in charge of these decisions, too knew testing is important. But it is impossible to test every subcomponant under every possible condition. There is simply not enough money or time available to do that. Take space shuttle for example. The total computing power available on board is probably as much as used in Nintindo gameboy. The design was frozen in 1970s. Upgrading the computers and software would be so expensive to test and prove they approach it with much trepidation. Richard Feyman was examining the practices of NASA and found that the workers who assembled some large bulkheads had to count bolts from two refrence points. He thought providing four reference points would simplify the job. NASA rejected the proposal because it would involve too many changes to the documentation, procedures and testing. (Surely You are joking, Mr Feyman I? or II?) So praise them for conducting a no nonsense investigation and owning up to the mistakes. Learn to live with failed space shots. They will become as reliable as air travel once we have launched about 10 million rockets. -- Ravi Sundaram. 10/17/96 PS: I am out of here. Going on vacation. Wont read followups for a month. (Opinions are mine, not Ansoft's.) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-17 0:00 ` Ravi Sundaram @ 1996-10-22 0:00 ` shmuel 1996-10-22 0:00 ` Jim Carr 0 siblings, 1 reply; 58+ messages in thread From: shmuel @ 1996-10-22 0:00 UTC (permalink / raw) In <3266741B.4DAA@ansoft.com>, Ravi Sundaram <ravi@ansoft.com> writes: >Ralf Tilch wrote: >> The reason that the software wasn't checked: >> It was too 'expensive'?!?!. > > Yeah, isn't hindsight a wonderful thing? > They, whoever were in charge of these decisions, > too knew testing is important. But it is impossible > to test every subcomponant under every possible > condition. There is simply not enough money or time > available to do that. Why do you assume that it was hindsight? They violated fundamental software engineering principles, and anyone who has been in this business for long should have expected chickens coming home to roost, even if they couldn't predict what would go wrong first. > Richard Feyman was examining the practices of NASA and > found that the workers who assembled some large bulkheads > had to count bolts from two refrence points. He thought > providing four reference points would simplify the job. > NASA rejected the proposal because it would involve > too many changes to the documentation, procedures and > testing. (Surely You are joking, Mr Feyman I? or II?) > > So praise them for conducting a no nonsense investigation > and owning up to the mistakes. Learn to live with > failed space shots. They will become as reliable as > air travel once we have launched about 10 million rockets. I hope that you're talking about Ariane and not NASA Challenger; Feynman's account of the behavior of most of the Rogers Commission, in "Why Do You Care ..." sounds more like a failed coverup than like "owning up to their mistakes", and Feynman had to threaten to air a dissenting opinion on television before they agreed to publish it in their report. Shmuel (Seymour J.) Metz Atid/2 ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-22 0:00 ` shmuel @ 1996-10-22 0:00 ` Jim Carr 1996-10-24 0:00 ` hayim 0 siblings, 1 reply; 58+ messages in thread From: Jim Carr @ 1996-10-22 0:00 UTC (permalink / raw) shmuel.metz@os2bbs.com writes: > >I hope that you're talking about Ariane and not NASA Challenger; Feynman's >account of the behavior of most of the Rogers Commission, in "Why Do >You Care ..." sounds more like a failed coverup than like "owning up to >their mistakes", ... The coverup was not entirely unsuccessful. Feynman did manage to break through and get his dissenting remarks on NASA reliability estimates into the report (as well as into Physics Today), but the coverup did succeed in keeping most people ignorant of the fact that the astronauts did not die until impact with the ocean despite a Miami Herald story pointing that out to its mostly-regional audience. Did you ever see a picture of the crew compartment? -- James A. Carr <jac@scri.fsu.edu> | Raw data, like raw sewage, needs http://www.scri.fsu.edu/~jac | some processing before it can be Supercomputer Computations Res. Inst. | spread around. The opposite is Florida State, Tallahassee FL 32306 | true of theories. -- JAC ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-22 0:00 ` Jim Carr @ 1996-10-24 0:00 ` hayim 1996-10-25 0:00 ` Michel OLAGNON 1996-10-25 0:00 ` Ken Garlington 0 siblings, 2 replies; 58+ messages in thread From: hayim @ 1996-10-24 0:00 UTC (permalink / raw) Unfortunately, I missed the original article describing the Ariane failure. If someone could please, either point me in the right direction as to where I can get a copy, or could even send it to me, I would greatly appreciate it. Thanks very much, Hayim Hendeles E-mail: hayim@platsol.com ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-24 0:00 ` hayim @ 1996-10-25 0:00 ` Michel OLAGNON 1996-10-25 0:00 ` Ken Garlington 1 sibling, 0 replies; 58+ messages in thread From: Michel OLAGNON @ 1996-10-25 0:00 UTC (permalink / raw) In article <54oht1$ln1@orchard.la.platsol.com>, <hayim> writes: >Unfortunately, I missed the original article describing the Ariane failure. >If someone could please, either point me in the right direction as to where >I can get a copy, or could even send it to me, I would greatly appreciate it. > It may be useful to remind the source address for the full report, since many comments seem based only on a presentation summary: http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html Michel -- | Michel OLAGNON email : Michel.Olagnon@ifremer.fr| | IFREMER: Institut Francais de Recherches pour l'Exploitation de la Mer| ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Ariane 5 failure 1996-10-24 0:00 ` hayim 1996-10-25 0:00 ` Michel OLAGNON @ 1996-10-25 0:00 ` Ken Garlington 1 sibling, 0 replies; 58+ messages in thread From: Ken Garlington @ 1996-10-25 0:00 UTC (permalink / raw) hayim wrote: > > Unfortunately, I missed the original article describing the Ariane failure. > If someone could please, either point me in the right direction as to where > I can get a copy, or could even send it to me, I would greatly appreciate it. > > Thanks very much, > > Hayim Hendeles > > E-mail: hayim@platsol.com See: http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html -- LMTAS - "Our Brand Means Quality" For more info, see http://www.lmtas.com or http://www.lmco.com ^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: Real-world education (was: Ariane 5 failure) @ 1996-10-02 0:00 Simon Johnston 0 siblings, 0 replies; 58+ messages in thread From: Simon Johnston @ 1996-10-02 0:00 UTC (permalink / raw) Michael Feldman wrote: > In article <1996Sep29.193602.17369@enterprise.rdd.lmsc.lockheed.com>, > Chris McKnight <cmcknigh@hercii.lasc.lockheed.com> wrote: >=20 > [Rich Pattis' good stuff snipped.] > > > > An excellent bit of teaching, IMHO. Glad to hear they're putting = some > > more of the real world issues in the class room. >=20 > Rich Pattis is indeed an experienced, even gifted teacher of > introductory courses, with a very practical view of what they > should be about. >=20 > Without diminishing Rich Pattis' teaching experience or skill one bit, > I am somewhat perplexed at the unfortunate stereotypical view you > seem to have of CS profs. Yours is the second post today to have > shown evidence of that stereotypical view; both you and the other > poster have industry addresses. I think some of it must come from experience, I have met some really = good, industry focused profs ho teach with a real "useful" view (my = first serious language was COBOL!). I have also met the "computer = science" guys, without whom we would never move forward. I have also met = some inbetween who really don't have that engineering focus or the = science. =20 > This is my 22nd year as a CS prof, I travel a lot in CS education > circles, and - while we, like any population, tend to hit a bell > curve - I've found that there are a lot more of us out here than > you may think with Pattis-like commitment to bring the real world > into our teaching. Mike, I know from your books and postings here the level of engineering = you bring to your teaching, we are discussing (I believe) the balance in = teaching computing as an engineering discipline or as an ad-hoc = individual "art". > Sure, there are theorists, as there are in any field, studying > and teaching computing just because it's "beautiful", with little > reference to real application, and there's a definite place in the > teaching world for them. Indeed, exposure to their "purity" of > approach is healthy for undergraduates - there is no harm at all > in taking on computing - sometimes - as purely an intellectual > exercise. >=20 > But it's a real reach from there to an assumption that most of us > are in that theoretical category. I don't think many of the people I work with have made this leap. =20 > I must say that there's a definite connection between an interest > in Ada and an interest in real-world software; certainly most of > the Ada teachers I've met are more like Pattis than you must think. > Indeed, it's probably our commitment to that "engineering" view > of computing that brings us to like and teach Ada. Certainly (or as in my case COBOL) it leads you into an application = oriented way of thinking which makes you think about requirements, = testing etc. [snip] let me give you a little anecdote f my own.=20 I recently went for a job interview with a very large well-known = software firm. Firstly they wanted me write the code to traverse a = binary tree for which they described the (C) data structures. Then I was = asked to write code to insert a node in a linked list (I had to ask what = the requirements for cases such as the list being empty or the node = already existing where). Finally I was asked to write the code to find = all the anagrams in a given string. There were no business type questions, no true analytical questions, the = things which as an engineer I have to do each day. The problems set me = have a simple and single answer which I don't write each day. I am sure = you can recite off hand the way to traverse a binary tree, but I have to = stop and think because I wrote it ONCE, AGES AGO and wrote it as a = GENERIC which I can REUSE. I know an understanding of these algorithms = is required so that I can decide which of my generics to use, but that = is why I invest in good books! By the way I happen to know someone who works for this firm who told me = that graduate programmers seem to do well in their interview process, he = once interviewed an engineer with 20 years industry experience and a PhD = who got up and left half way through the interview in disgust. with StandardDisclaimer; use StandardDisclaimer; package Sig is --,----------------------------------------------------------------------= ---. --|Simon K. Johnston - Development Engineer (C++/Ada95) |ICL Retail = Systems | --|-----------------------------------------------------|3/4 Willoughby = Road| --|Internet : skj@acm.org |Bracknell = | --|Telephone: +44 (0)1344 476320 Fax: +44 (0)1344 476302|Berkshire = | --|Internal : 7261 6320 OP Mail: S.K.Johnston@BRA0801 |RG12 8TJ = | --|WWW URL : http://www.acm.org/~skj/ |United Kingdom = | --`----------------------------------------------------------------------= ---' end Sig; ^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~1996-10-25 0:00 UTC | newest] Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <agrapsDy4oJH.29G@netcom.com> 1996-09-25 0:00 ` Ariane 5 failure @@ robin 1996-09-25 0:00 ` Bob Kitzberger 1996-09-26 0:00 ` Ronald Kunne 1996-09-26 0:00 ` Matthew Heaney 1996-09-27 0:00 ` Wayne Hayes 1996-09-27 0:00 ` Richard Pattis 1996-09-29 0:00 ` Alan Brain 1996-09-29 0:00 ` Dann Corbit 1996-09-29 0:00 ` Chris McKnight 1996-09-29 0:00 ` Real-world education (was: Ariane 5 failure) Michael Feldman 1996-10-01 0:00 ` Ariane 5 failure Ken Garlington 1996-09-27 0:00 ` Ronald Kunne 1996-09-27 0:00 ` Lawrence Foard 1996-10-04 0:00 ` @@ robin 1996-09-28 0:00 ` Ken Garlington 1996-09-28 0:00 ` Ken Garlington 1996-09-29 0:00 ` Alan Brain 1996-09-29 0:00 ` Robert A Duff 1996-09-30 0:00 ` Wayne L. Beavers 1996-10-01 0:00 ` Ken Garlington 1996-10-01 0:00 ` Wayne L. Beavers 1996-10-01 0:00 ` Ken Garlington 1996-10-02 0:00 ` Sandy McPherson 1996-10-03 0:00 ` Richard A. O'Keefe 1996-10-01 0:00 ` Ken Garlington 1996-09-28 0:00 ` Ken Garlington 1996-09-27 0:00 ` Ken Garlington 1996-09-27 0:00 ` Alan Brain 1996-09-28 0:00 ` Ken Garlington 1996-09-29 0:00 ` Louis K. Scheffer 1996-09-25 0:00 ` Michel OLAGNON 1996-09-25 0:00 ` Byron Kauffman 1996-09-25 0:00 ` A. Grant 1996-09-25 0:00 ` Ken Garlington 1996-09-26 0:00 ` Byron Kauffman 1996-09-27 0:00 ` A. Grant 1996-09-26 0:00 ` Sandy McPherson 1996-09-25 0:00 ` Chris Morgan 1996-09-27 0:00 ` John McCabe 1996-10-01 0:00 ` Michael Dworetsky 1996-10-04 0:00 ` Steve Bell 1996-10-07 0:00 ` Ken Garlington 1996-10-09 0:00 ` @@ robin 1996-10-09 0:00 ` Steve O'Neill 1996-10-12 0:00 ` Alan Brain 1996-10-04 0:00 ` @@ robin 1996-10-04 0:00 ` Michel OLAGNON 1996-10-09 0:00 ` @@ robin 1996-10-04 0:00 ` Joseph C Williams 1996-10-06 0:00 ` Wayne Hayes 1996-10-17 0:00 ` Ralf Tilch 1996-10-17 0:00 ` Ravi Sundaram 1996-10-22 0:00 ` shmuel 1996-10-22 0:00 ` Jim Carr 1996-10-24 0:00 ` hayim 1996-10-25 0:00 ` Michel OLAGNON 1996-10-25 0:00 ` Ken Garlington 1996-10-02 0:00 Real-world education (was: Ariane 5 failure) Simon Johnston
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox