comp.lang.ada
 help / color / mirror / Atom feed
From: Sandy McPherson <sandy@wgs.estec.esa.nl>
Subject: Re: Ariane 5 failure
Date: 1996/10/02
Date: 1996-10-02T00:00:00+00:00	[thread overview]
Message-ID: <325255C4.2CE9@wgs.estec.esa.nl> (raw)
In-Reply-To: 32515277.417E@lmtas.lmco.com


Ken Garlington wrote:
> 
> Wayne L. Beavers wrote:
> >
> > Ken Garlington wrote:
> >
> > > That's actually a pretty common rule of thumb for safety-critical systems.
> > > Unfortunately, read-only memory isn't exactly read-only. For example, hardware errors
> > > can cause a random change in the memory. So, it's not a perfect fix.
> >
> >   Your right, but the risk and probability of memory failures is pretty low I would think.  I have never seen
> > or heard of a memory failure in any of the systems that I have worked on.  I don't know what the current
> > technology is but I can remember quite awhile ago that at least one vendor was claiming that ALL double bit
> > memory errors were fully detectable and recoverable, ALL triple bit errors were detectable but only some were
> > correctable.  But I also don't work on realtime systems, my experience is with commercial systems.
> >
> >   Are you refering to on-board systems for aircraft where weight and vibration are also a factor or are you
> > refering to ground base systems that don't have similar constraints?
> 
> On-board systems. The failure _rate_ is usually pretty low, but in a harsh environment
> you can get quite a few failure _sources_, including mechanical failures (stress
> fractures, solder loss due to excessive heat, etc.), electrical failures (EMI,
> lightening), and so forth. You don't have to take out the actual chip, of course: just
> as bad is a failure in the address or data lines connecting the memory to the CPU. Add
> a memory management unit to the mix, along with various I/O devices mapped into the
> memory space, and you can get a whole slew of memory-related failure modes.
> 
> You can also get into some neat system failures. For example, some "read-only" memory
> actually allows writes to the execution space in certain modes, to allow quick
> reprogramming. If you have a system failure that allows writes at the wrong time,
> coupled with a failure that does a write where it shouldn't...

It depends upon what you mean by a memory failure. I can imagine that
the chances of your memory being trashed completely is very very low and
in rad-hardened systems the chances of a single-event-upset (SEU) is
also low, but has to be guarded against. I have recently been working on
a system where the specified hardware has a parity bit for each octet of
memory, so SEUs which flip bit values in the memory can be detected.
This parity check is built into the system's micro-code. 

Similarily the definition of what is and isn't read only memory is
usually a feature of the processor and or operating system being used. A
compiler cannot put code into read only areas of memory, unless the
processor its micro-code and/or o/s are playing ball as well. If you are
unfortunate enough to be in this situation (are there any such systems
left?), then the only thing you can do is DIY, but the compiler can't
help you much, other than the for-use-at.

I once read an interesting definition of two types of bugs in
"transaction processing" by Gray & Reuter, Heisenbugs and Bohrbugs. 

Identification of potential Heisenbugs, estimation of probability of
occurence, impact to system on occurrence and appropriate recovery
procedures are part of the risk analysis. An SEU is a classic Heisenbug,
which IMO is out of scope of compiler checks, because they can result in
a valid but incorrect value for a variable and are just as likely to
occur in the code section as the data section of your application. A
complete memory failure is of course beyond the scope of the compiler.

IMO an Ada compiler's job (when used properly) is to make sure that
syntactic Bohrbugs do not enter a system and all semantic Bohrbugs get
detected at runtime (as Bohrbugs, by definition have a fixed location
and are certain to occur under given conditions- the Ariane 5 bug was
definitely a Bohrbug). The compiler cannot do anything about Heisenbugs
(because they only have a probability of occurrence). To handle
Heisenbugs generally you need to have a detection, reporting and
handling mechanism: built using the hardwares error detection, generally
accepted software practices (e.g. duplicate storage, process-pairs) and
an application dependent exception handling mechanism. Ada provides the
means to trap the error condition once it has been reported, but it does
not implement exception handlers for you, other than the default "I'm
gone..."; additionally if the underlying system does not provide the
means to detect  a probable error, you have to implement the means of
detectin the probel and reporting this through the Ada exception
handling yourself. 


-- 
Sandy McPherson	MBCS CEng.	tel: 	+31 71 565 4288 (w)
ESTEC/WAS
P.O. Box 299
NL-2200AG Noordwijk




  reply	other threads:[~1996-10-02  0:00 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <agrapsDy4oJH.29G@netcom.com>
1996-09-25  0:00 ` Ariane 5 failure @@           robin
1996-09-25  0:00   ` Bob Kitzberger
1996-09-26  0:00     ` Ronald Kunne
1996-09-26  0:00       ` Matthew Heaney
1996-09-27  0:00         ` Wayne Hayes
1996-09-27  0:00           ` Richard Pattis
1996-09-29  0:00             ` Chris McKnight
1996-09-29  0:00               ` Real-world education (was: Ariane 5 failure) Michael Feldman
1996-09-29  0:00             ` Ariane 5 failure Dann Corbit
1996-09-29  0:00             ` Alan Brain
1996-10-01  0:00             ` Ken Garlington
1996-09-27  0:00         ` Ronald Kunne
1996-09-27  0:00           ` Lawrence Foard
1996-10-04  0:00             ` @@           robin
1996-09-28  0:00           ` Ken Garlington
1996-09-28  0:00             ` Ken Garlington
1996-09-29  0:00           ` Alan Brain
1996-09-29  0:00             ` Robert A Duff
1996-09-30  0:00               ` Wayne L. Beavers
1996-10-01  0:00                 ` Ken Garlington
1996-10-01  0:00                   ` Wayne L. Beavers
1996-10-01  0:00                     ` Ken Garlington
1996-10-02  0:00                       ` Sandy McPherson [this message]
1996-10-03  0:00                 ` Richard A. O'Keefe
1996-10-01  0:00             ` Ken Garlington
1996-09-28  0:00         ` Ken Garlington
1996-09-27  0:00       ` Alan Brain
1996-09-28  0:00         ` Ken Garlington
1996-09-27  0:00       ` Ken Garlington
1996-09-29  0:00       ` Louis K. Scheffer
1996-09-25  0:00   ` Michel OLAGNON
1996-09-25  0:00     ` Chris Morgan
1996-09-25  0:00     ` Byron Kauffman
1996-09-25  0:00       ` A. Grant
1996-09-25  0:00         ` Ken Garlington
1996-09-26  0:00         ` Sandy McPherson
1996-09-26  0:00         ` Byron Kauffman
1996-09-27  0:00           ` A. Grant
1996-09-27  0:00   ` John McCabe
1996-10-01  0:00     ` Michael Dworetsky
1996-10-04  0:00       ` Steve Bell
1996-10-07  0:00         ` Ken Garlington
1996-10-09  0:00         ` @@           robin
1996-10-09  0:00           ` Steve O'Neill
1996-10-12  0:00             ` Alan Brain
1996-10-04  0:00     ` @@           robin
1996-10-04  0:00       ` Michel OLAGNON
1996-10-09  0:00         ` @@           robin
1996-10-04  0:00       ` Joseph C Williams
1996-10-06  0:00         ` Wayne Hayes
1996-10-17  0:00       ` Ralf Tilch
1996-10-17  0:00         ` Ravi Sundaram
1996-10-22  0:00           ` shmuel
1996-10-22  0:00             ` Jim Carr
1996-10-24  0:00               ` hayim
1996-10-25  0:00                 ` Ken Garlington
1996-10-25  0:00                 ` Michel OLAGNON
1996-10-01  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-02  0:00 ` Alan Brain
1996-10-02  0:00   ` Ken Garlington
1996-10-02  0:00     ` Matthew Heaney
1996-10-04  0:00       ` Robert S. White
1996-10-05  0:00         ` Robert Dewar
1996-10-05  0:00         ` Alan Brain
1996-10-06  0:00           ` Robert S. White
1996-10-03  0:00     ` Alan Brain
1996-10-04  0:00       ` Ken Garlington
  -- strict thread matches above, loose matches on Subject: below --
1996-10-01  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-02  0:00 ` Ken Garlington
1996-10-01  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-02  0:00 ` Matthew Heaney
1996-10-04  0:00   ` Ken Garlington
1996-10-05  0:00     ` Robert Dewar
1996-10-06  0:00       ` Keith Thompson
1996-10-10  0:00       ` Ken Garlington
1996-10-14  0:00       ` Matthew Heaney
1996-10-15  0:00         ` Robert Dewar
1996-10-16  0:00         ` Ken Garlington
1996-10-18  0:00           ` Keith Thompson
1996-10-18  0:00             ` Ken Garlington
1996-10-18  0:00             ` Samuel T. Harris
1996-10-21  0:00               ` Ken Garlington
1996-10-23  0:00           ` robin
1996-10-02  0:00 ` Robert I. Eachus
1996-10-02  0:00   ` Ken Garlington
1996-10-03  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-03  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-03  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-14  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-15  0:00 ` Robert I. Eachus
1996-10-15  0:00   ` Robert Dewar
1996-10-16  0:00     ` Michael F Brenner
1996-10-16  0:00       ` Robert Dewar
1996-10-23  0:00 ` robin
1996-10-16  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-18  0:00 ` Ken Garlington
1996-10-19  0:00   ` Frank Manning
1996-10-21  0:00     ` Norman H. Cohen
1996-10-21  0:00 Marin David Condic, 407.796.8997, M/S 731-93
1996-10-22  0:00 ` Adam Beneschan
1996-10-28  0:00 Marin David Condic, 561.796.8997, M/S 731-93
1996-10-29  0:00 ` Ken Garlington
1996-11-08  0:00   ` robin
1996-10-28  0:00 Marin David Condic, 561.796.8997, M/S 731-93
1996-10-31  0:00 Marin David Condic, 561.796.8997, M/S 731-93
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox