From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 107079,eca28648989efca9
X-Google-Attributes: gid107079,public
X-Google-Thread: f74ae,eca28648989efca9
X-Google-Attributes: gidf74ae,public
X-Google-Thread: 103376,885dab3998d28a4
X-Google-Attributes: gid103376,public
X-Google-Thread: 101deb,885dab3998d28a4
X-Google-Attributes: gid101deb,public
From: Sandy McPherson <sandy@wgs.estec.esa.nl>
Subject: Re: Ariane 5 failure
Date: 1996/10/02
Message-ID: <325255C4.2CE9@wgs.estec.esa.nl>
X-Deja-AN: 186658316
distribution: inet
references: <agrapsDy4oJH.29G@netcom.com>
 <mheaney-ya023180002609962252500001@news.ni.net>
 <1780FB1E3.KUNNE@frcpn11.in2p3.fr> <324F1157.625C@dynamite.com.au>
 <DyHwL2.I4D@world.std.com> <52p49m$kug@beyond-software.com>
 <3251322B.1076@lmtas.lmco.com> <52s00v$oj1@beyond-software.com>
 <32515277.417E@lmtas.lmco.com>
content-type: text/plain; charset=us-ascii
organization: European Space Agency
mime-version: 1.0
newsgroups: sci.astro,sci.math.num-analysis,comp.lang.pl1,comp.lang.ada
x-mailer: Mozilla 3.0 (X11; I; SunOS 5.4 sun4)
Date: 1996-10-02T00:00:00+00:00
List-Id: <comp.lang.ada>


Ken Garlington wrote:
> 
> Wayne L. Beavers wrote:
> >
> > Ken Garlington wrote:
> >
> > > That's actually a pretty common rule of thumb for safety-critical systems.
> > > Unfortunately, read-only memory isn't exactly read-only. For example, hardware errors
> > > can cause a random change in the memory. So, it's not a perfect fix.
> >
> >   Your right, but the risk and probability of memory failures is pretty low I would think.  I have never seen
> > or heard of a memory failure in any of the systems that I have worked on.  I don't know what the current
> > technology is but I can remember quite awhile ago that at least one vendor was claiming that ALL double bit
> > memory errors were fully detectable and recoverable, ALL triple bit errors were detectable but only some were
> > correctable.  But I also don't work on realtime systems, my experience is with commercial systems.
> >
> >   Are you refering to on-board systems for aircraft where weight and vibration are also a factor or are you
> > refering to ground base systems that don't have similar constraints?
> 
> On-board systems. The failure _rate_ is usually pretty low, but in a harsh environment
> you can get quite a few failure _sources_, including mechanical failures (stress
> fractures, solder loss due to excessive heat, etc.), electrical failures (EMI,
> lightening), and so forth. You don't have to take out the actual chip, of course: just
> as bad is a failure in the address or data lines connecting the memory to the CPU. Add
> a memory management unit to the mix, along with various I/O devices mapped into the
> memory space, and you can get a whole slew of memory-related failure modes.
> 
> You can also get into some neat system failures. For example, some "read-only" memory
> actually allows writes to the execution space in certain modes, to allow quick
> reprogramming. If you have a system failure that allows writes at the wrong time,
> coupled with a failure that does a write where it shouldn't...

It depends upon what you mean by a memory failure. I can imagine that
the chances of your memory being trashed completely is very very low and
in rad-hardened systems the chances of a single-event-upset (SEU) is
also low, but has to be guarded against. I have recently been working on
a system where the specified hardware has a parity bit for each octet of
memory, so SEUs which flip bit values in the memory can be detected.
This parity check is built into the system's micro-code. 

Similarily the definition of what is and isn't read only memory is
usually a feature of the processor and or operating system being used. A
compiler cannot put code into read only areas of memory, unless the
processor its micro-code and/or o/s are playing ball as well. If you are
unfortunate enough to be in this situation (are there any such systems
left?), then the only thing you can do is DIY, but the compiler can't
help you much, other than the for-use-at.

I once read an interesting definition of two types of bugs in
"transaction processing" by Gray & Reuter, Heisenbugs and Bohrbugs. 

Identification of potential Heisenbugs, estimation of probability of
occurence, impact to system on occurrence and appropriate recovery
procedures are part of the risk analysis. An SEU is a classic Heisenbug,
which IMO is out of scope of compiler checks, because they can result in
a valid but incorrect value for a variable and are just as likely to
occur in the code section as the data section of your application. A
complete memory failure is of course beyond the scope of the compiler.

IMO an Ada compiler's job (when used properly) is to make sure that
syntactic Bohrbugs do not enter a system and all semantic Bohrbugs get
detected at runtime (as Bohrbugs, by definition have a fixed location
and are certain to occur under given conditions- the Ariane 5 bug was
definitely a Bohrbug). The compiler cannot do anything about Heisenbugs
(because they only have a probability of occurrence). To handle
Heisenbugs generally you need to have a detection, reporting and
handling mechanism: built using the hardwares error detection, generally
accepted software practices (e.g. duplicate storage, process-pairs) and
an application dependent exception handling mechanism. Ada provides the
means to trap the error condition once it has been reported, but it does
not implement exception handlers for you, other than the default "I'm
gone..."; additionally if the underlying system does not provide the
means to detect  a probable error, you have to implement the means of
detectin the probel and reporting this through the Ada exception
handling yourself. 


-- 
Sandy McPherson	MBCS CEng.	tel: 	+31 71 565 4288 (w)
ESTEC/WAS
P.O. Box 299
NL-2200AG Noordwijk