From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 107079,eca28648989efca9 X-Google-Attributes: gid107079,public X-Google-Thread: f74ae,eca28648989efca9 X-Google-Attributes: gidf74ae,public X-Google-Thread: 103376,885dab3998d28a4 X-Google-Attributes: gid103376,public X-Google-Thread: 101deb,885dab3998d28a4 X-Google-Attributes: gid101deb,public From: Sandy McPherson Subject: Re: Ariane 5 failure Date: 1996/10/02 Message-ID: <325255C4.2CE9@wgs.estec.esa.nl> X-Deja-AN: 186658316 distribution: inet references: <1780FB1E3.KUNNE@frcpn11.in2p3.fr> <324F1157.625C@dynamite.com.au> <52p49m$kug@beyond-software.com> <3251322B.1076@lmtas.lmco.com> <52s00v$oj1@beyond-software.com> <32515277.417E@lmtas.lmco.com> content-type: text/plain; charset=us-ascii organization: European Space Agency mime-version: 1.0 newsgroups: sci.astro,sci.math.num-analysis,comp.lang.pl1,comp.lang.ada x-mailer: Mozilla 3.0 (X11; I; SunOS 5.4 sun4) Date: 1996-10-02T00:00:00+00:00 List-Id: Ken Garlington wrote: > > Wayne L. Beavers wrote: > > > > Ken Garlington wrote: > > > > > That's actually a pretty common rule of thumb for safety-critical systems. > > > Unfortunately, read-only memory isn't exactly read-only. For example, hardware errors > > > can cause a random change in the memory. So, it's not a perfect fix. > > > > Your right, but the risk and probability of memory failures is pretty low I would think. I have never seen > > or heard of a memory failure in any of the systems that I have worked on. I don't know what the current > > technology is but I can remember quite awhile ago that at least one vendor was claiming that ALL double bit > > memory errors were fully detectable and recoverable, ALL triple bit errors were detectable but only some were > > correctable. But I also don't work on realtime systems, my experience is with commercial systems. > > > > Are you refering to on-board systems for aircraft where weight and vibration are also a factor or are you > > refering to ground base systems that don't have similar constraints? > > On-board systems. The failure _rate_ is usually pretty low, but in a harsh environment > you can get quite a few failure _sources_, including mechanical failures (stress > fractures, solder loss due to excessive heat, etc.), electrical failures (EMI, > lightening), and so forth. You don't have to take out the actual chip, of course: just > as bad is a failure in the address or data lines connecting the memory to the CPU. Add > a memory management unit to the mix, along with various I/O devices mapped into the > memory space, and you can get a whole slew of memory-related failure modes. > > You can also get into some neat system failures. For example, some "read-only" memory > actually allows writes to the execution space in certain modes, to allow quick > reprogramming. If you have a system failure that allows writes at the wrong time, > coupled with a failure that does a write where it shouldn't... It depends upon what you mean by a memory failure. I can imagine that the chances of your memory being trashed completely is very very low and in rad-hardened systems the chances of a single-event-upset (SEU) is also low, but has to be guarded against. I have recently been working on a system where the specified hardware has a parity bit for each octet of memory, so SEUs which flip bit values in the memory can be detected. This parity check is built into the system's micro-code. Similarily the definition of what is and isn't read only memory is usually a feature of the processor and or operating system being used. A compiler cannot put code into read only areas of memory, unless the processor its micro-code and/or o/s are playing ball as well. If you are unfortunate enough to be in this situation (are there any such systems left?), then the only thing you can do is DIY, but the compiler can't help you much, other than the for-use-at. I once read an interesting definition of two types of bugs in "transaction processing" by Gray & Reuter, Heisenbugs and Bohrbugs. Identification of potential Heisenbugs, estimation of probability of occurence, impact to system on occurrence and appropriate recovery procedures are part of the risk analysis. An SEU is a classic Heisenbug, which IMO is out of scope of compiler checks, because they can result in a valid but incorrect value for a variable and are just as likely to occur in the code section as the data section of your application. A complete memory failure is of course beyond the scope of the compiler. IMO an Ada compiler's job (when used properly) is to make sure that syntactic Bohrbugs do not enter a system and all semantic Bohrbugs get detected at runtime (as Bohrbugs, by definition have a fixed location and are certain to occur under given conditions- the Ariane 5 bug was definitely a Bohrbug). The compiler cannot do anything about Heisenbugs (because they only have a probability of occurrence). To handle Heisenbugs generally you need to have a detection, reporting and handling mechanism: built using the hardwares error detection, generally accepted software practices (e.g. duplicate storage, process-pairs) and an application dependent exception handling mechanism. Ada provides the means to trap the error condition once it has been reported, but it does not implement exception handlers for you, other than the default "I'm gone..."; additionally if the underlying system does not provide the means to detect a probable error, you have to implement the means of detectin the probel and reporting this through the Ada exception handling yourself. -- Sandy McPherson MBCS CEng. tel: +31 71 565 4288 (w) ESTEC/WAS P.O. Box 299 NL-2200AG Noordwijk