From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: *
X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_05,REPLYTO_WITHOUT_TO_CC
	autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,59dddae4a1f01e1a
X-Google-Attributes: gid103376,public
From: JP Thornley <jpt@diphi.demon.co.uk>
Subject: Re: Need help with PowerPC/Ada and realtime tasking
Date: 1996/05/25
Message-ID: <355912560wnr@diphi.demon.co.uk>
X-Deja-AN: 156718968
x-nntp-posting-host: diphi.demon.co.uk
references: <d-struble.107.001A95C3@ti.com> <1026696wnr@diphi.demon.co.uk>
 <Pine.GSO.3.92.960521204245.17309A-100000@nunic.nu.edu>
x-mail2news-path: disperse.demon.co.uk!post.demon.co.uk!diphi.demon.co.uk
organization: None
reply-to: jpt@diphi.demon.co.uk
newsgroups: comp.lang.ada
Date: 1996-05-25T00:00:00+00:00
List-Id: <comp.lang.ada>


Richard Riehle <rriehle@nunic.nu.edu> writes, in a follow-up on 
safety-critical software using interrupts and tasking:-

>   The main requirement of safety-critical code is that it be "safe."

My view is that code can never be judged as safe or unsafe - only 
correct or incorrect.  However my usage of the words "safe" - and 
"safety-critical" carries a lot of additional baggage, and it is 
possible that we are differing over the meaning of these words rather 
than anything fundamental.

So here are my meanings (this could get quite lengthy and it's rather 
off-topic for cla, so bail out now if not really interested).

Software *on its own* is incapable of causing harm.  For this to occur, 
it must be part of a larger system that translates the outputs of the 
software into actions in the real world - eg moving actuators or 
displaying information.  So safety is an attribute of a system.

In assessing the safety of a system, the process starts with hazard 
identification.  A hazard is an event that has a reasonable chance of 
resulting in a serious outcome (eg death or serious injury to a person, 
major financial loss or widespread environmental damage).
For example - a traffic light controlled road junction is a system; a 
hazard (possibly the only one) could be 'collision between vehicles 
using the junction'.
[Note - other people use 'hazard' with a different meaning; here I'm 
giving the meaning I use and I'm *not* arguing that it's the only 
correct meaning.]

Hazard analysis then identifies the mechanisms that could give rise to 
the hazard.  For example:-
1. 'vehicle crosses junction when lights are on red' or 
2. 'lights indicate green in conflicting directions'

The first of these could be further analysed as:-
1a. 'driver ignores red light'
1b. 'weather conditions make light difficult to see'
1c. 'failure of the vehicle's braking mechanism'
etc.

This process continues until specific failures of individual components 
of the system have been identified.

[Time for more caveats - system safety isn't really my area, also this 
is only one of a number of different ways of doing hazard analysis - 
it's still very much a developing technology (see "Safeware" by Nancy 
Levenson)].

Based upon this analysis each component of the system can be given a 
required integrity rating.  In many cases, failure of a single component 
does not lead to the hazard unless there is an independent failure of 
one or more other components - so the required integrity level of each 
component can be reduced.  A _safety-critical_ rating is given to any 
component where a failure can lead to the hazard without the need for 
any independent failure occurring.

Clearly any safety-critical component must have a very low failure rate 
as the overall failure rate for the system cannot be less than the sum 
of the failure rates for the safety-critical components.

Following this process, and the prediction of failure rates for the 
components, the system can be judged as _safe_ or unsafe on a calculated 
probability of the hazard occurring.  It is often measured as the rate 
of the hazard occuring over a defined period of operation - typical 
figures might be 10^-6 to 10^-9 per hour depending on the perceived 
severity of the hazard, rates of exposure, etc.

So why do I say that software cannot be considered safe?

There are no meaningful failure modes for a software component, since a 
software failure can rarely be contained to only part of that component 
- it either works without failure or fails completely.  The effects of a 
software failure are assumed to be whatever are the worst possible in 
the situation that is currently under analysis.

Given that we cannot measure software to the rates quoted above, any 
software component rated as safety-critical has to be given a failure 
rate of zero in the system safety assessment.  (This places quite severe 
requirements on the software development team and their process ;-).

So safety is measured by (usually) small but definitely non-zero 
numbers; software is either correct or not, with no numeric scale.

Sorry to take so long to get there, but I thought it worthwhile trying 
to get my meanings as clear as possible.

Phil Thornley

-- 
------------------------------------------------------------------------
| JP Thornley    EMail jpt@diphi.demon.co.uk                           |
------------------------------------------------------------------------