From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID
	autolearn=no autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: fac41,9a0ff0bffdf63657
X-Google-Attributes: gidfac41,public
X-Google-Thread: f43e6,9a0ff0bffdf63657
X-Google-Attributes: gidf43e6,public
X-Google-Thread: 103376,4b06f8f15f01a568
X-Google-Attributes: gid103376,public
X-Google-Thread: 1108a1,9a0ff0bffdf63657
X-Google-Attributes: gid1108a1,public
From: duncan@yc.estec.esa.nlx
Subject: Re: Software landmines (loops)
Date: 1998/09/11
Message-ID: <Ez4F4H.5Mx@yc.estec.esa.nl>#1/1
X-Deja-AN: 390103466
Sender: duncan@yc.estec.esa.nlx
References: <slrn6sopmg.4fg.jdege@jdege.visi.com>
 <6t4dmi$rhp@flatland.dimensional.com> <Ez0D7y.DCG@yc.estec.esa.nl>
 <6taqch$42b@flatland.dimensional.com>
Organization: ESA/ESTEC/YC, Noordwijk, The Netherlands
Newsgroups: comp.lang.eiffel,comp.object,comp.software-eng,comp.lang.ada
Date: 1998-09-11T00:00:00+00:00
List-Id: <comp.lang.ada>

In article <6taqch$42b@flatland.dimensional.com>,
Jim Cochrane <jtc@dimensional.com> wrote:
>
>I understand that things are not black and white in the "real world" and
>that compromises sometimes need to be made.  (For one thing, the fact that
>C was used here, which has no exception handling mechanism, is a factor in
>what path to take.)  However, I think if I was in this situation, I would
>ask a few questions, such as:
>
>What are the error handling requirements in the case where a pointer is null?
>If one of these errors does occur, is this a coding error, a bug?
>If so, where would the source of this bug likely be?
>If one of these errors occurred, will the program still be able to run
>correctly?  In other words, can the program recover from the error, or
>should it report the error and terminate in order to not cause a problem,
>such as corrupted data?
>Who (what module(s) or routine(s)) is responsible for building the tree so that
>the required nodes were not null?  If the answer is that no-one was
>responsible, can the design be changed so that this responsibility can be
>assigned to a particular module or modules?
>Who (what module or routine) is responsible for setting things right (if
>that is possible) if an error does occur?
>Can the fact that a certain depth of the tree is required be considered a
>contract that must be established by some (direct or indirect) client
>routine, even if it is in another process?
>Can the design be changed so that, even if assertions cannot reasonably be
>used, the structure is less awkward?  (At the very least, it seems, the
>original "do_something" function could be structured to check for the error
>and report it first, rather than nesting the check of each node, as the
>original did:
>
>    if (null_descendant (root, required_depth, &null_depth))
>    {
>	//report that error occurred at depth null_depth and deal with it
>    }
>    else
>    {
>        really_do_something(top->child...->child);
>	...
>    })
>
>I suppose my main point is that rather than simply following an edict, it
>is important to ask questions like these to find out if there might be a
>better way of doing things.  If this was done, and the decision to proceed
>as you described was done for a good reason, then that is basically all you
>can ask.


I agree with everything you say.

The system I've been describing was one I worked on 10 years
ago, but it taught everyone concerned many valuable lessons.

Unfortunately, we inherited all of this code (in C) from a
previous programming group in a different country, so all of
the decisions had already been made and we had to go along
with the existing style.

With the usual tight schedule for such things, it was simply
not possible to revisit the original analysis and design
decisions.

As I've stressed before, the system consisted of 7 processes
communicating via shared memory, and I believe that the
obsessive checking of all pointers stemmed from paranoia
about which process could update what and when.

As it turned out, the project was canned after our group had
been working on it for a year, mainly because it was just
too slow, and just for the hell of it someone instrumented
some of of the code before all 4000+ source files were wiped
from disk and we moved on to other things.  He discovered
that the paranoia was not well founded.  He discovered that
only two out of the seven processes actually had need to
access the data simultaneously, and even then they took
copies on which to work and derived results into their own
area of shared memory. No other processes could run until
these two completed, so there was no chance of incomplete
data being read prematurely. A lot of the code was simply
over-engineered to take into account something that was not
likely to happen.  All access to the shared memory was
controlled by semaphores. All pointer access was checked.
No wonder it ran so slowly.

Maybe earlier designs were more likely to suffer from
problems - it ran on custom built hardware until we got our
hands on it - or it was considered that even a crash
resulting from the unlikely was still not acceptable.

The whole purpose of this string of articles is to illustrate
that sometimes it is not always possible to follow all of the
'good practices' that you have learned, and that there will
always be compromises.


Cheers
Duncan

This is my article, not my employer's, with my opinions and my disclaimer!
--
Duncan Gibson, ESTEC/YCV, Postbus 299, 2200AG Noordwijk, The Netherlands
Tel: +31 71 5654013   Fax: +31 71 5656142  Email: duncan@yc.estec.esa.nlx
To avoid junk email my quoted address is incorrect. Use nl instead of nlx.