From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,6394e5e171f847d1
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2001-09-06 06:42:55 PST
Path: 
 archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu!newsfeeds.belnet.be!news.belnet.be!newsfeed.online.be!195.129.110.18.MISMATCH!bnewspeer00.bru.ops.eu.uu.net!emea.uu.net!newsfeed.siemens.de!news.siemens.de!news.mch.sbs.de!not-for-mail
From: Alfred Hilscher <Alfred.Hilscher@icn.siemens.de>
Newsgroups: comp.lang.ada
Subject: Re: Ada OS Kernel features
Date: Thu, 06 Sep 2001 15:42:13 +0200
Organization: Siemens AG
Message-ID: <3B977D35.B3B7581B@icn.siemens.de>
References: <9n4euv$t9m$1@slb6.atl.mindspring.net>
 <3B964C7A.BC04374E@icn.siemens.de> <9n5o9n$37a$1@slb7.atl.mindspring.net>
NNTP-Posting-Host: 139.21.122.158
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 4.5 [en] (WinNT; I)
X-Accept-Language: en
Xref: archiver1.google.com comp.lang.ada:12793
Date: 2001-09-06T15:42:13+02:00
List-Id: <comp.lang.ada>


Brian Catlin wrote:
> 
> > You should be able to "overload" a driver. What I mean ?  Lets assume
> > you have a simple grafic driver on bootup, then you load a "better"
> > (more complex, higher resolution, 3D excelerator ...) one. If this one
> > crashes, then it should simply be unloaded and the system should
> > continue work with the (simple) default driver - instead of showing a
> > "blue screen" ;-)
> 
> My first reaction to this was "Not Possible".  However, that isn't entirely
> true; it is just *VERY VERY* difficult.  A driver runs in kernel mode, and has
> access to system data structures.  If a driver corrupts a system data structure,
> how do you detect this, repair it, and continue?  In such instances, it is much
> better to bugcheck (blue screen) the system than try to continue.  Consider, if
> the system is slightly corrupted and continues to operate, there is the very
> real possibility that your data will be corrupted without your knowledge.  This
> was Win98's philosophy, and it was a disaster.  VMS and NT (and others) stop the
> system dead in its tracks to prevent hidden corruption.

Ok, I agree that there may be drivers where it would be hard. But - lets
assume a driver for graphic card (the ones I've got most problems with).
After the driver crashes, you reinitialize the card and continue work
(e.g. with low resolution). How to do the switch over ? As far as I know
drivers are accessed via a dispatch table (OS/2, Win). So if an other
driver is loaded, "stack" the previous dispatch table. When it crashes
then reload the code of the previous driver and "unstack" the dispatch
table. So maybe some information gets lost (e.g. the screen gets blank),
but it should be possible to either repeat the last action (if the
driver calls are "logged"), or to wait for "autorepair" (e.g. a repaint
message to all windows when a driver cahnge occures).

For a NIC reinitialize its interrupts and DMA channels and continue with
the old driver. Maybe a few packets get lost, but that can always happen
in a network, so it is no new situation. For an SCSI device driver
reinitialize the hardware, repeat the last action or accept loss of
data. For harddisc the filesystem should be able to repair it, for
scanner, the user should repeat the scan (if the PC reboots he must do
it, too), and so on. I think failure on one transaction is less worse
then a complete system crash.

I don't think that every driver _must_ have write access to system
internal data. An graphic driver for example does not need to write in
the process-table (please correct me if I'm wrong). If there are drivers
that need to do so, they should not access these data structures
directly, but via access procedures (which could do checks). And even if
a driver corrupts system internal data, then hopefully they belong only
to one process and not to the inner kernel. In this case I think it
would be more acceptable to kill one process instead of killing the
whole system.

So if there are a few drivers where this can not be done, then do it for
the rest. 50% failure tolerance is even better than 0%.