From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,6394e5e171f847d1
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2001-09-07 21:04:29 PST
Path: 
 archiver1.google.com!newsfeed.google.com!newsfeed.stanford.edu!news.tele.dk!small.news.tele.dk!205.231.236.10!newspeer.monmouth.com!news.monmouth.com!shell.monmouth.com!not-for-mail
From: ka@sorry.no.email (Kenneth Almquist)
Newsgroups: comp.lang.ada
Subject: Re: Ada OS Kernel features
Date: 7 Sep 2001 23:55:41 -0400
Organization: A poorly-installed InterNetNews site
Message-ID: <9nc4rt$ske$1@shell.monmouth.com>
References: <9n4euv$t9m$1@slb6.atl.mindspring.net>
 <3B964C7A.BC04374E@icn.siemens.de> <9n5o9n$37a$1@slb7.atl.mindspring.net>
NNTP-Posting-Host: shell.monmouth.com
Xref: archiver1.google.com comp.lang.ada:12923
Date: 2001-09-07T23:55:41-04:00
List-Id: <comp.lang.ada>

> My first reaction to this was "Not Possible".  However, that isn't entirely
> true; it is just *VERY VERY* difficult.  A driver runs in kernel mode, and
> has access to system data structures.  If a driver corrupts a system data
> structure, how do you detect this, repair it, and continue?  In such
> instances, it is much better to bugcheck (blue screen) the system than
> try to continue.  Consider, if the system is slightly corrupted and
> continues to operate, there is the very real possibility that your data
> will be corrupted without your knowledge.  This was Win98's philosophy,
> and it was a disaster.  VMS and NT (and others) stop the system dead in
> its tracks to prevent hidden corruption.

There are several related risks here.  One is system data structures
being overwritten.  The Intel x86 architecture maps segment addresses
to linear addresses and then uses the page table to map linear addresses
to physical addresses, so it is possible to give device drivers their
own address spaces without invalidating the page table cache every
time a device driver runs.  However, if the device drivers are written
in Ada then there is little need for hardware memory protection.

Another risk is resource leaks if a device driver allocates a resource
(e.g. allocates memory) and then crashes.  This can be dealt with by
providing debugging wrappers for kernel routines which allocate
resources, which keep track of which device driver holds the resource.
Then, when a device driver crashes the resources held by that driver
can be reclaimed.

I assume that the open routine for a device driver will return a tagged
object which is used to perform device operations.  Tracking down all
the references to these objects may not be practical.  One approach is
to write a wrapper around the device driver.  When you call the open
routine for the wrapper, it calls the driver's open routine, and then
allocates a wrapper object which points to the object returned by the
driver's open routine.  If the driver crashes, the wrapper switches to
the backup driver.  This is done by iterating through all the wrapper
objects, freeing the objects they point to, and making them point to
objects obtained by calling the open routine for the backup driver.

When I say the driver "crashes," that means that one task or interrupt
handler executing driver code raised an unhandled exception.  There
could be other tasks executing driver code at the same time.  As long
as these tasks do not block, they can be allowed to continue, but if
they block then it is necessary to throw an exception in the task.
This requires an extention to the Ada run time.  In GNAT, aborting
a task throws a special exception that cannot be caught, so that the
basic logic required to raise an exception in another task is there.
The execption should be caught by the wrapper, which will then retry
the operation using the backup version of the driver.

These ideas add up to a bit of work, but they should allow a new version
of a device driver to be tested on a running system with only a small
risk of disrupting system activity if the new version doesn't work.
Whether this is worth doing or not is an open question.  I wrote the
initial implementation of modules for Linux, and didn't do any of this
stuff.  (Traps in loaded modules cause the system to crash, just like
traps from the core kernel code.)  But when Ada code throws an
exception, you can be reasonably confident that it hasn't corrupted
data managed by some unrelated piece of code, so there is less risk
in keeping the system running that there is when C code goes awry.
I would say, though, that dynamic loading of code into a running
kernel is the big win.  If mistakes which are not caught by the Ada
type system cause the system to crash, that is still a lot better than
having to reboot every time you want to test a changed line of code.
				Kenneth Almquist