comp.lang.ada
 help / color / mirror / Atom feed
From: "Nick Roberts" <nick.roberts@acm.org>
Subject: Re: Deadlock resolution
Date: Wed, 28 Jul 2004 14:53:02 +0100
Date: 2004-07-28T14:53:02+01:00	[thread overview]
Message-ID: <opsbuouovdp4pfvb@bram-2> (raw)
In-Reply-To: m3d62gzd8t.fsf@0x50a5b1bb.boanxx18.adsl-dhcp.tele.dk

On 28 Jul 2004 11:34:42 +0200, Mark Lorenzen <mark.lorenzen@ofir.dk> wrote:

>> > Eliminate deadlocks in the first place is the best
>> > solution. Everything else is just a hack trying to fix a
>> > bad design.
>>
>> That is quite true, but deadlock handling is a bit like
>> exception handling, in that it can form the braces of a belt-
>> and-braces approach. Is deadlock elimination realistic for
>> all kinds of software?
>
> Good question. Is it actually possible to detect a deadlock in
> all situations?

I'm sure the answer is 'yes' in theory, but 'no' in practice.

>> >> I would appreciate brief descriptions of how deadlock
>> >> detection and/or resolution is performed in real Ada
>> >> programs (where it is performed in Ada).
>> >
>> > In some systems, a watchdog is monitoring the CPU to detect
>> > if it halts and then resets the system. But again, avoiding
>> > deadlocks is better than resetting the system.
>>
>> Is it necessary for the watchdog to reset the whole system,
>> or just to kill one of the tasks/threads participating in the
>> cycle of deadlock?
>
> I was thinking of small embedded systems and there it is often
> feasible to just reset the complete board. But in large systems,
> it can be necessary to handle such incidents more gracefully.
> ...

My current design for the AdaOS kernel is as follows. The kernel
counts the time that each thread (task) is blocked waiting for
another thread (in the same workstation). If the count reaches a
certain threshold (possibly about 10 minutes) for a thread, the
kernel performs a traceback on the chain of threads it is waiting
on; this chain breaks if any thread in it is waiting on I/O or a
timer. If the chain doesn't break before the original thread is
reached again, deadlock is detected.

The current 'resolution' strategy is to select the youngest (most
recently created) thread in the chain, and kill it (it gets a
special Deadlock exception).

There are at least two weaknesses with this approach: (a) it takes
at least 10 minutes (or whatever) to detect a deadlock; (b) any
chain of deadlock that goes outside the workstation at all will
not be detected by this mechanism.

I think (a) is unlikely to be a serious problem, in practice.

However, since AdaOS will be a fully distributed OS, (b) is very
likely to happen, in practice. I believe that the kernel canot be
expected to manage deadlocks in these cases, because it would be
impractical to implement a mechanism that could be guaranteed
immune to false intervention (to act only in cases of a genuine
deadlock).

I therefore believe that super-kernel software which might be in
communication with software executing on another workstation
(and in AdaOS, that means most software) must be programmed to
either: (1) eliminate (within reason) the possibility of
deadlock; (2) detect and resolve potential deadlocks. I think
(1) will be impractical in the majority of AdaOS programs.

When considering the likely AdaOS (actually CORBA) scenario of
a menagerie of programs interacting in potentially deadlocking
ways, I think considerations of deadlock management gain in
importance. So, I'm pondering about the subject as it impacts
the design of the AdaOS kernel (which is called 'Bachar').

I'd appreciate any comments on this design.

-- 
Ta,
Nick Roberts



  reply	other threads:[~2004-07-28 13:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-26  3:25 Deadlock resolution Nick Roberts
2004-07-26  7:46 ` Mark Lorenzen
2004-07-27 15:31   ` Nick Roberts
2004-07-28  9:34     ` Mark Lorenzen
2004-07-28 13:53       ` Nick Roberts [this message]
2004-07-28 14:21         ` Dmitry A. Kazakov
2004-08-02 11:00     ` Ole-Hjalmar Kristensen
2004-07-26  7:48 ` Jano
2004-07-27 15:33   ` Nick Roberts
2004-07-27 16:52     ` Jano
2004-07-28 14:14       ` Nick Roberts
2004-07-29  1:04         ` Randy Brukardt
2004-07-26 14:05 ` Marc A. Criley
2004-07-27 15:50   ` Nick Roberts
2004-07-27 17:31     ` Marc A. Criley
2004-07-27 21:29       ` Robert I. Eachus
2004-07-28 14:29         ` Nick Roberts
2004-07-27 17:53 ` Martin Dowie
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox