Deadlock resolution

comp.lang.ada
 help / color / mirror / Atom feed

* Deadlock resolution
@ 2004-07-26  3:25 Nick Roberts
  2004-07-26  7:46 ` Mark Lorenzen
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Nick Roberts @ 2004-07-26  3:25 UTC (permalink / raw)


I would appreciate answers from people who have experience
of real life multi-tasking software written in Ada.

Does this software tend to have deadlock detection and/or
resolution mechanisms explicitly programmed in Ada, or is
it expected that deadlocks will be managed elsewhere (for
example, in the operating system or run time system or
RTMS or whatever)?

Alternatively, have you ever come across software written
in Ada using techniques to eliminate deadlock (for example,
a lock ordering strategy)?

Which approach do you think is best, in reality (handle in
Ada, handle elsewhere, eliminate altogether)?

I would appreciate brief descriptions of how deadlock
detection and/or resolution is performed in real Ada
programs (where it is performed in Ada).

-- 
Thanks,
Nick Roberts



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-26  3:25 Deadlock resolution Nick Roberts
@ 2004-07-26  7:46 ` Mark Lorenzen
  2004-07-27 15:31   ` Nick Roberts
  2004-07-26  7:48 ` Jano
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Mark Lorenzen @ 2004-07-26  7:46 UTC (permalink / raw)

"Nick Roberts" <nick.roberts@acm.org> writes:

> I would appreciate answers from people who have experience
> of real life multi-tasking software written in Ada.
> 
> Does this software tend to have deadlock detection and/or
> resolution mechanisms explicitly programmed in Ada, or is
> it expected that deadlocks will be managed elsewhere (for
> example, in the operating system or run time system or
> RTMS or whatever)?

I would say that it should be managed elsewhere preferably by
hardware.

> Alternatively, have you ever come across software written
> in Ada using techniques to eliminate deadlock (for example,
> a lock ordering strategy)?

Use one of the normal analysis methods to analyse if your tasks always
meet their deadlines f.x. the Rate Monotonic Analysis method (see
"Meeting Deadlines in Hard Real-Time Systems" ISBN: 0-8186-7406-7)
together with the Ravenscar Ada profile.

> Which approach do you think is best, in reality (handle in
> Ada, handle elsewhere, eliminate altogether)?

Eliminate deadlocks in the first place is the best
solution. Everything else is just a hack trying to fix a bad design.

> I would appreciate brief descriptions of how deadlock
> detection and/or resolution is performed in real Ada
> programs (where it is performed in Ada).

In some systems, a watchdog is monitoring the CPU to detect if it
halts and then resets the system. But again, avoiding deadlocks is
better than resetting the system.

> Thanks,
> Nick Roberts

Regards,
- Mark Lorenzen

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-26  3:25 Deadlock resolution Nick Roberts
  2004-07-26  7:46 ` Mark Lorenzen
@ 2004-07-26  7:48 ` Jano
  2004-07-27 15:33   ` Nick Roberts
  2004-07-26 14:05 ` Marc A. Criley
  2004-07-27 17:53 ` Martin Dowie
  3 siblings, 1 reply; 18+ messages in thread
From: Jano @ 2004-07-26  7:48 UTC (permalink / raw)

Nick Roberts wrote:

> I would appreciate brief descriptions of how deadlock
> detection and/or resolution is performed in real Ada
> programs (where it is performed in Ada).

In my case, I always try to play the safe way, and I tend to use the 
basic techniques I was taught when learning RT programming. Tasks share 
data via monitors, and I don't use synchronous rendez-vous except where 
this is a true need...

In short, I prefer to not let deadlock to be a possibility.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-26  3:25 Deadlock resolution Nick Roberts
  2004-07-26  7:46 ` Mark Lorenzen
  2004-07-26  7:48 ` Jano
@ 2004-07-26 14:05 ` Marc A. Criley
  2004-07-27 15:50   ` Nick Roberts
  2004-07-27 17:53 ` Martin Dowie
  3 siblings, 1 reply; 18+ messages in thread
From: Marc A. Criley @ 2004-07-26 14:05 UTC (permalink / raw)

"Nick Roberts" <nick.roberts@acm.org> wrote:

> I would appreciate answers from people who have experience
> of real life multi-tasking software written in Ada.

That would be me :-)

> Does this software tend to have deadlock detection and/or
> resolution mechanisms explicitly programmed in Ada, or is
> it expected that deadlocks will be managed elsewhere (for
> example, in the operating system or run time system or
> RTMS or whatever)?

To be quite frank about it, I simply designed and implemented the code to
preclude deadlocks. During development it sometimes happened, and the Ada
RTS either detected it or everything just locked up, but the fix was
to...well...go fix the bug the caused the deadlock.

If you prevent the problem from happening, then you don't have to worry
about the consequences of it happening.

This was a command and control system, and some of the design aspects that
help avoid deadlock potential include using monitors to control access to
critical data items and employing a coarse-grained partitioning of
concurrent functionality.

By coarse granularity I mean that the tasks did signficant amounts of work
on their own and the number of rendezvous were limited, which reduces the
opportunity for deadlock and makes finding it if/when it does occur a lot
easier. This is in contrast to the original implementation of the system
that had the system's functionality broken down into finely-grained bits
that were almost constantly interacting with each other. May sound good in
theory (and may be for some kinds of systems), but the reality of that
implementation made it awful to trace control flow through the system and
debug time-related issues such as deadlock and starvation.

Marc A. Criley
McKae Technologies
www.mckae.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-26  7:46 ` Mark Lorenzen
@ 2004-07-27 15:31   ` Nick Roberts
  2004-07-28  9:34     ` Mark Lorenzen
  2004-08-02 11:00     ` Ole-Hjalmar Kristensen
  0 siblings, 2 replies; 18+ messages in thread
From: Nick Roberts @ 2004-07-27 15:31 UTC (permalink / raw)

On 26 Jul 2004 09:46:48 +0200, Mark Lorenzen <mark.lorenzen@ofir.dk> wrote:

>> Does this software tend to have deadlock detection and/or
>> resolution mechanisms explicitly programmed in Ada, or is
>> it expected that deadlocks will be managed elsewhere (for
>> example, in the operating system or run time system or
>> RTMS or whatever)?
>
> I would say that it should be managed elsewhere preferably
> by hardware.

I have never come across hardware that does deadlock
detection or resolution. Is there any information about this
on the Internet?

>> Alternatively, have you ever come across software written
>> in Ada using techniques to eliminate deadlock (for example,
>> a lock ordering strategy)?
>
> Use one of the normal analysis methods to analyse if your
> tasks always meet their deadlines f.x. the Rate Monotonic
> Analysis method (see "Meeting Deadlines in Hard Real-Time
> Systems" ISBN: 0-8186-7406-7) together with the Ravenscar
> Ada profile.

Although I did not mention it (sorry), I am not really
thinking about hard real time systems. The kind of systems I
am most interested in are soft real time (non-embedded
network server, technical workstation, office desktop).

Although many of the same basic principles apply, it is often
the case that the economics are slightly different. But
perhaps this doesn't matter?

>> Which approach do you think is best, in reality (handle in
>> Ada, handle elsewhere, eliminate altogether)?
>
> Eliminate deadlocks in the first place is the best
> solution. Everything else is just a hack trying to fix a
> bad design.

That is quite true, but deadlock handling is a bit like
exception handling, in that it can form the braces of a belt-
and-braces approach. Is deadlock elimination realistic for
all kinds of software?

>> I would appreciate brief descriptions of how deadlock
>> detection and/or resolution is performed in real Ada
>> programs (where it is performed in Ada).
>
> In some systems, a watchdog is monitoring the CPU to detect
> if it halts and then resets the system. But again, avoiding
> deadlocks is better than resetting the system.

Is it necessary for the watchdog to reset the whole system,
or just to kill one of the tasks/threads participating in the
cycle of deadlock?

-- 
Thanks again,
Nick Roberts

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-26  7:48 ` Jano
@ 2004-07-27 15:33   ` Nick Roberts
  2004-07-27 16:52     ` Jano
  0 siblings, 1 reply; 18+ messages in thread
From: Nick Roberts @ 2004-07-27 15:33 UTC (permalink / raw)


On Mon, 26 Jul 2004 09:48:43 +0200, Jano <notelacreas@porfavor.no> wrote:

> Nick Roberts wrote:
>
>> I would appreciate brief descriptions of how deadlock
>> detection and/or resolution is performed in real Ada
>> programs (where it is performed in Ada).
>
> In my case, I always try to play the safe way, and I tend to use the  
> basic techniques I was taught when learning RT programming. Tasks share  
> data via monitors, and I don't use synchronous rendez-vous except where  
> this is a true need...
>
> In short, I prefer to not let deadlock to be a possibility.

Okay, but in those cases where you do need to use rendezvouses (where
deadlock might be a possibility)?

-- 
Many thanks,
Nick Roberts



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-26 14:05 ` Marc A. Criley
@ 2004-07-27 15:50   ` Nick Roberts
  2004-07-27 17:31     ` Marc A. Criley
  0 siblings, 1 reply; 18+ messages in thread
From: Nick Roberts @ 2004-07-27 15:50 UTC (permalink / raw)

On Mon, 26 Jul 2004 09:05:23 -0500, Marc A. Criley <mcNOSPAM@mckae.com>  
wrote:

> To be quite frank about it, I simply designed and implemented the code to
> preclude deadlocks. During development it sometimes happened, and the Ada
> RTS either detected it or everything just locked up, but the fix was
> to...well...go fix the bug the caused the deadlock.

Right, but can I presume you were pleased on those occasions when the RTS
did detect the deadlock?

> If you prevent the problem from happening, then you don't have to worry
> about the consequences of it happening.

I suppose it is a question of sureness. Would I be wrong in suggesting
that for some kinds of software it won't be possible to be 100% sure
that deadlock cannot occur?

> This was a command and control system, and some of the design aspects 
> that help avoid deadlock potential include using monitors to control
> access to critical data items and employing a coarse-grained
> partitioning of concurrent functionality.
> ...
> This is in contrast to the original implementation of the systemthat had  
> the system's functionality broken down into finely-grained bits
> that were almost constantly interacting with each other. May sound good  
> in theory (and may be for some kinds of systems), but the reality of
> that implementation made it awful to trace control flow through the
> system and debug time-related issues such as deadlock and starvation.

That's an interesting observation (thanks).

To my mind, parallelism is unimportant except for software which is
interacting with the outside world (either human beings or peripheral
devices, or an external network). Of course, software which is
interacting with software that is interacting with the outside world
can get caught up in this, and so much of an overall system can be.

I do get a bit annoyed when I observe application software (in a
windowing system) behaving in a clearly non parallelistic manner. I
might click on a button on one program telling it to download a file,
say, and then click on another button telling it to fetch my e-mail
only to find that it doesn't respond. These two things could and (to
my mind) should be do-able in parallel.

To concoct another example, if an air traffic control system detects
two conflicts at the same time, it should not (I suspect ;-) have to
wait for one to be resolved before popping the alarm for the other.

I'm all for eliminating unnecessary parallelism. It is certainly
inefficient and it usually makes a piece of software more complex
than it needs to be.

But I think parallelism is often necessary or desirable. Nothing
comes for free, and the price to pay for parallelism, as you rightly
point out, is the increased danger of deadlocks. For safety critical
(or otherwise critical) software, one might decide that the danger
is too great, and opt for the safety of a non parallel (or less
parallel) design. But with less critical software (e.g. a GUI for
an office desktop computer), perhaps the risk is worth it?

-- 
Many thanks,
Nick Roberts

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-27 15:33   ` Nick Roberts
@ 2004-07-27 16:52     ` Jano
  2004-07-28 14:14       ` Nick Roberts
  0 siblings, 1 reply; 18+ messages in thread
From: Jano @ 2004-07-27 16:52 UTC (permalink / raw)

Nick Roberts wrote:
> On Mon, 26 Jul 2004 09:48:43 +0200, Jano <notelacreas@porfavor.no> wrote:
> 
>> Nick Roberts wrote:
>>
>>> I would appreciate brief descriptions of how deadlock
>>> detection and/or resolution is performed in real Ada
>>> programs (where it is performed in Ada).
>>
>>
>> In my case, I always try to play the safe way, and I tend to use the  
>> basic techniques I was taught when learning RT programming. Tasks 
>> share  data via monitors, and I don't use synchronous rendez-vous 
>> except where  this is a true need...
>>
>> In short, I prefer to not let deadlock to be a possibility.
> 
> 
> Okay, but in those cases where you do need to use rendezvouses (where
> deadlock might be a possibility)?

In short, I've never used a mitigation technique other than, when 
confronted with a deadlock, detecting the path to it and removing the cause.

In practice, I only remember to have suffered rendez-vous deadlock once, 
and it was confined to a single package still in early implementation 
stage, so it was easy to spot and remove.

I've been less lucky with circular deadlocks between two protected 
objects, but it was a temerity on my part because potentially blocking 
calls from protected objects shouldn't be performed. In fact it was an 
oversight on my part, solved once I saw what I was doing (!).

If I were extra paranoid about certain tasks and were to detect 
deadlocks, as a first though I suppose I would use the ATC construct, 
something like that:

select
    delay Timeout_Period;
    Report ("TIMEOUT IN RENDEZVOUS BLAHBLAH");
then abort
    Other_Task.Rendez_Vous;
    -- may be you can here
    abort Other_Task;
    -- or something else
end;

Are queued entry calls abortable? I seem to remember something about that.

I find the client (tasks)/server (monitors) architecture safe and 
convenient, if you can stick to it...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-27 15:50   ` Nick Roberts
@ 2004-07-27 17:31     ` Marc A. Criley
  2004-07-27 21:29       ` Robert I. Eachus
  0 siblings, 1 reply; 18+ messages in thread
From: Marc A. Criley @ 2004-07-27 17:31 UTC (permalink / raw)


"Nick Roberts" <nick.roberts@acm.org> wrote :
>
> Right, but can I presume you were pleased on those occasions when the RTS
> did detect the deadlock?

Yeah, but the RTS (Alsys Ada) detected deadlocks only a handful of times
throughout the whole development effort, because it would only catch fairly
obvious occurrences, like when the sequence of statements in the "select
block" invoked one of the same task's other entries.

Far more common was a simple freeze-up; since this was an event driven
distributed system, some event-handling chains could span multiple processes
and multiple tasks within a process. And occasionally, due to a bug,
ultimately the chain would circle back and an invocation of another entry in
what was the originating task would be made. To that task it just looked
like some other task trying to rendezvous with it and the entry call was
therefore blocked--which of course backed up and froze the whole chain.

> I suppose it is a question of sureness. Would I be wrong in suggesting
> that for some kinds of software it won't be possible to be 100% sure
> that deadlock cannot occur?

I suppose that for the general case of all possible usable concurrent
architectures that the answer would be "no", however, by suitably
constraining the architecture and design it is certainly possible to
eliminate the possibility of deadlock. Coarse-grained concurrency is one of
the factors in doing just that.

> To my mind, parallelism is unimportant except for software which is
> interacting with the outside world (either human beings or peripheral
> devices, or an external network). Of course, software which is
> interacting with software that is interacting with the outside world
> can get caught up in this, and so much of an overall system can be.

Go down enough levels of indirection and interfacing and pretty much all
software interacts with the outside world, so questions of
parallelism/concurrency can (and probably ought to) be considered for the
develoment of most any software system.

> I'm all for eliminating unnecessary parallelism. It is certainly
> inefficient and it usually makes a piece of software more complex
> than it needs to be.

Concurrency for the sake of using concurrency is certainly aggravating, and
I've walked into projects where there was no other excuse for its presence.
But the _improper_ use of any particular architecture, design, or language
feature is always aggravating and going to be inefficient and add
unnecessary complexity.

Concurrency is no more inherently problematic than inheritance or
polymorphism. They're all sexy techniques that can be fun and effective to
use and get you into a world of hurt when misapplied.

> But I think parallelism is often necessary or desirable. Nothing
> comes for free, and the price to pay for parallelism, as you rightly
> point out, is the increased danger of deadlocks.

Actually I don't believe that was something I pointed out. Concurrency is
subject to the risk of deadlock in the same sense that numerical analysis is
subject to the risk of division-by-zero. I would argue that that
concurrency, when correctly and appropriately employed, can be done with
zero risk of deadlock.

> For safety critical
> (or otherwise critical) software, one might decide that the danger
> is too great, and opt for the safety of a non parallel (or less
> parallel) design.

I'm an unabashed fan of concurrency, especially with Ada tasking; the thing
I keep harping on is "appropriate use". Coarsely-grained chunking of
functionality into concurrent entities, and well-defined interfaces
(entries) lead to effective implementations. In other words, the software
engineering 101 concepts of "highly cohesive and loosely coupled".

Marc A. Criley
McKae Technologies
www.mckae.com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-26  3:25 Deadlock resolution Nick Roberts
                   ` (2 preceding siblings ...)
  2004-07-26 14:05 ` Marc A. Criley
@ 2004-07-27 17:53 ` Martin Dowie
  3 siblings, 0 replies; 18+ messages in thread
From: Martin Dowie @ 2004-07-27 17:53 UTC (permalink / raw)


"Nick Roberts" <nick.roberts@acm.org> wrote in message
news:opsbp6g9xzp4pfvb@bram-2...
> I would appreciate answers from people who have experience
> of real life multi-tasking software written in Ada.
[snip]
> I would appreciate brief descriptions of how deadlock
> detection and/or resolution is performed in real Ada
> programs (where it is performed in Ada).

You could use a tool like "Quasar" to try and detect them, see
http://quasar.cnam.fr/

Also, you could use a tradition FPGA watchdog which the s/w occasionaly has
to 'kick' to prove it is alive. If the 'kick' fails to happen a counter
eventually hits zero and the board/box/system is reset.

Cheers

-- Martin





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-27 17:31     ` Marc A. Criley
@ 2004-07-27 21:29       ` Robert I. Eachus
  2004-07-28 14:29         ` Nick Roberts
  0 siblings, 1 reply; 18+ messages in thread
From: Robert I. Eachus @ 2004-07-27 21:29 UTC (permalink / raw)

Marc A. Criley wrote:

>>I suppose it is a question of sureness. Would I be wrong in suggesting
>>that for some kinds of software it won't be possible to be 100% sure
>>that deadlock cannot occur?
> 
> 
> I suppose that for the general case of all possible usable concurrent
> architectures that the answer would be "no", however, by suitably
> constraining the architecture and design it is certainly possible to
> eliminate the possibility of deadlock. Coarse-grained concurrency is one of
> the factors in doing just that.

We had a discussion here a few weeks back about how the Ada compilation 
order rules help manage concurrency so that deadlocks won't happen.  You 
can design in potential deadlocks if you want, or if for some reason you 
must, but Ada makes it pretty obvious what you are doing to yourself. 
In particular if you find yourself trying to interlock calls to two 
protected objects.

Oh, and as was pointed out in that discussion you do have to be careful 
when using protected objects.  However, that is not too hard in general. 
There is one 'mental' gotcha.  The rule in Ada is that potentially 
blocking calls in protected objects are bounded error. RM 9.5.1(8). 
Calls to some predefined IO programs (basically those that read from or 
  write to a file) are defined as potentially blocking, but most Ada 
compilers won't warn you about such calls inside a protected object. 
(Note that using Ada.Text_IO to read from or write to a String is okay.)

But 99% of the time, when I am designing an Ada package that controls or 
depends on concurrency, deadlock is just not a concern.  The protected 
objects or tasks in the package won't combine with other packages in any 
way that allows these constructs to be involved in a deadlock.  This 
makes avoiding deadlock a local issue instead of an emergent property of 
the whole system.

The other 1% of the time, the potential deadlock is not a software 
issue, and you just have to deal with the issues that causes.  I ran 
into this issue a couple of times in radar systems.  Let me see if I can 
distill an example down to understandable size.  In one particular 
system there were several "heads" (radar transmitters and receivers) 
controlled from a central control site.  For safety reasons the radar 
sites had to have hardware interlocks so that the radar transmitter 
could not be turned on while someone was in the radome.  This created 
potential deadlocks that could only be dealt with by "brute force."

True, these were extremely unlikely scenarios, but they had to be 
considered during the design.  For example, what if the radar was 
operating, and some component inside the radome fails preventing the 
remote operators from turning the radar off?  Solution, put in a power 
kill switch external to the radome, where maintenance personnel can get 
to it when the radar is on.  But that switch has to have other 
interlocks.  Another set of issues had to do with the power failing 
during testing of the computer memory.  (We had to insure that the 
memory tests never left the memory in a state that corresponded to the 
radar radiating, since the system could try to resume from the current 
state when the power came back on.)

-- 

                                           Robert I. Eachus

"The flames kindled on the Fourth of July, 1776, have spread over too 
much of the globe to be extinguished by the feeble engines of despotism; 
on the contrary, they will consume these engines and all who work them." 
-- Thomas Jefferson, 1821

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-27 15:31   ` Nick Roberts
@ 2004-07-28  9:34     ` Mark Lorenzen
  2004-07-28 13:53       ` Nick Roberts
  2004-08-02 11:00     ` Ole-Hjalmar Kristensen
  1 sibling, 1 reply; 18+ messages in thread
From: Mark Lorenzen @ 2004-07-28  9:34 UTC (permalink / raw)


"Nick Roberts" <nick.roberts@acm.org> writes:

> On 26 Jul 2004 09:46:48 +0200, Mark Lorenzen <mark.lorenzen@ofir.dk> wrote:
> 
> >> Does this software tend to have deadlock detection and/or
> >> resolution mechanisms explicitly programmed in Ada, or is
> >> it expected that deadlocks will be managed elsewhere (for
> >> example, in the operating system or run time system or
> >> RTMS or whatever)?
> >
> > I would say that it should be managed elsewhere preferably
> > by hardware.
> 
> I have never come across hardware that does deadlock
> detection or resolution. Is there any information about this
> on the Internet?

Martin Dowie mentioned something about an FPGA watchdog in another
response. You could also measure the electrical power consumption of
the CPU and if it is too low for too long, the CPU has probably
halted. But observe that these techniques does not guarantee that a
deadlock has occurred, just that "something" didn't happen on time.

> 
> >> Alternatively, have you ever come across software written
> >> in Ada using techniques to eliminate deadlock (for example,
> >> a lock ordering strategy)?
> >
> > Use one of the normal analysis methods to analyse if your
> > tasks always meet their deadlines f.x. the Rate Monotonic
> > Analysis method (see "Meeting Deadlines in Hard Real-Time
> > Systems" ISBN: 0-8186-7406-7) together with the Ravenscar
> > Ada profile.
> 
> Although I did not mention it (sorry), I am not really
> thinking about hard real time systems. The kind of systems I
> am most interested in are soft real time (non-embedded
> network server, technical workstation, office desktop).
> 
> Although many of the same basic principles apply, it is often
> the case that the economics are slightly different. But
> perhaps this doesn't matter?

You are right that economics differ - in soft real-time systems we are
often more interested in f.x. throughput than response time.

> 
> >> Which approach do you think is best, in reality (handle in
> >> Ada, handle elsewhere, eliminate altogether)?
> >
> > Eliminate deadlocks in the first place is the best
> > solution. Everything else is just a hack trying to fix a
> > bad design.
> 
> That is quite true, but deadlock handling is a bit like
> exception handling, in that it can form the braces of a belt-
> and-braces approach. Is deadlock elimination realistic for
> all kinds of software?

Good question. Is it actually possible to detect a deadlock in all
situations?

> 
> >> I would appreciate brief descriptions of how deadlock
> >> detection and/or resolution is performed in real Ada
> >> programs (where it is performed in Ada).
> >
> > In some systems, a watchdog is monitoring the CPU to detect
> > if it halts and then resets the system. But again, avoiding
> > deadlocks is better than resetting the system.
> 
> Is it necessary for the watchdog to reset the whole system,
> or just to kill one of the tasks/threads participating in the
> cycle of deadlock?

I was thinking of small embedded systems and there it is often
feasible to just reset the complete board. But in large systems, it
can be necessary to handle such incidents more gracefully. The
Ericsson AXE10 telephony switch can f.x. perform two kinds of restart
- a small and a large. A large restart is the fastest as it is simply
a "normal" restart of the system. A small restart on the other hand,
is much more complex as this involves tracing of all the ongoing calls
and only release the calls that have not yet reached a stable state
(such as two-way speach connection). The trick is to make sure that
all "tasks" (really co-routines in the AXE10) are able to receive a
special trace message at all times, evaluate it and send it to the
next entity in the call chain. It is thus the responsibility of the
co-routines to act on a small restart - they must reset themselves if
they are not in a stable stage.

But this does not solve the deadlock problem. It just ensure that in
such a case the RTS simply notes that a co-routine haven't given up
the CPU within a specific amount of time and then initiates a restart
(small I think) and if the problem persists the RTS will initiate a
large restart.

Maybe it is too overkill for your application though.

> 
> -- 
> Thanks again,
> Nick Roberts

Regards,
- Mark Lorenzen



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-28  9:34     ` Mark Lorenzen
@ 2004-07-28 13:53       ` Nick Roberts
  2004-07-28 14:21         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 18+ messages in thread
From: Nick Roberts @ 2004-07-28 13:53 UTC (permalink / raw)

On 28 Jul 2004 11:34:42 +0200, Mark Lorenzen <mark.lorenzen@ofir.dk> wrote:

>> > Eliminate deadlocks in the first place is the best
>> > solution. Everything else is just a hack trying to fix a
>> > bad design.
>>
>> That is quite true, but deadlock handling is a bit like
>> exception handling, in that it can form the braces of a belt-
>> and-braces approach. Is deadlock elimination realistic for
>> all kinds of software?
>
> Good question. Is it actually possible to detect a deadlock in
> all situations?

I'm sure the answer is 'yes' in theory, but 'no' in practice.

>> >> I would appreciate brief descriptions of how deadlock
>> >> detection and/or resolution is performed in real Ada
>> >> programs (where it is performed in Ada).
>> >
>> > In some systems, a watchdog is monitoring the CPU to detect
>> > if it halts and then resets the system. But again, avoiding
>> > deadlocks is better than resetting the system.
>>
>> Is it necessary for the watchdog to reset the whole system,
>> or just to kill one of the tasks/threads participating in the
>> cycle of deadlock?
>
> I was thinking of small embedded systems and there it is often
> feasible to just reset the complete board. But in large systems,
> it can be necessary to handle such incidents more gracefully.
> ...

My current design for the AdaOS kernel is as follows. The kernel
counts the time that each thread (task) is blocked waiting for
another thread (in the same workstation). If the count reaches a
certain threshold (possibly about 10 minutes) for a thread, the
kernel performs a traceback on the chain of threads it is waiting
on; this chain breaks if any thread in it is waiting on I/O or a
timer. If the chain doesn't break before the original thread is
reached again, deadlock is detected.

The current 'resolution' strategy is to select the youngest (most
recently created) thread in the chain, and kill it (it gets a
special Deadlock exception).

There are at least two weaknesses with this approach: (a) it takes
at least 10 minutes (or whatever) to detect a deadlock; (b) any
chain of deadlock that goes outside the workstation at all will
not be detected by this mechanism.

I think (a) is unlikely to be a serious problem, in practice.

However, since AdaOS will be a fully distributed OS, (b) is very
likely to happen, in practice. I believe that the kernel canot be
expected to manage deadlocks in these cases, because it would be
impractical to implement a mechanism that could be guaranteed
immune to false intervention (to act only in cases of a genuine
deadlock).

I therefore believe that super-kernel software which might be in
communication with software executing on another workstation
(and in AdaOS, that means most software) must be programmed to
either: (1) eliminate (within reason) the possibility of
deadlock; (2) detect and resolve potential deadlocks. I think
(1) will be impractical in the majority of AdaOS programs.

When considering the likely AdaOS (actually CORBA) scenario of
a menagerie of programs interacting in potentially deadlocking
ways, I think considerations of deadlock management gain in
importance. So, I'm pondering about the subject as it impacts
the design of the AdaOS kernel (which is called 'Bachar').

I'd appreciate any comments on this design.

-- 
Ta,
Nick Roberts

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-27 16:52     ` Jano
@ 2004-07-28 14:14       ` Nick Roberts
  2004-07-29  1:04         ` Randy Brukardt
  0 siblings, 1 reply; 18+ messages in thread
From: Nick Roberts @ 2004-07-28 14:14 UTC (permalink / raw)

On Tue, 27 Jul 2004 18:52:11 +0200, Jano <notelacreas@porfavor.no> wrote:

> ...
> If I were extra paranoid about certain tasks and were to detect  
> deadlocks, as a first though I suppose I would use the ATC
> construct, ...

I think there is generally great value in using an ATC to provide
a timeout to a call to a service which is external (to the program
or module), since this helps to ensure that your program or module
is not stymied by something outside its control, as well as
protecting against 'big ring' deadlock.

I find that the gory details can be neatly hidden inside a
procedure, to which you pass -- in addition to other relevant
parameters -- the timeout value, which can have a convenient
default.

> Are queued entry calls abortable? I seem to remember something
> about that.

There are various rules and caveats. An entry call can be
cancelled if it is still waiting in the queue, but the body of an
accept statement is an 'abort deferred region', and cannot be
aborted as such. An ATC is not permitted within any abort deferred
region, which includes Initialize, Finalize, and Adjust. And so
on.

> I find the client (tasks)/server (monitors) architecture safe
> and convenient, if you can stick to it...

Indeed.

-- 
Thanks for your reply,
Nick Roberts

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-28 13:53       ` Nick Roberts
@ 2004-07-28 14:21         ` Dmitry A. Kazakov
  0 siblings, 0 replies; 18+ messages in thread
From: Dmitry A. Kazakov @ 2004-07-28 14:21 UTC (permalink / raw)


On Wed, 28 Jul 2004 14:53:02 +0100, Nick Roberts wrote:

> My current design for the AdaOS kernel is as follows. The kernel
> counts the time that each thread (task) is blocked waiting for
> another thread (in the same workstation). If the count reaches a
> certain threshold (possibly about 10 minutes) for a thread, the
> kernel performs a traceback on the chain of threads it is waiting
> on; this chain breaks if any thread in it is waiting on I/O or a
> timer. If the chain doesn't break before the original thread is
> reached again, deadlock is detected.

Don't you have any protected objects a task might wait for? (except for
timer and I/O events, you have mentioned) So the only synchronization
mechanism is rendezvous with tasks?

> The current 'resolution' strategy is to select the youngest (most
> recently created) thread in the chain, and kill it (it gets a
> special Deadlock exception).

How does it differ from timed entry call from the perspective of the task
being killed? Except for waiting for a timer, I don't see much difference.
If so, then one can just claim that all calls are time bounded (say by 10
minutes).

> There are at least two weaknesses with this approach: (a) it takes
> at least 10 minutes (or whatever) to detect a deadlock; (b) any
> chain of deadlock that goes outside the workstation at all will
> not be detected by this mechanism.
> 
> I think (a) is unlikely to be a serious problem, in practice.
> 
> However, since AdaOS will be a fully distributed OS, (b) is very
> likely to happen, in practice. I believe that the kernel canot be
> expected to manage deadlocks in these cases, because it would be
> impractical to implement a mechanism that could be guaranteed
> immune to false intervention (to act only in cases of a genuine
> deadlock).

Theoretically you could navigate the chain of tasks across the whole
network... (:-))

> I therefore believe that super-kernel software which might be in
> communication with software executing on another workstation
> (and in AdaOS, that means most software) must be programmed to
> either: (1) eliminate (within reason) the possibility of
> deadlock; (2) detect and resolve potential deadlocks. I think
> (1) will be impractical in the majority of AdaOS programs.
> 
> When considering the likely AdaOS (actually CORBA) scenario of
> a menagerie of programs interacting in potentially deadlocking
> ways, I think considerations of deadlock management gain in
> importance. So, I'm pondering about the subject as it impacts
> the design of the AdaOS kernel (which is called 'Bachar').

It would be interesting to speculate whether some kind of profile (like
Ravenscar) plus a richer set of protected object operations could eliminate
deadlocks being enough powerful for an universal OS. For example, famous
philosophers wouldn't deadlock if two protected objects (forks) could share
one protected action. Just a guess.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-27 21:29       ` Robert I. Eachus
@ 2004-07-28 14:29         ` Nick Roberts
  0 siblings, 0 replies; 18+ messages in thread
From: Nick Roberts @ 2004-07-28 14:29 UTC (permalink / raw)

On Tue, 27 Jul 2004 17:29:16 -0400, Robert I. Eachus  
<rieachus@comcast.net> wrote:

> ...
> For safety reasons the radar sites had to have hardware interlocks so
> that the radar transmitter could not be turned on while someone was
> in the radome.

Hehe. For the the non-physicists, most radars radiate the same energy
that microwave ovens do. I once walked in front of a great big dish
on an RAF (Royal Air Force) base, and the sergeant accompanying me
chuckled and mentioned that if someone switched it on, we would be
fried. I didn't believe him at the time, but had it confirmed by an
engineer later (the dish wasn't actually connected up, however).

Of course, this is what the current fuss about mobile phones is based
on.

Thanks to you (Bob) and Marc for your interesting replies!

-- 
Nick Roberts

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-28 14:14       ` Nick Roberts
@ 2004-07-29  1:04         ` Randy Brukardt
  0 siblings, 0 replies; 18+ messages in thread
From: Randy Brukardt @ 2004-07-29  1:04 UTC (permalink / raw)


"Nick Roberts" <nick.roberts@acm.org> wrote in message
news:opsbuptvw4p4pfvb@bram-2...
> On Tue, 27 Jul 2004 18:52:11 +0200, Jano <notelacreas@porfavor.no> wrote:
>
> > ...
> > If I were extra paranoid about certain tasks and were to detect
> > deadlocks, as a first though I suppose I would use the ATC
> > construct, ...
>
> I think there is generally great value in using an ATC to provide
> a timeout to a call to a service which is external (to the program
> or module), since this helps to ensure that your program or module
> is not stymied by something outside its control, as well as
> protecting against 'big ring' deadlock.

In the implementation of Claw, we use timed entry calls in places where
Windows might leave us deadlocked. While the problem is usually a mistake by
the user of Claw, we figured people would figure out a lot quicker if they
got Message_Error instead of their program stopping unexpectedly!

                          Randy.






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Deadlock resolution
  2004-07-27 15:31   ` Nick Roberts
  2004-07-28  9:34     ` Mark Lorenzen
@ 2004-08-02 11:00     ` Ole-Hjalmar Kristensen
  1 sibling, 0 replies; 18+ messages in thread
From: Ole-Hjalmar Kristensen @ 2004-08-02 11:00 UTC (permalink / raw)


>>>>> "NR" == Nick Roberts <nick.roberts@acm.org> writes:

    NR> On 26 Jul 2004 09:46:48 +0200, Mark Lorenzen <mark.lorenzen@ofir.dk> wrote:
    >>> Does this software tend to have deadlock detection and/or
    >>> resolution mechanisms explicitly programmed in Ada, or is
    >>> it expected that deadlocks will be managed elsewhere (for
    >>> example, in the operating system or run time system or
    >>> RTMS or whatever)?
    >> 
    >> I would say that it should be managed elsewhere preferably
    >> by hardware.

    NR> I have never come across hardware that does deadlock
    NR> detection or resolution. Is there any information about this
    NR> on the Internet?

All DB systems do deadlock detection, as they have no control over
the sequence of locks set by user transactions, so arbitrary
transactions may deadlock. See for example "Transaction processing" by
Gray and Reuter, section 7.11.  For centralized systems you can either
construct a wait-for graph or use a timeout. For distributed systems,
the most common method is to use a timeout.  But by all means, if you
can possibly do it, always reserve resources in the same order and
release them at the end, thus avoiding the problem altogether.

    >>> Alternatively, have you ever come across software written
    >>> in Ada using techniques to eliminate deadlock (for example,
    >>> a lock ordering strategy)?
    >> 
    >> Use one of the normal analysis methods to analyse if your
    >> tasks always meet their deadlines f.x. the Rate Monotonic
    >> Analysis method (see "Meeting Deadlines in Hard Real-Time
    >> Systems" ISBN: 0-8186-7406-7) together with the Ravenscar
    >> Ada profile.

    NR> Although I did not mention it (sorry), I am not really
    NR> thinking about hard real time systems. The kind of systems I
    NR> am most interested in are soft real time (non-embedded
    NR> network server, technical workstation, office desktop).

    NR> Although many of the same basic principles apply, it is often
    NR> the case that the economics are slightly different. But
    NR> perhaps this doesn't matter?

    >>> Which approach do you think is best, in reality (handle in
    >>> Ada, handle elsewhere, eliminate altogether)?
    >> 
    >> Eliminate deadlocks in the first place is the best
    >> solution. Everything else is just a hack trying to fix a
    >> bad design.

    NR> That is quite true, but deadlock handling is a bit like
    NR> exception handling, in that it can form the braces of a belt-
    NR> and-braces approach. Is deadlock elimination realistic for
    NR> all kinds of software?

    >>> I would appreciate brief descriptions of how deadlock
    >>> detection and/or resolution is performed in real Ada
    >>> programs (where it is performed in Ada).
    >> 
    >> In some systems, a watchdog is monitoring the CPU to detect
    >> if it halts and then resets the system. But again, avoiding
    >> deadlocks is better than resetting the system.

    NR> Is it necessary for the watchdog to reset the whole system,
    NR> or just to kill one of the tasks/threads participating in the
    NR> cycle of deadlock?

    NR> -- 
    NR> Thanks again,
    NR> Nick Roberts

-- 
   C++: The power, elegance and simplicity of a hand grenade.



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-08-02 11:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-07-26  3:25 Deadlock resolution Nick Roberts
2004-07-26  7:46 ` Mark Lorenzen
2004-07-27 15:31   ` Nick Roberts
2004-07-28  9:34     ` Mark Lorenzen
2004-07-28 13:53       ` Nick Roberts
2004-07-28 14:21         ` Dmitry A. Kazakov
2004-08-02 11:00     ` Ole-Hjalmar Kristensen
2004-07-26  7:48 ` Jano
2004-07-27 15:33   ` Nick Roberts
2004-07-27 16:52     ` Jano
2004-07-28 14:14       ` Nick Roberts
2004-07-29  1:04         ` Randy Brukardt
2004-07-26 14:05 ` Marc A. Criley
2004-07-27 15:50   ` Nick Roberts
2004-07-27 17:31     ` Marc A. Criley
2004-07-27 21:29       ` Robert I. Eachus
2004-07-28 14:29         ` Nick Roberts
2004-07-27 17:53 ` Martin Dowie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox