comp.lang.ada
 help / color / mirror / Atom feed
* Reliability and deadlock in Annex E/distributed code
@ 2006-09-10 20:58 Dr. Adrian Wrigley
  2006-09-11 18:52 ` Jerome Hugues
  2006-09-12 20:31 ` Dr. Adrian Wrigley
  0 siblings, 2 replies; 18+ messages in thread
From: Dr. Adrian Wrigley @ 2006-09-10 20:58 UTC (permalink / raw)


Hi guys!

I've been having difficulty getting my Annex E/glade code to run reliably.

Under gnat 3.15p for x86 Linux, things were tolerably OK, with failures
of the code about weekly (running one instance continuously).
Sometimes the program simply wouldn't allow new partitions to run, as if
there was some boot server failure.  Sometimes the server would suddenly
start consuming all the CPU cycles it could get.

I think there may have been one or two bugs in the 3.15p version of glade,
particularly under certain error conditions of partitions being killed
while in use.

It has proved impossible to get the program to be as reliable as I want,
and have tried running under GNAT GPL 2006 from https://libre2.adacore.com/

I built and installed the glade, and have been testing my code.  It doesn't
work at all.  I have also tried the gnat and glade from Martin Krischik's
builds for FC5, and got the same problem:

There are three partitions A, B, C
The program starts up normally.
A procedure (in a normal unit) in partition C calls a function (in a
normal package) in partition B (using dynamic dispatch on a remote access
to class-wide type). The function in partition B calls a function (in a
rci package) in partition A The function in partition A never executes,
and the program stops executing.

If I enable glade debugging (S_PARINT=true S_RPC=true), I can see that
partition A gets the RPC message instructing it to do the call, but
then it doesn't actually call the necessary function (parameterless
return of an integer, but any function call fails).

If I call the function in A directly from B, it works fine.  It only seems
to be when A is called from B while executing a call from C that the problem occurs.

It's as if there is some deadlock or shortage of tasks to allocate or something.

Any ideas?

Using gdb, I find that each time a call in A is made, but doesn't execute, I
get a new task:

(gdb) info tasks
...
* 12   81d16a8    1  46 Waiting on entry call  rpc_handler
(gdb) where
#0  0x42028d69 in sigsuspend () from /lib/i686/libc.so.6
#1  0x4005b108 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0
#2  0x4005804b in pthread_cond_wait () from /lib/i686/libpthread.so.0
#3  0x080c5172 in system.tasking.entry_calls.wait_until_abortable ()
#4  0x080c29c4 in system.tasking.protected_objects.operations.protected_entry_call ()
#5  0x080b0afa in system.rpc.server.rpc_handler (<_task>=0x81d1698) at s-tpobop.ads:200
#6  0x080bed4f in system.tasking.stages.task_wrapper ()

It looks like the call cannot proceed until "wait_until_abortable" returns.

Am I doing something wrong by making one remote call inside another?
Maybe the new glade detects an error unnoticed by 3.15p?  Perhaps this
is the cause of previous 'hangs'?

On the topic of Annex E support:

I've tried building PolyORB from https://libre2.adacore.com/ but it seems
to be missing the src/dsa directory needed to support Annex E.  If I get the
version from cvs, it gets the error "raised RTSFIND.RE_NOT_AVAILABLE : rtsfind.adb:497".
What's the best way to build the DSA personality?

If I used the DSA personality from PolyORB, will it be any different from the
GARLIC PCS?  Might it be more/less robust? faster?

Thanks in advance for any input!
--
Dr. Adrian Wrigley, Cambridge, UK.




^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-09-25 11:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-10 20:58 Reliability and deadlock in Annex E/distributed code Dr. Adrian Wrigley
2006-09-11 18:52 ` Jerome Hugues
2006-09-12 20:40   ` Dr. Adrian Wrigley
2006-09-13  7:16     ` Dmitry A. Kazakov
2006-09-12 20:31 ` Dr. Adrian Wrigley
2006-09-12 23:24   ` tmoran
2006-09-13 11:00     ` Dr. Adrian Wrigley
2006-09-13 11:21   ` Dr. Adrian Wrigley
2006-09-21 21:18   ` Dr. Adrian Wrigley
2006-09-22 13:52   ` Dr. Adrian Wrigley
2006-09-22 23:11     ` Ludovic Brenta
2006-09-23 16:03       ` Reliability and deadlock in Annex E/distributed code (progress at last!) Dr. Adrian Wrigley
2006-09-23 19:17         ` Björn Persson
2006-09-23 20:53           ` Dr. Adrian Wrigley
2006-09-23 22:21             ` Björn Persson
2006-09-23 23:31               ` tmoran
2006-09-24  0:19                 ` Dr. Adrian Wrigley
2006-09-25 11:41             ` Alex R. Mosteo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox