From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,88cb7446cf44556a,start X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news3.google.com!news.glorb.com!news-in.ntli.net!newsrout1-win.ntli.net!ntli.net!news.highwinds-media.com!newspeer1-win.ntli.net!newsfe3-win.ntli.net.POSTED!53ab2750!not-for-mail From: "Dr. Adrian Wrigley" Subject: Reliability and deadlock in Annex E/distributed code User-Agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.) Message-Id: Newsgroups: comp.lang.ada MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Date: Sun, 10 Sep 2006 20:58:33 GMT NNTP-Posting-Host: 82.10.238.153 X-Trace: newsfe3-win.ntli.net 1157921913 82.10.238.153 (Sun, 10 Sep 2006 21:58:33 BST) NNTP-Posting-Date: Sun, 10 Sep 2006 21:58:33 BST Organization: NTL Xref: g2news2.google.com comp.lang.ada:6544 Date: 2006-09-10T20:58:33+00:00 List-Id: Hi guys! I've been having difficulty getting my Annex E/glade code to run reliably. Under gnat 3.15p for x86 Linux, things were tolerably OK, with failures of the code about weekly (running one instance continuously). Sometimes the program simply wouldn't allow new partitions to run, as if there was some boot server failure. Sometimes the server would suddenly start consuming all the CPU cycles it could get. I think there may have been one or two bugs in the 3.15p version of glade, particularly under certain error conditions of partitions being killed while in use. It has proved impossible to get the program to be as reliable as I want, and have tried running under GNAT GPL 2006 from https://libre2.adacore.com/ I built and installed the glade, and have been testing my code. It doesn't work at all. I have also tried the gnat and glade from Martin Krischik's builds for FC5, and got the same problem: There are three partitions A, B, C The program starts up normally. A procedure (in a normal unit) in partition C calls a function (in a normal package) in partition B (using dynamic dispatch on a remote access to class-wide type). The function in partition B calls a function (in a rci package) in partition A The function in partition A never executes, and the program stops executing. If I enable glade debugging (S_PARINT=true S_RPC=true), I can see that partition A gets the RPC message instructing it to do the call, but then it doesn't actually call the necessary function (parameterless return of an integer, but any function call fails). If I call the function in A directly from B, it works fine. It only seems to be when A is called from B while executing a call from C that the problem occurs. It's as if there is some deadlock or shortage of tasks to allocate or something. Any ideas? Using gdb, I find that each time a call in A is made, but doesn't execute, I get a new task: (gdb) info tasks ... * 12 81d16a8 1 46 Waiting on entry call rpc_handler (gdb) where #0 0x42028d69 in sigsuspend () from /lib/i686/libc.so.6 #1 0x4005b108 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0 #2 0x4005804b in pthread_cond_wait () from /lib/i686/libpthread.so.0 #3 0x080c5172 in system.tasking.entry_calls.wait_until_abortable () #4 0x080c29c4 in system.tasking.protected_objects.operations.protected_entry_call () #5 0x080b0afa in system.rpc.server.rpc_handler (<_task>=0x81d1698) at s-tpobop.ads:200 #6 0x080bed4f in system.tasking.stages.task_wrapper () It looks like the call cannot proceed until "wait_until_abortable" returns. Am I doing something wrong by making one remote call inside another? Maybe the new glade detects an error unnoticed by 3.15p? Perhaps this is the cause of previous 'hangs'? On the topic of Annex E support: I've tried building PolyORB from https://libre2.adacore.com/ but it seems to be missing the src/dsa directory needed to support Annex E. If I get the version from cvs, it gets the error "raised RTSFIND.RE_NOT_AVAILABLE : rtsfind.adb:497". What's the best way to build the DSA personality? If I used the DSA personality from PolyORB, will it be any different from the GARLIC PCS? Might it be more/less robust? faster? Thanks in advance for any input! -- Dr. Adrian Wrigley, Cambridge, UK.