From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,88cb7446cf44556a X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news4.google.com!border1.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!nntp.giganews.com!cyclone1.gnilink.net!gnilink.net!news-out.ntli.net!newsrout1-gui.ntli.net!ntli.net!news.highwinds-media.com!newspeer1-win.ntli.net!newsfe3-gui.ntli.net.POSTED!53ab2750!not-for-mail From: "Dr. Adrian Wrigley" Subject: Re: Reliability and deadlock in Annex E/distributed code User-Agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.) Message-Id: Newsgroups: comp.lang.ada References: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Date: Wed, 13 Sep 2006 11:21:11 GMT NNTP-Posting-Host: 82.10.238.153 X-Trace: newsfe3-gui.ntli.net 1158146471 82.10.238.153 (Wed, 13 Sep 2006 12:21:11 BST) NNTP-Posting-Date: Wed, 13 Sep 2006 12:21:11 BST Organization: NTL Xref: g2news2.google.com comp.lang.ada:6575 Date: 2006-09-13T11:21:11+00:00 List-Id: On Tue, 12 Sep 2006 20:31:55 +0000, Dr. Adrian Wrigley wrote: > On Sun, 10 Sep 2006 20:58:33 +0000, Dr. Adrian Wrigley wrote: > >> I've been having difficulty getting my Annex E/glade code to run reliably. >> >> Under gnat 3.15p for x86 Linux, things were tolerably OK, with failures >> of the code about weekly (running one instance continuously). >> Sometimes the program simply wouldn't allow new partitions to run, as if >> there was some boot server failure. Sometimes the server would suddenly >> start consuming all the CPU cycles it could get. > ... > > OK. I have produced a fairly short example. > There are three partitions, A, B, C. > C calls B which calls A. > Compiler is GNAT GPL 2006 + GLADE 2006 on x86 Linux > > The partition C (executable in ./cpart) runs OK on *alternate* > invocations. Every other time, it hangs indefinitely. > This seems strange. (talking to myself again...) If I change function Next in b.adb so that it doesn't call A (returning a constant instead), there are absolutely no problems. Only when B.Next calls A does the deadlock happen. There must be something in BPart that isn't completing properly when running B.Next and calling into A. Each time the call into B hangs, it uses up a task. It will create anonymous tasks to replace them until the whole program grinds to a halt :( I've tried using gdb on BPart to see what's going on. From what I can tell, the key code is in s-rpcser.adb, function RPC_Handler. On alternate occasions, it executes the remote subprogram or just stops. If I could see why this happens, I might be able to fix it... -- Adrian