From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,88cb7446cf44556a X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news3.google.com!news.glorb.com!news-in.ntli.net!newsrout1-win.ntli.net!ntli.net!news.highwinds-media.com!newspeer1-win.ntli.net!newsfe1-gui.ntli.net.POSTED!53ab2750!not-for-mail From: "Dr. Adrian Wrigley" Subject: Re: Reliability and deadlock in Annex E/distributed code User-Agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.) Message-Id: Newsgroups: comp.lang.ada References: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Date: Tue, 12 Sep 2006 20:31:55 GMT NNTP-Posting-Host: 82.10.238.153 X-Trace: newsfe1-gui.ntli.net 1158093115 82.10.238.153 (Tue, 12 Sep 2006 21:31:55 BST) NNTP-Posting-Date: Tue, 12 Sep 2006 21:31:55 BST Organization: NTL Xref: g2news2.google.com comp.lang.ada:6566 Date: 2006-09-12T20:31:55+00:00 List-Id: On Sun, 10 Sep 2006 20:58:33 +0000, Dr. Adrian Wrigley wrote: > I've been having difficulty getting my Annex E/glade code to run reliably. > > Under gnat 3.15p for x86 Linux, things were tolerably OK, with failures > of the code about weekly (running one instance continuously). > Sometimes the program simply wouldn't allow new partitions to run, as if > there was some boot server failure. Sometimes the server would suddenly > start consuming all the CPU cycles it could get. ... OK. I have produced a fairly short example. There are three partitions, A, B, C. C calls B which calls A. Compiler is GNAT GPL 2006 + GLADE 2006 on x86 Linux The partition C (executable in ./cpart) runs OK on *alternate* invocations. Every other time, it hangs indefinitely. This seems strange. Dialogue following shows source files, compilation and two invocations of partition C, one hanging. Host machine is archimedes, in bash. archimedes$ ls -l *.ad[sb] dist.cfg -rw-rw-rw- 1 amtw amtw 188 Sep 12 16:04 a.adb -rw-rw-rw- 1 amtw amtw 88 Sep 12 16:14 a.ads -rw-rw-rw- 1 amtw amtw 106 Sep 12 16:11 amain.adb -rw-rw-r-- 1 amtw amtw 466 Sep 12 16:17 b.adb -rw-rw-r-- 1 amtw amtw 116 Sep 12 16:08 b.ads -rw-rw-rw- 1 amtw amtw 252 Sep 12 16:10 cmain.adb -rw-rw-rw- 1 amtw amtw 540 Sep 12 16:16 dist.cfg archimedes$ head -n 100 *.ad[sb] dist.cfg ==> a.adb <== package body A is X : Integer := 0; function Next return Integer is begin X := X + 1; -- Return next integer in sequence, unprotected return X; end Next; end A; ==> a.ads <== package A is pragma Remote_Call_Interface; function Next return Integer; end A; ==> amain.adb <== with A; procedure Amain is begin delay 1000.0; -- Wait around for a while, then complete end Amain; ==> b.adb <== with Text_IO; with A; package body B is -- Return A.Next simply by passing call through function Next return Integer is begin Text_IO.Put_Line ("B: B Next called"); return A.Next; end Next; task Main; task body Main is begin Text_IO.Put_Line ("B: B making direct call to RCI function in A:"); -- Direct call to function in A works fine Text_IO.Put_Line ("B: A Next gives" & Integer'Image (A.Next)); end Main; end B; ==> b.ads <== package B is pragma Remote_Call_Interface; function Next return Integer; -- Pass through of A's Next end B; ==> cmain.adb <== with Text_IO; with B; -- Each time this program is run, should produce the next integer in sequence procedure CMain is begin Text_IO.Put_Line ("C: Running B.Next:"); Text_IO.Put_Line ("C: B Next gives" & Integer'Image (B.Next)); end CMain; ==> dist.cfg <== configuration Dist is -- Boot server specification: pragma Starter (None); pragma Boot_Location ("tcp", "localhost:6788"); -- Choose spare port APart : Partition := (A); procedure AMain is in APart; for APart'Task_Pool use (4, 4, 10); BPart : Partition := (B); for BPart'Task_Pool use (4, 4, 10); procedure CMain; CPart : Partition := (CMain); for CPart'Task_Pool use (4, 4, 10); for CPart'Main use CMain; for CPart'Termination use Local_Termination; for CPart'Reconnection use Block_Until_Restart; end Dist; archimedes$ archimedes$ gcc -v # Test the compiler version Reading specs from /data2/gnat-gpl/bin/../lib/gcc/i686-pc-linux-gnu/3.4.6/specs Configured with: /cardhu.b/gnatmail/release-gpl/build-cardhu/src/configure --prefix=/usr/gnat --enable-languages=c,ada --disable-nls --disable-libada --target=i686-pc-linux-gnu --host=i686-pc-linux-gnu --disable-checking --enable-threads=posix Thread model: posix gcc version 3.4.6 for GNAT GPL 2006 (20060522) archimedes$ gnatdist -g dist.cfg # Build the partitions gnatdist: checking configuration consistency ------------------------------ ---- Configuration report ---- ------------------------------ Configuration : Name : dist Main : amain Starter : none Protocols : tcp://localhost:6788 Partition apart Main : amain Task Pool : 4 4 10 Units : - a (rci) - amain (normal) Partition bpart Task Pool : 4 4 10 Units : - b (rci) Partition cpart Main : cmain Task Pool : 4 4 10 Termination : local Units : - cmain (normal) ------------------------------- gnatdist: a caller stubs is up to date gnatdist: a receiver stubs is up to date gnatdist: building b caller stubs from b.ads gnatdist: building b receiver stubs from b.adb gnatdist: building partition bpart gnatdist: building partition cpart archimedes$ archimedes$ ./apart & # Start partition A [1] 20904 archimedes$ ./bpart & # Start partition B [2] 20911 archimedes$ B: B making direct call to RCI function in A: B: A Next gives 1 archimedes$ ./cpart # Test partition C C: Running B.Next: B: B Next called C: B Next gives 2 # Works! archimedes$ ./cpart # Test partition C again C: Running B.Next: # Hangs :( -- Adrian