* Does Ada tasking profit from multi-core cpus? @ 2007-01-29 11:57 Gerd 2007-01-29 12:04 ` Georg Bauhaus 2007-03-04 17:54 ` jpluto 0 siblings, 2 replies; 61+ messages in thread From: Gerd @ 2007-01-29 11:57 UTC (permalink / raw) Has someone experience with Ada tasking (especially GNAT) on multi- core systems? Show programs with several working tasks a performance boost on dual- core or quad-core cpus? ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-01-29 11:57 Does Ada tasking profit from multi-core cpus? Gerd @ 2007-01-29 12:04 ` Georg Bauhaus 2007-01-30 13:55 ` Gerd 2007-03-04 17:54 ` jpluto 1 sibling, 1 reply; 61+ messages in thread From: Georg Bauhaus @ 2007-01-29 12:04 UTC (permalink / raw) On Mon, 2007-01-29 at 03:57 -0800, Gerd wrote: > Has someone experience with Ada tasking (especially GNAT) on multi- > core systems? > > Show programs with several working tasks a performance boost on dual- > core or quad-core cpus? You might want to look into the archives of c.l.ada, as this has been discussed a few times, recently. The short answer is, Yes. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-01-29 12:04 ` Georg Bauhaus @ 2007-01-30 13:55 ` Gerd 2007-02-09 10:18 ` karl 0 siblings, 1 reply; 61+ messages in thread From: Gerd @ 2007-01-30 13:55 UTC (permalink / raw) Georg Bauhaus schrieb: > The short answer is, Yes. Thanks, that is all that I was interessted in. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-01-30 13:55 ` Gerd @ 2007-02-09 10:18 ` karl 2007-02-09 11:03 ` Stefan Lucks 0 siblings, 1 reply; 61+ messages in thread From: karl @ 2007-02-09 10:18 UTC (permalink / raw) During November and December, I had an opportunity to evaluate the Sun Fire T1000, which has 8 cores with 4 strands per core, making it appear to be a 32 CPU machine. My application used GNAT with lots of long-lived threads. You can read my report at http://www.grebyn.com/t1000 - it was impressive enough that I actually WON the evaluation system from Sun in their Open Performance Contest (see http://www.sun.com/ tryandbuy/prm/perf/winners.jsp for other winners - primarily web servers, databases and other transactional systems). -- Karl -- ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-02-09 10:18 ` karl @ 2007-02-09 11:03 ` Stefan Lucks 2007-02-09 11:35 ` Ludovic Brenta 0 siblings, 1 reply; 61+ messages in thread From: Stefan Lucks @ 2007-02-09 11:03 UTC (permalink / raw) > [...] My application used GNAT with lots of long-lived threads. You can > read my report at http://www.grebyn.com/t1000 - it was impressive enough > that I actually WON the evaluation system from Sun in their Open > Performance Contest [...] Congratulations! Is there any chance that you will publish the sources? -- Stefan Lucks Th. Informatik, Univ. Mannheim, 68131 Mannheim, Germany e-mail: lucks@th.informatik.uni-mannheim.de home: http://th.informatik.uni-mannheim.de/people/lucks/ ------ I love the taste of Cryptanalysis in the morning! ------ ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-02-09 11:03 ` Stefan Lucks @ 2007-02-09 11:35 ` Ludovic Brenta 0 siblings, 0 replies; 61+ messages in thread From: Ludovic Brenta @ 2007-02-09 11:35 UTC (permalink / raw) Stefan Lucks writes: >> [...] My application used GNAT with lots of long-lived threads. You can >> read my report at http://www.grebyn.com/t1000 - it was impressive enough >> that I actually WON the evaluation system from Sun in their Open >> Performance Contest [...] > > Congratulations! > > Is there any chance that you will publish the sources? Yes, I'd like to know how my dual-core, 2GHz amd64 laptop with 2 Gb of RAM stacks up against a T1000 :) -- Ludovic Brenta. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Does Ada tasking profit from multi-core cpus? 2007-01-29 11:57 Does Ada tasking profit from multi-core cpus? Gerd 2007-01-29 12:04 ` Georg Bauhaus @ 2007-03-04 17:54 ` jpluto 2007-03-05 10:08 ` Ludovic Brenta 2007-03-05 18:46 ` Does Ada tasking profit from multi-core cpus? Jeffrey R. Carter 1 sibling, 2 replies; 61+ messages in thread From: jpluto @ 2007-03-04 17:54 UTC (permalink / raw) To: comp.lang.ada Has someone experience with Ada tasking (especially GNAT) on multi-core systems? Show programs with several working tasks a performance boost on dual-core or quad-core cpus? _________________________________________________________________ Find what you need at prices you�ll love. Compare products and save at MSN� Shopping. http://shopping.msn.com/default/shp/?ptnrid=37,ptnrdata=24102&tcode=T001MSN20A0701 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-04 17:54 ` jpluto @ 2007-03-05 10:08 ` Ludovic Brenta 2007-03-05 13:12 ` Dmitry A. Kazakov 2007-03-05 18:46 ` Does Ada tasking profit from multi-core cpus? Jeffrey R. Carter 1 sibling, 1 reply; 61+ messages in thread From: Ludovic Brenta @ 2007-03-05 10:08 UTC (permalink / raw) "Â jpluto" wrote: > Has someone experience with Ada tasking (especially GNAT) on multi-core > systems? > > Show programs with several working tasks a performance boost on dual-core or > quad-core cpus? On my dual-core Turion 64 with Debian GNU/Linux and GCC 4.1.2, all is well. Ada programs using tasking use both cores. I think it would work on most other platforms too, but YMMV. -- Ludovic Brenta. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-05 10:08 ` Ludovic Brenta @ 2007-03-05 13:12 ` Dmitry A. Kazakov 2007-03-06 5:33 ` tmoran 2007-03-07 3:58 ` Does Ada tasking profit from multi-core cpus? Steve 0 siblings, 2 replies; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-05 13:12 UTC (permalink / raw) On 5 Mar 2007 02:08:22 -0800, Ludovic Brenta wrote: > "� jpluto" wrote: >> Has someone experience with Ada tasking (especially GNAT) on multi-core >> systems? >> >> Show programs with several working tasks a performance boost on dual-core or >> quad-core cpus? > > On my dual-core Turion 64 with Debian GNU/Linux and GCC 4.1.2, all is > well. Ada programs using tasking use both cores. I think it would work > on most other platforms too, but YMMV. Apart from using both cores, does anybody know how protected objects function on multi-cores? Especially: 1. Whether protected object's functions are indeed executed concurrently when come from the tasks running on different cores? 2. What are the times required to take/release the protected object's spin lock compared to ones on single core? 3. Can a task switch cores? If yes, what is the overhead of switching? -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-05 13:12 ` Dmitry A. Kazakov @ 2007-03-06 5:33 ` tmoran 2007-03-06 8:44 ` Dmitry A. Kazakov ` (3 more replies) 2007-03-07 3:58 ` Does Ada tasking profit from multi-core cpus? Steve 1 sibling, 4 replies; 61+ messages in thread From: tmoran @ 2007-03-06 5:33 UTC (permalink / raw) > 1. Whether protected object's functions are indeed executed concurrently > when come from the tasks running on different cores? A quick test with a single protected object containing a single, long-duration, function appears to have just one call of the function active at a time, even if the function is called from two different tasks. global_flag : integer := 0; protected body pt is function f(id : integer) return natural is change_count : natural := 0; begin global_flag := id; for i in 1 .. 10_000_000 loop if global_flag /= id then change_count := change_count; global_flag := id; end if; end loop; return change_count; end f; end pt; One task calls pt.f(id=>1) and the other calls pt.f(id=>2). They both get a result of zero back from their function call. This was with Gnat 3.15p Windows 2000 on a dual core Pentium. If I change it from a single protected object to two instances of a protected type, then the function calls are overlapped and return non-zero results. > 3. Can a task switch cores? If yes, what is the overhead of switching? By "switch cores" do you mean that the particular hardware stack pointers swap which stack they are pointing to? I think this is an OS question and, for Windows, I don't know how one asks "which core am I currently running on" - or indeed if that questions makes any sense. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 5:33 ` tmoran @ 2007-03-06 8:44 ` Dmitry A. Kazakov 2007-03-07 7:52 ` tmoran 2007-03-07 9:31 ` tmoran 2007-03-06 9:40 ` Colin Paul Gloster ` (2 subsequent siblings) 3 siblings, 2 replies; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-06 8:44 UTC (permalink / raw) On Mon, 05 Mar 2007 23:33:31 -0600, tmoran@acm.org wrote: >> 1. Whether protected object's functions are indeed executed concurrently >> when come from the tasks running on different cores? > > A quick test with a single protected object containing a single, > long-duration, function appears to have just one call of the function > active at a time, even if the function is called from two different tasks. > > global_flag : integer := 0; > > protected body pt is > function f(id : integer) return natural is > change_count : natural := 0; > begin > global_flag := id; > for i in 1 .. 10_000_000 loop > if global_flag /= id then > change_count := change_count; > global_flag := id; > end if; > end loop; > return change_count; > end f; > end pt; > > One task calls pt.f(id=>1) and the other calls pt.f(id=>2). They both get > a result of zero back from their function call. This was with Gnat 3.15p > Windows 2000 on a dual core Pentium. If I change it from a single > protected object to two instances of a protected type, then the function > calls are overlapped and return non-zero results. Not very promising, sigh. Probably GNAT uses a critical section for all protected actions, so the result. >> 3. Can a task switch cores? If yes, what is the overhead of switching? > By "switch cores" do you mean that the particular hardware stack > pointers swap which stack they are pointing to? I think this is an OS > question That depends on the mapping of Ada's schedulable units to the OS ones. But it is Ada question too. Because the processor affinity of a task is not specified, the scheduler should be able to switch it from processor to processor to achieve an optimal performance. Otherwise multi-core would make no sense for Ada. AFAIK, Windows indeed switches thread's processors, so GNAT 3.15p should follow it as well (provided, tasks are mapped on threads and no thread affinity mask is set). Interesting would be to know the penalty of such switching. > and, for Windows, I don't know how one asks "which core am > I currently running on" - or indeed if that questions makes any sense. I suppose it should be NtGetCurrentProcessorNumber under Windows. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 8:44 ` Dmitry A. Kazakov @ 2007-03-07 7:52 ` tmoran 2007-03-07 9:31 ` tmoran 1 sibling, 0 replies; 61+ messages in thread From: tmoran @ 2007-03-07 7:52 UTC (permalink / raw) >>> 3. Can a task switch cores? If yes, what is the overhead of switching? >> for Windows, I don't know how one asks "which core am >> I currently running on" >I suppose it should be NtGetCurrentProcessorNumber under Windows. That apparently is only available on Windows Server 2003 and Vista. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 8:44 ` Dmitry A. Kazakov 2007-03-07 7:52 ` tmoran @ 2007-03-07 9:31 ` tmoran 1 sibling, 0 replies; 61+ messages in thread From: tmoran @ 2007-03-07 9:31 UTC (permalink / raw) >>> 3. Can a task switch cores? If yes, what is the overhead of switching? > >AFAIK, Windows indeed switches thread's processors, >so GNAT 3.15p should follow it as well (provided, tasks are mapped on >threads and no thread affinity mask is set). Interesting would be to know >the penalty of such switching. 5 microseconds? I ran a loop storing away successive values of Ada.Calendar.Clock while another task was sitting burning cpu cycles. The delta-t's between successive clock readings were almost all under 1 microsecond, but there was a cluster around 5-6 mics and another at 250-300 mics, the latter occurring pretty regularly every 2.9 milliseconds. So I would hazard a guess that the 5-6 mic hiccups were due to a simple core switch. It might be 250-300 mics, with the round-robin time slice being 2.9 ms, but spending 10% of a CPU on running the time slicer seems excessive, so I'm guessing that's doing something substantive - polling hardware or running the myriad background tasks in a Windows system. OTTH, it's too late to think rationally anyway. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 5:33 ` tmoran 2007-03-06 8:44 ` Dmitry A. Kazakov @ 2007-03-06 9:40 ` Colin Paul Gloster 2007-03-06 12:47 ` Jeffrey Creem 2007-03-06 14:44 ` Georg Bauhaus 2007-03-06 16:53 ` Dr. Adrian Wrigley 2007-03-06 18:51 ` Jeffrey R. Carter 3 siblings, 2 replies; 61+ messages in thread From: Colin Paul Gloster @ 2007-03-06 9:40 UTC (permalink / raw) Tom Moran posted on Mon, 05 Mar 2007 23:33:31 -0600: "> 1. Whether protected object's functions are indeed executed concurrently > when come from the tasks running on different cores? A quick test with a single protected object containing a single, long-duration, function appears to have just one call of the function active at a time, even if the function is called from two different tasks. global_flag : integer := 0; protected body pt is function f(id : integer) return natural is change_count : natural := 0; begin global_flag := id; for i in 1 .. 10_000_000 loop if global_flag /= id then change_count := change_count; global_flag := id; end if; end loop; return change_count; end f; end pt; One task calls pt.f(id=>1) and the other calls pt.f(id=>2). They both get a result of zero back from their function call. This was with Gnat 3.15p Windows 2000 on a dual core Pentium. If I change it from a single protected object to two instances of a protected type, then the function calls are overlapped and return non-zero results. [..]" I am grateful to Thomas Moran for such a brilliant demonstration of how poor the so-called GNU Ada Translator can be. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 9:40 ` Colin Paul Gloster @ 2007-03-06 12:47 ` Jeffrey Creem 2007-03-06 14:44 ` Georg Bauhaus 1 sibling, 0 replies; 61+ messages in thread From: Jeffrey Creem @ 2007-03-06 12:47 UTC (permalink / raw) Colin Paul Gloster wrote: > Tom Moran posted on Mon, 05 Mar 2007 23:33:31 -0600: > > "> 1. Whether protected object's functions are indeed executed concurrently >> when come from the tasks running on different cores? > > A quick test with a single protected object containing a single, > long-duration, function appears to have just one call of the function > active at a time, even if the function is called from two different tasks. > > global_flag : integer := 0; > > protected body pt is > function f(id : integer) return natural is > change_count : natural := 0; > begin > global_flag := id; > for i in 1 .. 10_000_000 loop > if global_flag /= id then > change_count := change_count; > global_flag := id; > end if; > end loop; > return change_count; > end f; > end pt; > > One task calls pt.f(id=>1) and the other calls pt.f(id=>2). They both get > a result of zero back from their function call. This was with Gnat 3.15p > Windows 2000 on a dual core Pentium. If I change it from a single > protected object to two instances of a protected type, then the function > calls are overlapped and return non-zero results. > > [..]" > > > I am grateful to Thomas Moran for such a brilliant demonstration of > how poor the so-called GNU Ada Translator can be. I don't think this proves that the two tasks are not overlapping. It probably proves that if you specifically write broken code it is indeed sometimes broken. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 9:40 ` Colin Paul Gloster 2007-03-06 12:47 ` Jeffrey Creem @ 2007-03-06 14:44 ` Georg Bauhaus 1 sibling, 0 replies; 61+ messages in thread From: Georg Bauhaus @ 2007-03-06 14:44 UTC (permalink / raw) On Tue, 2007-03-06 at 09:40 +0000, Colin Paul Gloster wrote: > I am grateful to Thomas Moran for such a brilliant demonstration of > how poor the so-called GNU Ada Translator can be. I'm sure they welcome rich suggestions. IIRC, Stephen Leake wanted to ask why GNAT is asking NT for a semaphore for protected functions. Any news, Stehpen? -- Georg ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 5:33 ` tmoran 2007-03-06 8:44 ` Dmitry A. Kazakov 2007-03-06 9:40 ` Colin Paul Gloster @ 2007-03-06 16:53 ` Dr. Adrian Wrigley 2007-03-06 18:58 ` tmoran 2007-03-06 18:51 ` Jeffrey R. Carter 3 siblings, 1 reply; 61+ messages in thread From: Dr. Adrian Wrigley @ 2007-03-06 16:53 UTC (permalink / raw) On Mon, 05 Mar 2007 23:33:31 -0600, tmoran wrote: >> 1. Whether protected object's functions are indeed executed concurrently >> when come from the tasks running on different cores? > > A quick test with a single protected object containing a single, > long-duration, function appears to have just one call of the function > active at a time, even if the function is called from two different tasks. > > global_flag : integer := 0; > > protected body pt is > function f(id : integer) return natural is > change_count : natural := 0; > begin > global_flag := id; > for i in 1 .. 10_000_000 loop > if global_flag /= id then What is the next line meant to do? > change_count := change_count; > global_flag := id; > end if; > end loop; > return change_count; > end f; > end pt; Change_count is set to zero, and never changes. Function always returns zero. Surprised? Is there meant to be an increment in the code? did I miss something? -- Adrian ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 16:53 ` Dr. Adrian Wrigley @ 2007-03-06 18:58 ` tmoran 2007-03-07 10:11 ` Colin Paul Gloster 0 siblings, 1 reply; 61+ messages in thread From: tmoran @ 2007-03-06 18:58 UTC (permalink / raw) > What is the next line meant to do? > > change_count := change_count; > Change_count is set to zero, and never changes. Function always > returns zero. Surprised? Is there meant to be an increment in the code? Sorry. That's a typo in trying to pretty-up the code for posting. The actual code that ran was: if pt_flag /= id then result := result+1; pt_flag := id; end if; which, as you see, does indeed do an increment. Which is why > > If I change it from a single > > protected object to two instances of a protected type, then the function > > calls are overlapped and return non-zero results. As my original posting said, "... appears to have just one call of the function active at a time". I compiled with no optimization so the compiler would generate what I said, not what it thought I meant. The function takes almost a half second to run on my machine, so I would have expected some task switching to occur during that time. This doesn't of course "prove" that's how Gnat 3.15p always works. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 18:58 ` tmoran @ 2007-03-07 10:11 ` Colin Paul Gloster 2007-03-07 18:47 ` tmoran 0 siblings, 1 reply; 61+ messages in thread From: Colin Paul Gloster @ 2007-03-07 10:11 UTC (permalink / raw) Tom Moran posted on Mon, 05 Mar 2007 23:33:31 -0600: "[..] global_flag : integer := 0; protected body pt is function f(id : integer) return natural is change_count : natural := 0; begin global_flag := id; for i in 1 .. 10_000_000 loop if global_flag /= id then change_count := change_count; global_flag := id; end if; end loop; return change_count; end f; end pt; One task calls pt.f(id=>1) and the other calls pt.f(id=>2). They both get a result of zero back from their function call. This was with Gnat 3.15p Windows 2000 on a dual core Pentium. If I change it from a single protected object to two instances of a protected type, then the function calls are overlapped and return non-zero results. [..]" Tom Moran posted on Tue, 06 Mar 2007 12:58:34 -0600: "[..] Sorry. That's a typo in trying to pretty-up the code for posting. The actual code that ran was: if pt_flag /= id then result := result+1; pt_flag := id; end if; which, as you see, does indeed do an increment. Which is why > > If I change it from a single > > protected object to two instances of a protected type, then the function > > calls are overlapped and return non-zero results. [..]" Was the return statement return change_count; or some other item whose value was never changed according to the semantics of Ada as in your original post in which case the behavior is still not valid Ada, or did the return statement return something like result or pt_flag? Regards, Colin Paul Gloster ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-07 10:11 ` Colin Paul Gloster @ 2007-03-07 18:47 ` tmoran 0 siblings, 0 replies; 61+ messages in thread From: tmoran @ 2007-03-07 18:47 UTC (permalink / raw) > > Sorry. That's a typo in trying to pretty-up the code for posting. > > The actual code that ran was: > > if pt_flag /= id then > > result := result+1; > > pt_flag := id; > > end if; > Was the return statement > return change_count; > or some other item whose value was never changed according to the The code that I ran was correct. It never had any variable named change_count. In the pretty-ed version I posted I left out the "+1" in one place while changing the variable name "result" to "change_count". That's also why > > If I change it from a single > > protected object to two instances of a protected type, then the function > > calls are overlapped and return non-zero results. which of course would not have been the case if the running code hadn't included the "+1". I'd be curious to know if newer versions of Gnat run protected functions concurrently. My version of Gnat 3.15p-nt is dated late 2002, at which time there were not many multi-cpu Windows systems around to take advantage of such a feature. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-06 5:33 ` tmoran ` (2 preceding siblings ...) 2007-03-06 16:53 ` Dr. Adrian Wrigley @ 2007-03-06 18:51 ` Jeffrey R. Carter 2007-03-16 14:29 ` Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) Jacob Sparre Andersen 3 siblings, 1 reply; 61+ messages in thread From: Jeffrey R. Carter @ 2007-03-06 18:51 UTC (permalink / raw) tmoran@acm.org wrote: > > A quick test with a single protected object containing a single, > long-duration, function appears to have just one call of the function > active at a time, even if the function is called from two different tasks. The last time I looked at the GNAT sources for protected objects, each had a mutex associated with it that was obtained before any action, even functions. I guess that hasn't changed. Other compilers may be different. -- Jeff Carter "When danger reared its ugly head, he bravely turned his tail and fled." Monty Python and the Holy Grail 60 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-06 18:51 ` Jeffrey R. Carter @ 2007-03-16 14:29 ` Jacob Sparre Andersen 2007-03-17 5:26 ` Jeffrey R. Carter 2007-03-17 10:25 ` Dmitry A. Kazakov 0 siblings, 2 replies; 61+ messages in thread From: Jacob Sparre Andersen @ 2007-03-16 14:29 UTC (permalink / raw) Jeffrey R. Carter wrote: > The last time I looked at the GNAT sources for protected objects, > each had a mutex associated with it that was obtained before any > action, even functions. I guess that hasn't changed. Isn't that the most efficient implementation on a POSIX system? a) We use threads (and not processes) for tasks, since it is more efficient and a better conceptual match. b) We know that it is considered good style to make protected function bodies small. c) Since we use threads (a) we should use mutexes (and not semaphores) to implement inter-task exclusion. d) Since the protected function bodies can be assumed to be small (b), there will be a relatively large overhead in keeping track of the number of queued calls to protected functions and procedures. e) Since the protected function bodies can be assumed to be small (b), the probability of colliding calls to protected functions is relatively small. f) It is thus likely that using a single mutex to provide exclusion on a protected object is more efficient than adding counters. The balance in (f) depends on your average collision rate for protected function calls. More concurrent threads (more CPU cores) increase this value. So does larger protected function bodies. Now that multi-CPU-core systems are more common, it may be worthwhile to make a proper examination of the numbers which may change the balance against (f). This could also be a challenge for code profilers and optimisers. It may be that some protected types need a detailed queue, while others need a single mutex - depending on the code and the number of available CPU cores. Greetings, Jacob -- Scripts for automating parts of the daily operations of your Linux system: http://edb.jacob-sparre.dk/tcsh-samling/ ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-16 14:29 ` Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) Jacob Sparre Andersen @ 2007-03-17 5:26 ` Jeffrey R. Carter 2007-03-17 17:22 ` Robert A Duff 2007-03-17 10:25 ` Dmitry A. Kazakov 1 sibling, 1 reply; 61+ messages in thread From: Jeffrey R. Carter @ 2007-03-17 5:26 UTC (permalink / raw) Jacob Sparre Andersen wrote: > > Isn't that the most efficient implementation on a POSIX system? I don't know. I mentioned it simply because it clearly prevents parallel function calls on multiprocessor systems, and because it seems to prevent the use of the ceiling locking policy on monoprocessors. -- Jeff Carter "I like it when the support group complains that they have insufficient data on mean time to repair bugs in Ada software." Robert I. Eachus 91 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-17 5:26 ` Jeffrey R. Carter @ 2007-03-17 17:22 ` Robert A Duff 2007-03-17 17:52 ` Jeffrey R. Carter 2007-03-17 23:06 ` Randy Brukardt 0 siblings, 2 replies; 61+ messages in thread From: Robert A Duff @ 2007-03-17 17:22 UTC (permalink / raw) "Jeffrey R. Carter" <jrcarter@acm.org> writes: > Jacob Sparre Andersen wrote: >> Isn't that the most efficient implementation on a POSIX system? > > I don't know. I mentioned it simply because it clearly prevents parallel > function calls on multiprocessor systems, and because it seems to > prevent the use of the ceiling locking policy on monoprocessors. I understand the point about multiprocessor systems, but why do you say ceiling locking won't work if function calls lock out other function calls? - Bob ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-17 17:22 ` Robert A Duff @ 2007-03-17 17:52 ` Jeffrey R. Carter 2007-03-17 23:06 ` Randy Brukardt 1 sibling, 0 replies; 61+ messages in thread From: Jeffrey R. Carter @ 2007-03-17 17:52 UTC (permalink / raw) Robert A Duff wrote: > > I understand the point about multiprocessor systems, but why do you > say ceiling locking won't work if function calls lock out > other function calls? As I understand ceiling locking (which probably isn't as well as you), its whole point is the absence of any actual lock. The necessary mutual exclusion is achieved through tasks' actual priorities and their positions in the ready-to-run queues when they're preempted. I guess the behavior will be the same with or without the lock. Ceiling locking might be a little faster since it eliminates the overhead of the explicit lock. -- Jeff Carter "My name is Jim, but most people call me ... Jim." Blazing Saddles 39 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-17 17:22 ` Robert A Duff 2007-03-17 17:52 ` Jeffrey R. Carter @ 2007-03-17 23:06 ` Randy Brukardt 2007-03-18 17:57 ` Robert A Duff 1 sibling, 1 reply; 61+ messages in thread From: Randy Brukardt @ 2007-03-17 23:06 UTC (permalink / raw) "Robert A Duff" <bobduff@shell01.TheWorld.com> wrote in message news:wcczm6bss2i.fsf@shell01.TheWorld.com... > "Jeffrey R. Carter" <jrcarter@acm.org> writes: ... > > I don't know. I mentioned it simply because it clearly prevents parallel > > function calls on multiprocessor systems, and because it seems to > > prevent the use of the ceiling locking policy on monoprocessors. > > I understand the point about multiprocessor systems, but why do you > say ceiling locking won't work if function calls lock out > other function calls? I don't understand Jeffrey's point either, but isn't it true that ceiling locking is essentially irrelevant on multiprocessor systems? That is, the point is to get rid of the lock, but you can't do that on a multi-processor (one processor can be running a lower priority task without anything being wrong, and that task had better be blocked from accessing the protected object). So ceiling locking has no advantage on a multi processor, it just restricts what you can do. If multiple cores continue to grow in popularity, it seems that the whole ceiling locking thing will become essentially irrelevant - just another case of premature optimization. (Can you tell I don't like ceiling locking much??) Randy. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-17 23:06 ` Randy Brukardt @ 2007-03-18 17:57 ` Robert A Duff 2007-03-19 21:49 ` Randy Brukardt 0 siblings, 1 reply; 61+ messages in thread From: Robert A Duff @ 2007-03-18 17:57 UTC (permalink / raw) "Randy Brukardt" <randy@rrsoftware.com> writes: > I don't understand Jeffrey's point either, but isn't it true that ceiling > locking is essentially irrelevant on multiprocessor systems? That is, the > point is to get rid of the lock, but you can't do that on a multi-processor > (one processor can be running a lower priority task without anything being > wrong, and that task had better be blocked from accessing the protected > object). So ceiling locking has no advantage on a multi processor, it just > restricts what you can do. If multiple cores continue to grow in popularity, > it seems that the whole ceiling locking thing will become essentially > irrelevant - just another case of premature optimization. > > (Can you tell I don't like ceiling locking much??) On a multiprocessor, you can use ceilings plus spin locks to protect protected objects. The point is to avoid any queued waiting to enter a PO (all the queuing is done with entry queues in this model). That can be made quite efficient, and the ceilings still prevent certain forms of priority inversion. So, no, I don't think ceilings are entirely irrelevant on multiprocessors. Ceilings are largely irrelevant if you're building Ada tasking on top of some other system (posix or windows threads, for example) that wants to do things differently. I mean, if you're using the one-Ada-task-per-thread model. Another issue is that Ada's priority model has nothing to say about processor affinity. - Bob ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-18 17:57 ` Robert A Duff @ 2007-03-19 21:49 ` Randy Brukardt 2007-03-20 0:55 ` Jeffrey R. Carter 0 siblings, 1 reply; 61+ messages in thread From: Randy Brukardt @ 2007-03-19 21:49 UTC (permalink / raw) "Robert A Duff" <bobduff@shell01.TheWorld.com> wrote in message news:wccmz2azb7j.fsf@shell01.TheWorld.com... > "Randy Brukardt" <randy@rrsoftware.com> writes: > > > I don't understand Jeffrey's point either, but isn't it true that ceiling > > locking is essentially irrelevant on multiprocessor systems? That is, the > > point is to get rid of the lock, but you can't do that on a multi-processor > > (one processor can be running a lower priority task without anything being > > wrong, and that task had better be blocked from accessing the protected > > object). So ceiling locking has no advantage on a multi processor, it just > > restricts what you can do. If multiple cores continue to grow in popularity, > > it seems that the whole ceiling locking thing will become essentially > > irrelevant - just another case of premature optimization. > > > > (Can you tell I don't like ceiling locking much??) > > On a multiprocessor, you can use ceilings plus spin locks to protect > protected objects. The point is to avoid any queued waiting to enter a > PO (all the queuing is done with entry queues in this model). That can > be made quite efficient, and the ceilings still prevent certain forms of > priority inversion. So, no, I don't think ceilings are entirely > irrelevant on multiprocessors. I believe that latter point (ceiling prevent some forms of priority inversion, because they boost the priority of everything in the PO), but I don't see the former. I don't see any reason that you would have to use a queued (rather than a spin) lock with or without ceiling locking. Until you have the spin lock, your priority doesn't matter (if you get pre-empted, so what?). And afterwards, it's just a special case of the normal potential priority inversion of a PO: if it isn't an issue for the entire object (assuming you can start a protected action), it surely won't matter how you start that action. The problem with ceiling locking is that it depends on boosting the priority of tasks. That means its a big problem for longer-running operations (such as I/O, which aren't allowed in protected operations for this reason). And it's a big problem for reusable libraries, which can't know ahead of time what the ceiling ought to be. (Make it too high, and critical tasks could be starved by lower-priority ones operating in the library, make it too low and tasks aren't even allowed to access the library.) Consider trying to set the ceiling for a container library implemented with protected objects. (At least we now can do this on the fly; in Ada 95, it was impossible.) > Ceilings are largely irrelevant if you're building Ada tasking on top of > some other system (posix or windows threads, for example) that wants to > do things differently. I mean, if you're using the > one-Ada-task-per-thread model. Is there any other kind? I haven't heard of many bare-machine Ada projects in recent years; almost everything is on top of some sort of RTOS or other OS. (That's too bad, really, for a lot of projects, Ada provides nearly everything you need in an RTOS.) > Another issue is that Ada's priority model has nothing to say about > processor affinity. True enough, but I don't think most OSes have much to say on this topic, either. That makes it pretty hard to say anything about it (unless you use a one-thread for all tasks model -- but that's a pessimising implementation!) Randy. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-19 21:49 ` Randy Brukardt @ 2007-03-20 0:55 ` Jeffrey R. Carter 2007-03-20 1:36 ` Randy Brukardt 0 siblings, 1 reply; 61+ messages in thread From: Jeffrey R. Carter @ 2007-03-20 0:55 UTC (permalink / raw) Randy Brukardt wrote: > > The problem with ceiling locking is that it depends on boosting the priority > of tasks. That means its a big problem for longer-running operations (such > as I/O, which aren't allowed in protected operations for this reason). And > it's a big problem for reusable libraries, which can't know ahead of time > what the ceiling ought to be. (Make it too high, and critical tasks could be > starved by lower-priority ones operating in the library, make it too low and > tasks aren't even allowed to access the library.) Consider trying to set the > ceiling for a container library implemented with protected objects. (At > least we now can do this on the fly; in Ada 95, it was impossible.) The PragmARCs use protected type Handle (Ceiling_Priority : System.Any_Priority := System.Default_Priority) is pragma Priority (Ceiling_Priority); They're Ada 95, and have been compiled with at least 2 different compilers. The problem is, as you pointed out, the library can't know what an appropriate priority is. Only the client can know that, and this allows the client to specify it. Is there something wrong with this approach? -- Jeff Carter "Son of a window-dresser." Monty Python & the Holy Grail 12 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-20 0:55 ` Jeffrey R. Carter @ 2007-03-20 1:36 ` Randy Brukardt 2007-03-20 16:32 ` Jeffrey R. Carter 2007-03-26 23:24 ` Robert A Duff 0 siblings, 2 replies; 61+ messages in thread From: Randy Brukardt @ 2007-03-20 1:36 UTC (permalink / raw) "Jeffrey R. Carter" <jrcarter@acm.org> wrote in message news:5iGLh.26236$PF.18838@attbi_s21... > Randy Brukardt wrote: > > > > The problem with ceiling locking is that it depends on boosting the priority > > of tasks. That means its a big problem for longer-running operations (such > > as I/O, which aren't allowed in protected operations for this reason). And > > it's a big problem for reusable libraries, which can't know ahead of time > > what the ceiling ought to be. (Make it too high, and critical tasks could be > > starved by lower-priority ones operating in the library, make it too low and > > tasks aren't even allowed to access the library.) Consider trying to set the > > ceiling for a container library implemented with protected objects. (At > > least we now can do this on the fly; in Ada 95, it was impossible.) > > The PragmARCs use > > protected type Handle > (Ceiling_Priority : System.Any_Priority := System.Default_Priority) > is > pragma Priority (Ceiling_Priority); > > They're Ada 95, and have been compiled with at least 2 different > compilers. The problem is, as you pointed out, the library can't know > what an appropriate priority is. Only the client can know that, and this > allows the client to specify it. > > Is there something wrong with this approach? Nothing serious, but it's less than ideal: (1) It complicates the interface; it makes the client worry about something that they probably don't care about. (2) It doesn't work for a protected interface (no discriminants); (3) It doesn't work as well if the protected object is wrapped in a tagged type, because tagged types can't have defaults on their discriminants - meaning that it always has to be specified. This also applies to any protected object that defines an interface (it can have discriminants, but no defaults, as it is considered tagged). I suspect that most libraries will fall into category (2) or (3) [the latter because of the need for clean-up, or simply that not all operations need to be protected]. In any case, it illustrates the difficulty of defining general-purpose task-safe libraries. There are a lot of gotchas that don't apply to single-tasking code. Randy. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-20 1:36 ` Randy Brukardt @ 2007-03-20 16:32 ` Jeffrey R. Carter 2007-03-20 17:51 ` Randy Brukardt 2007-03-26 23:24 ` Robert A Duff 1 sibling, 1 reply; 61+ messages in thread From: Jeffrey R. Carter @ 2007-03-20 16:32 UTC (permalink / raw) Randy Brukardt wrote: > > Nothing serious, but it's less than ideal: My concern was that you said it was impossible in Ada 95. It may not be ideal, but it doesn't seem to be impossible. -- Jeff Carter "Crucifixion's a doddle." Monty Python's Life of Brian 82 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-20 16:32 ` Jeffrey R. Carter @ 2007-03-20 17:51 ` Randy Brukardt 2007-03-21 0:10 ` Jeffrey R. Carter 2007-03-26 23:38 ` Robert A Duff 0 siblings, 2 replies; 61+ messages in thread From: Randy Brukardt @ 2007-03-20 17:51 UTC (permalink / raw) "Jeffrey R. Carter" <jrcarter@acm.org> wrote in message news:B0ULh.28294$PF.23764@attbi_s21... > Randy Brukardt wrote: > > > > Nothing serious, but it's less than ideal: > > My concern was that you said it was impossible in Ada 95. It may not be > ideal, but it doesn't seem to be impossible. I was thinking about a solution that doesn't clutter the cliient's view of the library with (usually) irrelevant details (such as whether the library is implemented with protected objects). After all, information hiding is good! If you're willing to ignore that (and you are), then it certainly is possible. But I was thinking about a library that uses as the ceiling whatever the highest priority it is called with: that can't be implemented in Ada 95. (Such a library would not make anything having to do with priorities visible.) If you really care about priorities, then your solution is probably better (it allows more analyzability). Which just demonstrates that you can't just make something "task-safe". You have to answer the question of "task-safe for what?". And that tends to lead to families of libraries rather than an all-in-one solution (like Ada.Containers) - or impacts reusability. Randy. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-20 17:51 ` Randy Brukardt @ 2007-03-21 0:10 ` Jeffrey R. Carter 2007-03-26 23:38 ` Robert A Duff 1 sibling, 0 replies; 61+ messages in thread From: Jeffrey R. Carter @ 2007-03-21 0:10 UTC (permalink / raw) Randy Brukardt wrote: > > I was thinking about a solution that doesn't clutter the cliient's view of > the library with (usually) irrelevant details (such as whether the library > is implemented with protected objects). After all, information hiding is > good! If you're willing to ignore that (and you are), then it certainly is > possible. But I was thinking about a library that uses as the ceiling > whatever the highest priority it is called with: that can't be implemented > in Ada 95. (Such a library would not make anything having to do with > priorities visible.) OK. We're talking about 2 different things. > If you really care about priorities, then your solution is probably better > (it allows more analyzability). Which just demonstrates that you can't just > make something "task-safe". You have to answer the question of "task-safe > for what?". And that tends to lead to families of libraries rather than an > all-in-one solution (like Ada.Containers) - or impacts reusability. Right. Concurrency adds an addition dimension. On the other hand, many multi-tasking applications I've worked on have used the PragmARCs' bounded, blocking queues, or something very like them. -- Jeff Carter "Crucifixion's a doddle." Monty Python's Life of Brian 82 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-20 17:51 ` Randy Brukardt 2007-03-21 0:10 ` Jeffrey R. Carter @ 2007-03-26 23:38 ` Robert A Duff 1 sibling, 0 replies; 61+ messages in thread From: Robert A Duff @ 2007-03-26 23:38 UTC (permalink / raw) "Randy Brukardt" <randy@rrsoftware.com> writes: > "Jeffrey R. Carter" <jrcarter@acm.org> wrote in message > news:B0ULh.28294$PF.23764@attbi_s21... >> Randy Brukardt wrote: >> > >> > Nothing serious, but it's less than ideal: >> >> My concern was that you said it was impossible in Ada 95. It may not be >> ideal, but it doesn't seem to be impossible. > > I was thinking about a solution that doesn't clutter the cliient's view of > the library with (usually) irrelevant details (such as whether the library > is implemented with protected objects). After all, information hiding is > good! Info hiding is good, but one can't always have it, sadly. If we have human beings specifying numeric priorities of things, then we have the property that priorities only have meaning relative to other priorities. And that property implies that priorities are a global issue -- can't be neatly encapsulated. That's true whether we're talking about priority of tasks, or PO's (ceilings), or I/O events, or anything else. So I wouldn't blame this on ceilings specifically -- I'd blame on the general model (numeric priorities, set by programmers). In a non-real-time context, it's easy to make it all automatic. I haven't typed "nice" on Unix very often lately, and yet my editor responds well, even when I have a compute-bound process also running. In a real-time context, we have various partial solutions (earliest deadline first, and so forth). - Bob ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-20 1:36 ` Randy Brukardt 2007-03-20 16:32 ` Jeffrey R. Carter @ 2007-03-26 23:24 ` Robert A Duff 1 sibling, 0 replies; 61+ messages in thread From: Robert A Duff @ 2007-03-26 23:24 UTC (permalink / raw) "Randy Brukardt" <randy@rrsoftware.com> writes: > Nothing serious, but it's less than ideal: > (1) It complicates the interface; it makes the client worry about something > that they probably don't care about. > (2) It doesn't work for a protected interface (no discriminants); > (3) It doesn't work as well if the protected object is wrapped in a tagged > type, because tagged types can't have defaults on their discriminants - > meaning that it always has to be specified. This also applies to any > protected object that defines an interface (it can have discriminants, but > no defaults, as it is considered tagged). These problems with discriminants go away in Ada 2005, because we have limited constructor functions. A very cool feature, IMHO. - Bob ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) 2007-03-16 14:29 ` Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) Jacob Sparre Andersen 2007-03-17 5:26 ` Jeffrey R. Carter @ 2007-03-17 10:25 ` Dmitry A. Kazakov 2007-03-18 17:15 ` Arguments for single-mutex-exclusion on protected types Jacob Sparre Andersen 1 sibling, 1 reply; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-17 10:25 UTC (permalink / raw) On Fri, 16 Mar 2007 15:29:15 +0100, Jacob Sparre Andersen wrote: > Jeffrey R. Carter wrote: > >> The last time I looked at the GNAT sources for protected objects, >> each had a mutex associated with it that was obtained before any >> action, even functions. I guess that hasn't changed. > > Isn't that the most efficient implementation on a POSIX system? Even if it were, why Ada RTL should care? > a) We use threads (and not processes) for tasks, since it is more > efficient and a better conceptual match. > > b) We know that it is considered good style to make protected > function bodies small. > > c) Since we use threads (a) we should use mutexes (and not > semaphores) to implement inter-task exclusion. ? (Mutex = semaphore with the count=1) > d) Since the protected function bodies can be assumed to be small > (b), there will be a relatively large overhead in keeping track of > the number of queued calls to protected functions and procedures. But queue is not the only possible implementation of a waitable object. Protected functions and procedures do not necessarily need queues, they could be implemented using a hierarchical (read-write) mutex. [*] For example, when the processor has an atomic increment instruction, a read-write mutex could be implemented very efficiently. The idea is that you *first* change the read count and then check if there is a potential writer without any locking. If everything is clear, you just continue 2-3 instructions later, otherwise you take a system mutex and manage all counts safely. > e) Since the protected function bodies can be assumed to be small > (b), the probability of colliding calls to protected functions is > relatively small. That surely depends on the program logic. [...] --------- * Leaving aside the problem of writer starvation in presence of hyper-active readers. However that is solvable too. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types 2007-03-17 10:25 ` Dmitry A. Kazakov @ 2007-03-18 17:15 ` Jacob Sparre Andersen 2007-03-18 18:50 ` Dmitry A. Kazakov 2007-03-20 12:38 ` Florian Weimer 0 siblings, 2 replies; 61+ messages in thread From: Jacob Sparre Andersen @ 2007-03-18 17:15 UTC (permalink / raw) Dmitry A. Kazakov wrote: > On Fri, 16 Mar 2007 15:29:15 +0100, Jacob Sparre Andersen wrote: >> Jeffrey R. Carter wrote: >>> The last time I looked at the GNAT sources for protected objects, >>> each had a mutex associated with it that was obtained before any >>> action, even functions. I guess that hasn't changed. >> >> Isn't that the most efficient implementation on a POSIX system? > > Even if it were, why Ada RTL should care? Because my Ada run-time is running on top of a POSIX system. >> c) Since we use threads (a) we should use mutexes (and not >> semaphores) to implement inter-task exclusion. > > ? (Mutex = semaphore with the count=1) I intended these terms to be understood in a POSIX context. It is my understanding that semaphores (semget(2), etc.) are inappropriate for inter-thread exclusion, whereas mutexes ("pthread_mutex_t" objects) are appropriate for this purpose. >> d) Since the protected function bodies can be assumed to be small >> (b), there will be a relatively large overhead in keeping track of >> the number of queued calls to protected functions and procedures. > > But queue is not the only possible implementation of a waitable object. > Protected functions and procedures do not necessarily need queues, they > could be implemented using a hierarchical (read-write) mutex. [*] For > example, when the processor has an atomic increment instruction, a > read-write mutex could be implemented very efficiently. The idea is that > you *first* change the read count and then check if there is a potential > writer without any locking. If everything is clear, you just continue 2-3 > instructions later, otherwise you take a system mutex and manage all counts > safely. Wouldn't we then expect our POSIX system to implement its mutexes using this feature? With SMP systems and multi-core CPU's, I doubt that atomic increment instructions are popular among the hardware designers (but I may be wrong - this is not my strongest area of expertise). >> e) Since the protected function bodies can be assumed to be small >> (b), the probability of colliding calls to protected functions is >> relatively small. > > That surely depends on the program logic. For "small" = "smaller than the code using the result", it will be true up to at least two CPU cores (unless you force the callers to synchronise - and then protected objects seem somewhat misplaced - any other exceptions). Please remember that the compiler should generate code which is most efficient in most cases. We shouldn't expect it to optimise for a few special cases at the cost of the more common cases. Greetings, Jacob -- "Only Hogwarts students really need spellcheckers" -- An anonymous RISKS reader ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types 2007-03-18 17:15 ` Arguments for single-mutex-exclusion on protected types Jacob Sparre Andersen @ 2007-03-18 18:50 ` Dmitry A. Kazakov 2007-03-20 12:38 ` Florian Weimer 1 sibling, 0 replies; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-18 18:50 UTC (permalink / raw) On Sun, 18 Mar 2007 18:15:13 +0100, Jacob Sparre Andersen wrote: > Dmitry A. Kazakov wrote: >> On Fri, 16 Mar 2007 15:29:15 +0100, Jacob Sparre Andersen wrote: >>> Jeffrey R. Carter wrote: > >>>> The last time I looked at the GNAT sources for protected objects, >>>> each had a mutex associated with it that was obtained before any >>>> action, even functions. I guess that hasn't changed. >>> >>> Isn't that the most efficient implementation on a POSIX system? >> >> Even if it were, why Ada RTL should care? > > Because my Ada run-time is running on top of a POSIX system. But Ada RTL could take advantage of knowing what goes on behind POSIX, which itself is most likely built on top of something else. >>> d) Since the protected function bodies can be assumed to be small >>> (b), there will be a relatively large overhead in keeping track of >>> the number of queued calls to protected functions and procedures. >> >> But queue is not the only possible implementation of a waitable object. >> Protected functions and procedures do not necessarily need queues, they >> could be implemented using a hierarchical (read-write) mutex. [*] For >> example, when the processor has an atomic increment instruction, a >> read-write mutex could be implemented very efficiently. The idea is that >> you *first* change the read count and then check if there is a potential >> writer without any locking. If everything is clear, you just continue 2-3 >> instructions later, otherwise you take a system mutex and manage all counts >> safely. > > Wouldn't we then expect our POSIX system to implement its mutexes > using this feature? I don't know. If POSIX were designed with Ada in mind, was it? In particular, does it have read-write mutexes? > With SMP systems and multi-core CPU's, I doubt > that atomic increment instructions are popular among the hardware > designers (but I may be wrong - this is not my strongest area of > expertise). Massively multiple-core systems could take ways quite different from what we are accustomed now. Forgotten and gone architectures like transputers could return. [ Consider a "transaction"-like design of protected objects based on memory replication (hardware-supported). The protected object state could then be held in the processor's local memory. Protected function would compare the change number of its copy and do everything without interlocking if it hasn't been incremented. ] >>> e) Since the protected function bodies can be assumed to be small >>> (b), the probability of colliding calls to protected functions is >>> relatively small. >> >> That surely depends on the program logic. > > For "small" = "smaller than the code using the result", it will be > true up to at least two CPU cores (unless you force the callers to > synchronise - and then protected objects seem somewhat misplaced - any > other exceptions). There are two sufficient parameters: 1. The number of processors which shifts the relation between protected / unprotected actions code towards protected, if protected implies mutual exclusion. 2. The number of distinct locks associated with protected objects. This is especially the issue if you wanted to built it on top of expensive and limited operating system resources. > Please remember that the compiler should generate code which is most > efficient in most cases. We shouldn't expect it to optimise for a few > special cases at the cost of the more common cases. And which relation between protected function/procedure/entry is most common? I doubt that anybody could have any reliable statistics of. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Arguments for single-mutex-exclusion on protected types 2007-03-18 17:15 ` Arguments for single-mutex-exclusion on protected types Jacob Sparre Andersen 2007-03-18 18:50 ` Dmitry A. Kazakov @ 2007-03-20 12:38 ` Florian Weimer 1 sibling, 0 replies; 61+ messages in thread From: Florian Weimer @ 2007-03-20 12:38 UTC (permalink / raw) * Jacob Sparre Andersen: >>> Isn't that the most efficient implementation on a POSIX system? >> >> Even if it were, why Ada RTL should care? > > Because my Ada run-time is running on top of a POSIX system. A lot of software provides its own locks, for some reason or other. And nowadays, POSIX has got read-write locks anyway. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-05 13:12 ` Dmitry A. Kazakov 2007-03-06 5:33 ` tmoran @ 2007-03-07 3:58 ` Steve 2007-03-07 8:39 ` Dmitry A. Kazakov 1 sibling, 1 reply; 61+ messages in thread From: Steve @ 2007-03-07 3:58 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1586 bytes --] "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:rgcukvs1j6ck$.1vjj69zzy1h56$.dlg@40tude.net... > On 5 Mar 2007 02:08:22 -0800, Ludovic Brenta wrote: > >> "� jpluto" wrote: >>> Has someone experience with Ada tasking (especially GNAT) on multi-core >>> systems? >>> >>> Show programs with several working tasks a performance boost on >>> dual-core or >>> quad-core cpus? >> >> On my dual-core Turion 64 with Debian GNU/Linux and GCC 4.1.2, all is >> well. Ada programs using tasking use both cores. I think it would work >> on most other platforms too, but YMMV. > > Apart from using both cores, does anybody know how protected objects > function on multi-cores? Especially: > > 1. Whether protected object's functions are indeed executed concurrently > when come from the tasks running on different cores? > > 2. What are the times required to take/release the protected object's spin > lock compared to ones on single core? > > 3. Can a task switch cores? If yes, what is the overhead of switching? On Windows, which uses symmetric multiprocessing, I belive two cores works the same as two CPU's. With two CPU's the two highest priority threads that are in the ready state run concurrently, so yes a task can switch cores. Sorry I don't know about the overhead of switching. I have run tests on a system with 2 CPU's and found that a single task that does a lot of switching winds up using 50% of the CPU time on both CPU's (from the task viewer). Regards, Steve (The Duck) > > -- > Regards, > Dmitry A. Kazakov > http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-07 3:58 ` Does Ada tasking profit from multi-core cpus? Steve @ 2007-03-07 8:39 ` Dmitry A. Kazakov 2007-03-08 5:21 ` Randy Brukardt 0 siblings, 1 reply; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-07 8:39 UTC (permalink / raw) On Tue, 6 Mar 2007 19:58:31 -0800, Steve wrote: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:rgcukvs1j6ck$.1vjj69zzy1h56$.dlg@40tude.net... >> On 5 Mar 2007 02:08:22 -0800, Ludovic Brenta wrote: >> >>> "� jpluto" wrote: >>>> Has someone experience with Ada tasking (especially GNAT) on multi-core >>>> systems? >>>> >>>> Show programs with several working tasks a performance boost on >>>> dual-core or >>>> quad-core cpus? >>> >>> On my dual-core Turion 64 with Debian GNU/Linux and GCC 4.1.2, all is >>> well. Ada programs using tasking use both cores. I think it would work >>> on most other platforms too, but YMMV. >> >> Apart from using both cores, does anybody know how protected objects >> function on multi-cores? Especially: >> >> 1. Whether protected object's functions are indeed executed concurrently >> when come from the tasks running on different cores? >> >> 2. What are the times required to take/release the protected object's spin >> lock compared to ones on single core? >> >> 3. Can a task switch cores? If yes, what is the overhead of switching? > > On Windows, which uses symmetric multiprocessing, I belive two cores works > the same as two CPU's. With two CPU's the two highest priority threads that > are in the ready state run concurrently, so yes a task can switch cores. > Sorry I don't know about the overhead of switching. > I have run tests on a system with 2 CPU's and found that a single task that > does a lot of switching winds up using 50% of the CPU time on both CPU's > (from the task viewer). Just a side note, the Windows API GetThreadTimes (which the viewer apparently uses) is corrupted. It counts complete time quants rather than the performance counter ticks. So, potentially you could observe 1% under factual 99% CPU load. The bug should appear for threads performing much synchronization, because they leave the processor before the current quant expiration. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-07 8:39 ` Dmitry A. Kazakov @ 2007-03-08 5:21 ` Randy Brukardt 2007-03-08 10:15 ` Dmitry A. Kazakov 0 siblings, 1 reply; 61+ messages in thread From: Randy Brukardt @ 2007-03-08 5:21 UTC (permalink / raw) "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:1lq9zxgrnvfjx$.17ip3w3ei4xdb.dlg@40tude.net... ... > Just a side note, the Windows API GetThreadTimes (which the viewer > apparently uses) is corrupted. It counts complete time quants rather than > the performance counter ticks. So, potentially you could observe 1% under > factual 99% CPU load. The bug should appear for threads performing much > synchronization, because they leave the processor before the current quant > expiration. I wouldn't call it "corrupted"; it's just not very accurate (given that it can only register time with a granularity of 0.01 sec). I don't think there is any other way to find out CPU use, though, as the performance counter provides wall time and thus isn't very useful to find out how much a thread is running. (I've tried to figure out how to implement Ada.Execution_Time on Windows...) Randy. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-08 5:21 ` Randy Brukardt @ 2007-03-08 10:15 ` Dmitry A. Kazakov 2007-03-08 21:18 ` accuracy (was: Does Ada tasking profit from multi-core cpus?) Björn Persson 0 siblings, 1 reply; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-08 10:15 UTC (permalink / raw) On Wed, 7 Mar 2007 23:21:04 -0600, Randy Brukardt wrote: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:1lq9zxgrnvfjx$.17ip3w3ei4xdb.dlg@40tude.net... > ... >> Just a side note, the Windows API GetThreadTimes (which the viewer >> apparently uses) is corrupted. It counts complete time quants rather than >> the performance counter ticks. So, potentially you could observe 1% under >> factual 99% CPU load. The bug should appear for threads performing much >> synchronization, because they leave the processor before the current quant >> expiration. > > I wouldn't call it "corrupted"; it's just not very accurate (given that it > can only register time with a granularity of 0.01 sec). If it were just inaccurate then the obtained values would be like ThreadTime + Error where Error has zero mean. That is just not the case. ThreadTime has a systematic error => in my view corrupt. It simply does not measure what its name assumes. > I don't think there > is any other way to find out CPU use, though, as the performance counter > provides wall time and thus isn't very useful to find out how much a thread > is running. (I've tried to figure out how to implement Ada.Execution_Time on > Windows...) Yes, at the user level there seems to be no way to do it. The performance counter should be queried at the scheduling points, and the increment of should be accumulated for the thread possessing the processor. Only the OS kernel could do that. Ada.Execution_Time looks quite a problem for Windows... -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* accuracy (was: Does Ada tasking profit from multi-core cpus?) 2007-03-08 10:15 ` Dmitry A. Kazakov @ 2007-03-08 21:18 ` Björn Persson 2007-03-09 8:33 ` accuracy Dmitry A. Kazakov 0 siblings, 1 reply; 61+ messages in thread From: Björn Persson @ 2007-03-08 21:18 UTC (permalink / raw) Dmitry A. Kazakov wrote: > If it were just inaccurate then the obtained values would be like > ThreadTime + Error where Error has zero mean. No, that's "imprecise". Shots distributed evenly over a shooting target is bad precision. A tight group of shots at one side of the target is good precision but bad accuracy. -- Bj�rn Persson PGP key A88682FD omb jor ers @sv ge. r o.b n.p son eri nu ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-08 21:18 ` accuracy (was: Does Ada tasking profit from multi-core cpus?) Björn Persson @ 2007-03-09 8:33 ` Dmitry A. Kazakov 2007-03-10 1:39 ` accuracy Randy Brukardt 0 siblings, 1 reply; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-09 8:33 UTC (permalink / raw) On Thu, 08 Mar 2007 21:18:11 GMT, Bj�rn Persson wrote: > Dmitry A. Kazakov wrote: > >> If it were just inaccurate then the obtained values would be like >> ThreadTime + Error where Error has zero mean. > > No, that's "imprecise". No. The sets "accurate" and "precise" do not contain each other. Which means that measurement can be precise and accurate, precise but inaccurate, imprecise but accurate or imprecise and inaccurate. As for GetThreadTimes, its absolute precision is 1ms. Its suggested absolute accuracy should be one time quant (which duration depends on the system settings). The later does not hold, because the error is in fact not bounded. > Shots distributed evenly over a shooting target is > bad precision. A tight group of shots at one side of the target is good > precision but bad accuracy. That's right. This is why GetThreadTimes is not just inaccurate, it is precisely wrong. BTW, precisely wrong /= imprecise. (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-09 8:33 ` accuracy Dmitry A. Kazakov @ 2007-03-10 1:39 ` Randy Brukardt 2007-03-10 9:11 ` accuracy Dmitry A. Kazakov 2007-03-10 14:53 ` accuracy Stephen Leake 0 siblings, 2 replies; 61+ messages in thread From: Randy Brukardt @ 2007-03-10 1:39 UTC (permalink / raw) "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:p87mtsns4of0.hhld0y03415s.dlg@40tude.net... ... > As for GetThreadTimes, its absolute precision is 1ms. No, it is 10ms. The interface offers more precision than is actually provided. > Its suggested > absolute accuracy should be one time quant (which duration depends on the > system settings). The later does not hold, because the error is in fact not > bounded. I believe that the function was intended for profiling and performance monitoring, and it surely is not different than any other technique I've ever seen used for that. All such techniques give you a statistical approximation to the real behavior. You just have to run them long enough to make the results statistically significant. It's theoretically possible for a thread to run in sync so that it never gets a tick, but I've never seen (or heard of) an instance of that happening in a real program being profiled. On a real DOS or Windows system, there is too much asynchronous going on for any "lock-step" to continue for long. In any case, it is statistical analysis that has to be applied here; it's clear that the error can be reduced by lengthening the runtime (presuming that you are willing to assume, as I am, that behavior is essentially random if looked at over a long enough time period). My main objection to this data is the gigantic tick rate, which means to get anything meaningful, you have to run programs for a very long time (at least a thousand times longer than the tick, and generally a thousand times the "real value" of a counter before it is sufficiently significant). OTOH, I don't want to use Ada.Execution_Times to control a program's behavior. (I think that's a bit dubious, given that a hardware change would invalidate the assumptions, and typically the important thing is the response time: which depends on the wall-time, not the CPU time. But a self-contained embedded system has more control than a program running on Windows, so it might make sense somewhere.) Randy. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-10 1:39 ` accuracy Randy Brukardt @ 2007-03-10 9:11 ` Dmitry A. Kazakov 2007-03-11 3:03 ` accuracy Randy Brukardt 2007-03-10 14:53 ` accuracy Stephen Leake 1 sibling, 1 reply; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-10 9:11 UTC (permalink / raw) On Fri, 9 Mar 2007 19:39:30 -0600, Randy Brukardt wrote: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:p87mtsns4of0.hhld0y03415s.dlg@40tude.net... > ... >> As for GetThreadTimes, its absolute precision is 1ms. > > No, it is 10ms. The interface offers more precision than is actually > provided. You can force it to 1ms using timeBeginPeriod (1); This what any tasking Ada program should not forget do, when it starts under Windows. I hope that GNAT RTL does this... (I am too lazy to check it, but probably XP already has 1ms as the default) >> Its suggested >> absolute accuracy should be one time quant (which duration depends on the >> system settings). The later does not hold, because the error is in fact not >> bounded. > > I believe that the function was intended for profiling and performance > monitoring, and it surely is not different than any other technique I've > ever seen used for that. All such techniques give you a statistical > approximation to the real behavior. You just have to run them long enough to > make the results statistically significant. (under the condition that the error mean is 0, which unfortunately is not the case) > It's theoretically possible for a thread to run in sync so that it never > gets a tick, but I've never seen (or heard of) an instance of that happening > in a real program being profiled. On a real DOS or Windows system, there is > too much asynchronous going on for any "lock-step" to continue for long. Which theoretical case hit me. We performed a QoS studio of our distributed middleware and wished to measure the time its services require for publishing and subscribing, separately from delivery times. To our amazement times of some services were solid 0, no matter how long and how many cycles we run the test! I started to investigate and discovered that mess. > In any case, it is statistical analysis that has to be applied here; it's > clear that the error can be reduced by lengthening the runtime (presuming > that you are willing to assume, as I am, that behavior is essentially random > if looked at over a long enough time period). (plus some assumption about the error mean. Otherwise the averaged result can be any.) > OTOH, I don't want to use Ada.Execution_Times to control a program's > behavior. (I think that's a bit dubious, given that a hardware change would > invalidate the assumptions, and typically the important thing is the > response time: which depends on the wall-time, not the CPU time. But a > self-contained embedded system has more control than a program running on > Windows, so it might make sense somewhere.) I believe there are logical/philosophical reasons why a program shall not change its behavior depending on its ... behaviour. (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-10 9:11 ` accuracy Dmitry A. Kazakov @ 2007-03-11 3:03 ` Randy Brukardt 2007-03-11 5:21 ` accuracy tmoran 2007-03-11 8:52 ` accuracy Dmitry A. Kazakov 0 siblings, 2 replies; 61+ messages in thread From: Randy Brukardt @ 2007-03-11 3:03 UTC (permalink / raw) "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:r5rrsmngabou$.nc73hmyyugax.dlg@40tude.net... > On Fri, 9 Mar 2007 19:39:30 -0600, Randy Brukardt wrote: > > > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > > news:p87mtsns4of0.hhld0y03415s.dlg@40tude.net... > > ... > >> As for GetThreadTimes, its absolute precision is 1ms. > > > > No, it is 10ms. The interface offers more precision than is actually > > provided. > > You can force it to 1ms using > > timeBeginPeriod (1); That's documented as applying to "Multimedia timers", whatever those are. I wouldn't want to assume it would work on thread times and the like which have nothing to do with multimedia. Besides, why wouldn't the maximum accuracy alway be used if it is possible? What possible value is there in using a less accurate time (given that you still have to do the math on every switch no matter what the accuracy is involved)?? > This what any tasking Ada program should not forget do, when it starts > under Windows. I hope that GNAT RTL does this... Why? Ada.Real_Time is built on top of the performance counters, so is all of your tasking programs. > (I am too lazy to check it, but probably XP already has 1ms as the default) I don't think so, I tried my profiling code there, too, and didn't get any more accuracy. ... > > It's theoretically possible for a thread to run in sync so that it never > > gets a tick, but I've never seen (or heard of) an instance of that happening > > in a real program being profiled. On a real DOS or Windows system, there is > > too much asynchronous going on for any "lock-step" to continue for long. > > Which theoretical case hit me. We performed a QoS studio of our distributed > middleware and wished to measure the time its services require for > publishing and subscribing, separately from delivery times. To our > amazement times of some services were solid 0, no matter how long and how > many cycles we run the test! I started to investigate and discovered that > mess. Humm, I find that nearly impossible to believe. I'd expect some other cause (*any* other cause) before I believed that. (Outside of device drivers, anyway, which would be a lousy place to use this sort of timing.) I guess I'd have to see a detailed example of that for myself before I believed it. Randy. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 3:03 ` accuracy Randy Brukardt @ 2007-03-11 5:21 ` tmoran 2007-03-11 8:52 ` accuracy Dmitry A. Kazakov 1 sibling, 0 replies; 61+ messages in thread From: tmoran @ 2007-03-11 5:21 UTC (permalink / raw) > > too much asynchronous going on for any "lock-step" to continue for long. If random stuff is independent that will generate noise that a large sample size can minimize, but there may be "caravan" effects or sample-frequency aliasing or some such things making the timing samples non-independent. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 3:03 ` accuracy Randy Brukardt 2007-03-11 5:21 ` accuracy tmoran @ 2007-03-11 8:52 ` Dmitry A. Kazakov 2007-03-11 13:57 ` accuracy Pascal Obry 2007-03-12 20:20 ` accuracy Randy Brukardt 1 sibling, 2 replies; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-11 8:52 UTC (permalink / raw) On Sat, 10 Mar 2007 21:03:41 -0600, Randy Brukardt wrote: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:r5rrsmngabou$.nc73hmyyugax.dlg@40tude.net... >> On Fri, 9 Mar 2007 19:39:30 -0600, Randy Brukardt wrote: >> >>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message >>> news:p87mtsns4of0.hhld0y03415s.dlg@40tude.net... >>> ... >>>> As for GetThreadTimes, its absolute precision is 1ms. >>> >>> No, it is 10ms. The interface offers more precision than is actually >>> provided. >> >> You can force it to 1ms using >> >> timeBeginPeriod (1); > > That's documented as applying to "Multimedia timers", whatever those are. I > wouldn't want to assume it would work on thread times and the like which > have nothing to do with multimedia. Besides, why wouldn't the maximum > accuracy alway be used if it is possible? What possible value is there in > using a less accurate time (given that you still have to do the math on > every switch no matter what the accuracy is involved)?? The side effect of timeBeginPeriod(1) is in changing the granularity of timing calls, which in turn has an impact on the overall threads scheduling. For example, Sleep(1) would indeed wait for 1ms, not 10ms. That would be difficult to have if threads wouldn't be rescheduled faster. So timeBeginPeriod achieves this by changing the time resolution of the system scheduler, the accuracy of the time slices should change as well. >> This what any tasking Ada program should not forget do, when it starts >> under Windows. I hope that GNAT RTL does this... > > Why? Ada.Real_Time is built on top of the performance counters, so is all of > your tasking programs. No, the reason is to get a finer scheduler resolution. 10ms was chosen in the times when PCs were sufficiently slower. Now one can and should reschedule at 1ms tact, or even faster. BTW, Ada.Calendar should use the performance counters as well, because system time calls have catastrophic accuracy. In C++ programs I translate performance counters into system time using some statistical algorithm. Better it would be to do on the driver level. I don't know why MS still keeps it this way. >>> It's theoretically possible for a thread to run in sync so that it never >>> gets a tick, but I've never seen (or heard of) an instance of that happening >>> in a real program being profiled. On a real DOS or Windows system, there is >>> too much asynchronous going on for any "lock-step" to continue for long. >> >> Which theoretical case hit me. We performed a QoS studio of our distributed >> middleware and wished to measure the time its services require for >> publishing and subscribing, separately from delivery times. To our >> amazement times of some services were solid 0, no matter how long and how >> many cycles we run the test! I started to investigate and discovered that >> mess. > > Humm, I find that nearly impossible to believe. I'd expect some other cause > (*any* other cause) before I believed that. (Outside of device drivers, > anyway, which would be a lousy place to use this sort of timing.) I guess > I'd have to see a detailed example of that for myself before I believed it. There is a plausible explanation of the effect. When a middleware variable gets changed the middleware stores it in its memory updates some internal structures and returns to the caller. The physical publishing I/O activity happens on the context of another thread and even other process. This is why the thread time of the publisher was always 0, it simply took less than 1ms and the caller in the test application entered sleep immediately after publishing the variable. Even delivery was shorter, about 250um total latency. GetThreadTimes is absolutely unsuitable to measure anything like that. When I faced the problem, I found that some guys in a similar study (something about Java) had it as well. They wrote an OS extension. (Some people have much time to spare (:-)) They interrupted Windows each nus, inspected which thread had the processor and let it continue. This way they could get at true thread times. Quite complicated for Ada.Execution_Times, isn't it? (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 8:52 ` accuracy Dmitry A. Kazakov @ 2007-03-11 13:57 ` Pascal Obry 2007-03-11 14:16 ` accuracy Dmitry A. Kazakov 2007-03-12 20:20 ` accuracy Randy Brukardt 1 sibling, 1 reply; 61+ messages in thread From: Pascal Obry @ 2007-03-11 13:57 UTC (permalink / raw) To: mailbox Dmitry A. Kazakov a �crit : > BTW, Ada.Calendar should use the performance counters as well, because And it does. Pascal. -- --|------------------------------------------------------ --| Pascal Obry Team-Ada Member --| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE --|------------------------------------------------------ --| http://www.obry.net --| "The best way to travel is by means of imagination" --| --| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 13:57 ` accuracy Pascal Obry @ 2007-03-11 14:16 ` Dmitry A. Kazakov 2007-03-11 14:37 ` accuracy Pascal Obry 0 siblings, 1 reply; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-11 14:16 UTC (permalink / raw) On Sun, 11 Mar 2007 14:57:56 +0100, Pascal Obry wrote: > Dmitry A. Kazakov a �crit : > >> BTW, Ada.Calendar should use the performance counters as well, because > > And it does. Good to know it. Do you know how does it synchronize counter ticks with GetSystemTime? The method I am using is a thread that periodically adjusts the offset. BTW performance counters: http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q274323& -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 14:16 ` accuracy Dmitry A. Kazakov @ 2007-03-11 14:37 ` Pascal Obry 2007-03-11 15:50 ` accuracy Dmitry A. Kazakov 0 siblings, 1 reply; 61+ messages in thread From: Pascal Obry @ 2007-03-11 14:37 UTC (permalink / raw) To: mailbox Dmitry A. Kazakov a �crit : > On Sun, 11 Mar 2007 14:57:56 +0100, Pascal Obry wrote: > >> Dmitry A. Kazakov a �crit : >> >>> BTW, Ada.Calendar should use the performance counters as well, because >> And it does. > > Good to know it. Do you know how does it synchronize counter ticks with > GetSystemTime? The method I am using is a thread that periodically adjusts > the offset. By checking a base time against GetSystemTimeAsFileTime to avoid using another thread. This way you have the accuracy of the performance counter and can adjust the current time if needed (DST, manual changes...). > BTW performance counters: > > http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q274323& A known indeed. But this affect only some hardwares. Pascal. -- --|------------------------------------------------------ --| Pascal Obry Team-Ada Member --| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE --|------------------------------------------------------ --| http://www.obry.net --| "The best way to travel is by means of imagination" --| --| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 14:37 ` accuracy Pascal Obry @ 2007-03-11 15:50 ` Dmitry A. Kazakov 2007-03-11 17:38 ` accuracy Pascal Obry 0 siblings, 1 reply; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-11 15:50 UTC (permalink / raw) On Sun, 11 Mar 2007 15:37:03 +0100, Pascal Obry wrote: > By checking a base time against GetSystemTimeAsFileTime to avoid using > another thread. This way you have the accuracy of the performance > counter and can adjust the current time if needed (DST, manual changes...). I.e. it does this only once. The potential problems with this is that: 1. The accuracy of GetSystemTimeAsFileTime is very low. 2. All system calls have non-zero latencies. 3. There is a chance that the thread will be preempted between querying the counter and calling to a system time query, which would additionally increase the experienced latency up to milliseconds. 4. It is unclear if the time source of the system time is derived from the performance counter. If not, they will divergent. A thread is used to accumulate measurements of performance counters and system time readings to get a best possible estimation out of it. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 15:50 ` accuracy Dmitry A. Kazakov @ 2007-03-11 17:38 ` Pascal Obry 2007-03-11 18:48 ` accuracy Dmitry A. Kazakov 0 siblings, 1 reply; 61+ messages in thread From: Pascal Obry @ 2007-03-11 17:38 UTC (permalink / raw) To: mailbox Dmitry, > I.e. it does this only once. The potential problems with this is that: No it is not does once but each time a clock is requested for adjustment if needed. > 1. The accuracy of GetSystemTimeAsFileTime is very low. Yes but it is not used for final clock returned value. > 2. All system calls have non-zero latencies. > > 3. There is a chance that the thread will be preempted between querying the > counter and calling to a system time query, which would additionally > increase the experienced latency up to milliseconds. During initialization of the runtime the performance counter and the system time are read until this is done during a minimal amount of time. This gives the base reference for the performance counter and the os time. > 4. It is unclear if the time source of the system time is derived from the > performance counter. If not, they will divergent. No it is no. If this is still unclear have a look at the corresponding implementation in the GNAT sources. Pascal. -- --|------------------------------------------------------ --| Pascal Obry Team-Ada Member --| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE --|------------------------------------------------------ --| http://www.obry.net --| "The best way to travel is by means of imagination" --| --| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 17:38 ` accuracy Pascal Obry @ 2007-03-11 18:48 ` Dmitry A. Kazakov 0 siblings, 0 replies; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-11 18:48 UTC (permalink / raw) On Sun, 11 Mar 2007 18:38:58 +0100, Pascal Obry wrote: >> I.e. it does this only once. The potential problems with this is that: > > No it is not does once but each time a clock is requested for adjustment > if needed. You mean each time Clock is called? That would be a very expensive implementation. A periodic task would be less demanding. But ideally it should be something sitting close to the PCI bus. >> 1. The accuracy of GetSystemTimeAsFileTime is very low. > > Yes but it is not used for final clock returned value. How so? Ada.Calendar has Split and Time_Of. Any error in GetSystemTimeAsFileTime will show itself there, if not compensated using some digital filtering technique. >> 2. All system calls have non-zero latencies. >> >> 3. There is a chance that the thread will be preempted between querying the >> counter and calling to a system time query, which would additionally >> increase the experienced latency up to milliseconds. > > During initialization of the runtime the performance counter and the > system time are read until this is done during a minimal amount of time. > This gives the base reference for the performance counter and the os time. I am lost you here. My interpretation of what you wrote above is that the factor and offset between the readings of performance counter and GetSystemTimeAsFileTime is readjusted each time Clock is called. Anyway there are so many issues when reading from any clock sources, which could prevent you from getting good measurements at once, like bus load, interrupts, preempting etc. Also when the source of GetSystemTimeAsFileTime is ticks, it could have systematic digitization error (1-2ms), which cannot be compensated quickly. > If this is still unclear have a look at the corresponding implementation > in the GNAT sources. That is what I am trying to avoid... (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-11 8:52 ` accuracy Dmitry A. Kazakov 2007-03-11 13:57 ` accuracy Pascal Obry @ 2007-03-12 20:20 ` Randy Brukardt 2007-03-13 9:33 ` accuracy Dmitry A. Kazakov 1 sibling, 1 reply; 61+ messages in thread From: Randy Brukardt @ 2007-03-12 20:20 UTC (permalink / raw) "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:1lo7kf2cw2mog$.94hkrwmeyhqy.dlg@40tude.net... > On Sat, 10 Mar 2007 21:03:41 -0600, Randy Brukardt wrote: > > > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > > news:r5rrsmngabou$.nc73hmyyugax.dlg@40tude.net... > >> On Fri, 9 Mar 2007 19:39:30 -0600, Randy Brukardt wrote: > >> > >>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > >>> news:p87mtsns4of0.hhld0y03415s.dlg@40tude.net... > >>> ... > >>>> As for GetThreadTimes, its absolute precision is 1ms. > >>> > >>> No, it is 10ms. The interface offers more precision than is actually > >>> provided. > >> > >> You can force it to 1ms using > >> > >> timeBeginPeriod (1); > > > > That's documented as applying to "Multimedia timers", whatever those are. I > > wouldn't want to assume it would work on thread times and the like which > > have nothing to do with multimedia. Besides, why wouldn't the maximum > > accuracy alway be used if it is possible? What possible value is there in > > using a less accurate time (given that you still have to do the math on > > every switch no matter what the accuracy is involved)?? > > The side effect of timeBeginPeriod(1) is in changing the granularity of > timing calls, which in turn has an impact on the overall threads > scheduling. For example, Sleep(1) would indeed wait for 1ms, not 10ms. That > would be difficult to have if threads wouldn't be rescheduled faster. So > timeBeginPeriod achieves this by changing the time resolution of the system > scheduler, the accuracy of the time slices should change as well. I can believe that's true, but there is no indication of that in the official Microsoft documentation. It only talks about multimedia; it never mentions "sleep", for instance, so I would not want to depend on it changing the behavior of those functions. (We only call sleep for the full 0.01 periods we want to sleep; the fractional part if any is handled by a busy-wait loop. Nasty, but the alternative is very slow running applications.) In any event, that has nothing to do with the accuracy of GetThreadTimes. There is no reason at all to have less accuracy there, because the math is the same either way: they have to work to discard accuracy. So I don't see why changing something in the multimedia library would matter. > >> This what any tasking Ada program should not forget do, when it starts > >> under Windows. I hope that GNAT RTL does this... > > > > Why? Ada.Real_Time is built on top of the performance counters, so is all of > > your tasking programs. > > No, the reason is to get a finer scheduler resolution. 10ms was chosen in > the times when PCs were sufficiently slower. Now one can and should > reschedule at 1ms tact, or even faster. Again, I don't see any indication in the Microsoft documentation that this function has any effect on scheduling. And if it does, what the heck is to doing defined in and documented as part of multimedia extensions? (OK, you can't answer that.) > BTW, Ada.Calendar should use the performance counters as well, because > system time calls have catastrophic accuracy. In C++ programs I translate > performance counters into system time using some statistical algorithm. > Better it would be to do on the driver level. I don't know why MS still > keeps it this way. Janus/Ada certainly does this. It uses the performance counter, and only checks the real clock periodically to check for massive changes. Moreover, it only re-bases if the time has changed more then 5 minutes in either direction from the one determined by the performance counter (else we believe the performance counter). Tom Moran also designed code to fix the "clock leap" problem of the performance counter (he had a computer with that problem), and the is also part of our Calendar package. ... > > Humm, I find that nearly impossible to believe. I'd expect some other cause > > (*any* other cause) before I believed that. (Outside of device drivers, > > anyway, which would be a lousy place to use this sort of timing.) I guess > > I'd have to see a detailed example of that for myself before I believed it. > > There is a plausible explanation of the effect. When a middleware variable > gets changed the middleware stores it in its memory updates some internal > structures and returns to the caller. The physical publishing I/O activity > happens on the context of another thread and even other process. This is > why the thread time of the publisher was always 0, it simply took less than > 1ms and the caller in the test application entered sleep immediately after > publishing the variable. Even delivery was shorter, about 250um total > latency. GetThreadTimes is absolutely unsuitable to measure anything like > that. I see, that says that the test harness was causing the effect. That doesn't surprise me at all: if you *try* to put yourself into lock step with the system, you'll surely have trouble. Wake up, do something short, then sleep surely would have that effect. A more realistic test would probably give real data. Anyway, it's something to watch out for. (It wouldn't happen in the current version of Janus/Ada, because Janus/Ada doesn't use threads -- so the entire program is one thread, and most programs rarely sleep. But that will probably change over time.) > When I faced the problem, I found that some guys in a similar study > (something about Java) had it as well. They wrote an OS extension. (Some > people have much time to spare (:-)) They interrupted Windows each nus, > inspected which thread had the processor and let it continue. This way they > could get at true thread times. Quite complicated for Ada.Execution_Times, > isn't it? (:-)) That's why we have 1.1.3(6). This certainly seems to qualify as "impractical"! Randy. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-12 20:20 ` accuracy Randy Brukardt @ 2007-03-13 9:33 ` Dmitry A. Kazakov 0 siblings, 0 replies; 61+ messages in thread From: Dmitry A. Kazakov @ 2007-03-13 9:33 UTC (permalink / raw) On Mon, 12 Mar 2007 15:20:46 -0500, Randy Brukardt wrote: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:1lo7kf2cw2mog$.94hkrwmeyhqy.dlg@40tude.net... >> On Sat, 10 Mar 2007 21:03:41 -0600, Randy Brukardt wrote: > Janus/Ada certainly does this. It uses the performance counter, and only > checks the real clock periodically to check for massive changes. Moreover, > it only re-bases if the time has changed more then 5 minutes in either > direction from the one determined by the performance counter (else we > believe the performance counter). > > Tom Moran also designed code to fix the "clock leap" problem of the > performance counter (he had a computer with that problem), and the is also > part of our Calendar package. I read somewhere that it is also possible to access the processor ticks which have resolution of nanoseconds and are very lightweight to read. The problem is that they fluctuate with the processor frequency. It would be interesting to try to tie them to slower, but more reliable performance counters in Ada.Real_Time.Clock, and the later to the system time in Ada.Calendar.Clock: ns ticks <-----> perf. counter <-----> sys time (Ada.Real_Time.Clock) (Ada.Calendar.Clock) So basically Ada.Real_Time.Clock would read ns ticks and adjust them according to the accumulated statistics from performance counters. That should be extremely fast comparing to a call QueryPerformanceCounter. Ada.Calendar.Clock would do the same but adjust further to the system time. The drawback is that the statistic would be accumulated in an extra task. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-10 1:39 ` accuracy Randy Brukardt 2007-03-10 9:11 ` accuracy Dmitry A. Kazakov @ 2007-03-10 14:53 ` Stephen Leake 2007-03-10 18:36 ` accuracy Cesar Rabak 1 sibling, 1 reply; 61+ messages in thread From: Stephen Leake @ 2007-03-10 14:53 UTC (permalink / raw) "Randy Brukardt" <randy@rrsoftware.com> writes: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:p87mtsns4of0.hhld0y03415s.dlg@40tude.net... > ... >> As for GetThreadTimes, its absolute precision is 1ms. > > No, it is 10ms. The interface offers more precision than is actually > provided. Technically, "precision" is the number of bits in a value. "accuracy" is how many of those bits are meaningful. -- -- Stephe ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: accuracy 2007-03-10 14:53 ` accuracy Stephen Leake @ 2007-03-10 18:36 ` Cesar Rabak 0 siblings, 0 replies; 61+ messages in thread From: Cesar Rabak @ 2007-03-10 18:36 UTC (permalink / raw) Stephen Leake escreveu: > "Randy Brukardt" <randy@rrsoftware.com> writes: > >> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message >> news:p87mtsns4of0.hhld0y03415s.dlg@40tude.net... >> ... >>> As for GetThreadTimes, its absolute precision is 1ms. >> No, it is 10ms. The interface offers more precision than is actually >> provided. > > Technically, "precision" is the number of bits in a value. "accuracy" > is how many of those bits are meaningful. > Isn't the number of bits of the value the 'resolution', precision being a way of describing the dispersion of the values and accuracy the distance to the actual quantity�? -- Cesar Rabak [1] The last two defs already written in this thread w/other words. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Does Ada tasking profit from multi-core cpus? 2007-03-04 17:54 ` jpluto 2007-03-05 10:08 ` Ludovic Brenta @ 2007-03-05 18:46 ` Jeffrey R. Carter 1 sibling, 0 replies; 61+ messages in thread From: Jeffrey R. Carter @ 2007-03-05 18:46 UTC (permalink / raw) jpluto wrote: > > Has someone experience with Ada tasking (especially GNAT) on multi-core > systems? > > Show programs with several working tasks a performance boost on > dual-core or quad-core cpus? This has been discussed more than once. A search at groups.google.com should find those for you. GNAT certainly distributes tasks across multiple cores. -- Jeff Carter "If you think you got a nasty taunting this time, you ain't heard nothing yet!" Monty Python and the Holy Grail 23 ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2007-03-26 23:38 UTC | newest] Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-01-29 11:57 Does Ada tasking profit from multi-core cpus? Gerd 2007-01-29 12:04 ` Georg Bauhaus 2007-01-30 13:55 ` Gerd 2007-02-09 10:18 ` karl 2007-02-09 11:03 ` Stefan Lucks 2007-02-09 11:35 ` Ludovic Brenta 2007-03-04 17:54 ` jpluto 2007-03-05 10:08 ` Ludovic Brenta 2007-03-05 13:12 ` Dmitry A. Kazakov 2007-03-06 5:33 ` tmoran 2007-03-06 8:44 ` Dmitry A. Kazakov 2007-03-07 7:52 ` tmoran 2007-03-07 9:31 ` tmoran 2007-03-06 9:40 ` Colin Paul Gloster 2007-03-06 12:47 ` Jeffrey Creem 2007-03-06 14:44 ` Georg Bauhaus 2007-03-06 16:53 ` Dr. Adrian Wrigley 2007-03-06 18:58 ` tmoran 2007-03-07 10:11 ` Colin Paul Gloster 2007-03-07 18:47 ` tmoran 2007-03-06 18:51 ` Jeffrey R. Carter 2007-03-16 14:29 ` Arguments for single-mutex-exclusion on protected types (Was: Does Ada tasking profit from multi-core cpus?) Jacob Sparre Andersen 2007-03-17 5:26 ` Jeffrey R. Carter 2007-03-17 17:22 ` Robert A Duff 2007-03-17 17:52 ` Jeffrey R. Carter 2007-03-17 23:06 ` Randy Brukardt 2007-03-18 17:57 ` Robert A Duff 2007-03-19 21:49 ` Randy Brukardt 2007-03-20 0:55 ` Jeffrey R. Carter 2007-03-20 1:36 ` Randy Brukardt 2007-03-20 16:32 ` Jeffrey R. Carter 2007-03-20 17:51 ` Randy Brukardt 2007-03-21 0:10 ` Jeffrey R. Carter 2007-03-26 23:38 ` Robert A Duff 2007-03-26 23:24 ` Robert A Duff 2007-03-17 10:25 ` Dmitry A. Kazakov 2007-03-18 17:15 ` Arguments for single-mutex-exclusion on protected types Jacob Sparre Andersen 2007-03-18 18:50 ` Dmitry A. Kazakov 2007-03-20 12:38 ` Florian Weimer 2007-03-07 3:58 ` Does Ada tasking profit from multi-core cpus? Steve 2007-03-07 8:39 ` Dmitry A. Kazakov 2007-03-08 5:21 ` Randy Brukardt 2007-03-08 10:15 ` Dmitry A. Kazakov 2007-03-08 21:18 ` accuracy (was: Does Ada tasking profit from multi-core cpus?) Björn Persson 2007-03-09 8:33 ` accuracy Dmitry A. Kazakov 2007-03-10 1:39 ` accuracy Randy Brukardt 2007-03-10 9:11 ` accuracy Dmitry A. Kazakov 2007-03-11 3:03 ` accuracy Randy Brukardt 2007-03-11 5:21 ` accuracy tmoran 2007-03-11 8:52 ` accuracy Dmitry A. Kazakov 2007-03-11 13:57 ` accuracy Pascal Obry 2007-03-11 14:16 ` accuracy Dmitry A. Kazakov 2007-03-11 14:37 ` accuracy Pascal Obry 2007-03-11 15:50 ` accuracy Dmitry A. Kazakov 2007-03-11 17:38 ` accuracy Pascal Obry 2007-03-11 18:48 ` accuracy Dmitry A. Kazakov 2007-03-12 20:20 ` accuracy Randy Brukardt 2007-03-13 9:33 ` accuracy Dmitry A. Kazakov 2007-03-10 14:53 ` accuracy Stephen Leake 2007-03-10 18:36 ` accuracy Cesar Rabak 2007-03-05 18:46 ` Does Ada tasking profit from multi-core cpus? Jeffrey R. Carter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox