Tasking performance between Ada83 and Ada95

comp.lang.ada
 help / color / mirror / Atom feed

* Tasking performance between Ada83 and Ada95
@ 1997-06-06  0:00 Mike Rose
  1997-06-07  0:00 ` jim hopper
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Mike Rose @ 1997-06-06  0:00 UTC (permalink / raw)



I am checking the performance between Ada83 and Ada95 using the Tasking
Benchmarks written by Thomas Burger from the PAL.

The Compilers I'm using are for Ada95 -  GNAT v3.07 and for
Ada83 - Alsys Adaworld v5.5.4.  The operating system is HP UX v10.10.

Each test was run with the creation of 500 tasks.

In comparing the results between the two compilers, I found that the tasking
performance is much slower with GNAT than with Alsys, every test was at least
10 times slower and some were much more.

Our software depends heavily on tasking.  Is there any way to improve the
tasking performance with GNAT ? 

-- 

 
 -------------------------------------------------------------------------------
 Mike Rose
 NSWC/DD  
 Phone: 540-653-4753
 Email: mrose@nswc.navy.mil

 Disclaimer: The preceeding message was brought to you via myself and
             in no way reflect the ideas or wishes of the U.S. Navy or
             the DOD in any way.
 -------------------------------------------------------------------------------





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-06  0:00 Tasking performance between Ada83 and Ada95 Mike Rose
  1997-06-07  0:00 ` jim hopper
  1997-06-07  0:00 ` Robert A Duff
@ 1997-06-07  0:00 ` Robert Dewar
  1997-06-08  0:00   ` Edmond Walsh
  2 siblings, 1 reply; 20+ messages in thread
From: Robert Dewar @ 1997-06-07  0:00 UTC (permalink / raw)



Mike Rose says

<<In comparing the results between the two compilers, I found that the tasking
performance is much slower with GNAT than with Alsys, every test was at least
10 times slower and some were much more.
 
Our software depends heavily on tasking.  Is there any way to improve the
tasking performance with GNAT ? 
 
>>

You have to be careful to know exactly what you are comparing. In partciular, 
there is no question that Ada 95 does impose some additional semantic
constraints and features (e.g. requeue and ATC) that result in distributed
implementation costs. Often for example, the proper comparison is between
a task in Ada 83 and a protected type in Ada 95.

You also have to decide what features you are testing carefully. In looking
at, for example, the timings on SGI between VADS and GNAT on the PIWG tasking
benchmarks, we certainly do not see a factor of 10 difference in performance,
and in some comparative benchmarks on timing performance for tasking, we see
GNAT running faster than an Ada 83 compiler doing similar things.

The other point is that you have to be very careful that you are in fact
looking at comparable situations. For example, comparing a GNAT compiler
where tasks are mapped to operating systems threads, with an Ada 83
o

compiler where the tasking maps to a single processing and is handled in
user mode is of course a completely meaningless comparison.

One answer to your question if you are making this kind of apples/oranges
comparison is to use a similar kernel (e.g. FSU threads on GNAT). We find
on many targets that the use of FSU threads is MUCH more efficient than the
use of operating systems tasks.

A more focussed reply is possible if you tell us exactly what is being
compared (what machines, what compilers, what thread packages).

Robert Dewar
Ada Core Technologies





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-06  0:00 Tasking performance between Ada83 and Ada95 Mike Rose
  1997-06-07  0:00 ` jim hopper
@ 1997-06-07  0:00 ` Robert A Duff
  1997-06-08  0:00   ` Robert Dewar
  1997-06-07  0:00 ` Robert Dewar
  2 siblings, 1 reply; 20+ messages in thread
From: Robert A Duff @ 1997-06-07  0:00 UTC (permalink / raw)

In article <1997Jun6.115223.7384@relay.nswc.navy.mil>,
Mike Rose <user@machine.domain> wrote:
>In comparing the results between the two compilers, I found that the tasking
>performance is much slower with GNAT than with Alsys, every test was at least
>10 times slower and some were much more.
>
>Our software depends heavily on tasking.  Is there any way to improve the
>tasking performance with GNAT ? 

Probably the problem is that GNAT uses the tasking (threads) of the
underlying OS, and what you're measuring is the poor performance of
those underlying systems.  Often, the issue is that you're doing system
calls to lock things and so forth -- and system calls take a l-o-o-o-n-g
time.  I don't think it's all that hard to make it fast -- change the
RTS to do all its own task dispatching and whatnot, and never never do a
system call in time-critical operations.  But then you lose something:
you have to jump through hoops to do asynchronous I/O, for example.
Also, the O/S doesn't know about your tasks, so it won't know how to
schedule your tasks in relation to tasks (threads) in other unrelated
programs.  If a page fault happens, and you're not using OS threads,
your whole program will wait for disk I/O.  A real-time program might
not care about these things, however.

- Bob

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-06  0:00 Tasking performance between Ada83 and Ada95 Mike Rose
@ 1997-06-07  0:00 ` jim hopper
  1997-06-07  0:00 ` Robert A Duff
  1997-06-07  0:00 ` Robert Dewar
  2 siblings, 0 replies; 20+ messages in thread
From: jim hopper @ 1997-06-07  0:00 UTC (permalink / raw)



In article <1997Jun6.115223.7384@relay.nswc.navy.mil>
mrose@nswc.navy.mil (Mike Rose) writes:

> I am checking the performance between Ada83 and Ada95 using the Tasking
> Benchmarks written by Thomas Burger from the PAL.
> 
> The Compilers I'm using are for Ada95 -  GNAT v3.07 and for
> Ada83 - Alsys Adaworld v5.5.4.  The operating system is HP UX v10.10.
> 
> Each test was run with the creation of 500 tasks.
> 
> In comparing the results between the two compilers, I found that the tasking
> performance is much slower with GNAT than with Alsys, every test was at least
> 10 times slower and some were much more.
> 
> Our software depends heavily on tasking.  Is there any way to improve the
> tasking performance with GNAT ? 




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-07  0:00 ` Robert Dewar
@ 1997-06-08  0:00   ` Edmond Walsh
  1997-06-09  0:00     ` Robert Dewar
  0 siblings, 1 reply; 20+ messages in thread
From: Edmond Walsh @ 1997-06-08  0:00 UTC (permalink / raw)



In article <dewar.865693453@merv>, Robert Dewar <dewar@merv.cs.nyu.edu>
writes
>Mike Rose says
>
><<In comparing the results between the two compilers, I found that the tasking
>performance is much slower with GNAT than with Alsys, every test was at least
>10 times slower and some were much more.
> 
>Our software depends heavily on tasking.  Is there any way to improve the
>tasking performance with GNAT ? 
> 
>>>
>
>You have to be careful to know exactly what you are comparing. In partciular, 
>there is no question that Ada 95 does impose some additional semantic
>constraints and features (e.g. requeue and ATC) that result in distributed
>implementation costs. Often for example, the proper comparison is between
>a task in Ada 83 and a protected type in Ada 95.
>
>You also have to decide what features you are testing carefully. In looking
>at, for example, the timings on SGI between VADS and GNAT on the PIWG tasking
>benchmarks, we certainly do not see a factor of 10 difference in performance,
>and in some comparative benchmarks on timing performance for tasking, we see
>GNAT running faster than an Ada 83 compiler doing similar things.
>
>The other point is that you have to be very careful that you are in fact
>looking at comparable situations. For example, comparing a GNAT compiler
>where tasks are mapped to operating systems threads, with an Ada 83
>o
>
>compiler where the tasking maps to a single processing and is handled in
>user mode is of course a completely meaningless comparison.
>
>One answer to your question if you are making this kind of apples/oranges
>comparison is to use a similar kernel (e.g. FSU threads on GNAT). We find
>on many targets that the use of FSU threads is MUCH more efficient than the
>use of operating systems tasks.
>
>A more focussed reply is possible if you tell us exactly what is being
>compared (what machines, what compilers, what thread packages).
>
>Robert Dewar
>Ada Core Technologies
>
We had a similar problem when moving some Ada 83 code running on a
Harris NightHawk to Ada 95 (Gnat) on an SG.  It took a lot of effort to
get the code running reasonably on the SG.  The underlying problem was
the mapping of the Ada Tasks to Unix threads.  The Harris (now
Concurrent) system was very good, reflecting the Real Time background of
Harris. (I was not involved in the porting, I was just an interested
observer.)
-- 
Edmond Walsh




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-07  0:00 ` Robert A Duff
@ 1997-06-08  0:00   ` Robert Dewar
  1997-06-10  0:00     ` Jon S Anthony
  1997-06-10  0:00     ` PascMartin
  0 siblings, 2 replies; 20+ messages in thread
From: Robert Dewar @ 1997-06-08  0:00 UTC (permalink / raw)



Bob Duff said

<<Probably the problem is that GNAT uses the tasking (threads) of the
underlying OS, and what you're measuring is the poor performance of
those underlying systems.  Often, the issue is that you're doing system
calls to lock things and so forth -- and system calls take a l-o-o-o-n-g
time.  I don't think it's all that hard to make it fast -- change the
RTS to do all its own task dispatching and whatnot, and never never do a
system call in time-critical operations.  But then you lose something:
you have to jump through hoops to do asynchronous I/O, for example.
Also, the O/S doesn't know about your tasks, so it won't know how to
schedule your tasks in relation to tasks (threads) in other unrelated
programs.  If a page fault happens, and you're not using OS threads,
your whole program will wait for disk I/O.  A real-time program might
not care about these things, however.>>


That's my guess too. I very much doubt that the Alsys World compiler on HPUX
used system level threads, and I also guess that the
version of GNAT on HPUX is probably using DCE threads.
So the comparison is completely meaningless.

What we are doing with several of the new distributions of GNAT is to provide
a mechanism for selecting system level threads or our own thread package.
The system level threads give full system concurrency and allow true
multi-processing on an MP, but tend to be a lot slower, and also often
less accurate with respect to Annex D requirements. Our own threads package
(typically FSU threads), does not give full system concurrency, but is often
much more efficient, and also more accurate with respect to Annex D
requirements. This gives the best of both worlds, leaving the choice
up to the user.

Robert Dewar
Ada Core Technologies






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-08  0:00   ` Edmond Walsh
@ 1997-06-09  0:00     ` Robert Dewar
  1997-06-15  0:00       ` Edmond Walsh
  0 siblings, 1 reply; 20+ messages in thread
From: Robert Dewar @ 1997-06-09  0:00 UTC (permalink / raw)



Edmond Walsh said

<<We had a similar problem when moving some Ada 83 code running on a
Harris NightHawk to Ada 95 (Gnat) on an SG.  It took a lot of effort to
get the code running reasonably on the SG.  The underlying problem was
the mapping of the Ada Tasks to Unix threads.  The Harris (now
Concurrent) system was very good, reflecting the Real Time background of
Harris. (I was not involved in the porting, I was just an interested
observer.)>>

Of course, the mapping of Ada tasks to Unix threads is certainly a *good
thing* if you need to take advantage of the flexibility of this mapping.
For example, if you are using one of SGI's high end MP's, then you 
definitely want this mapping. But there certainly is an efficiency
penalty to be paid. 

Actualy from your post it is not quite clear what exactly you are referring
to in "lot of effort" and "underlying problem" here. Were there problems
other than efficiency? Sometimes, especially in Ada 83 programs, where the
dispatching semantics were not defined, programs make assumptions about the
dispatching that are non-portable. This is avoided in Ada 95 if you are using
a compiler that implements full Annex D semantics (true of the SGI compiler
for example), but that does not necessarily help the porting of legacy code.

Robert dewar
Ada Core Technologies





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-08  0:00   ` Robert Dewar
@ 1997-06-10  0:00     ` Jon S Anthony
  1997-06-10  0:00     ` PascMartin
  1 sibling, 0 replies; 20+ messages in thread
From: Jon S Anthony @ 1997-06-10  0:00 UTC (permalink / raw)



In article <dewar.865813228@merv> dewar@merv.cs.nyu.edu (Robert Dewar) writes:

> What we are doing with several of the new distributions of GNAT is to provide
> a mechanism for selecting system level threads or our own thread package.
> The system level threads give full system concurrency and allow true
> multi-processing on an MP, but tend to be a lot slower, and also often
> less accurate with respect to Annex D requirements. Our own threads package
> (typically FSU threads), does not give full system concurrency, but is often
> much more efficient, and also more accurate with respect to Annex D
> requirements. This gives the best of both worlds, leaving the choice
> up to the user.

That's really excellent!  Thanks,

/Jon
-- 
Jon Anthony
Organon Motives, Inc.
Belmont, MA 02178
617.484.3383
jsa@organon.com





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-10  0:00     ` PascMartin
@ 1997-06-10  0:00       ` Robert Dewar
  0 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1997-06-10  0:00 UTC (permalink / raw)



Pascal says

<<So far, considering respective features of these two runtimes, I see no
difference. Both are switching tasks at the same user level. Both will
have
the same problems regarding blocking IOs, both will never take benefit of
multiprocessing, etc..
 >>

No, that's not the case, there is considerable extra overhead in the DCE
case. 

I think the closest comparison would be with FSU threads optimized for
no ATC, and no this version is not yet publicly released. 

But actually we find most people prefer the version of tasking that links
directly to system threads, even if it is less efficient.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-08  0:00   ` Robert Dewar
  1997-06-10  0:00     ` Jon S Anthony
@ 1997-06-10  0:00     ` PascMartin
  1997-06-10  0:00       ` Robert Dewar
  1 sibling, 1 reply; 20+ messages in thread
From: PascMartin @ 1997-06-10  0:00 UTC (permalink / raw)



Robert Dewar wrote:

"That's my guess too. I very much doubt that the Alsys World compiler on
  HPUX used system level threads, and I also guess that the
  version of GNAT on HPUX is probably using DCE threads.
  So the comparison is completely meaningless."

I don't agree. Let me elaborate. First, a few comments:

 - to my knowledge there is _no_ thread support in the HP-UX kernel (9
months
   ago..).
 - the DCE thread library is an user-level thread simulation (same
comment).
 - the AdaWorld product use "proprietary" user-level thread simulation,
   built into the Ada runtime, and (obsessively) optimized for it.

So far, considering respective features of these two runtimes, I see no
difference. Both are switching tasks at the same user level. Both will
have
the same problems regarding blocking IOs, both will never take benefit of
multiprocessing, etc..

Back to the point, can you compare a bicycle and a Ford Mustang?. No,
of course.. Robert you should practice bicycle more often, for me I can
tell
the difference when I use one or the other :-) (BTW, bicycle enhance
breathing).
What I want to demonstrate is: if the benchmark reflects the user's need,
then
the benchmark is good. It does not matter what products are compared, as
all Ada compilers implement the same thing: the Ada language. Don't they ?

If there is a better runtime for GNAT on HP-UX, this is a good time to
disclose
it. If there is none, the comparison seems valid to me.

It is true Ada95 ATC is a major constraint for runtime developpers. As
robustness is critical, especially in the first releases of a product, I
expect
most Ada95 runtimes to take a hit compare to Ada83 ones (1). Beside that,
ATC is so complicated that most (sane) people will probably avoid using
it.
My conclusion is: some Ada95 new features could have been left on
the side; they are appealing to a minority, and appaling to the others.

To conclude, if some program has to be developped for the real world, and
Ada83 is good enough, why not selecting AdaWorld?. When Ada95 (and
GNAT) will have been well optimized, or when HP will have released the
1 GHz HPPA workstation (which ever come first), it will be time to switch.

BTW, I must disclose that I have been working in Alsys for more years that
I want to admit..

(1) except for some implementation details. The ObjectAda runtime includes
a secondary stack design that is refreshingly efficient. Tucker Taft brain
child..

Pascal.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-09  0:00     ` Robert Dewar
@ 1997-06-15  0:00       ` Edmond Walsh
  1997-06-15  0:00         ` Robert Dewar
  0 siblings, 1 reply; 20+ messages in thread
From: Edmond Walsh @ 1997-06-15  0:00 UTC (permalink / raw)



In article <dewar.865870961@merv>, Robert Dewar <dewar@merv.cs.nyu.edu>
writes
>Edmond Walsh said
>
><<We had a similar problem when moving some Ada 83 code running on a
>Harris NightHawk to Ada 95 (Gnat) on an SG.  It took a lot of effort to
>get the code running reasonably on the SG.  The underlying problem was
>the mapping of the Ada Tasks to Unix threads.  The Harris (now
>Concurrent) system was very good, reflecting the Real Time background of
>Harris. (I was not involved in the porting, I was just an interested
>observer.)>>
>
>Of course, the mapping of Ada tasks to Unix threads is certainly a *good
>thing* if you need to take advantage of the flexibility of this mapping.
>For example, if you are using one of SGI's high end MP's, then you 
>definitely want this mapping. But there certainly is an efficiency
>penalty to be paid. 
>
>Actualy from your post it is not quite clear what exactly you are referring
>to in "lot of effort" and "underlying problem" here. Were there problems
>other than efficiency? Sometimes, especially in Ada 83 programs, where the
>dispatching semantics were not defined, programs make assumptions about the
>dispatching that are non-portable. This is avoided in Ada 95 if you are using
>a compiler that implements full Annex D semantics (true of the SGI compiler
>for example), but that does not necessarily help the porting of legacy code.
>
>Robert dewar
>Ada Core Technologies
>
Efficiency was a significant problem.  The program ran correctly in Ada
95 on the SG Indy with each task being executed as a seperate unix
process.  However this ran rather slowly because of the large overheads
in context switching unix processes.  In the original Nighthawk '83
version the two main components (consisting of many tasks) each ran as a
seperate unix process and the individual tasks in the components were
controlled by the Ada run time executive.  The blocking of one task due
to inter (unix) process comunications did not block the entire
component.  When this scheme was ported to the SG Indy it was found that
the blocking of a task due to inter (unix) process communications did
block the entire component.  Working around this problem to achieve
reasonable run time and correct operation was what caused the trouble.  
-- 
Edmond Walsh




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-15  0:00       ` Edmond Walsh
@ 1997-06-15  0:00         ` Robert Dewar
  1997-06-15  0:00           ` Tom Moran
                             ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Robert Dewar @ 1997-06-15  0:00 UTC (permalink / raw)



Edmond Walsh said

<<Efficiency was a significant problem.  The program ran correctly in Ada
95 on the SG Indy with each task being executed as a seperate unix
process.  However this ran rather slowly because of the large overheads
in context switching unix processes.  In the original Nighthawk '83
version the two main components (consisting of many tasks) each ran as a
seperate unix process and the individual tasks in the components were
controlled by the Ada run time executive.  The blocking of one task due
to inter (unix) process comunications did not block the entire
component.  When this scheme was ported to the SG Indy it was found that
the blocking of a task due to inter (unix) process communications did
block the entire component.  Working around this problem to achieve
reasonable run time and correct operation was what caused the trouble.  
>>


You mean separate unix thread, rather than separate unix process I think.
At least there is certainly no need to use separate processes for each
task. Nevertheless that can indeed introduce extra overhead.

In the Ada 83 world on monoprocessors, the use of a special Ada exec for
task switching on top of an OS often made sense, and this is exactly what
we get when we port the FSU threads. The advantage of the FSU threads is
high efficiency and exact semantic accuracy.

However, these days, more and more work is done on multi-processors, and
then of course you have no choice if you want to distribute tasks across
processors other than to use the system level threads. Furthermore, the
efficiency hit from operating these threads on separate processors may
indeed be significant.

No obvious solution here ... that's why the best approach seems to be to
provide a choice of threads libraries on machines where it makes sense.
In version 3.10 of GNAT, we we providing that choice for Solaris and
for Linux. We may do it for additional ports as we go along. So far we
did not port FSU threads to SGI (one of the motives for doing so, accuracy,
does not apply, since the SGI threads implementation is exactly correct
for Ada, one of the advantages of having the vendor have an interest and
stake in Ada!) But the efficiency issue might still apply.

Note that on the SGI implementation of GNAT, there are controls ovre how
tasks are distributed among processes, and it is worth while tuning these
right, especially when using a multi-processor machine





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-15  0:00         ` Robert Dewar
@ 1997-06-15  0:00           ` Tom Moran
  1997-06-16  0:00           ` Robert A Duff
  1997-06-22  0:00           ` Geert Bosch
  2 siblings, 0 replies; 20+ messages in thread
From: Tom Moran @ 1997-06-15  0:00 UTC (permalink / raw)



> Note that on the SGI implementation of GNAT, there are controls ovre how
> tasks are distributed among processes, and it is worth while tuning these
> right
  Some years ago at a local (SF Bay Area) SIGADA meeting the speaker
discussed a system where you could map N Ada tasks to M threads. It was
probably on a Sun, but I could misremember.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-15  0:00         ` Robert Dewar
  1997-06-15  0:00           ` Tom Moran
@ 1997-06-16  0:00           ` Robert A Duff
  1997-06-17  0:00             ` Robert Dewar
  1997-06-22  0:00           ` Geert Bosch
  2 siblings, 1 reply; 20+ messages in thread
From: Robert A Duff @ 1997-06-16  0:00 UTC (permalink / raw)

In article <dewar.866376446@merv>, Robert Dewar <dewar@merv.cs.nyu.edu> wrote:
>However, these days, more and more work is done on multi-processors, and
>then of course you have no choice if you want to distribute tasks across
>processors other than to use the system level threads. Furthermore, the
>efficiency hit from operating these threads on separate processors may
>indeed be significant.

Are you saying it's impossible to write an Ada run-time system that does
parallelism without using operating system threads (on a Unix-like OS)?
That's not true -- I've done it.  I wrote an Ada 83 RTS that ran on a
parallel shared-memory machine where the OS had no threads support.  So
we created one Unix process per CPU (you need a way to know how many
CPUs there are).  Almost all memory in all the processes was mapped to
the same place.  And then the task scheduler decided which task to run
in which process according to the usual priority rules.  Of course,
non-blocking I/O required writing Text_IO (etc) in terms of system calls
that don't block, but interrupt when the I/O is done.

- Bob

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-16  0:00           ` Robert A Duff
@ 1997-06-17  0:00             ` Robert Dewar
  0 siblings, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1997-06-17  0:00 UTC (permalink / raw)



<<Are you saying it's impossible to write an Ada run-time system that does
parallelism without using operating system threads (on a Unix-like OS)?
That's not true -- I've done it.  I wrote an Ada 83 RTS that ran on a
parallel shared-memory machine where the OS had no threads support.  So
we created one Unix process per CPU (you need a way to know how many
CPUs there are).  Almost all memory in all the processes was mapped to
the same place.  And then the task scheduler decided which task to run
in which process according to the usual priority rules.  Of course,
non-blocking I/O required writing Text_IO (etc) in terms of system calls
that don't block, but interrupt when the I/O is done.>>

No one said that this is impossible. It is of course possible, but it does
not seem to be a particularly useful approach in practice on modern day
operating systems and hardware. Or put it another way, we have not found
even one customer interested in this kind of simulation. All our customers
using multi-processors definitely want Ada to use the underlying operating
systems threads, since this gives them all kinds of capabilities, like
assigning threads to processors, dealing with thread hierarchies, special
kinbds of inter-thread syncrhonization, debugging tools that know about the
threads, performance analyzers that know about the threads etc.






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-15  0:00         ` Robert Dewar
  1997-06-15  0:00           ` Tom Moran
  1997-06-16  0:00           ` Robert A Duff
@ 1997-06-22  0:00           ` Geert Bosch
  1997-06-23  0:00             ` Robert Dewar
  1997-06-23  0:00             ` Larry Kilgallen
  2 siblings, 2 replies; 20+ messages in thread
From: Geert Bosch @ 1997-06-22  0:00 UTC (permalink / raw)



Robert Dewar <dewar@merv.cs.nyu.edu> wrote:

   However, these days, more and more work is done on multi-processors,
   and then of course you have no choice if you want to distribute
   tasks across processors other than to use the system level
   threads. Furthermore, the efficiency hit from operating these
   threads on separate processors may indeed be significant.

IMO the best solution would be to start X system level threads and
implement a user-level threads package on top of it. Of course
there will be a little extra need for locking, but on platforms
suitable for multi-processing there exist CPU-instructions that
make the implementation of fast locks possible.

The interesting thing of course is that you can vary the number of
system threads to customize the task model to the application. When
N is the number of processors interesting values of X are:
  * 1 to limit the program to use 1 processor and no system context-switching
  * N, to achieve full multi-processing on a multi-processor
  * M, where M > N to simulate a multi-processor with M processors on
    a system with N processors (N = 1 for a uni-processor). 

This scheme could combine the advantages of user-level threads
(fast context switches, fast priority changes and correct Ada
semantics) with those of system-level threads (non-blocking
system-calls and multi-processing).

Regards,
   Geert




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-22  0:00           ` Geert Bosch
  1997-06-23  0:00             ` Robert Dewar
@ 1997-06-23  0:00             ` Larry Kilgallen
  1997-06-25  0:00               ` Fergus Henderson
  1 sibling, 1 reply; 20+ messages in thread
From: Larry Kilgallen @ 1997-06-23  0:00 UTC (permalink / raw)

In article <5oir0v$mgu$1@gonzo.sun3.iaf.nl>, Geert Bosch <geert@gonzo.sun3.iaf.nl> writes:

> IMO the best solution would be to start X system level threads and
> implement a user-level threads package on top of it. Of course
> there will be a little extra need for locking, but on platforms
> suitable for multi-processing there exist CPU-instructions that
> make the implementation of fast locks possible.

That is the method used by Alpha VMS.  The kernel thread primitives
in fact are not documented for public consumption.  The documented
interface is the DECthreads library (which has a couple different
APIs matching varying styles and standards).  DECthreads creates the
user-mode lightweight threads which then get scheduled onto some
number of kernel threads (typically numbering on the order of the
number of CPUs).

Larry Kilgallen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-22  0:00           ` Geert Bosch
@ 1997-06-23  0:00             ` Robert Dewar
  1997-06-23  0:00             ` Larry Kilgallen
  1 sibling, 0 replies; 20+ messages in thread
From: Robert Dewar @ 1997-06-23  0:00 UTC (permalink / raw)



iGeert Bosch said

<<This scheme could combine the advantages of user-level threads
(fast context switches, fast priority changes and correct Ada
semantics) with those of system-level threads (non-blocking
system-calls and multi-processing).
>>


Quite a reasonable scheme, and in fact alread implemented in some versions
of GNAT. For examle, on the SGI, you have these two levels of support of
threads at the system level, and you can distribute tasks among execution
vehicles (the new SGI terminology for such gizmos) as you wish.

Presumably, though I have not looked in detail, the threads and fibres of
NT give a similar capability.

It is better if the two kinds of threads are implemented at a common level
and not entirely independently, since that places the abstractions at the
right level, and makes sure that such issues as pririty are handled
consistently.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-23  0:00             ` Larry Kilgallen
@ 1997-06-25  0:00               ` Fergus Henderson
  1997-06-25  0:00                 ` Larry Kilgallen
  0 siblings, 1 reply; 20+ messages in thread
From: Fergus Henderson @ 1997-06-25  0:00 UTC (permalink / raw)

kilgallen@eisner.decus.org (Larry Kilgallen) writes:

>Geert Bosch <geert@gonzo.sun3.iaf.nl> writes:
>
>> IMO the best solution would be to start X system level threads and
>> implement a user-level threads package on top of it.
>
>That is the method used by Alpha VMS.  The kernel thread primitives
>in fact are not documented for public consumption.  The documented
>interface is the DECthreads library (which has a couple different
>APIs matching varying styles and standards).  DECthreads creates the
>user-mode lightweight threads which then get scheduled onto some
>number of kernel threads (typically numbering on the order of the
>number of CPUs).

What happens with blocking I/O?  Does DECthreads implement blocking
I/O operations using asynchronous kernel I/O operations, so that it 
will avoid blocking a whole kernel thread when all that really needs
to be blocked is a user thread?

If not, then choosing the right number of kernel threads may be a bit
tricky -- you want one for each CPU, to get multiprocessing, but
you also need one for every user-level thread that might get blocked
on I/O, so that you background compute threads don't get paused
when your foreground threads are doing blocking I/O.

--
Fergus Henderson <fjh@cs.mu.oz.au>   |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>   |  of excellence is a lethal habit"
PGP: finger fjh@128.250.37.3         |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Tasking performance between Ada83 and Ada95
  1997-06-25  0:00               ` Fergus Henderson
@ 1997-06-25  0:00                 ` Larry Kilgallen
  0 siblings, 0 replies; 20+ messages in thread
From: Larry Kilgallen @ 1997-06-25  0:00 UTC (permalink / raw)

n article <5oq70f$4f@mulga.cs.mu.OZ.AU>, fjh@mundook.cs.mu.OZ.AU (Fergus Henderson) writes:
> kilgallen@eisner.decus.org (Larry Kilgallen) writes:
> 
>>Geert Bosch <geert@gonzo.sun3.iaf.nl> writes:
>>
>>> IMO the best solution would be to start X system level threads and
>>> implement a user-level threads package on top of it.
>>
>>That is the method used by Alpha VMS.  The kernel thread primitives
>>in fact are not documented for public consumption.  The documented
>>interface is the DECthreads library (which has a couple different
>>APIs matching varying styles and standards).  DECthreads creates the
>>user-mode lightweight threads which then get scheduled onto some
>>number of kernel threads (typically numbering on the order of the
>>number of CPUs).
> 
> What happens with blocking I/O?  Does DECthreads implement blocking
> I/O operations using asynchronous kernel I/O operations, so that it 
> will avoid blocking a whole kernel thread when all that really needs
> to be blocked is a user thread?

Well DECthreads does not actually implement the I/O operations itself,
or else they might be tying the DECthreads I/O support to a particular
language, and they might make the wrong choice :-)

DECthreads has its hooks into the System Service Dispatcher of the
base OS so that DECthreads will get an "upcall" invocation when IO
is about to block.  DECthreads then has the opportunity to switch
which lightweight thread is active on a given kernel thread before
the kernel thread "really" stalls.  If DECthreads is smart enough to
choose a lightweight thread which is not blocked :-), there is no stall.

That "upcall" mechanism can even work when there is only a single
kernel thread.  In the future it might be enabled also on VAX,
where kernel threads are not available (moral equivalent of
having a single kernel thread), although that does not affect
DEC Ada directly, since on VAX DEC Ada does not use DECthreads.
On VAX, DEC Ada Pragma TIME_SLICE does a less timely job of
preventing stalled threads from blocking the whole process
by using a timer AST (timeliness under programmer control
for a price (overhead)).

Both the AST mechanism and the upcall mechanism only work when the
wait is in user mode, since that is the only case the state of the
process is well-known.  Most System Service waits are in user mode
these days, although those who used VMS V1 may remember programs
from which one could not escape via CTRL/Y or CTRL/C, often due
to System Services which waited in inner modes.

> If not, then choosing the right number of kernel threads may be a bit
> tricky -- you want one for each CPU, to get multiprocessing, but
> you also need one for every user-level thread that might get blocked
> on I/O, so that you background compute threads don't get paused
> when your foreground threads are doing blocking I/O.

But such an implementation would still get a marketing "checkoff"
for supporting "kernel threads" :-)

Larry Kilgallen

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~1997-06-25  0:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-06-06  0:00 Tasking performance between Ada83 and Ada95 Mike Rose
1997-06-07  0:00 ` jim hopper
1997-06-07  0:00 ` Robert A Duff
1997-06-08  0:00   ` Robert Dewar
1997-06-10  0:00     ` Jon S Anthony
1997-06-10  0:00     ` PascMartin
1997-06-10  0:00       ` Robert Dewar
1997-06-07  0:00 ` Robert Dewar
1997-06-08  0:00   ` Edmond Walsh
1997-06-09  0:00     ` Robert Dewar
1997-06-15  0:00       ` Edmond Walsh
1997-06-15  0:00         ` Robert Dewar
1997-06-15  0:00           ` Tom Moran
1997-06-16  0:00           ` Robert A Duff
1997-06-17  0:00             ` Robert Dewar
1997-06-22  0:00           ` Geert Bosch
1997-06-23  0:00             ` Robert Dewar
1997-06-23  0:00             ` Larry Kilgallen
1997-06-25  0:00               ` Fergus Henderson
1997-06-25  0:00                 ` Larry Kilgallen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox