comp.lang.ada
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: Real tasking problems with Ada.
  2017-07-25 23:19  5% Real tasking problems with Ada Robert Eachus
  2017-07-26 19:42  7% ` sbelmont700
@ 2017-08-01  4:41  0% ` Randy Brukardt
  1 sibling, 0 replies; 5+ results
From: Randy Brukardt @ 2017-08-01  4:41 UTC (permalink / raw)


"Robert Eachus" <rieachus@comcast.net> wrote in message 
news:9e51f87c-3b54-4d09-b9ca-e3c6a6e8940a@googlegroups.com...
> First, it should be possible to assign tasks in arrays to CPUs.

Use a discriminant of type CPU and (in Ada 2020) an 
iterated_component_association. (This was suggested way back in Ada 9x, left 
out in the infamous "scope reduction", and then forgotten about until 2013. 
See AI12-0061-1 or the draft RM 
http://www.ada-auth.org/standards/2xaarm/html/AA-4-3-3.html).

...
>1) Add a function Current_CPU or whatever (to System.Multiprocessors) that
> returns the identity of the CPU this task is running on.  Obviously in a 
> rendezvous
> with a protected object, the function would return the ID of the caller.

Why do you say this? Ada doesn't require the task that calls a protected 
object to execute it (the execution can be handdled by the task that 
services the barriers - I don't know if any implementation actually does 
this, but the language rules are written to allow it).

>  Probably do the same thing in a rendezvous between two tasks for 
> consistency.
>  Note that Get_ID function in System.Multiprocessors.Dispatching_Domains
> does that but it requires adding three (unnecessary) packages (DD,
> Ada.Real_Time, and Ada.Task_Identification) to your context without really
> using anything there.

Say what? You're using Get_Id, so clearly you're using something there. 
Get_Id (like the rest of dispatching domains is likely to be expensive, so 
you don't want it dragged into all programs. (And CPU is effectively part of 
all programs.)

> 2) Allow a task to  its CPU assignment after it has started execution.  It
> is no big deal if a task starts on a different CPU than the one it will 
> spend
> the rest of its life on.  At a minimum Set_CPU(Current_CPU) or just
> Set_CPU should cause the task to be anchored to its current CPU core.
>  Note that again you can do this with Dispatching_Domains.

So the capability already exists, but you don't like having to with an extra 
package to use it? Have you lost your freaking mind? You want us to add 
operations that ALREADY EXIST to another package, with all of the 
compatibility problems that doing so would cause (especially for people that 
had withed and used Dispatching_Domains)? When there are lots of problems 
that can't be solved portably at all?

>Next, a huge problem.  I just had some code churn out garbage while I was 
>finding the
> "right" settings to get each chunk of work to have its own portion of an 
> array.  Don't tell
> me how to do this safely, if you do you are missing the point.

No, you're missing the point. Ada is about writing portable code. Nothing at 
the level of "cache lines" is EVER going to be portable in any way. Either 
one writes "safe" code and hopefully the compiler and runtime can take into 
account the characteristics of the target. (Perhaps parallel loop constructs 
will help with that.)

Or otherwise, one writes bleeding edge code that is not portable and not 
safe. And you're on your own in such a case; no programming language could 
possibly help you.

>A function Cache_Line_Size in System or System.Multiprocessors seems right.

No,  it doesn't. It assumes a particular memory organization, and one thing 
that's pretty clear is that whatever memory organization is common now will 
not be common in a bunch of years. Besides, so many systems have multiple 
layers of caches, that a single result won't be enough. And there is no way 
for a general implementation to find this out (neither CPUs nor kernels 
describe such information).

>Is adding these features to Ada worth the effort?

No way. They're much too low level, and they actually aren't enough to allow 
parallelization. You want a language which allows fine-grained parallelism 
from the start (like Parasail); trying to retrofit that on Ada (which is 
mainly sequential, only having coarse parallelism) just will make a mess. 
You might get a few problems solved (those using actual arrays, as opposed 
to containers or user-defined types -- which one hopes are far more common 
in today's programs), but there is nothing general, nor anything that fits 
into Ada's building block approach, at the level that you're discussing.

                             Randy.



^ permalink raw reply	[relevance 0%]

* Re: Real tasking problems with Ada.
  2017-07-26 19:42  7% ` sbelmont700
@ 2017-07-27  2:00  0%   ` Robert Eachus
  0 siblings, 0 replies; 5+ results
From: Robert Eachus @ 2017-07-27  2:00 UTC (permalink / raw)


On Wednesday, July 26, 2017 at 3:42:57 PM UTC-4, sbelm...@gmail.com wrote:
> On Tuesday, July 25, 2017 at 7:19:59 PM UTC-4, Robert Eachus wrote:
> >    1) Add a function Current_CPU or whatever (to System.Multiprocessors) that returns the identity of the CPU this task is running on.  
> > 
> >    2) Allow a task to  its CPU assignment after it has started execution. 
> > 
> 
> Are these not exactly what System.Multiprocessors.Dispatching_Domains.Get_CPU and Set_CPU do?

Short answer, not exactly.  Yes, if I had posted the code I'm working on--probably sometime next week--you would have seen me using just that.  But the operations from Dispatching_Domains are pretty heavyweight--even if you ignore bringing in the extra packages.  What I would like are very lightweight operations.  Bringing in Ada.Real_Time and Ada.Task_Identification for default parameters which are never used would be bad enough, the problem is the program ends up checking the values passed.  So a call to Set_CPU(ID); is really a call to: Set_CPU(ID, Ada.Task_Identification.Current_Task), which can't be done before a task has an ID assigned.

If on a particular implementation, they are exactly the same, then all I am asking for is some declarative sugar which requires four lines of source added to Ada.Multitasking.  But what I really want is the ability to start a task on the processor core it will stay on.  Ah, you say, I can set the aspect CPU.  Well, not really.

I can't determine how many CPU cores are available until run-time.  That means if I want to generate tasks on a per CPU core basis, I can/must create them at run-time.  But there is no way for me to set the CPU aspect when I have an array of (identical) tasks.  I can set the aspect for tasks named Tom, Dick, and Harry like in the examples, but if I declare: Worker_Tasks: array(1..Number_of_CPUs), I can't set the CPU other than by an entry, which serializes the tasks when there are lots of them.  Remember, some of those tasks, on some hardware, will need to run on a chip in a socket over there somewhere.

It's just making things harder for non-real-time programmers for no reason.  And when you see the results from the work I am doing, you will be either astonished or appalled.


^ permalink raw reply	[relevance 0%]

* Re: Real tasking problems with Ada.
  2017-07-25 23:19  5% Real tasking problems with Ada Robert Eachus
@ 2017-07-26 19:42  7% ` sbelmont700
  2017-07-27  2:00  0%   ` Robert Eachus
  2017-08-01  4:41  0% ` Randy Brukardt
  1 sibling, 1 reply; 5+ results
From: sbelmont700 @ 2017-07-26 19:42 UTC (permalink / raw)


On Tuesday, July 25, 2017 at 7:19:59 PM UTC-4, Robert Eachus wrote:
>    1) Add a function Current_CPU or whatever (to System.Multiprocessors) that returns the identity of the CPU this task is running on.  
> 
>    2) Allow a task to  its CPU assignment after it has started execution. 
> 

Are these not exactly what System.Multiprocessors.Dispatching_Domains.Get_CPU and Set_CPU do?


^ permalink raw reply	[relevance 7%]

* Real tasking problems with Ada.
@ 2017-07-25 23:19  5% Robert Eachus
  2017-07-26 19:42  7% ` sbelmont700
  2017-08-01  4:41  0% ` Randy Brukardt
  0 siblings, 2 replies; 5+ results
From: Robert Eachus @ 2017-07-25 23:19 UTC (permalink / raw)


This may come across as a rant.  I've tried to keep it down, but a little bit of ranting is probably appropriate.  Some of these problems could have been fixed earlier, but there are some which are only coming of age with new CPU and GPU designs, notably the Zen family of CPUs from AMD.

Let me first explain where I am coming from (at least in this thread).  I want to write code that takes full advantage of the features available to run code as fast as possible.  In particular I'd like to get the time to run some embarrassingly parallel routines in less than twice the total CPU time of a single CPU.  (In other words, a problem that takes T seconds (wall clock) on a single CPU should run in less than 2T/N wall clock time on N processors. Oh, and it shouldn't generate garbage.  I'm working on a test program and it did generate garbage once when I didn't guess some of the system parameters right. 

So what needs to be fixed.  First, it should be possible to assign tasks in arrays to CPUs.  With a half dozen CPU cores the current facilities are irksome.  Oh, and when doing the assignment it would be nice to ask for the facilities you need, rather than have per manufacturer's family code.  Just to start with, AMD has some processors which share floating-point units, so you want to run code on alternate CPU cores on those machines--if the tasks make heavy use of floating point.

Intel makes some processors with Hyperthreading, and some without, even within the same processor family.  Hyperthreading does let you get some extra performance out if you know what you are doing, but much of the time you will want the Hyperthreading alternate to do background and OS processing while allocating your heavy hitting work threads to the main threads.

Now look at AMD's Zen family.  Almost all available today have two threads per core like Intel's Hyperthreading, but these are much more like twins.  With random loads, each thread will do about the same amount of work.  However, if you know what you are doing, you can write code which usefully hogs all of the core's resources if it can.  Back to running on alternate cores...

I know that task array types were considered in Ada 9X.  I don't know what happened to them.  But even without them, two huge improvements would be:

   1) Add a function Current_CPU or whatever (to System.Multiprocessors) that returns the identity of the CPU this task is running on.  Obviously in a rendezvous with a protected object, the function would return the ID of the caller.  Probably do the same thing in a rendezvous between two tasks for consistency.  Note that Get_ID function in System.Multiprocessors.Dispatching_Domains does that but it requires adding three (unnecessary) packages (DD, Ada.Real_Time, and Ada.Task_Identification) to your context without really using anything there. 

   2) Allow a task to  its CPU assignment after it has started execution.  It is no big deal if a task starts on a different CPU than the one it will spend the rest of its life on.  At a minimum Set_CPU(Current_CPU) or just Set_CPU should cause the task to be anchored to its current CPU core.  Note that again you can do this with Dispatching_Domains.

   Stretch goal:  Make it possible to assign tasks to a specific pair of threads. In theory Dispatching_Domains does this, but the environment task messes things up a bit.  You need to leave the partner of the environment task's CPU core in the default dispatching domain.  The problem is that there is no guarantee that the environment task is running on CPU 1 (or CPU 0, the way the hardware numbers them).

Next, a huge problem.  I just had some code churn out garbage while I was finding the "right" settings to get each chunk of work to have its own portion of an array.  Don't tell me how to do this safely, if you do you are missing the point.  If each cache line is only written to by one task, that should be safe.  But to do that I need to determine the size of the cache lines, and how to force the compiler to allocate the data in the array beginning on a cache line boundary.  The second part is not hard, except that the compiler may not support alignment clauses that large.  The first?  A function Cache_Line_Size in System or System.Multiprocessors seems right.  Whether it is in bits or storage_units is no big deal.  Why a function not a constant?  The future looks like a mix of CPUs and GPUs all running parts of the same program.

Finally, caches and NUMA galore.  I mentioned AMD's Zen above.  Right now there are three Zen families with very different system architectures.  In fact, the difference between single and dual socket Epyc makes it four, and the Ryzen 3 and Ryzen APUs when released?  At least one more.  What's the big deal?  Take Threadripper to begin with.  Your choice of 12 or 16 cores each supporting two threads.  But the cache hierarchy is complex.  Each CPU core has two threads and its own L1 and L2 caches.  Then 3 or 4 cores, depending on the model share the same 8 Meg L3 cache. The four blocks of CPU cores and caches are actually split between two different chips.  Doesn't affect the cache timings much, but half the memory is attached to one chip, half to the other.  The memory loads and stores, if to the other chip, compete with L3 and cache probe traffic.  Let's condense that to this: 2 (threads)/(3 or 4) cores/2 (NUMA pairs)/2(chips)/1 (socket).  A Ryzen 7 chip is 2/4/2/1/1, Ryzen 5 is 2/(3 or 2)/2/1/1, Ryzen 3 1/2/2/1/1.  Epyc comes in at 2/(3 or 4)/2/4/2 among other flavors.

Writing a package to recognize these models and sort out for executable programs is going to be a non-trivial exercise--at least if I try to keep it current.  But how to convey the information to the software which is going to try to saturate the system?  No point in creating tasks which won't have a CPU core of their own (or half a core, or what you count Hyperthreading as).  Given the size of some of these systems, even without the HPC environment, it may be better for a program to split the work between chips or boxes.

Is adding these features to Ada worth the effort?  Sure.  Let me give you a very realistic example.  Running on processor cores which share L3 cache may be worthwhile.  Actually with Zen, the difference is that a program that stays on one L3 cache will actually save a lot of time on L2 probes.  (The line you need is in L2 on another CPU core.  Moving it to your core will take less time, and more importantly less latency than moving it from another CPU cluster.)  So we go to write our code.  On Ryzen 7 we want to run on cores 1, 3, 5, and 7 or 9, 11, 13 and 15, or 2, 4, 6, 8 or...  Actually I could choose 1,4,6, and 7, any set of one core from each pair staying within the module (of eight threads).  Move to a low-end Ryzen 3, and I get almost the same performance by choosing all the available cores: 1, 2, 3, and 4.  What about Ryzen 5 1600 and 1600X?  Is it going to be better to run on 3 cores and one L3 cache or 4 cores spread across two caches?    Or maybe choose all six cores on one L3 cache?  Argh!

Is this problem real?  Just took a program from  7.028 seconds on six cores, to 2.229 seconds on (the correct) three cores.  I'll post the program, or put it on-line somewhere once I've confined the memory corruption to very small examples--so you can see which machines do it--and do a bit more cleanup and optimization.

^ permalink raw reply	[relevance 5%]

* Re: assigning priority to task
  @ 2014-09-24  5:32  6%     ` Jeffrey Carter
  0 siblings, 0 replies; 5+ results
From: Jeffrey Carter @ 2014-09-24  5:32 UTC (permalink / raw)


On 09/23/2014 08:39 PM, Stribor40 wrote:
> Also my box has 2 cores. Is there way in ada to see which core is running which task?

On many modern processors, each core can execute 2 tasks at the same time
(effectively). So it may be possible for all 4 tasks in your example to run at
the same time.

An Ada-12 compiler that implements Annex D has pkg
System.Multiprocessors.Dispatching_Domains (ARM D.16.1) that lets you determine
the CPU that a task is running on, as well as aspect CPU to force a task to run
on a specific CPU. Compilers for earlier versions of Ada may have
vendor-specific ways to do similar things.

-- 
Jeff Carter
"All citizens will be required to change their underwear
every half hour. Underwear will be worn on the outside,
so we can check."
Bananas
29


^ permalink raw reply	[relevance 6%]

Results 1-5 of 5 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2014-09-24  1:50     assigning priority to task Stribor40
2014-09-24  3:31     ` Jeffrey Carter
2014-09-24  3:39       ` Stribor40
2014-09-24  5:32  6%     ` Jeffrey Carter
2017-07-25 23:19  5% Real tasking problems with Ada Robert Eachus
2017-07-26 19:42  7% ` sbelmont700
2017-07-27  2:00  0%   ` Robert Eachus
2017-08-01  4:41  0% ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox