comp.lang.ada
 help / color / mirror / Atom feed
* Real tasking problems with Ada.
@ 2017-07-25 23:19 Robert Eachus
  2017-07-26 19:42 ` sbelmont700
  2017-08-01  4:41 ` Randy Brukardt
  0 siblings, 2 replies; 23+ messages in thread
From: Robert Eachus @ 2017-07-25 23:19 UTC (permalink / raw)


This may come across as a rant.  I've tried to keep it down, but a little bit of ranting is probably appropriate.  Some of these problems could have been fixed earlier, but there are some which are only coming of age with new CPU and GPU designs, notably the Zen family of CPUs from AMD.

Let me first explain where I am coming from (at least in this thread).  I want to write code that takes full advantage of the features available to run code as fast as possible.  In particular I'd like to get the time to run some embarrassingly parallel routines in less than twice the total CPU time of a single CPU.  (In other words, a problem that takes T seconds (wall clock) on a single CPU should run in less than 2T/N wall clock time on N processors. Oh, and it shouldn't generate garbage.  I'm working on a test program and it did generate garbage once when I didn't guess some of the system parameters right. 

So what needs to be fixed.  First, it should be possible to assign tasks in arrays to CPUs.  With a half dozen CPU cores the current facilities are irksome.  Oh, and when doing the assignment it would be nice to ask for the facilities you need, rather than have per manufacturer's family code.  Just to start with, AMD has some processors which share floating-point units, so you want to run code on alternate CPU cores on those machines--if the tasks make heavy use of floating point.

Intel makes some processors with Hyperthreading, and some without, even within the same processor family.  Hyperthreading does let you get some extra performance out if you know what you are doing, but much of the time you will want the Hyperthreading alternate to do background and OS processing while allocating your heavy hitting work threads to the main threads.

Now look at AMD's Zen family.  Almost all available today have two threads per core like Intel's Hyperthreading, but these are much more like twins.  With random loads, each thread will do about the same amount of work.  However, if you know what you are doing, you can write code which usefully hogs all of the core's resources if it can.  Back to running on alternate cores...

I know that task array types were considered in Ada 9X.  I don't know what happened to them.  But even without them, two huge improvements would be:

   1) Add a function Current_CPU or whatever (to System.Multiprocessors) that returns the identity of the CPU this task is running on.  Obviously in a rendezvous with a protected object, the function would return the ID of the caller.  Probably do the same thing in a rendezvous between two tasks for consistency.  Note that Get_ID function in System.Multiprocessors.Dispatching_Domains does that but it requires adding three (unnecessary) packages (DD, Ada.Real_Time, and Ada.Task_Identification) to your context without really using anything there. 

   2) Allow a task to  its CPU assignment after it has started execution.  It is no big deal if a task starts on a different CPU than the one it will spend the rest of its life on.  At a minimum Set_CPU(Current_CPU) or just Set_CPU should cause the task to be anchored to its current CPU core.  Note that again you can do this with Dispatching_Domains.

   Stretch goal:  Make it possible to assign tasks to a specific pair of threads. In theory Dispatching_Domains does this, but the environment task messes things up a bit.  You need to leave the partner of the environment task's CPU core in the default dispatching domain.  The problem is that there is no guarantee that the environment task is running on CPU 1 (or CPU 0, the way the hardware numbers them).

Next, a huge problem.  I just had some code churn out garbage while I was finding the "right" settings to get each chunk of work to have its own portion of an array.  Don't tell me how to do this safely, if you do you are missing the point.  If each cache line is only written to by one task, that should be safe.  But to do that I need to determine the size of the cache lines, and how to force the compiler to allocate the data in the array beginning on a cache line boundary.  The second part is not hard, except that the compiler may not support alignment clauses that large.  The first?  A function Cache_Line_Size in System or System.Multiprocessors seems right.  Whether it is in bits or storage_units is no big deal.  Why a function not a constant?  The future looks like a mix of CPUs and GPUs all running parts of the same program.

Finally, caches and NUMA galore.  I mentioned AMD's Zen above.  Right now there are three Zen families with very different system architectures.  In fact, the difference between single and dual socket Epyc makes it four, and the Ryzen 3 and Ryzen APUs when released?  At least one more.  What's the big deal?  Take Threadripper to begin with.  Your choice of 12 or 16 cores each supporting two threads.  But the cache hierarchy is complex.  Each CPU core has two threads and its own L1 and L2 caches.  Then 3 or 4 cores, depending on the model share the same 8 Meg L3 cache. The four blocks of CPU cores and caches are actually split between two different chips.  Doesn't affect the cache timings much, but half the memory is attached to one chip, half to the other.  The memory loads and stores, if to the other chip, compete with L3 and cache probe traffic.  Let's condense that to this: 2 (threads)/(3 or 4) cores/2 (NUMA pairs)/2(chips)/1 (socket).  A Ryzen 7 chip is 2/4/2/1/1, Ryzen 5 is 2/(3 or 2)/2/1/1, Ryzen 3 1/2/2/1/1.  Epyc comes in at 2/(3 or 4)/2/4/2 among other flavors.

Writing a package to recognize these models and sort out for executable programs is going to be a non-trivial exercise--at least if I try to keep it current.  But how to convey the information to the software which is going to try to saturate the system?  No point in creating tasks which won't have a CPU core of their own (or half a core, or what you count Hyperthreading as).  Given the size of some of these systems, even without the HPC environment, it may be better for a program to split the work between chips or boxes.

Is adding these features to Ada worth the effort?  Sure.  Let me give you a very realistic example.  Running on processor cores which share L3 cache may be worthwhile.  Actually with Zen, the difference is that a program that stays on one L3 cache will actually save a lot of time on L2 probes.  (The line you need is in L2 on another CPU core.  Moving it to your core will take less time, and more importantly less latency than moving it from another CPU cluster.)  So we go to write our code.  On Ryzen 7 we want to run on cores 1, 3, 5, and 7 or 9, 11, 13 and 15, or 2, 4, 6, 8 or...  Actually I could choose 1,4,6, and 7, any set of one core from each pair staying within the module (of eight threads).  Move to a low-end Ryzen 3, and I get almost the same performance by choosing all the available cores: 1, 2, 3, and 4.  What about Ryzen 5 1600 and 1600X?  Is it going to be better to run on 3 cores and one L3 cache or 4 cores spread across two caches?    Or maybe choose all six cores on one L3 cache?  Argh!

Is this problem real?  Just took a program from  7.028 seconds on six cores, to 2.229 seconds on (the correct) three cores.  I'll post the program, or put it on-line somewhere once I've confined the memory corruption to very small examples--so you can see which machines do it--and do a bit more cleanup and optimization.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-07-25 23:19 Real tasking problems with Ada Robert Eachus
@ 2017-07-26 19:42 ` sbelmont700
  2017-07-27  2:00   ` Robert Eachus
  2017-08-01  4:41 ` Randy Brukardt
  1 sibling, 1 reply; 23+ messages in thread
From: sbelmont700 @ 2017-07-26 19:42 UTC (permalink / raw)


On Tuesday, July 25, 2017 at 7:19:59 PM UTC-4, Robert Eachus wrote:
>    1) Add a function Current_CPU or whatever (to System.Multiprocessors) that returns the identity of the CPU this task is running on.  
> 
>    2) Allow a task to  its CPU assignment after it has started execution. 
> 

Are these not exactly what System.Multiprocessors.Dispatching_Domains.Get_CPU and Set_CPU do?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-07-26 19:42 ` sbelmont700
@ 2017-07-27  2:00   ` Robert Eachus
  2017-08-01  4:45     ` Randy Brukardt
  0 siblings, 1 reply; 23+ messages in thread
From: Robert Eachus @ 2017-07-27  2:00 UTC (permalink / raw)


On Wednesday, July 26, 2017 at 3:42:57 PM UTC-4, sbelm...@gmail.com wrote:
> On Tuesday, July 25, 2017 at 7:19:59 PM UTC-4, Robert Eachus wrote:
> >    1) Add a function Current_CPU or whatever (to System.Multiprocessors) that returns the identity of the CPU this task is running on.  
> > 
> >    2) Allow a task to  its CPU assignment after it has started execution. 
> > 
> 
> Are these not exactly what System.Multiprocessors.Dispatching_Domains.Get_CPU and Set_CPU do?

Short answer, not exactly.  Yes, if I had posted the code I'm working on--probably sometime next week--you would have seen me using just that.  But the operations from Dispatching_Domains are pretty heavyweight--even if you ignore bringing in the extra packages.  What I would like are very lightweight operations.  Bringing in Ada.Real_Time and Ada.Task_Identification for default parameters which are never used would be bad enough, the problem is the program ends up checking the values passed.  So a call to Set_CPU(ID); is really a call to: Set_CPU(ID, Ada.Task_Identification.Current_Task), which can't be done before a task has an ID assigned.

If on a particular implementation, they are exactly the same, then all I am asking for is some declarative sugar which requires four lines of source added to Ada.Multitasking.  But what I really want is the ability to start a task on the processor core it will stay on.  Ah, you say, I can set the aspect CPU.  Well, not really.

I can't determine how many CPU cores are available until run-time.  That means if I want to generate tasks on a per CPU core basis, I can/must create them at run-time.  But there is no way for me to set the CPU aspect when I have an array of (identical) tasks.  I can set the aspect for tasks named Tom, Dick, and Harry like in the examples, but if I declare: Worker_Tasks: array(1..Number_of_CPUs), I can't set the CPU other than by an entry, which serializes the tasks when there are lots of them.  Remember, some of those tasks, on some hardware, will need to run on a chip in a socket over there somewhere.

It's just making things harder for non-real-time programmers for no reason.  And when you see the results from the work I am doing, you will be either astonished or appalled.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-07-25 23:19 Real tasking problems with Ada Robert Eachus
  2017-07-26 19:42 ` sbelmont700
@ 2017-08-01  4:41 ` Randy Brukardt
  2017-08-02  3:44   ` Robert Eachus
  1 sibling, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2017-08-01  4:41 UTC (permalink / raw)


"Robert Eachus" <rieachus@comcast.net> wrote in message 
news:9e51f87c-3b54-4d09-b9ca-e3c6a6e8940a@googlegroups.com...
> First, it should be possible to assign tasks in arrays to CPUs.

Use a discriminant of type CPU and (in Ada 2020) an 
iterated_component_association. (This was suggested way back in Ada 9x, left 
out in the infamous "scope reduction", and then forgotten about until 2013. 
See AI12-0061-1 or the draft RM 
http://www.ada-auth.org/standards/2xaarm/html/AA-4-3-3.html).

...
>1) Add a function Current_CPU or whatever (to System.Multiprocessors) that
> returns the identity of the CPU this task is running on.  Obviously in a 
> rendezvous
> with a protected object, the function would return the ID of the caller.

Why do you say this? Ada doesn't require the task that calls a protected 
object to execute it (the execution can be handdled by the task that 
services the barriers - I don't know if any implementation actually does 
this, but the language rules are written to allow it).

>  Probably do the same thing in a rendezvous between two tasks for 
> consistency.
>  Note that Get_ID function in System.Multiprocessors.Dispatching_Domains
> does that but it requires adding three (unnecessary) packages (DD,
> Ada.Real_Time, and Ada.Task_Identification) to your context without really
> using anything there.

Say what? You're using Get_Id, so clearly you're using something there. 
Get_Id (like the rest of dispatching domains is likely to be expensive, so 
you don't want it dragged into all programs. (And CPU is effectively part of 
all programs.)

> 2) Allow a task to  its CPU assignment after it has started execution.  It
> is no big deal if a task starts on a different CPU than the one it will 
> spend
> the rest of its life on.  At a minimum Set_CPU(Current_CPU) or just
> Set_CPU should cause the task to be anchored to its current CPU core.
>  Note that again you can do this with Dispatching_Domains.

So the capability already exists, but you don't like having to with an extra 
package to use it? Have you lost your freaking mind? You want us to add 
operations that ALREADY EXIST to another package, with all of the 
compatibility problems that doing so would cause (especially for people that 
had withed and used Dispatching_Domains)? When there are lots of problems 
that can't be solved portably at all?

>Next, a huge problem.  I just had some code churn out garbage while I was 
>finding the
> "right" settings to get each chunk of work to have its own portion of an 
> array.  Don't tell
> me how to do this safely, if you do you are missing the point.

No, you're missing the point. Ada is about writing portable code. Nothing at 
the level of "cache lines" is EVER going to be portable in any way. Either 
one writes "safe" code and hopefully the compiler and runtime can take into 
account the characteristics of the target. (Perhaps parallel loop constructs 
will help with that.)

Or otherwise, one writes bleeding edge code that is not portable and not 
safe. And you're on your own in such a case; no programming language could 
possibly help you.

>A function Cache_Line_Size in System or System.Multiprocessors seems right.

No,  it doesn't. It assumes a particular memory organization, and one thing 
that's pretty clear is that whatever memory organization is common now will 
not be common in a bunch of years. Besides, so many systems have multiple 
layers of caches, that a single result won't be enough. And there is no way 
for a general implementation to find this out (neither CPUs nor kernels 
describe such information).

>Is adding these features to Ada worth the effort?

No way. They're much too low level, and they actually aren't enough to allow 
parallelization. You want a language which allows fine-grained parallelism 
from the start (like Parasail); trying to retrofit that on Ada (which is 
mainly sequential, only having coarse parallelism) just will make a mess. 
You might get a few problems solved (those using actual arrays, as opposed 
to containers or user-defined types -- which one hopes are far more common 
in today's programs), but there is nothing general, nor anything that fits 
into Ada's building block approach, at the level that you're discussing.

                             Randy.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-07-27  2:00   ` Robert Eachus
@ 2017-08-01  4:45     ` Randy Brukardt
  2017-08-02  2:23       ` Robert Eachus
  0 siblings, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2017-08-01  4:45 UTC (permalink / raw)


"Robert Eachus" <rieachus@comcast.net> wrote in message 
news:ad30cdd8-c444-481f-9353-c16d91542e06@googlegroups.com...
...
>If on a particular implementation, they are exactly the same, then all I
> am asking for is some declarative sugar which requires four lines of
> source added to Ada.Multitasking.  But what I really want is the ability
> to start a task on the processor core it will stay on.  Ah, you say, I can
> set the aspect CPU.  Well, not really.

Yes, really. Use a discriminant of type CPU, and use that in the aspect. 
That's an age-old technique, and indeed is the major reason that tasks have 
discriminants. You then can allocate the tasks (which would be my 
suggestion), or you could create the entire set in an aggregate (assuming 
you have Ada 2020).

                                   Randy.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-01  4:45     ` Randy Brukardt
@ 2017-08-02  2:23       ` Robert Eachus
  2017-08-03  3:43         ` Randy Brukardt
  0 siblings, 1 reply; 23+ messages in thread
From: Robert Eachus @ 2017-08-02  2:23 UTC (permalink / raw)


On Tuesday, August 1, 2017 at 12:45:43 AM UTC-4, Randy Brukardt wrote:

> Yes, really. Use a discriminant of type CPU, and use that in the aspect. 
> That's an age-old technique, and indeed is the major reason that tasks have 
> discriminants. You then can allocate the tasks (which would be my 
> suggestion), or you could create the entire set in an aggregate (assuming 
> you have Ada 2020).

Sorry, you missed what all the shouting was about. ;-)  On the processor I am using (an AMD FX-6300 Vishera) running on all CPU cores causes contention for the floating-point units.  So for efficiency I have to run on one core from each pair of CPU cores.  Currently my program uses 2,4, and 6.  Creating an array indexed by CPU doesn't work.  If we had Algol style indexing--but I am certainly not going to advocate that.  This is not a problem unique to one family of CPUs.  I'm upgrading to an AMD Ryzen 7 which will have 8 cores and 16 threads.  It is going IMNSHO, to require the same thing.  Same for Intel processors with Hyperthreading enabled.

As for cache line sizes affecting code, yes the garbage case was a bug in my code--or in GNAT, or in expectations.  (GNAT 2017 has Standard'Maximum_Alignment equal to 16. At least the version I am using does. I was trying to trick it into 64 byte (cache line) alignment by using computed Address clauses.   On AMD processors cache lines are 64 bytes, but usually two lines (128 bytes) are read if no other thread is waiting for a cache line.  Intel does it the other way around 256 byte cache lines, and the CPU will only fetch 128 if there are other requests queued.)

Yes, I knew what I was doing was messy and dangerous--or at least required careful checking.  My point was that if Maximum_Alignment was large enough, I wouldn't be going through the pain.  Was it worth it?  That is what this is all about.  I have a program which spreads a matrix multiplication over multiple processors--and compares the result with the single processor case.  Right now, unfortunately, every time I get the tasking version faster, the non-tasking version improves as well.  (I'm currently at about 700 Million multiplications, 1.4 GigaFLOPS ignoring the integer indexing.)  Now if I could get up to 2 GigaFLOPS on the tasking version I'd be happy.  Of course, once I move to the Ryzen 7 I expect much better numbers, and better still for video cards.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-01  4:41 ` Randy Brukardt
@ 2017-08-02  3:44   ` Robert Eachus
  2017-08-02 14:39     ` Lucretia
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Robert Eachus @ 2017-08-02  3:44 UTC (permalink / raw)


On Tuesday, August 1, 2017 at 12:42:01 AM UTC-4, Randy Brukardt wrote:
 
> Use a discriminant of type CPU and (in Ada 2020) an 
> iterated_component_association. (This was suggested way back in Ada 9x, left 
> out in the infamous "scope reduction", and then forgotten about until 2013. 
> See AI12-0061-1 or the draft RM 
> http://www.ada-auth.org/standards/2xaarm/html/AA-4-3-3.html).

It is nice that this is in there, just in-time to be outmoded.  See my previous post, on most desktop and server processors assigning to alternate processor IDs is necessary.

> Why do you say this? Ada doesn't require the task that calls a protected 
> object to execute it (the execution can be handdled by the task that 
> services the barriers - I don't know if any implementation actually does 
> this, but the language rules are written to allow it).

Um, I say it because any other result is useless?  The use case is for the called task or protected object to be able to get the CPU number and do something with it.  The simple case right now would be a protected object whose only purpose is to make sure task assignments are compatible with the architecture.  Again with Zen, Epyx has multiprocessors with 64 cores, 128 threads that will show as CPUs.  There are favored pairings of threads so that they share caches, or have shorter paths to parts of L3.  Intel says they will support up to eight sockets, with 28 CPU cores and 56 threads per socket.  I don't believe a system with 448 CPU threads will be realistic, even 224 will probably stress scheduling.  (Note that many of these "big" systems use VMware to create lots of virtual machines with four or eight threads.)

> Say what? You're using Get_Id, so clearly you're using something there. 
> Get_Id (like the rest of dispatching domains is likely to be expensive, so 
> you don't want it dragged into all programs. (And CPU is effectively part of 
> all programs.)

Sigh! Get_Id as defined is heavy only because of the default initial value:

   function Get_CPU
      (T   : Ada.Task_Identification.Task_Id :=
                 Ada.Task_Identification.Current_Task)
           return CPU_Range;

but a parameterless Get_CPU could compile to a single load instruction.  I'm not asking for the current function to be eliminated, there are situations where it is needed.  But it doesn't need all that baggage for the normal use.

> 
> > 2) Allow a task to  its CPU assignment after it has started execution.  It
> > is no big deal if a task starts on a different CPU than the one it will 
> > spend
> > the rest of its life on.  At a minimum Set_CPU(Current_CPU) or just
> > Set_CPU should cause the task to be anchored to its current CPU core.
> >  Note that again you can do this with Dispatching_Domains.

Left out two words above.  Should read "Allow a task to statically set its CPU assignment...
> 
> So the capability already exists, but you don't like having to with an extra 
> package to use it? Have you lost your freaking mind? You want us to add 
> operations that ALREADY EXIST to another package, with all of the 
> compatibility problems that doing so would cause (especially for people that 
> had withed and used Dispatching_Domains)? When there are lots of problems 
> that can't be solved portably at all?

No, I don't want compilers putting in extra code when it is not necessary.  If a task has a (static) CPU assignment then again, Get_CPU is essentially free.  Is it cheaper than fishing some index out of main memory because it got flushed there? Probably.  Yes, I can make a function My_CPU which does this.  I'm just making too many of those workarounds right now. 
> 
> No, you're missing the point. Ada is about writing portable code. Nothing at 
> the level of "cache lines" is EVER going to be portable in any way. Either 
> one writes "safe" code and hopefully the compiler and runtime can take into 
> account the characteristics of the target. (Perhaps parallel loop constructs 
> will help with that.)
> 

> Or otherwise, one writes bleeding edge code that is not portable and not 
> safe. And you're on your own in such a case; no programming language could 
> possibly help you.

I am trying to write portable code.  Portable enough to run on all modern AMD64 and EM64T (Intel) CPUs.  A table of processor IDs with the associated values for the numbers I need in the program is not going to happen.  Most of the parameters I need can be found in Ada now.  Cache line size is one that is not there.
> 
> >A function Cache_Line_Size in System or System.Multiprocessors seems right.
> 
> No,  it doesn't. It assumes a particular memory organization, and one thing 
> that's pretty clear is that whatever memory organization is common now will 
> not be common in a bunch of years. Besides, so many systems have multiple 
> layers of caches, that a single result won't be enough. And there is no way 
> for a general implementation to find this out (neither CPUs nor kernels 
> describe such information).

Um. No.  Systems may have multiple levels of caches of different sizes and different numbers of "ways" per cache.  But the actual cache line size is almost locked in, and is the same for all caches in a system.  Most systems with DDR3 and DDR4 use 64 byte cache lines because it matches the memory burst length.  But other values are possible.  Right now HBM2 is pushing GPUs (not CPUs) to 256 byte cache lines.  Will we eventually have Ada compilers generating code for heterogeneous systems?  Possible.  What I am working on is building the blocks that can be used with DirectCompute, OpenCL 2.0, and perhaps other GPU software interfaces. 
> 
> >Is adding these features to Ada worth the effort?
> 
> No way. They're much too low level, and they actually aren't enough to allow 
> parallelization. You want a language which allows fine-grained parallelism 
> from the start (like Parasail); trying to retrofit that on Ada (which is 
> mainly sequential, only having coarse parallelism) just will make a mess. 
> You might get a few problems solved (those using actual arrays, as opposed 
> to containers or user-defined types -- which one hopes are far more common 
> in today's programs), but there is nothing general, nor anything that fits 
> into Ada's building block approach, at the level that you're discussing.
> 
For now we can agree to disagree.  The difference is the size of the arrays we have to deal with.  When arrays get to tens of millions of entries, and operations on them can take tens of billions operations, I don't think I am talking about fine grained parallelism.  The main characteristics of the operations I want to get working: matrix multiplication and inversion, linear programming, FFT, FLIR and FLRadar, Navier-Stokes, all have the form of a set of huge data arrays constant once loaded, and can be parallelized across large numbers of CPU cores, or GPUs.                           


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-02  3:44   ` Robert Eachus
@ 2017-08-02 14:39     ` Lucretia
  2017-08-03  0:57       ` Robert Eachus
                         ` (2 more replies)
  2017-08-03  4:16     ` Randy Brukardt
  2017-08-03  8:03     ` Simon Wright
  2 siblings, 3 replies; 23+ messages in thread
From: Lucretia @ 2017-08-02 14:39 UTC (permalink / raw)


On Wednesday, 2 August 2017 04:44:03 UTC+1, Robert Eachus  wrote:

> > >A function Cache_Line_Size in System or System.Multiprocessors seems right.
> > 
> > No,  it doesn't. It assumes a particular memory organization, and one thing 
> > that's pretty clear is that whatever memory organization is common now will 
> > not be common in a bunch of years. Besides, so many systems have multiple 
> > layers of caches, that a single result won't be enough. And there is no way 
> > for a general implementation to find this out (neither CPUs nor kernels 
> > describe such information).
> 
> Um. No.  Systems may have multiple levels of caches of different sizes and different numbers of "ways" per cache.  But the actual cache line size is almost locked in, and is the same for all caches in a system.  Most systems with DDR3 and DDR4 use 64 byte cache lines because it matches the memory burst length.  But other values are possible.  Right now HBM2 is pushing GPUs (not CPUs) to 256 byte cache lines.  Will we eventually have Ada compilers generating code for heterogeneous systems?  Possible.  What I am working on is building the blocks that can be used with DirectCompute, OpenCL 2.0, and perhaps other GPU software interfaces. 

You and I see Ada going in the same direction, Ada needs to and should be targetting massively multi-core/threaded systems. With 202x, I want to see this push such that parallel blocks can be compiled down to SPIR-V, for example. Or being able to compile a subprogam as a compute kernel, all within one language.

i.e.

parallel
   ...
and
   ...
end with
   Compute; -- Offload to compute if available.

FYI, Khronos group is pretty much converging on SPIR-V as the intermediate language of choice for compute, OpenCL 2.0, Vulkan and now, OpenGL 4.6, all have it.

Just because Ada was designed for the DoD, doesn't mean that that is it's only use, i.e. smaller embedded systems. We all know we can use the language for it's impressive portability, but even a portable language has non-portable areas, that's what the System packages are for, no?

Seems, you may have hit against the old grey-beards wall of not wanting to modernise maybe?

> > >Is adding these features to Ada worth the effort?

Yes.

> > No way. They're much too low level, and they actually aren't enough to allow 
> > parallelization. You want a language which allows fine-grained parallelism 
> > from the start (like Parasail); trying to retrofit that on Ada (which is 
> > mainly sequential, only having coarse parallelism) just will make a mess. 
> > You might get a few problems solved (those using actual arrays, as opposed 
> > to containers or user-defined types -- which one hopes are far more common 
> > in today's programs), but there is nothing general, nor anything that fits 
> > into Ada's building block approach, at the level that you're discussing.
> > 
> For now we can agree to disagree.  The difference is the size of the arrays we have to deal with.  When arrays get to tens of millions of entries, and operations on them can take tens of billions operations, I don't think I am talking about fine grained parallelism.  The main characteristics of the operations I want to get working: matrix multiplication and inversion, linear programming, FFT, FLIR and FLRadar, Navier-Stokes, all have the form of a set of huge data arrays constant once loaded, and can be parallelized across large numbers of CPU cores, or GPUs.

Have we hit the limits of the language (and any future Ada's)? Is this what it it takes to create a new fresh modern take on Ada? Now, where did I put that David Botton guy? :)

P.S: This is how forks happen.
P.P.S: This is how people leave a language too.

Luke.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-02 14:39     ` Lucretia
@ 2017-08-03  0:57       ` Robert Eachus
  2017-08-03  5:43         ` Randy Brukardt
  2017-08-03  1:33       ` Wesley Pan
  2017-08-03  4:30       ` Randy Brukardt
  2 siblings, 1 reply; 23+ messages in thread
From: Robert Eachus @ 2017-08-03  0:57 UTC (permalink / raw)


On Wednesday, August 2, 2017 at 10:39:02 AM UTC-4, Lucretia wrote:

> P.S: This is how forks happen.
> P.P.S: This is how people leave a language too.

We are not to that point, I hope.  I just got bit by the every other CPU and Maximum_Alignment issues within a few minutes.  Both had solutions within the existing language, but both need like they were designed to eat non-experts alive.  For example, I understand why System.Multiprocessing.Dispatching_Domains is a separate subclause from System.Multiprocessing. But hiding these packages is where I flipped out:

NOTES
38  17  There are also some language-defined child packages of System defined elsewhere. 

Let's have a treasure hunt!  No thanks.  Why not list the packages and where they are declared?  (BTW, that gem comes from 13.7 The Package System...)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-02 14:39     ` Lucretia
  2017-08-03  0:57       ` Robert Eachus
@ 2017-08-03  1:33       ` Wesley Pan
  2017-08-03  4:30       ` Randy Brukardt
  2 siblings, 0 replies; 23+ messages in thread
From: Wesley Pan @ 2017-08-03  1:33 UTC (permalink / raw)


On Wednesday, August 2, 2017 at 7:39:02 AM UTC-7, Lucretia wrote:
 
> You and I see Ada going in the same direction, Ada needs to and should be targetting massively multi-core/threaded systems. With 202x, I want to see this push such that parallel blocks can be compiled down to SPIR-V, for example. Or being able to compile a subprogam as a compute kernel, all within one language.
> 
> i.e.
> 
> parallel
>    ...
> and
>    ...
> end with
>    Compute; -- Offload to compute if available.

Example where massive multi-core/multi-threaded systems will be the way of the future: http://www.eetimes.com/document.asp?_mc=RSS%5FEET%5FEDT&doc_id=1331891&page_number=1

Sadly, languages like C and C++ will probably be used for such platforms since Ada is frowned upon by many. A safety critical use case that probably won't use a safety critical focused language like Ada... =(

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-02  2:23       ` Robert Eachus
@ 2017-08-03  3:43         ` Randy Brukardt
  2017-08-03 20:03           ` Robert Eachus
  0 siblings, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2017-08-03  3:43 UTC (permalink / raw)


"Robert Eachus" <rieachus@comcast.net> wrote in message 
news:914ae4df-cc52-4e6e-b342-584bcac98e88@googlegroups.com...
On Tuesday, August 1, 2017 at 12:45:43 AM UTC-4, Randy Brukardt wrote:

>> Yes, really. Use a discriminant of type CPU, and use that in the aspect.
>> That's an age-old technique, and indeed is the major reason that tasks 
>> have
>> discriminants. You then can allocate the tasks (which would be my
>> suggestion), or you could create the entire set in an aggregate (assuming
>> you have Ada 2020).

>Sorry, you missed what all the shouting was about. ;-)  On the processor I 
>am using
>(an AMD FX-6300 Vishera) running on all CPU cores causes contention for the
>floating-point units.  So for efficiency I have to run on one core from 
>each pair of
>CPU cores.  Currently my program uses 2,4, and 6.  Creating an array 
>indexed by
>CPU doesn't work.

Who said anything about indexing an array by CPUs? If you're able to 
allocate tasks, then create them with whatever CPUs you want and index the 
array by whatever you want, there's no reason that they have to be tied 
together.

Any if you can't, you're probably in a Ravenscar or similar environment, and 
those do not allow any sort of dynamic assignment of CPUs. So you're not 
allowed to do what you want in any case.

...
>Yes, I knew what I was doing was messy and dangerous--or at least required
>careful checking.  My point was that if Maximum_Alignment was large enough,
>I wouldn't be going through the pain.

The maximum alignment an implementation supports comes directly from the 
linker in use. Since most Ada implementations use system linkers that are 
completely out of their control, it's not really possible to support 
anything larger. (One can do it dynamically with a significant waste of 
memory, but that is not the sort of solution that is wanted on an embedded 
system.)

                                        Randy.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-02  3:44   ` Robert Eachus
  2017-08-02 14:39     ` Lucretia
@ 2017-08-03  4:16     ` Randy Brukardt
  2017-08-03  5:05       ` Niklas Holsti
  2017-08-03  8:03     ` Simon Wright
  2 siblings, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2017-08-03  4:16 UTC (permalink / raw)


"Robert Eachus" <rieachus@comcast.net> wrote in message 
news:f9e87130-3f08-43c4-8cc6-9e164a6da954@googlegroups.com...
On Tuesday, August 1, 2017 at 12:42:01 AM UTC-4, Randy Brukardt wrote:

>> Use a discriminant of type CPU and (in Ada 2020) an
>> iterated_component_association. (This was suggested way back in Ada 9x, 
>> left
>> out in the infamous "scope reduction", and then forgotten about until 
>> 2013.
>> See AI12-0061-1 or the draft RM
>> http://www.ada-auth.org/standards/2xaarm/html/AA-4-3-3.html).
>
>It is nice that this is in there, just in-time to be outmoded.  See my 
>previous post,
>on most desktop and server processors assigning to alternate processor IDs 
>is
> necessary.

Why would that be "outmoded"? What's wrong with:
    (for I in 1..Number_of_CPUs/2 => new My_Task (The_CPU => I*2));
if you need alternate CPUs? Or, even better, a function call to map the 
linear array indexes to CPUs? This is all dynamic code, after all, write 
anything you need.

>> Why do you say this? Ada doesn't require the task that calls a protected
>> object to execute it (the execution can be handdled by the task that
>> services the barriers - I don't know if any implementation actually does
>> this, but the language rules are written to allow it).

>Um, I say it because any other result is useless?  The use case is for the
>called task or protected object to be able to get the CPU number and
>do something with it.

But that's a nonsense use-case. You're not allowed to ask the Current_Task 
in an entry body or interrupt handler (see C.7.1(17/3) -- the operation is 
allowed to raise Program_Error or return an implementation-defined Task_Id). 
So it's nonsense to ask what CPU you're running on when you're not even 
allowed to ask what task is running.

> The simple case right now would be a protected object whose only purpose
> is to make sure task assignments are compatible with the architecture.

But that's backwards; you use a function to make those assignments when the 
tasks are created. (Again, if you're in a Ravenscar environment, you can't 
do that, but the code has to be tied to a particular CPU layout when it 
compiled, so there's nothing to check.)

...
>> Say what? You're using Get_Id, so clearly you're using something there.
>> Get_Id (like the rest of dispatching domains is likely to be expensive, 
>> so
>> you don't want it dragged into all programs. (And CPU is effectively part 
>> of
>> all programs.)
>
>Sigh! Get_Id as defined is heavy only because of the default initial value:
>
>   function Get_CPU
>     (T   : Ada.Task_Identification.Task_Id :=
>                 Ada.Task_Identification.Current_Task)
>          return CPU_Range;
>
>but a parameterless Get_CPU could compile to a single load instruction.

No chance. You have to make a kernel call to get any thread information 
(like affinities), and that's likely to be more expensive than any task id 
stuff. That "single load instruction" is protected on every archtecture I've 
dealt with, and the kernels don't let programs mess with those instructions 
(and rightly so).

You seem to be thinking of bare machine implementations of Ada, but none of 
those that have existed in the last decade or more have any tasking support. 
Customers aren't interested in such things so far as I can tell. (And I 
agree this is a sad state of affairs, but complaining about it isn't going 
to change anything. I've been told in ARG meetings not to worry about bare 
machine implementations for this reason.)

In any case, you can avoid any heaviness from the default parameter by 
actually providing the task id you need to know about.

...
>> So the capability already exists, but you don't like having to with an 
>> extra
>> package to use it? Have you lost your freaking mind? You want us to add
>> operations that ALREADY EXIST to another package, with all of the
>> compatibility problems that doing so would cause (especially for people 
>> that
>> had withed and used Dispatching_Domains)? When there are lots of problems
>> that can't be solved portably at all?
>
>No, I don't want compilers putting in extra code when it is not necessary. 
>If a
> task has a (static) CPU assignment then again, Get_CPU is essentially 
> free.

No way; you have to get that information from the kernel. So long as Set_CPU 
exists you have to assume that it could have been used. It's only free if 
you've put it into a well-known place (say the discriminant of the task 
type) and read it from there, because you know you didn't use Set_CPU. (Also 
note, at least in the cases of the systems I've worked on, that we don't 
store anything that can be changed by kernel operations in the Ada runtime 
system, so that if someone uses kernel operations to change the values, we 
actually report the real answer and not the one that the compiler knows 
about.)

>Is it cheaper than fishing some index out of main memory because it got
>flushed there? Probably.  Yes, I can make a function My_CPU which
>does this.  I'm just making too many of those workarounds right now.

I don't think this is a workaround. It's WAY cheaper to get this information 
yourself than it is to get it from the kernel.

>> >A function Cache_Line_Size in System or System.Multiprocessors seems 
>> >right.
>>
>> No,  it doesn't. It assumes a particular memory organization, and one 
>> thing
>> that's pretty clear is that whatever memory organization is common now 
>> will
>> not be common in a bunch of years. Besides, so many systems have multiple
>> layers of caches, that a single result won't be enough. And there is no 
>> way
>> for a general implementation to find this out (neither CPUs nor kernels
>> describe such information).

>Um. No.  Systems may have multiple levels of caches of different sizes and
>different numbers of "ways" per cache.  But the actual cache line size is
>almost locked in, and is the same for all caches in a system.  Most systems
>with DDR3 and DDR4 use 64 byte cache lines because it matches the
>memory burst length.  But other values are possible.  Right now HBM2
>is pushing GPUs (not CPUs) to 256 byte cache lines.  Will we eventually
>have Ada compilers generating code for heterogeneous systems?  Possible.
>What I am working on is building the blocks that can be used with
>DirectCompute, OpenCL 2.0, and perhaps other GPU software interfaces.

But you didn't answer the actual question. How in the world would an Ada 
compiler figure out the cache line size? I have no clue as to what it is on 
the computer that I'm writing them on, and it's not something that is 
provided by Windows or Linux. So how could an Ada runtime figure out it out 
and report it?? Janus/Ada programs run on any Win32 system from Windows 95 
on, it's wildly impractical to have some sort of hardware table.

As far as hetrogeneous systems go, you have to assume that (especially in a 
case like this). It would be awful to prevent that just for a little-used 
function. So there can't be an unconditional "cache-line" setting, it has to 
be based on a particular piece of memory. (It's similar in that way to file 
systems; you can have both case-sensitive and case-preserving file systems 
on the same computer and even in the same program.) That's one complication 
of several with discussing "cache-lines", not to mention that the entire 
concept will most likely be gone in a decade.

                            Randy.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-02 14:39     ` Lucretia
  2017-08-03  0:57       ` Robert Eachus
  2017-08-03  1:33       ` Wesley Pan
@ 2017-08-03  4:30       ` Randy Brukardt
  2 siblings, 0 replies; 23+ messages in thread
From: Randy Brukardt @ 2017-08-03  4:30 UTC (permalink / raw)


"Lucretia" <laguest9000@googlemail.com> wrote in message 
news:290e79ee-6626-468b-932b-94dfe724ec45@googlegroups.com...
On Wednesday, 2 August 2017 04:44:03 UTC+1, Robert Eachus  wrote:

...
>You and I see Ada going in the same direction, Ada needs to and should be
>targetting massively multi-core/threaded systems. With 202x, I want to see
>this push such that parallel blocks can be compiled down to SPIR-V, for
>example. Or being able to compile a subprogam as a compute kernel, all
>within one language.

I want this too. But I don't want low-level constructs that tie Ada to 
today's (and yesterday's architectures). So talking about "cache-lines" and 
explicit CPU assignment is simply the wrong thing to do for Ada. That's 
stuff the compiler should be doing for you, in order to get the best way to 
execute your code. That way, when you get a new computer and compiler, your 
code will automatically adjust to do the right thing on that (presumably 
very different architecture).

What happens with Robert E's code if suddenly most machines are using 
transactional memory? It's tied to traditional cache lines, but such things 
probably wouldn't exist. Etc.

There's always going to be a few people in need of absolutely bleeding edge 
performance, but (a) you're almost never going to get that from a compile 
[any compiler, for any language], and (b) that vast majority of problems 
don't need absolutely bleeding edge performance anyway (or even any 
particular level of performance). Moreover, getting bleeding edge 
performance requires a lot of knowledge about tasking/multi-processors that 
is way beyond the usual programmer.

I'm putting my money (and effort) on the relatively high-level constructs 
proposed for Ada 2020 (which should have some lower-level tuning features). 
I want *everyone* to be able to take advantage of the speed-ups offered by 
parallel programs, not just 0.1% that understand modern memory models.

                                    Randy.

P.S. The really low-level features that don't map very well to more modern 
hardware is precisely what is wrong with C. At least C is so common that 
processors have to find ways to support that stuff; that would not be true 
for Ada or for any other language for that matter.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-03  4:16     ` Randy Brukardt
@ 2017-08-03  5:05       ` Niklas Holsti
  2017-08-04 23:11         ` Randy Brukardt
  0 siblings, 1 reply; 23+ messages in thread
From: Niklas Holsti @ 2017-08-03  5:05 UTC (permalink / raw)


On 17-08-03 07:16 , Randy Brukardt wrote:

> You seem to be thinking of bare machine implementations of Ada, but none of
> those that have existed in the last decade or more have any tasking support.

GNAT supports bare-machine applications, with various levels of tasking 
support, from none through Ravenscar to full tasking, depending on the 
version of the Run-Time System one chooses.

Or does your "bare machine" mean "no RTS"? I have always understood it 
to mean "no *language-independent* kernel or O/S".

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-03  0:57       ` Robert Eachus
@ 2017-08-03  5:43         ` Randy Brukardt
  0 siblings, 0 replies; 23+ messages in thread
From: Randy Brukardt @ 2017-08-03  5:43 UTC (permalink / raw)


"Robert Eachus" <rieachus@comcast.net> wrote in message 
news:8a2bf578-fc1e-42a6-a11f-8b302d41e656@googlegroups.com...

>Let's have a treasure hunt!  No thanks.  Why not list the packages and
>where they are declared?  (BTW, that gem comes from 13.7 The Package 
>System...)

That's done at the start of Annex A (for the entire set of language-defined 
packages), and given how hard it is to keep that up-to-date, having any more 
such places is asking for a disaster.

If someone formally made this comment, I'd recommend just deleting the idiot 
note (I don't see any value that it has, of course there are packages 
declared in various places). As it is, I'm not doing anything as there are 
dozens of worse things in the Standard (and every change has a cost).

                                                          Randy.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-02  3:44   ` Robert Eachus
  2017-08-02 14:39     ` Lucretia
  2017-08-03  4:16     ` Randy Brukardt
@ 2017-08-03  8:03     ` Simon Wright
  2017-08-04 23:16       ` Randy Brukardt
  2 siblings, 1 reply; 23+ messages in thread
From: Simon Wright @ 2017-08-03  8:03 UTC (permalink / raw)


Robert Eachus <rieachus@comcast.net> writes:

> Sigh! Get_Id as defined is heavy only because of the default initial value:
>
>    function Get_CPU
>       (T   : Ada.Task_Identification.Task_Id :=
>                  Ada.Task_Identification.Current_Task)
>            return CPU_Range;

Would it have been reasonable to have used Null_Task_Id to mean 'the
current task'? (this is a trick used in FreeRTOS at least).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-03  3:43         ` Randy Brukardt
@ 2017-08-03 20:03           ` Robert Eachus
  2017-08-03 23:10             ` Luke A. Guest
  2017-08-04 23:22             ` Randy Brukardt
  0 siblings, 2 replies; 23+ messages in thread
From: Robert Eachus @ 2017-08-03 20:03 UTC (permalink / raw)


On W> "Robert Eachus" <rieachus@comcast.net> wrote in message 
 
> The maximum alignment an implementation supports comes directly from the 
> linker in use. Since most Ada implementations use system linkers that are 
> completely out of their control, it's not really possible to support 
> anything larger. (One can do it dynamically with a significant waste of 
> memory, but that is not the sort of solution that is wanted on an embedded 
> system.)

Now I'm really confused.  I'll have to do some experimenting.  If I have two locations protected by use of read-modify-write (RMW) instructions and written by tasks on different CPUs, but in the same cache line, caching automagically provides safety.  The read part moves the line to the local CPU's L1 data cache.  But if they are on the same (physical) CPU but different logical CPUs due to Hyperthreading, multithreading, or the like, the hardware logic needs to protect against another read or write by itself.  This probably means that the hardware when analyzing the RMW instruction insists that not only have all prior writes been written, but that no new instructions from the Hyperthreading thread on the same CPU be executed until the cycle is finished.  So the hardware logic prevents writes to the same cache line by protecting against writes to all cache lines.

By the way, yes I am working on trying to generate fast code for supercomputers.  I'd like to do it in Ada rather than assembler.  They use the same CPUs as desktop computers, or more often, servers, so I can do my experimenting on my desktop.  The frustrating thing is that right now, the Ada code for a single thread/task is faster than the assembler.  Very much reversed for the multitasking case.

In matrix multiplication code (A*B=C) I have individual tasks computing slices of C such that they are never 'worried' about such issues, so RMWs are not needed--as long as the slices are multiples of the cache line length.  C(X,C(2)'Last) is followed immediately by C(X+1,C(2)'First), so the only potential problem is if C'Length(1)*C'Length(2) is not a multiple of the cache line length.  I deal with that by doing one special multiply in the main task, not the worker tasks.

I guess I can deal with this by using 256 and adding a note that it needs to be fixed if cache line lengths are longer.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-03 20:03           ` Robert Eachus
@ 2017-08-03 23:10             ` Luke A. Guest
  2017-08-04 23:22             ` Randy Brukardt
  1 sibling, 0 replies; 23+ messages in thread
From: Luke A. Guest @ 2017-08-03 23:10 UTC (permalink / raw)


Robert Eachus <rieachus@comcast.net> wrote:

> By the way, yes I am working on trying to generate fast code for
> supercomputers.  I'd like to do it in Ada rather than assembler.  They
> use the same CPUs as desktop computers, or more often, servers, so I can
> do my experimenting on my desktop.  The frustrating thing is that right
> now, the Ada code for a single thread/task is faster than the assembler. 
> Very much reversed for the multitasking case.
> 

Do you have this t st code up on GitHub or the like?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-03  5:05       ` Niklas Holsti
@ 2017-08-04 23:11         ` Randy Brukardt
  2017-08-05  7:01           ` Niklas Holsti
  0 siblings, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2017-08-04 23:11 UTC (permalink / raw)


"Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message 
news:eufp9gFi2doU1@mid.individual.net...
> On 17-08-03 07:16 , Randy Brukardt wrote:
>
>> You seem to be thinking of bare machine implementations of Ada, but none 
>> of
>> those that have existed in the last decade or more have any tasking 
>> support.
>
> GNAT supports bare-machine applications, with various levels of tasking 
> support, from none through Ravenscar to full tasking, depending on the 
> version of the Run-Time System one chooses.
>
> Or does your "bare machine" mean "no RTS"? I have always understood it to 
> mean "no *language-independent* kernel or O/S".

Last I looked, all of the vendors had gotten rid of their bare machine 
implementations. I know GNAT supports such environments without tasking, but 
I hadn't heard anything about any such support including tasking -- in part 
because such tasking support has to be customized to each processor (and to 
some extent board), and not much of it can be shared with the runtime for a 
OS like Windows or Linux.

If they really have a complete bare machine package, that's great, but there 
still is very little commercial interest in such things (I suspect because 
most commercial systems need some sort of networking support and probably a 
bit more device-independence).

                                      Randy.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-03  8:03     ` Simon Wright
@ 2017-08-04 23:16       ` Randy Brukardt
  0 siblings, 0 replies; 23+ messages in thread
From: Randy Brukardt @ 2017-08-04 23:16 UTC (permalink / raw)


"Simon Wright" <simon@pushface.org> wrote in message 
news:ly4ltpqdrm.fsf@pushface.org...
> Robert Eachus <rieachus@comcast.net> writes:
>
>> Sigh! Get_Id as defined is heavy only because of the default initial 
>> value:
>>
>>    function Get_CPU
>>       (T   : Ada.Task_Identification.Task_Id :=
>>                  Ada.Task_Identification.Current_Task)
>>            return CPU_Range;
>
> Would it have been reasonable to have used Null_Task_Id to mean 'the
> current task'? (this is a trick used in FreeRTOS at least).

There are rules about when Current_Task is not well-defined (or can raise 
Program_Error, as it is a bounded error to call it in some places). By using 
that here, we don't have to repeat them in every place that might care about 
the current task. If  you used Null_Task_Id as a placeholder, you would have 
to say that Get_CPU is not well-defined (or even when it can raise 
Program_Error).

You'd also lose the null value for representing "no task", which seems to be 
a common requirement for any kind of type.

Thus, that's not the worst idea, but it would be a pain to define and 
possibly would cause issues for anything that needs to represent "no task" 
(such as the default initialization of a data structure).

                                     Randy.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-03 20:03           ` Robert Eachus
  2017-08-03 23:10             ` Luke A. Guest
@ 2017-08-04 23:22             ` Randy Brukardt
  2017-08-22  5:10               ` Robert Eachus
  1 sibling, 1 reply; 23+ messages in thread
From: Randy Brukardt @ 2017-08-04 23:22 UTC (permalink / raw)


"Robert Eachus" writes:

>Now I'm really confused. ...

You've reached the limits of my knowledge of memory hardware. In theory, the 
Independent_Components aspect should ensure that different components can be 
read/written by different tasks independently. But I don't know how/whether 
cache lines get involved (my understanding was that putting the components 
into different words was sufficient). But you could be right (having tried 
this much more than I have). In any case, the compiler is supposed to do the 
right thing when Independent_Components is used.

                        Randy.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-04 23:11         ` Randy Brukardt
@ 2017-08-05  7:01           ` Niklas Holsti
  0 siblings, 0 replies; 23+ messages in thread
From: Niklas Holsti @ 2017-08-05  7:01 UTC (permalink / raw)


On 17-08-05 02:11 , Randy Brukardt wrote:
> "Niklas Holsti" <niklas.holsti@tidorum.invalid> wrote in message
> news:eufp9gFi2doU1@mid.individual.net...
>> On 17-08-03 07:16 , Randy Brukardt wrote:
>>
>>> You seem to be thinking of bare machine implementations of Ada, but none
>>> of
>>> those that have existed in the last decade or more have any tasking
>>> support.
>>
>> GNAT supports bare-machine applications, with various levels of tasking
>> support, from none through Ravenscar to full tasking, depending on the
>> version of the Run-Time System one chooses.
    ...
> Last I looked, all of the vendors had gotten rid of their bare machine
> implementations. I know GNAT supports such environments without tasking, but
> I hadn't heard anything about any such support including tasking -- in part
> because such tasking support has to be customized to each processor (and to
> some extent board), and not much of it can be shared with the runtime for a
> OS like Windows or Linux.
>
> If they really have a complete bare machine package, that's great,

Sorry, I mis-remembered -- AdaCore have an "extended Ravenscar" 
run-time, and a "configurable" run-time, but apparently not configurable 
up to full tasking.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Real tasking problems with Ada.
  2017-08-04 23:22             ` Randy Brukardt
@ 2017-08-22  5:10               ` Robert Eachus
  0 siblings, 0 replies; 23+ messages in thread
From: Robert Eachus @ 2017-08-22  5:10 UTC (permalink / raw)


On Friday, August 4, 2017 at 7:22:54 PM UTC-4, Randy Brukardt wrote:
> "Robert Eachus" writes:
> 
> >Now I'm really confused. ...
> 
> You've reached the limits of my knowledge of memory hardware. In theory, the 
> Independent_Components aspect should ensure that different components can be 
> read/written by different tasks independently. But I don't know how/whether 
> cache lines get involved (my understanding was that putting the components 
> into different words was sufficient). But you could be right (having tried 
> this much more than I have). In any case, the compiler is supposed to do the 
> right thing when Independent_Components is used.
> 
The problem is that Independent_Components will insure that the compiler can generate relatively simple code to access components even if they are accessed by different tasks.  However, simple code doesn't take into account caches.  In other words, if occasional collisions where two tasks access the same cache line at about the same time occur, you will get the expected (right) results, at a slight cost in CPU time.  But if you have two tasks accessing the same cache line on a regular basis, your code will be dog slow.  I could write up a test program if you want--store data in records and have the processing of any record require at least two tasks.  Then a version where the data is stored in arrays accessed by key (AKA) index.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-08-22  5:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-25 23:19 Real tasking problems with Ada Robert Eachus
2017-07-26 19:42 ` sbelmont700
2017-07-27  2:00   ` Robert Eachus
2017-08-01  4:45     ` Randy Brukardt
2017-08-02  2:23       ` Robert Eachus
2017-08-03  3:43         ` Randy Brukardt
2017-08-03 20:03           ` Robert Eachus
2017-08-03 23:10             ` Luke A. Guest
2017-08-04 23:22             ` Randy Brukardt
2017-08-22  5:10               ` Robert Eachus
2017-08-01  4:41 ` Randy Brukardt
2017-08-02  3:44   ` Robert Eachus
2017-08-02 14:39     ` Lucretia
2017-08-03  0:57       ` Robert Eachus
2017-08-03  5:43         ` Randy Brukardt
2017-08-03  1:33       ` Wesley Pan
2017-08-03  4:30       ` Randy Brukardt
2017-08-03  4:16     ` Randy Brukardt
2017-08-03  5:05       ` Niklas Holsti
2017-08-04 23:11         ` Randy Brukardt
2017-08-05  7:01           ` Niklas Holsti
2017-08-03  8:03     ` Simon Wright
2017-08-04 23:16       ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox