From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.13.255.194 with SMTP id p185mr8426138ywf.205.1501645441518;
        Tue, 01 Aug 2017 20:44:01 -0700 (PDT)
X-Received: by 10.36.80.193 with SMTP id m184mr172241itb.6.1501645441473; Tue,
 01 Aug 2017 20:44:01 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.glorb.com!w51no4611qtc.0!news-out.google.com!196ni1828itl.0!nntp.google.com!u14no392163ita.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Tue, 1 Aug 2017 20:44:01 -0700 (PDT)
In-Reply-To: <olp0qn$ua6$1@franka.jacob-sparre.dk>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=2601:191:8303:2100:ad21:fae8:74f1:4499;
 posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3
NNTP-Posting-Host: 2601:191:8303:2100:ad21:fae8:74f1:4499
References: <9e51f87c-3b54-4d09-b9ca-e3c6a6e8940a@googlegroups.com>
 <olp0qn$ua6$1@franka.jacob-sparre.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <f9e87130-3f08-43c4-8cc6-9e164a6da954@googlegroups.com>
Subject: Re: Real tasking problems with Ada.
From: Robert Eachus <rieachus@comcast.net>
Injection-Date: Wed, 02 Aug 2017 03:44:01 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:47553
Date: 2017-08-01T20:44:01-07:00
List-Id: <comp.lang.ada>

On Tuesday, August 1, 2017 at 12:42:01 AM UTC-4, Randy Brukardt wrote:
=20
> Use a discriminant of type CPU and (in Ada 2020) an=20
> iterated_component_association. (This was suggested way back in Ada 9x, l=
eft=20
> out in the infamous "scope reduction", and then forgotten about until 201=
3.=20
> See AI12-0061-1 or the draft RM=20
> http://www.ada-auth.org/standards/2xaarm/html/AA-4-3-3.html).

It is nice that this is in there, just in-time to be outmoded.  See my prev=
ious post, on most desktop and server processors assigning to alternate pro=
cessor IDs is necessary.

> Why do you say this? Ada doesn't require the task that calls a protected=
=20
> object to execute it (the execution can be handdled by the task that=20
> services the barriers - I don't know if any implementation actually does=
=20
> this, but the language rules are written to allow it).

Um, I say it because any other result is useless?  The use case is for the =
called task or protected object to be able to get the CPU number and do som=
ething with it.  The simple case right now would be a protected object whos=
e only purpose is to make sure task assignments are compatible with the arc=
hitecture.  Again with Zen, Epyx has multiprocessors with 64 cores, 128 thr=
eads that will show as CPUs.  There are favored pairings of threads so that=
 they share caches, or have shorter paths to parts of L3.  Intel says they =
will support up to eight sockets, with 28 CPU cores and 56 threads per sock=
et.  I don't believe a system with 448 CPU threads will be realistic, even =
224 will probably stress scheduling.  (Note that many of these "big" system=
s use VMware to create lots of virtual machines with four or eight threads.=
)

> Say what? You're using Get_Id, so clearly you're using something there.=
=20
> Get_Id (like the rest of dispatching domains is likely to be expensive, s=
o=20
> you don't want it dragged into all programs. (And CPU is effectively part=
 of=20
> all programs.)

Sigh! Get_Id as defined is heavy only because of the default initial value:

   function Get_CPU
      (T   : Ada.Task_Identification.Task_Id :=3D
                 Ada.Task_Identification.Current_Task)
           return CPU_Range;

but a parameterless Get_CPU could compile to a single load instruction.  I'=
m not asking for the current function to be eliminated, there are situation=
s where it is needed.  But it doesn't need all that baggage for the normal =
use.

>=20
> > 2) Allow a task to  its CPU assignment after it has started execution. =
 It
> > is no big deal if a task starts on a different CPU than the one it will=
=20
> > spend
> > the rest of its life on.  At a minimum Set_CPU(Current_CPU) or just
> > Set_CPU should cause the task to be anchored to its current CPU core.
> >  Note that again you can do this with Dispatching_Domains.

Left out two words above.  Should read "Allow a task to statically set its =
CPU assignment...
>=20
> So the capability already exists, but you don't like having to with an ex=
tra=20
> package to use it? Have you lost your freaking mind? You want us to add=
=20
> operations that ALREADY EXIST to another package, with all of the=20
> compatibility problems that doing so would cause (especially for people t=
hat=20
> had withed and used Dispatching_Domains)? When there are lots of problems=
=20
> that can't be solved portably at all?

No, I don't want compilers putting in extra code when it is not necessary. =
 If a task has a (static) CPU assignment then again, Get_CPU is essentially=
 free.  Is it cheaper than fishing some index out of main memory because it=
 got flushed there? Probably.  Yes, I can make a function My_CPU which does=
 this.  I'm just making too many of those workarounds right now.=20
>=20
> No, you're missing the point. Ada is about writing portable code. Nothing=
 at=20
> the level of "cache lines" is EVER going to be portable in any way. Eithe=
r=20
> one writes "safe" code and hopefully the compiler and runtime can take in=
to=20
> account the characteristics of the target. (Perhaps parallel loop constru=
cts=20
> will help with that.)
>=20

> Or otherwise, one writes bleeding edge code that is not portable and not=
=20
> safe. And you're on your own in such a case; no programming language coul=
d=20
> possibly help you.

I am trying to write portable code.  Portable enough to run on all modern A=
MD64 and EM64T (Intel) CPUs.  A table of processor IDs with the associated =
values for the numbers I need in the program is not going to happen.  Most =
of the parameters I need can be found in Ada now.  Cache line size is one t=
hat is not there.
>=20
> >A function Cache_Line_Size in System or System.Multiprocessors seems rig=
ht.
>=20
> No,  it doesn't. It assumes a particular memory organization, and one thi=
ng=20
> that's pretty clear is that whatever memory organization is common now wi=
ll=20
> not be common in a bunch of years. Besides, so many systems have multiple=
=20
> layers of caches, that a single result won't be enough. And there is no w=
ay=20
> for a general implementation to find this out (neither CPUs nor kernels=
=20
> describe such information).

Um. No.  Systems may have multiple levels of caches of different sizes and =
different numbers of "ways" per cache.  But the actual cache line size is a=
lmost locked in, and is the same for all caches in a system.  Most systems =
with DDR3 and DDR4 use 64 byte cache lines because it matches the memory bu=
rst length.  But other values are possible.  Right now HBM2 is pushing GPUs=
 (not CPUs) to 256 byte cache lines.  Will we eventually have Ada compilers=
 generating code for heterogeneous systems?  Possible.  What I am working o=
n is building the blocks that can be used with DirectCompute, OpenCL 2.0, a=
nd perhaps other GPU software interfaces.=20
>=20
> >Is adding these features to Ada worth the effort?
>=20
> No way. They're much too low level, and they actually aren't enough to al=
low=20
> parallelization. You want a language which allows fine-grained parallelis=
m=20
> from the start (like Parasail); trying to retrofit that on Ada (which is=
=20
> mainly sequential, only having coarse parallelism) just will make a mess.=
=20
> You might get a few problems solved (those using actual arrays, as oppose=
d=20
> to containers or user-defined types -- which one hopes are far more commo=
n=20
> in today's programs), but there is nothing general, nor anything that fit=
s=20
> into Ada's building block approach, at the level that you're discussing.
>=20
For now we can agree to disagree.  The difference is the size of the arrays=
 we have to deal with.  When arrays get to tens of millions of entries, and=
 operations on them can take tens of billions operations, I don't think I a=
m talking about fine grained parallelism.  The main characteristics of the =
operations I want to get working: matrix multiplication and inversion, line=
ar programming, FFT, FLIR and FLRadar, Navier-Stokes, all have the form of =
a set of huge data arrays constant once loaded, and can be parallelized acr=
oss large numbers of CPU cores, or GPUs.                          =20