From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.237.49.46 with SMTP id 43mr15524571qtg.31.1501640600094;
        Tue, 01 Aug 2017 19:23:20 -0700 (PDT)
X-Received: by 10.36.19.81 with SMTP id 78mr171570itz.2.1501640600054; Tue, 01
 Aug 2017 19:23:20 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.glorb.com!s6no1455673qtc.1!news-out.google.com!196ni1714itl.0!nntp.google.com!u14no8180ita.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Tue, 1 Aug 2017 19:23:19 -0700 (PDT)
In-Reply-To: <olp11l$ub6$1@franka.jacob-sparre.dk>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=2601:191:8303:2100:ad21:fae8:74f1:4499;
 posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3
NNTP-Posting-Host: 2601:191:8303:2100:ad21:fae8:74f1:4499
References: <9e51f87c-3b54-4d09-b9ca-e3c6a6e8940a@googlegroups.com>
 <49d02dda-8f1b-4005-a164-7af34e1993cc@googlegroups.com>
 <ad30cdd8-c444-481f-9353-c16d91542e06@googlegroups.com>
 <olp11l$ub6$1@franka.jacob-sparre.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <914ae4df-cc52-4e6e-b342-584bcac98e88@googlegroups.com>
Subject: Re: Real tasking problems with Ada.
From: Robert Eachus <rieachus@comcast.net>
Injection-Date: Wed, 02 Aug 2017 02:23:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:47552
Date: 2017-08-01T19:23:19-07:00
List-Id: <comp.lang.ada>

On Tuesday, August 1, 2017 at 12:45:43 AM UTC-4, Randy Brukardt wrote:

> Yes, really. Use a discriminant of type CPU, and use that in the aspect.=
=20
> That's an age-old technique, and indeed is the major reason that tasks ha=
ve=20
> discriminants. You then can allocate the tasks (which would be my=20
> suggestion), or you could create the entire set in an aggregate (assuming=
=20
> you have Ada 2020).

Sorry, you missed what all the shouting was about. ;-)  On the processor I =
am using (an AMD FX-6300 Vishera) running on all CPU cores causes contentio=
n for the floating-point units.  So for efficiency I have to run on one cor=
e from each pair of CPU cores.  Currently my program uses 2,4, and 6.  Crea=
ting an array indexed by CPU doesn't work.  If we had Algol style indexing-=
-but I am certainly not going to advocate that.  This is not a problem uniq=
ue to one family of CPUs.  I'm upgrading to an AMD Ryzen 7 which will have =
8 cores and 16 threads.  It is going IMNSHO, to require the same thing.  Sa=
me for Intel processors with Hyperthreading enabled.

As for cache line sizes affecting code, yes the garbage case was a bug in m=
y code--or in GNAT, or in expectations.  (GNAT 2017 has Standard'Maximum_Al=
ignment equal to 16. At least the version I am using does. I was trying to =
trick it into 64 byte (cache line) alignment by using computed Address clau=
ses.   On AMD processors cache lines are 64 bytes, but usually two lines (1=
28 bytes) are read if no other thread is waiting for a cache line.  Intel d=
oes it the other way around 256 byte cache lines, and the CPU will only fet=
ch 128 if there are other requests queued.)

Yes, I knew what I was doing was messy and dangerous--or at least required =
careful checking.  My point was that if Maximum_Alignment was large enough,=
 I wouldn't be going through the pain.  Was it worth it?  That is what this=
 is all about.  I have a program which spreads a matrix multiplication over=
 multiple processors--and compares the result with the single processor cas=
e.  Right now, unfortunately, every time I get the tasking version faster, =
the non-tasking version improves as well.  (I'm currently at about 700 Mill=
ion multiplications, 1.4 GigaFLOPS ignoring the integer indexing.)  Now if =
I could get up to 2 GigaFLOPS on the tasking version I'd be happy.  Of cour=
se, once I move to the Ryzen 7 I expect much better numbers, and better sti=
ll for video cards.