From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.129.49.78 with SMTP id x75mr1885959ywx.212.1501790614874;
        Thu, 03 Aug 2017 13:03:34 -0700 (PDT)
X-Received: by 10.36.104.205 with SMTP id v196mr21539itb.3.1501790614832; Thu,
 03 Aug 2017 13:03:34 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.glorb.com!w51no345290qtc.0!news-out.google.com!196ni1157itl.0!nntp.google.com!u14no262764ita.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Thu, 3 Aug 2017 13:03:34 -0700 (PDT)
In-Reply-To: <olu65h$pcv$1@franka.jacob-sparre.dk>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=2601:191:8303:2100:4056:51e4:3100:616;
 posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3
NNTP-Posting-Host: 2601:191:8303:2100:4056:51e4:3100:616
References: <9e51f87c-3b54-4d09-b9ca-e3c6a6e8940a@googlegroups.com>
 <49d02dda-8f1b-4005-a164-7af34e1993cc@googlegroups.com>
 <ad30cdd8-c444-481f-9353-c16d91542e06@googlegroups.com>
 <olp11l$ub6$1@franka.jacob-sparre.dk>
 <914ae4df-cc52-4e6e-b342-584bcac98e88@googlegroups.com>
 <olu65h$pcv$1@franka.jacob-sparre.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e8c5f9fe-ec72-47f8-8b3c-57eacab8cc08@googlegroups.com>
Subject: Re: Real tasking problems with Ada.
From: Robert Eachus <rieachus@comcast.net>
Injection-Date: Thu, 03 Aug 2017 20:03:34 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:47580
Date: 2017-08-03T13:03:34-07:00
List-Id: <comp.lang.ada>

On W> "Robert Eachus" <rieachus@comcast.net> wrote in message=20
=20
> The maximum alignment an implementation supports comes directly from the=
=20
> linker in use. Since most Ada implementations use system linkers that are=
=20
> completely out of their control, it's not really possible to support=20
> anything larger. (One can do it dynamically with a significant waste of=
=20
> memory, but that is not the sort of solution that is wanted on an embedde=
d=20
> system.)

Now I'm really confused.  I'll have to do some experimenting.  If I have tw=
o locations protected by use of read-modify-write (RMW) instructions and wr=
itten by tasks on different CPUs, but in the same cache line, caching autom=
agically provides safety.  The read part moves the line to the local CPU's =
L1 data cache.  But if they are on the same (physical) CPU but different lo=
gical CPUs due to Hyperthreading, multithreading, or the like, the hardware=
 logic needs to protect against another read or write by itself.  This prob=
ably means that the hardware when analyzing the RMW instruction insists tha=
t not only have all prior writes been written, but that no new instructions=
 from the Hyperthreading thread on the same CPU be executed until the cycle=
 is finished.  So the hardware logic prevents writes to the same cache line=
 by protecting against writes to all cache lines.

By the way, yes I am working on trying to generate fast code for supercompu=
ters.  I'd like to do it in Ada rather than assembler.  They use the same C=
PUs as desktop computers, or more often, servers, so I can do my experiment=
ing on my desktop.  The frustrating thing is that right now, the Ada code f=
or a single thread/task is faster than the assembler.  Very much reversed f=
or the multitasking case.

In matrix multiplication code (A*B=3DC) I have individual tasks computing s=
lices of C such that they are never 'worried' about such issues, so RMWs ar=
e not needed--as long as the slices are multiples of the cache line length.=
  C(X,C(2)'Last) is followed immediately by C(X+1,C(2)'First), so the only =
potential problem is if C'Length(1)*C'Length(2) is not a multiple of the ca=
che line length.  I deal with that by doing one special multiply in the mai=
n task, not the worker tasks.

I guess I can deal with this by using 256 and adding a note that it needs t=
o be fixed if cache line lengths are longer.