Re: Real tasking problems with Ada.

comp.lang.ada
 help / color / mirror / Atom feed

From: Robert Eachus <rieachus@comcast.net>
Subject: Re: Real tasking problems with Ada.
Date: Thu, 3 Aug 2017 13:03:34 -0700 (PDT)
Date: 2017-08-03T13:03:34-07:00	[thread overview]
Message-ID: <e8c5f9fe-ec72-47f8-8b3c-57eacab8cc08@googlegroups.com> (raw)
In-Reply-To: <olu65h$pcv$1@franka.jacob-sparre.dk>

On W> "Robert Eachus" <rieachus@comcast.net> wrote in message 

> The maximum alignment an implementation supports comes directly from the 
> linker in use. Since most Ada implementations use system linkers that are 
> completely out of their control, it's not really possible to support 
> anything larger. (One can do it dynamically with a significant waste of 
> memory, but that is not the sort of solution that is wanted on an embedded 
> system.)

Now I'm really confused.  I'll have to do some experimenting.  If I have two locations protected by use of read-modify-write (RMW) instructions and written by tasks on different CPUs, but in the same cache line, caching automagically provides safety.  The read part moves the line to the local CPU's L1 data cache.  But if they are on the same (physical) CPU but different logical CPUs due to Hyperthreading, multithreading, or the like, the hardware logic needs to protect against another read or write by itself.  This probably means that the hardware when analyzing the RMW instruction insists that not only have all prior writes been written, but that no new instructions from the Hyperthreading thread on the same CPU be executed until the cycle is finished.  So the hardware logic prevents writes to the same cache line by protecting against writes to all cache lines.

By the way, yes I am working on trying to generate fast code for supercomputers.  I'd like to do it in Ada rather than assembler.  They use the same CPUs as desktop computers, or more often, servers, so I can do my experimenting on my desktop.  The frustrating thing is that right now, the Ada code for a single thread/task is faster than the assembler.  Very much reversed for the multitasking case.

In matrix multiplication code (A*B=C) I have individual tasks computing slices of C such that they are never 'worried' about such issues, so RMWs are not needed--as long as the slices are multiples of the cache line length.  C(X,C(2)'Last) is followed immediately by C(X+1,C(2)'First), so the only potential problem is if C'Length(1)*C'Length(2) is not a multiple of the cache line length.  I deal with that by doing one special multiply in the main task, not the worker tasks.

I guess I can deal with this by using 256 and adding a note that it needs to be fixed if cache line lengths are longer.

next prev parent reply	other threads:[~2017-08-03 20:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-25 23:19 Real tasking problems with Ada Robert Eachus
2017-07-26 19:42 ` sbelmont700
2017-07-27  2:00   ` Robert Eachus
2017-08-01  4:45     ` Randy Brukardt
2017-08-02  2:23       ` Robert Eachus
2017-08-03  3:43         ` Randy Brukardt
2017-08-03 20:03           ` Robert Eachus [this message]
2017-08-03 23:10             ` Luke A. Guest
2017-08-04 23:22             ` Randy Brukardt
2017-08-22  5:10               ` Robert Eachus
2017-08-01  4:41 ` Randy Brukardt
2017-08-02  3:44   ` Robert Eachus
2017-08-02 14:39     ` Lucretia
2017-08-03  0:57       ` Robert Eachus
2017-08-03  5:43         ` Randy Brukardt
2017-08-03  1:33       ` Wesley Pan
2017-08-03  4:30       ` Randy Brukardt
2017-08-03  4:16     ` Randy Brukardt
2017-08-03  5:05       ` Niklas Holsti
2017-08-04 23:11         ` Randy Brukardt
2017-08-05  7:01           ` Niklas Holsti
2017-08-03  8:03     ` Simon Wright
2017-08-04 23:16       ` Randy Brukardt

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox