comp.lang.ada
 help / color / mirror / Atom feed
From: Robert Eachus <rieachus@comcast.net>
Subject: Re: Portable memory barrier?
Date: Mon, 8 May 2017 21:02:44 -0700 (PDT)
Date: 2017-05-08T21:02:44-07:00	[thread overview]
Message-ID: <c005f57d-d731-4615-8228-bb7f9792fb27@googlegroups.com> (raw)
In-Reply-To: <8a968aae-79e4-421b-ba4b-e0a9a33ce0db@googlegroups.com>

On Monday, May 8, 2017 at 8:30:02 PM UTC-4, Jere wrote:

First, oops!  I missed that the program computes Write_Index + 1 twice. The CPU won't, but th code sequence should be:

r1 := Write_Index; 
... 
r1 := r1 + 1;
r2 := r1 x Element'Length; 
r3 := r2 + FIFO'First; 
r4 := Element; 
(r3) := r4; 
(Write_Index'Address) := r1; 

(Parentheses write to the address.)
 
> What prevents the last 3 lines from being reordered like this:
> r1 := r1 + 1;
> (Write_Index'Address) := r1;
> (r3) := r4;

The rule against reordering of writes.  I think your question still stands, but that is the answer, moves to registers are reads, moves to memory are writes.  Reads can be moved before writes to other locations, but order within writes has to be maintained.

> I may have misunderstood your post, but isn't it free to make this adjustment 
> to the instructions?  There is no dependence on r3 or r4 for the 
> Write_Index'Address part or the r1 increment.  I ask because this very issue 
> has (allegedly) been seen implementing lock free designs in other languages 
> like C/C++.
> 
> If it is, then the risk is that after
> (Write_Index'Address) := r1;
> 
> a task context switch or interrupt occurs and now the index is incremented
> before the data was added (and now pointing to junk).

No. Or at least it should not happen.  A single register set exists when the interrupt is allow to occur.  This may have the (correct) value of Write_Index 
in memory (or not) or in r1 (or not).  If r1 has the old or new value, not a problem.  If the write to FIFO occurred, the write to Write_Index must have happened first (on an x86 or AMD64 CPU at least).  In other words, the write pipe will get emptied before the cache gets flushed.  Note that the cache flush does not occur until after the interrupt.*  On other (usually RISC) architectures, the interrupt service routine may have to execute a "fence" instruction to force the CPU to a consistent state. (x86 does have three fence instructions: I found an interesting writeup  that shows that they are very occasionally needed: https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/) Fences are more often needed on RISC CPUs, since the ordering rules are more relaxed.  That may be where the problem you noted occurred.  Especially in embedded controllers, there can be interrupt service routines not supplied by the OS.  Especially if the other half of this stream is implemented in the interrupt, the fence instruction is necessary.

In general do not use fences in Ada code, use protected operations.  At a minimum the lock needed for a cleverly written protected object can be replaced by a read-modify-write (RMW) instruction, and the CPU should execute it in under a nanosecond.  Code cannot be moved past RMW operations, and they are much faster than fences.

Why isn't a fence needed here?  When it comes to shared variables, one thread only writes.  But the example is flawed, in that we do have an access that needs to be protected.  To make the buffer a circular buffer you need to add a test and adjustment.  For simplicity? change:

 FIFO (Write_Index + 1) := Element; 
 Write_Index := Write_Index + 1;

>  to:

Temp := Write_Index;
if Temp = Max 
then Temp := 1;
else Temp := Temp + 1;
end if;
FIFO (Temp) := Element; 
Write_Index := Temp;

Making Temp explicit is a belt and suspenders operation.  The CPU will generate the same set of micro-ops, or at least the same set as once we make the buffer circular.  But now it should be clear that the two writes cannot be reordered. Note that the writes to Temp can be replaced by writes to R1 and are reads for the x86 rules.




...and the same for Read_Index. 

Simpler?  Huh?  What are we doing here?
The same thing we did in the pseudocode, but 
> I'm probably gonna have to reread it a few times.

I could probably teach a graduate course on this, and not include anything not touched on here.  And, yes I did teach a graduate course in compiler construction several times back before this revolution in how CPUs are put together.  If I was teaching that course now, I would either limit it to the front end and a bit of optimization, or stretch it into two semesters.  I won't say that this stuff makes quantum mechanics look easy, but they are at the same level of complexity.  (In fact you may have seen that Elon Musk among others believe that there is a good chance we are living in a simulation. The similarities are eerie, and the as ifs pop up in the same places.)

*  Especially if you are running on an AMD CPU in a virtual processor.  As long as the CPU stays in the hypervisor state, the cache doesn't need to be flushed.  That's a detail and a huge bite of extra complexity here.  It allows the hypervisor to service a page fault without flushing the cache.  Of course, if there is a context switch, even to the OS on the virtual machine, the cache flush is needed. Usually if the page is not present in memory the cache flush (and a context switch) happens, then the hypervisor turns control over to the "real" OS.  (Disks even SSDs are slow.) Otherwise the hypervisor patches up the page tables and lets the (faulting) instruction be retried never touching the OS.


  reply	other threads:[~2017-05-09  4:02 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-06  2:23 Portable memory barrier? Jere
2017-05-06  8:47 ` Jeffrey R. Carter
2017-05-06 14:17   ` Jere
2017-05-06 19:08     ` Dennis Lee Bieber
2017-05-06 19:26     ` Jeffrey R. Carter
2017-05-06 19:41     ` Jeffrey R. Carter
2017-05-06 20:42       ` Niklas Holsti
2017-05-09 19:49     ` Randy Brukardt
2017-05-09 22:07       ` Jere
2017-05-11  1:14         ` Randy Brukardt
2017-05-10 18:28       ` Shark8
2017-05-11  1:17         ` Randy Brukardt
2017-05-11 16:23           ` Jeffrey R. Carter
2017-05-07 20:18 ` Robert Eachus
2017-05-08  7:45   ` Dmitry A. Kazakov
2017-05-08 15:56     ` Robert Eachus
2017-05-08 16:22       ` Dmitry A. Kazakov
2017-05-08 18:39         ` Robert Eachus
2017-05-08 19:18         ` Robert Eachus
2017-05-08 21:09           ` Dmitry A. Kazakov
2017-05-08 23:24             ` Robert Eachus
2017-05-09  0:30               ` Jere
2017-05-09  4:02                 ` Robert Eachus [this message]
2017-05-09  4:32                 ` Robert Eachus
2017-05-09  4:44                   ` Robert Eachus
2017-05-09 22:26                   ` Jere
2017-05-09 20:01                 ` Randy Brukardt
2017-05-09 19:57               ` Randy Brukardt
2017-05-10  0:51                 ` Jere
2017-05-10  5:25                   ` J-P. Rosen
2017-05-10 22:56                     ` Jere
2017-05-11  7:36                       ` Dmitry A. Kazakov
2017-05-13 20:25                         ` Jere
2017-05-10  7:13                   ` Dmitry A. Kazakov
2017-05-10 16:45                     ` Robert Eachus
2017-05-10 17:28                       ` Simon Wright
2017-05-10 23:21                     ` Jere
2017-05-11  0:47                       ` Randy Brukardt
2017-05-13 20:11                         ` Jere
2017-05-15 22:33                           ` Randy Brukardt
2017-05-10 23:30                     ` Jere
2017-05-11  0:38                     ` Randy Brukardt
2017-05-10 16:38                   ` Jeffrey R. Carter
2017-05-10 23:40                     ` Jere
2017-05-10 16:19                 ` Robert Eachus
2017-05-11  1:02                   ` Randy Brukardt
2017-05-11  1:51                     ` Robert Eachus
2017-05-15 22:45                       ` Randy Brukardt
2017-05-08 20:29         ` Niklas Holsti
2017-05-08 21:09           ` Dmitry A. Kazakov
2017-05-09  4:34             ` Niklas Holsti
2017-05-09  6:16               ` Niklas Holsti
2017-05-09  8:34                 ` Dmitry A. Kazakov
2017-05-09 20:18                 ` Randy Brukardt
2017-05-09 20:10           ` Randy Brukardt
2017-05-09  0:05         ` Jere
2017-05-09  8:26           ` Dmitry A. Kazakov
2017-05-09 19:53         ` Randy Brukardt
2017-05-09 20:27           ` Dmitry A. Kazakov
2017-05-11  0:35             ` Randy Brukardt
2017-05-11  8:24               ` Dmitry A. Kazakov
2017-05-15 22:53                 ` Randy Brukardt
2017-05-18 17:44                   ` Dmitry A. Kazakov
2017-05-18 21:01                     ` Randy Brukardt
2017-05-19  7:54                       ` Dmitry A. Kazakov
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox