From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.157.38.142 with SMTP id l14mr29187616otb.124.1494258969006;
        Mon, 08 May 2017 08:56:09 -0700 (PDT)
X-Received: by 10.157.17.23 with SMTP id g23mr1141882ote.4.1494258968970; Mon,
 08 May 2017 08:56:08 -0700 (PDT)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!c26no1479894itd.0!news-out.google.com!v18ni2059ita.0!nntp.google.com!c26no1479890itd.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Mon, 8 May 2017 08:56:08 -0700 (PDT)
In-Reply-To: <oep7mj$1g3f$1@gioia.aioe.org>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=2601:191:8303:2100:5985:2c17:9409:aa9c;
 posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3
NNTP-Posting-Host: 2601:191:8303:2100:5985:2c17:9409:aa9c
References: <e8d229c0-317e-4f5a-8d54-cf56c3aede7a@googlegroups.com>
 <0fc56bf7-1cfa-4776-9c47-a573db315c5f@googlegroups.com>
 <oep7mj$1g3f$1@gioia.aioe.org>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <e87ac3c3-8390-4d1d-979d-0bcc7e4d3dde@googlegroups.com>
Subject: Re: Portable memory barrier?
From: Robert Eachus <rieachus@comcast.net>
Injection-Date: Mon, 08 May 2017 15:56:08 +0000
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:46699
Date: 2017-05-08T08:56:08-07:00
List-Id: <comp.lang.ada>

One sentence summary: The CPU will play lots of tricks behind your back, bu=
t in such a way you can pretend your code works as you intended.  (Er, afte=
r you get the bugs out.)

On Monday, May 8, 2017 at 3:45:26 AM UTC-4, Dmitry A. Kazakov wrote:

> I don't believe either of the sequences can get reordered by the CPU,=20
> but it is only belief and common sense.

Silly rabbit, Trix are for kids.  Oops, silly programmer, common sense has =
nothing to do with modern CPUs.

If I didn't make it clear, modern CPUs analyze which pending instructions c=
an be done NOW, and they select some subset of those operations. This is ca=
lled OoO (out of order) scheduling.  In addition, the CPU has a register fi=
le with lots of extra registers. It uses register renaming to insure that i=
nstructions "see" the right arguments, even if there are four or five diffe=
rent copies of the index in the register file.  This allows the CPU to put =
only one copy in the write pipe, and run the code faster than a naive count=
ing of writes to memory suggests.  Finally, superscalar CPUs convert the in=
structions into micro-ops (name depends on CPU vendor) and then several of =
these micro-ops belonging to multiple instructions get executed all at once=
.

The CPU tries hard to eliminate reads and writes to main memory which are s=
low.  By putting the memory controller on the CPU, and having it look into =
the cache no matter the source of the request, even volatile memory can be =
virtualized.  (There are a few pages in memory which can't be virtualized, =
because they contain things like the real-time clock.  But that is a detail=
 known to the CPU.)

All of this is done with state not associated with any real clock, but with=
 the instructions as they execute.  If you think of a program counter as a =
pointer to the next instruction to be executed, and defining a state; moder=
n CPUs have multiple states in play at the same time--and out of order.

Since in all this the CPU arranges things so that all of the outputs occur =
in the expected order. If you halt the CPU, then flush the cache, THEN ther=
e will be one point in the programmer's view of the code that corresponds t=
o the current state--in other words, debuggers work.  (Assuming you look at=
 the code as generated, or turn off optimization in the compiler.)

As far as this issue is concerned, the CPU implements the ISA, so you get t=
he effects you expect from a program.  All of this is on an "as if" basis, =
and in fact the CPU is seldom in a reproducible state.

Is any of this important to a high-level language programmer?  Not really, =
unless they are working in hard real time. (Which I did.)  If you program i=
n assembler, or more likely work on a compiler back-end, knowing which inst=
ructions fit together is important.  Even more important is coding so that =
an instruction that references a new cache line in memory is followed by as=
 many instructions as possible that don't use that value.  Some times you r=
un into cases where you want to take advantage of write combining or some o=
ther trick involving the write pipe--but today, smart compiler optimization=
 is about accesses to main memory and not much else.