comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: GNAT 4.8 atomic access to 64-bit objects
Date: Sat, 16 Nov 2013 13:02:11 +0100
Date: 2013-11-16T13:02:11+01:00	[thread overview]
Message-ID: <1k2fx07tfbs4o$.1xxik0ffw6ud6.dlg@40tude.net> (raw)
In-Reply-To: 52874401$0$9514$9b4e6d93@newsspool1.arcor-online.net

On Sat, 16 Nov 2013 11:08:00 +0100, Georg Bauhaus wrote:

> On 15.11.13 22:33, Dmitry A. Kazakov wrote:
> 
>> Try this:
>>
>> with Interfaces;
>> with Ada.Unchecked_Conversion;
>> with Ada.Text_IO;
>>
>> procedure Test is
>>     type T is mod 2**64;
>>     type Atomic_T is new Interfaces.IEEE_Float_64;
>>    ...
>> end Test;
>>
>> The code generated looks horrific.
> 
> Maybe according to
> http://stackoverflow.com/questions/15843159/are-32-bit-software-builds-typically-64-bit-optimized
> simply wanting  movq  is not "mode compatible"; however,
> if there are MMX registers in the CPU you are targetting,
> the following may be a valid way to get  movq  nevertheless,
> albeit using a 64 bit signedinteger.
> The program was translated in 32 bit GNU/Linux, using GNAT GPL 2012.
> It uses compiler intrinsics in ways adapted from GNAT.SSE.
> 
> with Ada.Text_IO;
> with GNAT.SSE;
> 
> procedure Atoms is
>     use GNAT.SSE;
> 
>     type m64 is array (0 .. 0) of Integer64;
>     for m64'Alignment use 8;
>     pragma Machine_Attribute (m64, "vector_type");
>     pragma Machine_Attribute (m64, "may_alias");
> 
>     function ia32_psllq (Left : m64; Right : m64) return m64;
>     pragma Import (Intrinsic, ia32_psllq, "__builtin_ia32_psllq");
> 
>     X : Integer64;
>     F : m64;
>     for X'Address use F'Address;
> begin
>     X := 123;
>     F := ia32_psllq (F, m64'(0 => 1));
>     Ada.Text_IO.Put_Line (Integer64'Image (X));  --  246
> end Atoms;

With the -mmmx switch, it indeed uses movq in order to load the register.

In the test example I wrote, atomic load becomes;

   movq
   psllq
   movq  to another location (through Unchecked_Conversion)

Atomic store is the reverse.

Surprisingly (at least to me), this is about ten times faster than using
the floating point trick. I.e.

   Load + Increment + Store

using psllq needs 16ns, using IEEE 64 it does 168ns, on i7-2700K 3.5GHz

It would be nice to get rid of psllq, which is a waste.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

  reply	other threads:[~2013-11-16 12:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-14 15:57 GNAT 4.8 atomic access to 64-bit objects Dmitry A. Kazakov
2013-11-14 20:34 ` Ludovic Brenta
2013-11-15  8:44   ` Dmitry A. Kazakov
2013-11-15 19:25     ` Georg Bauhaus
2013-11-15 21:33       ` Dmitry A. Kazakov
2013-11-16 10:08         ` Georg Bauhaus
2013-11-16 12:02           ` Dmitry A. Kazakov [this message]
2013-11-15 19:08 ` Stefan.Lucks
2013-11-15 21:19   ` Dmitry A. Kazakov
2013-11-22  0:30     ` Randy Brukardt
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox