From: Bojan Bozovic <bozovic.bojan@gmail.com>
Subject: Re: GNAT can't vectorize Real_Matrix multiplication from Ada.Numerics.Real_Arrays. What a surprise!
Date: Sat, 17 Feb 2018 07:49:15 -0800 (PST)
Date: 2018-02-17T07:49:15-08:00 [thread overview]
Message-ID: <9727c621-19d5-469e-90a9-07fa2b6d545a@googlegroups.com> (raw)
In-Reply-To: <529428bd-147a-41fa-84a7-575080588349@googlegroups.com>
On Saturday, February 17, 2018 at 4:17:41 PM UTC+1, Bojan Bozovic wrote:
> On Saturday, February 17, 2018 at 1:55:49 PM UTC+1, Bojan Bozovic wrote:
> > -- code
> >
> > with Ada.Calendar;
> > with Ada.Text_IO;
> > with Ada.Numerics.Real_Arrays;
> > use Ada.Calendar;
> > use Ada.Text_IO;
> > use Ada.Numerics.Real_Arrays;
> >
> > procedure Matrix_Mul is
> >
> > package F_IO is new Ada.Text_IO.Fixed_IO (Day_Duration);
> > use F_IO;
> >
> > Start_Time, End_Time : Time;
> >
> > procedure Put (X : Real_Matrix) is
> > begin
> > for I in X'Range (1) loop
> > for J in X'Range (2) loop
> > Put (Float'Image (Float (X (I, J))));
> > end loop;
> > New_Line;
> > end loop;
> > end Put;
> >
> > Matrix_A, Matrix_B, Result : Real_Matrix (1 .. 4, 1 .. 4);
> > Elapsed_Time : Duration;
> > Sum : Float;
> > begin
> > Matrix_A :=
> > ((1.0, 1.0, 1.0, 1.0),
> > (2.0, 2.0, 2.0, 2.0),
> > (3.0, 3.0, 3.0, 3.0),
> > (4.0, 4.0, 4.0, 4.0));
> > Matrix_B :=
> > ((16.0, 15.0, 14.0, 13.0),
> > (12.0, 11.0, 10.0, 9.0),
> > (8.0, 7.0, 6.0, 5.0),
> > (4.0, 3.0, 2.0, 1.0));
> >
> > Start_Time := Clock;
> > for Iteration in 1 .. 10_000_000 loop
> > Result := Matrix_A * Matrix_B;
> > end loop;
> > End_Time := Clock;
> > Elapsed_Time := End_Time - Start_Time;
> > Put (Result);
> > New_Line;
> > Put ("Elapsed Time is ");
> > Put (Elapsed_Time);
> > New_Line;
> > Start_Time := Clock;
> > for Iteration in 1 .. 10_000_000 loop
> > for I in Matrix_A'Range (1) loop
> > for J in Matrix_A'Range (2) loop
> > Sum := 0.0;
> > for K in Matrix_A'Range (2) loop
> > Sum := Sum + Matrix_A (I, K) * Matrix_B (K, J);
> > end loop;
> > Result (I, J) := Sum;
> > end loop;
> > end loop;
> > end loop;
> > End_Time := Clock;
> > Elapsed_Time := End_Time - Start_Time;
> > Put (Result);
> > New_Line;
> > Put ("Elapsed time is ");
> > Put (Elapsed_Time);
> > New_Line;
> > end Matrix_Mul;
> > -- end code
> >
> > Results: FSF GNAT 7.2.0 x64
> > C:\Users\Bojan\Documents>gnatmake -O3 -fopt-info -march=skylake matrix_mul.adb -largs -s
> > gcc -c -O3 -fopt-info -march=skylake matrix_mul.adb
> > matrix_mul.adb:56:41: note: loop turned into non-loop; it never loops.
> > matrix_mul.adb:56:41: note: loop with 4 iterations completely unrolled
> > matrix_mul.adb:54:38: note: loop turned into non-loop; it never loops.
> > matrix_mul.adb:54:38: note: loop with 4 iterations completely unrolled
> > matrix_mul.adb:53:35: note: loop turned into non-loop; it never loops.
> > matrix_mul.adb:53:35: note: loop with 4 iterations completely unrolled
> > matrix_mul.adb:40:15: note: basic block vectorized
> > matrix_mul.adb:42:26: note: basic block vectorized
> > matrix_mul.adb:18:31: note: basic block vectorized
> > matrix_mul.adb:49:9: note: basic block vectorized
> > C:/MSYS64/MINGW64/lib/gcc/x86_64-w64-mingw32/7.2.0/adainclude/a-tifiio.adb:706:10: note: basic block vectorized
> > matrix_mul.adb:64:17: note: basic block vectorized
> > matrix_mul.adb:18:31: note: basic block vectorized
> > matrix_mul.adb:68:9: note: basic block vectorized
> > C:/MSYS64/MINGW64/lib/gcc/x86_64-w64-mingw32/7.2.0/adainclude/a-tifiio.adb:706:10: note: basic block vectorized
> > gnatbind -x matrix_mul.ali
> > gnatlink matrix_mul.ali -O3 -fopt-info -march=skylake -s
> >
> > C:\Users\Bojan\Documents>matrix_mul
> > 4.00000E+01 3.60000E+01 3.20000E+01 2.80000E+01
> > 8.00000E+01 7.20000E+01 6.40000E+01 5.60000E+01
> > 1.20000E+02 1.08000E+02 9.60000E+01 8.40000E+01
> > 1.60000E+02 1.44000E+02 1.28000E+02 1.12000E+02
> >
> > Elapsed Time is 1.338206667
> > 4.00000E+01 3.60000E+01 3.20000E+01 2.80000E+01
> > 8.00000E+01 7.20000E+01 6.40000E+01 5.60000E+01
> > 1.20000E+02 1.08000E+02 9.60000E+01 8.40000E+01
> > 1.60000E+02 1.44000E+02 1.28000E+02 1.12000E+02
> >
> > Elapsed time is 0.000000445
> >
> > C:\Users\Bojan\Documents>set path=C:\GNAT\2017\BIN;%path%
> >
> > Results GPL GNAT/2017 from AdaCore.
> >
> > C:\Users\Bojan\Documents>gnatmake -O3 -fopt-info -mavx2 matrix_mul.adb -largs -s
> > gcc -c -O3 -fopt-info -mavx2 matrix_mul.adb
> > matrix_mul.adb:56:41: note: loop turned into non-loop; it never loops.
> > matrix_mul.adb:56:41: note: loop with 4 iterations completely unrolled
> > matrix_mul.adb:54:38: note: loop turned into non-loop; it never loops.
> > matrix_mul.adb:54:38: note: loop with 4 iterations completely unrolled
> > matrix_mul.adb:53:35: note: loop turned into non-loop; it never loops.
> > matrix_mul.adb:53:35: note: loop with 4 iterations completely unrolled
> > matrix_mul.adb:40:15: note: basic block vectorized
> > matrix_mul.adb:68:9: note: basic block vectorized
> > gnatbind -x matrix_mul.ali
> > gnatlink matrix_mul.ali -O3 -fopt-info -mavx2 -s
> >
> > C:\Users\Bojan\Documents>matrix_mul
> > 4.00000E+01 3.60000E+01 3.20000E+01 2.80000E+01
> > 8.00000E+01 7.20000E+01 6.40000E+01 5.60000E+01
> > 1.20000E+02 1.08000E+02 9.60000E+01 8.40000E+01
> > 1.60000E+02 1.44000E+02 1.28000E+02 1.12000E+02
> >
> > Elapsed Time is 2.145337334
> > 4.00000E+01 3.60000E+01 3.20000E+01 2.80000E+01
> > 8.00000E+01 7.20000E+01 6.40000E+01 5.60000E+01
> > 1.20000E+02 1.08000E+02 9.60000E+01 8.40000E+01
> > 1.60000E+02 1.44000E+02 1.28000E+02 1.12000E+02
> >
> > Elapsed time is 0.000000444
> >
> > C:\Users\Bojan\Documents>gcc --version
> > gcc (GCC) 6.3.1 20170510 (for GNAT GPL 2017 20170515)
> > Copyright (C) 2016 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions.
> > See your AdaCore support agreement for details of warranty and support.
> > If you do not have a current support agreement, then there is absolutely
> > no warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> > PURPOSE.
> >
> > Should I submit this as bug to AdaCore? My computer is Intel Core i3-6100U (Skylake AVX2). Please try to reproduce.
>
> Compiler is smart enough to not iterate 10 million times over constant values, but there is still optimization problem.
I don't know how to use pragma Volatile to force iterations, even though with no 10 million iterations but with single multiplication results are (from AdaCore GNAT/2017 GPL)
C:\Users\Bojan\Documents>matrix_mul
4.00000E+01 3.60000E+01 3.20000E+01 2.80000E+01
8.00000E+01 7.20000E+01 6.40000E+01 5.60000E+01
1.20000E+02 1.08000E+02 9.60000E+01 8.40000E+01
1.60000E+02 1.44000E+02 1.28000E+02 1.12000E+02
Elapsed Time is 0.000009333
4.00000E+01 3.60000E+01 3.20000E+01 2.80000E+01
8.00000E+01 7.20000E+01 6.40000E+01 5.60000E+01
1.20000E+02 1.08000E+02 9.60000E+01 8.40000E+01
1.60000E+02 1.44000E+02 1.28000E+02 1.12000E+02
Elapsed time is 0.000000889
I'm submitting this as bug to AdaCore.
next prev parent reply other threads:[~2018-02-17 15:49 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-17 12:55 GNAT can't vectorize Real_Matrix multiplication from Ada.Numerics.Real_Arrays. What a surprise! Bojan Bozovic
2018-02-17 15:17 ` Bojan Bozovic
2018-02-17 15:49 ` Bojan Bozovic [this message]
2018-02-18 1:51 ` Bojan Bozovic
2018-02-18 10:35 ` Jeffrey R. Carter
2018-02-18 12:05 ` Bojan Bozovic
2018-02-18 13:31 ` Jeffrey R. Carter
2018-02-18 19:38 ` Bojan Bozovic
2018-02-18 21:48 ` Nasser M. Abbasi
2018-02-18 22:50 ` Bojan Bozovic
2018-02-19 21:08 ` Robert Eachus
2018-02-20 2:31 ` Bojan Bozovic
2018-02-26 6:58 ` Robert Eachus
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox