comp.lang.ada
 help / color / mirror / Atom feed
From: Bojan Bozovic <bozovic.bojan@gmail.com>
Subject: Re: GNAT can't vectorize Real_Matrix multiplication from Ada.Numerics.Real_Arrays. What a surprise!
Date: Mon, 19 Feb 2018 18:31:24 -0800 (PST)
Date: 2018-02-19T18:31:24-08:00	[thread overview]
Message-ID: <06efbe02-cdae-4fac-a17d-6d0c1be7848c@googlegroups.com> (raw)
In-Reply-To: <a4ccd7ab-59a7-486b-afd9-41737dbdb706@googlegroups.com>

On Monday, February 19, 2018 at 10:08:41 PM UTC+1, Robert Eachus wrote:
> On Sunday, February 18, 2018 at 4:48:42 PM UTC-5, Nasser M. Abbasi wrote:
> > On 2/18/2018 1:38 PM, Bojan Bozovic wrote:
> > 
> > If you are doing A*B by hand, then you are doing something
> > wrong. Almost all languages end up calling Blas
> > Fortran libraries for these operations. Your code and
> > the Ada code can't be faster.
> > 
> > http://www.netlib.org/blas/
> > 
> > Intel Math Kernel Library has all these.
> > 
> > https://en.wikipedia.org/wiki/Math_Kernel_Library
> 
> For multiplying two small matrices, blas is overkill and will be slower.  If you have say, 1000x1000 matrices, then you should be using blas.  But which BLAS?  Intel and AMD both have math libraries optimized for their CPUs.  However, I tend to use ATLAS.  ATLAS will build a blas targeted at your specific hardware.  This is not just about instruction set additions like SIMD2.  It will tailor the implementation to your number of cores and supported threads, cache sizes, and memory speeds.  I've also used the goto blas, but ATLAS even though not perfect, builds all of blas3 using matrix multiplication and blas2, such that all operations slower than O(n^2) have their speed determined by matrix multiplication.  (Then use multiple matrix multiplication codes with different parameters to find the fastest.)
> 
> Usually hardware vendor libraries catch up to and surpass ATLAS, but by then the hardware is obsolete. :-(   The other problem right now is that blas libraries are pretty dumb when it comes to multiprocessor systems.  I'm working on fixing that. ;-)

I have looked at ATLAS, however it can't spawn more threads than specified at compile time, so there's lots of possibility to optimize there, by spawning as many threads as supported at run-time. Ada would do much better here than C, because you could make portable code without resorting to ugly hacks of C, and using parallelism no matter whats the underlying processor architecture. That are my $0.02, worthless or not (and if you want to use assembler to "optimize" further in C, that can be done in any language, which I fear Intel MKL library and other vendor libraries do).


  reply	other threads:[~2018-02-20  2:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-17 12:55 GNAT can't vectorize Real_Matrix multiplication from Ada.Numerics.Real_Arrays. What a surprise! Bojan Bozovic
2018-02-17 15:17 ` Bojan Bozovic
2018-02-17 15:49   ` Bojan Bozovic
2018-02-18  1:51 ` Bojan Bozovic
2018-02-18 10:35   ` Jeffrey R. Carter
2018-02-18 12:05     ` Bojan Bozovic
2018-02-18 13:31       ` Jeffrey R. Carter
2018-02-18 19:38         ` Bojan Bozovic
2018-02-18 21:48           ` Nasser M. Abbasi
2018-02-18 22:50             ` Bojan Bozovic
2018-02-19 21:08             ` Robert Eachus
2018-02-20  2:31               ` Bojan Bozovic [this message]
2018-02-26  6:58                 ` Robert Eachus
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox