From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	FROM_WORDY autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,7767a311e01e1cd
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news2.google.com!news2.google.com!news.germany.com!feeder.news-service.com!tudelft.nl!txtfeed1.tudelft.nl!feeder4.cambrium.nl!feed.tweaknews.nl!not-for-mail
From: "Wiljan Derks" <Wiljan.Derks@zonnet.nl>
Newsgroups: comp.lang.ada
References: <1161341264.471057.252750@h48g2000cwc.googlegroups.com>
 <9Qb_g.111857$aJ.65708@attbi_s21> <434o04-7g7.ln1@newserver.thecreems.com>
 <4539ce34$1_2@news.bluewin.ch> <nrup04-5hj.ln1@newserver.thecreems.com>
 <453A532F.2070709@obry.net> <9kfq04-sgm.ln1@newserver.thecreems.com>
 <sj3r04-rlv.ln1@newserver.thecreems.com>
 <5vgs04-64f.ln1@newserver.thecreems.com>
 <453bc74e$0$19614$426a74cc@news.free.fr>
 <4jit04-0gq.ln1@newserver.thecreems.com>
 <jcot04-l2u.ln1@newserver.thecreems.com> <Ld__g.206207$FQ1.45478@attbi_s71>
 <c6nu04-m31.ln1@newserver.thecreems.com>
Subject: Re: GNAT compiler switches and optimization 
Date: Mon, 23 Oct 2006 21:52:59 +0200
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.2869
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
X-RFC2646: Format=Flowed; Response
Message-ID: <453d1d36$0$25551$bf4948fe@news.tele2.nl>
Organization: Tele2
X-Trace: 
 DXC=<mNFeD[fekQK6N>Lad`?AY6`Y6aWje^YZ:@liE]:Mf1UBbQ>\>HV]ZZ:6?lG4KJ>N\
Xref: g2news2.google.com comp.lang.ada:7173
Date: 2006-10-23T21:52:59+02:00
List-Id: <comp.lang.ada>

> Jeffrey R. Carter wrote:
> I looked at the gcc FORTRAN matmul. Unless there some additional trickery 
> going on behind the scenes, it is not anything magical. It looks like a 
> matmul implemented in C with manual array subscripting logic (i.e. uses a 
> single dimensional array overlay)..
When going for a faster algorithm this might be usefull information.
Notice that this matmul uses an O(n^3) algorithm and it is possible to do 
this faster.
I am not sure, but maybe O(n^2 log(n)) is reachable, which is much faster.
It is easy to see that from the loops that the basic multiplications are 
done n times for the same pairs of numbers.
In a similar way, the same additions are done multiple times.
That means that there is real potential to improve things.
For all small matrixes (let say 2 upto 4) one can write out all logic to 
multiply them and then optimize the code for those
specific multiplications. This will give us fast multiplications for small 
matrixes.
When having a larger matrix, is can be chopped into 4 smaller ones and these 
pieces can then be multiplied and added together.
In total this can lead to a faster algorithm, although much more complex 
then this multipliction.