From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, FROM_WORDY autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,7767a311e01e1cd X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news2.google.com!news.germany.com!feeder.news-service.com!tudelft.nl!txtfeed1.tudelft.nl!feeder4.cambrium.nl!feed.tweaknews.nl!not-for-mail From: "Wiljan Derks" Newsgroups: comp.lang.ada References: <1161341264.471057.252750@h48g2000cwc.googlegroups.com> <9Qb_g.111857$aJ.65708@attbi_s21> <434o04-7g7.ln1@newserver.thecreems.com> <4539ce34$1_2@news.bluewin.ch> <453A532F.2070709@obry.net> <9kfq04-sgm.ln1@newserver.thecreems.com> <5vgs04-64f.ln1@newserver.thecreems.com> <453bc74e$0$19614$426a74cc@news.free.fr> <4jit04-0gq.ln1@newserver.thecreems.com> Subject: Re: GNAT compiler switches and optimization Date: Mon, 23 Oct 2006 21:52:59 +0200 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.2869 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2869 X-RFC2646: Format=Flowed; Response Message-ID: <453d1d36$0$25551$bf4948fe@news.tele2.nl> Organization: Tele2 X-Trace: DXC=Lad`?AY6`Y6aWje^YZ:@liE]:Mf1UBbQ>\>HV]ZZ:6?lG4KJ>N\ Xref: g2news2.google.com comp.lang.ada:7173 Date: 2006-10-23T21:52:59+02:00 List-Id: > Jeffrey R. Carter wrote: > I looked at the gcc FORTRAN matmul. Unless there some additional trickery > going on behind the scenes, it is not anything magical. It looks like a > matmul implemented in C with manual array subscripting logic (i.e. uses a > single dimensional array overlay).. When going for a faster algorithm this might be usefull information. Notice that this matmul uses an O(n^3) algorithm and it is possible to do this faster. I am not sure, but maybe O(n^2 log(n)) is reachable, which is much faster. It is easy to see that from the loops that the basic multiplications are done n times for the same pairs of numbers. In a similar way, the same additions are done multiple times. That means that there is real potential to improve things. For all small matrixes (let say 2 upto 4) one can write out all logic to multiply them and then optimize the code for those specific multiplications. This will give us fast multiplications for small matrixes. When having a larger matrix, is can be chopped into 4 smaller ones and these pieces can then be multiplied and added together. In total this can lead to a faster algorithm, although much more complex then this multipliction.