From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00,DATE_IN_PAST_24_48, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,56525db28240414a X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.43.12.198 with SMTP id pj6mr7405423icb.8.1343097922355; Mon, 23 Jul 2012 19:45:22 -0700 (PDT) Path: p10ni44082583pbh.1!nntp.google.com!border1.nntp.dca.giganews.com!border1.nntp.ams.giganews.com!border4.nntp.ams.giganews.com!border2.nntp.ams.giganews.com!nntp.giganews.com!news.nobody.at!news.glorb.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail From: Keean Schupke Newsgroups: comp.lang.ada Subject: Re: Efficient Sequential Access to Arrays Date: Sun, 22 Jul 2012 11:21:06 -0700 (PDT) Organization: http://groups.google.com Message-ID: <01174147-b078-40bb-a52b-8861bee52816@googlegroups.com> References: <01983f1c-f842-4b1f-a180-bcef531dad4c@googlegroups.com> <87ipdf4vh6.fsf@mid.deneb.enyo.de> NNTP-Posting-Host: 82.44.19.199 Mime-Version: 1.0 X-Trace: posting.google.com 1342981267 22248 127.0.0.1 (22 Jul 2012 18:21:07 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Sun, 22 Jul 2012 18:21:07 +0000 (UTC) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199; posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM User-Agent: G2/1.0 X-Received-Bytes: 5003 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Date: 2012-07-22T11:21:06-07:00 List-Id: On Sunday, 22 July 2012 18:34:26 UTC+1, Niklas Holsti wrote: > On 12-07-22 19:21 , Florian Weimer wrote: > > * Keean Schupke: > > > >> So assuming we need this level of performance, what would be the > >> best (and most idiomatic Ada) way to package this type of usage > >> pattern as an abstract datatype? > > > > You should look at compiler dumps to figure out why the strength > > reduction optimization does not kick in. > > >=20 > As I understand it, the program is accessing A(I,J), where A is a 2D=20 > array, and also accessing A(I+1,J), A(I,J+1), and the other neighbours=20 > of A(I,J), eight in all. >=20 > I,J are indices that are *not* traversed in sequential order, but in=20 > some scattered order. >=20 > Computing the address of A(I,J) involves multiplications of J and I with= =20 > the size of an array element and the size of an array row, respectively.= =20 > The problem seems to be that the results of these multiplications are=20 > not reused in computing the address of A(I+1,J) etc. >=20 > If the index constraints (or at least the row length) of A are static=20 > (compile-time) constants, and the address of A(I,J) has been computed,=20 > and c1 and c2 are some other compile-time constants (-1, 0, +1 in this=20 > program), then the address of A(I+c1,J+c2) can be computed by adding a=20 > static offset to the address of A(I,J). This is what happens when A(I,J)= =20 > and the neighbours are accessed through a C pointer, or when Keean uses= =20 > address arithmetic explicitly in the Ada program. The question is why=20 > the Ada compiler is not doing this optimization. >=20 > As I understand it, "strength reduction" is a loop optimization= where an=20 > expression of the form C*i, with "i" the loop index, is replace= d by a=20 > new variable that is incremented by C on each loop iteration. But in=20 > Keean's program the indices I,J are not loop indices, nor linearly=20 > related to loop indices. >=20 > The optimization that would be required here is to replace (X+C)*F by=20 > X*F+C*F, when C and F are static constants and X*F has already been=20 > computed and is available. This would replace the multiplication by an=20 > addition of a static constant. I've never seen this special kind of= =20 > optimization proposed or described. >=20 > (We are of course (I believe) assuming that index range checks are=20 > turned off.) >=20 > --=20 > Niklas Holsti > Tidorum Ltd > niklas holsti tidorum fi > . @ . That's some of it, the other part is that sometimes the 'scattered order' i= s from following a linked list between elements. So A1(I, J) is computed fr= om a PRNG, and we access A1(I, J) and may access all or none of A1(I, J-1),= A1(I-1, J), A1(I+1, J), A1(I, J+1) and all or none of A1(I-1, J-1), A1(I+1= , J-1), A1(I-1, J+1), A1(I+1, J+1). We may also access the 'canonical' elem= ent of the subset A* which in the fast code is a link and the neighbours ad= dresses calculated from the link address, and all the elements in the subse= t (A2, A3, .. AN) and each of their horizontal and vertical neighbours. So with the relative addressing code, stepping through the subset is derefe= rence link, add constant[+/- row_size * element_size] and constant[+/- elem= ent_size] to access the neighbours, then dereference next link etc. The siz= e of the array are generic parameters, so I presume the constants get pre-c= alculated at elaboration time. With the indexed version, it is storing the index of the canonical element = of the subset, and the index of the next element in the subset instead of t= he address. Does that help determine if it is going to be possible to get better perfor= mance from the array indexed version? Cheers, Keean.