From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00,DATE_IN_PAST_24_48,
	FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: a07f3367d7,56525db28240414a
X-Google-Attributes: gida07f3367d7,public,usenet
X-Google-NewGroupId: yes
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.43.12.198 with SMTP id pj6mr7405423icb.8.1343097922355;
        Mon, 23 Jul 2012 19:45:22 -0700 (PDT)
Path: 
 p10ni44082583pbh.1!nntp.google.com!border1.nntp.dca.giganews.com!border1.nntp.ams.giganews.com!border4.nntp.ams.giganews.com!border2.nntp.ams.giganews.com!nntp.giganews.com!news.nobody.at!news.glorb.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From: Keean Schupke <keean.schupke@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: Efficient Sequential Access to Arrays
Date: Sun, 22 Jul 2012 11:21:06 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <01174147-b078-40bb-a52b-8861bee52816@googlegroups.com>
References: <01983f1c-f842-4b1f-a180-bcef531dad4c@googlegroups.com>
 <87ipdf4vh6.fsf@mid.deneb.enyo.de> <a72rt6Fq5sU1@mid.individual.net>
NNTP-Posting-Host: 82.44.19.199
Mime-Version: 1.0
X-Trace: posting.google.com 1342981267 22248 127.0.0.1 (22 Jul 2012 18:21:07
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Sun, 22 Jul 2012 18:21:07 +0000 (UTC)
In-Reply-To: <a72rt6Fq5sU1@mid.individual.net>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199;
 posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM
User-Agent: G2/1.0
X-Received-Bytes: 5003
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Date: 2012-07-22T11:21:06-07:00
List-Id: <comp.lang.ada>

On Sunday, 22 July 2012 18:34:26 UTC+1, Niklas Holsti  wrote:
> On 12-07-22 19:21 , Florian Weimer wrote:
> &gt; * Keean Schupke:
> &gt;
> &gt;&gt; So assuming we need this level of performance, what would be the
> &gt;&gt; best (and most idiomatic Ada) way to package this type of usage
> &gt;&gt; pattern as an abstract datatype?
> &gt;
> &gt; You should look at compiler dumps to figure out why the strength
> &gt; reduction optimization does not kick in.
> &gt;
>=20
> As I understand it, the program is accessing A(I,J), where A is a 2D=20
> array, and also accessing A(I+1,J), A(I,J+1), and the other neighbours=20
> of A(I,J), eight in all.
>=20
> I,J are indices that are *not* traversed in sequential order, but in=20
> some scattered order.
>=20
> Computing the address of A(I,J) involves multiplications of J and I with=
=20
> the size of an array element and the size of an array row, respectively.=
=20
> The problem seems to be that the results of these multiplications are=20
> not reused in computing the address of A(I+1,J) etc.
>=20
> If the index constraints (or at least the row length) of A are static=20
> (compile-time) constants, and the address of A(I,J) has been computed,=20
> and c1 and c2 are some other compile-time constants (-1, 0, +1 in this=20
> program), then the address of A(I+c1,J+c2) can be computed by adding a=20
> static offset to the address of A(I,J). This is what happens when A(I,J)=
=20
> and the neighbours are accessed through a C pointer, or when Keean uses=
=20
> address arithmetic explicitly in the Ada program. The question is why=20
> the Ada compiler is not doing this optimization.
>=20
> As I understand it, &quot;strength reduction&quot; is a loop optimization=
 where an=20
> expression of the form C*i, with &quot;i&quot; the loop index, is replace=
d by a=20
> new variable that is incremented by C on each loop iteration. But in=20
> Keean&#39;s program the indices I,J are not loop indices, nor linearly=20
> related to loop indices.
>=20
> The optimization that would be required here is to replace (X+C)*F by=20
> X*F+C*F, when C and F are static constants and X*F has already been=20
> computed and is available. This would replace the multiplication by an=20
> addition of a static constant. I&#39;ve never seen this special kind of=
=20
> optimization proposed or described.
>=20
> (We are of course (I believe) assuming that index range checks are=20
> turned off.)
>=20
> --=20
> Niklas Holsti
> Tidorum Ltd
> niklas holsti tidorum fi
>        .      @       .

That's some of it, the other part is that sometimes the 'scattered order' i=
s from following a linked list between elements. So A1(I, J) is computed fr=
om a PRNG, and we access A1(I, J) and may access all or none of A1(I, J-1),=
 A1(I-1, J), A1(I+1, J), A1(I, J+1) and all or none of A1(I-1, J-1), A1(I+1=
, J-1), A1(I-1, J+1), A1(I+1, J+1). We may also access the 'canonical' elem=
ent of the subset A* which in the fast code is a link and the neighbours ad=
dresses calculated from the link address, and all the elements in the subse=
t (A2, A3, .. AN) and each of their horizontal and vertical neighbours.

So with the relative addressing code, stepping through the subset is derefe=
rence link, add constant[+/- row_size * element_size] and constant[+/- elem=
ent_size] to access the neighbours, then dereference next link etc. The siz=
e of the array are generic parameters, so I presume the constants get pre-c=
alculated at elaboration time.

With the indexed version, it is storing the index of the canonical element =
of the subset, and the index of the next element in the subset instead of t=
he address.

Does that help determine if it is going to be possible to get better perfor=
mance from the array indexed version?


Cheers,
Keean.