From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: a07f3367d7,56525db28240414a
X-Google-Attributes: gida07f3367d7,public,usenet
X-Google-NewGroupId: yes
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.66.77.39 with SMTP id p7mr821302paw.0.1343037430507;
        Mon, 23 Jul 2012 02:57:10 -0700 (PDT)
Path: 
 b9ni36561452pbl.0!nntp.google.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!ctu-peer!ctu-gate!news.nctu.edu.tw!usenet.stanford.edu!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From: Keean Schupke <keean.schupke@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: Efficient Sequential Access to Arrays
Date: Mon, 16 Jul 2012 09:39:45 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <4c86d54d-dfeb-44f1-9638-3c416ea51d74@googlegroups.com>
References: <01983f1c-f842-4b1f-a180-bcef531dad4c@googlegroups.com>
 <kpltof5a8xvm.arygetd4esmc.dlg@40tude.net>
 <44f2bfd0-75ed-4b9c-982e-61d5308f9c01@googlegroups.com>
 <1s8rnmtyqnrr$.1o78f1qaxeulh$.dlg@40tude.net>
 <e43d83de-d992-475b-bcfc-fafa13c4ea40@googlegroups.com>
 <1lgvwexsx8ovd.xdbvohz8eowp.dlg@40tude.net>
NNTP-Posting-Host: 82.44.19.199
Mime-Version: 1.0
X-Trace: posting.google.com 1342456789 17823 127.0.0.1 (16 Jul 2012 16:39:49
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 16 Jul 2012 16:39:49 +0000 (UTC)
Cc: mailbox@dmitry-kazakov.de
In-Reply-To: <1lgvwexsx8ovd.xdbvohz8eowp.dlg@40tude.net>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199;
 posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM
User-Agent: G2/1.0
X-Received-Bytes: 8356
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Date: 2012-07-16T09:39:45-07:00
List-Id: <comp.lang.ada>

On Monday, 16 July 2012 15:57:04 UTC+1, Dmitry A. Kazakov  wrote:
> On Mon, 16 Jul 2012 03:16:09 -0700 (PDT), Keean Schupke wrote:
>=20
> &gt; On Monday, 16 July 2012 08:47:01 UTC+1, Dmitry A. Kazakov  wrote:
> &gt;&gt; On Sun, 15 Jul 2012 13:03:12 -0700 (PDT), Keean Schupke wrote:
> &gt;&gt;=20
> &gt;&gt; Absolute figures tell little. How big is the relative difference=
?
> &gt;=20
> &gt; About 25% (30k with indexing, 40k with relative addressing)
>=20
> Which itself looks like a problem. It takes no less than 25% of overall
> performance for mere shuffling data in a task unrelated to stuff like
> recoding etc. This looks wrong to me. I would look for a better
> algorithm/structure.

I find this comment puzzling and does not match my experience. What sort of=
 algorithms do you work with. What algorithm for disjoint-set do you sugges=
t I use? Have a read of this:

http://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

Can you come up with a faster algorithm? Until you have read it I don't thi=
nk you can comment on the quality of algorithm I am using.

>=20
> &gt;&gt; Loop optimization is a tricky issue. I suspect that tiny variati=
ons of code
> &gt;&gt; may turn it on or off. You should read GCC documentation on that=
 subject to
> &gt;&gt; see what is possible on which machines.
> &gt;=20
> &gt; It appears GCC is unable to optimise the indexing away when the inde=
x is
> &gt; dynamic (relies on IO or PRNG for example).
>=20
> You should read its documentation in order to be able to say. Consequent
> indexing could be optimized as addition to the register-stored base.


It may be able to optimize the access in loops, but this is for more comple=
x.

The node in the array contains pointers to other nodes. When we follow the =
pointers we want to be able to access the neighbours consider:

type Node_Type is record=20
    Next : access all Node_Type;
    Other : access all Node_Type;
    Data : Data_Type;
end record;

type A is array (Index_Type) of aliased Node_Type;

X : A;

So we can access the next record like this:

D :=3D X(Y).Next.Data;

but how do we access the neighbouring nodes? We cannot do:

D :=3D (X(Y).Next + 1).Data;


However if we put Indexes in the Node:

type Node_Type is record=20
    Next : Index_Type;
    Other : Index_Type;
    Data : Data_Type;
end record;


Then the access is slow:

D :=3D X(X(Y).Next).Data;

I don't think there is anything the compiler can do to optimise this kind o=
f access.

However we can now access the neighbour:

D :=3D X(X(Y).Next + 1).Data;


Basically what we are talking about is an hybrid array / list datatype. To =
me this really requires memory address manipulation (and I really don't car=
e about running on Lisp machines, I want good performance on real hardware =
like Intel/AMD/ARM cpus). But we can hide all that inside a nice safe packa=
ge which does some checking when in debug mode.


>=20
> &gt;&gt; How much operations do you have per array element? If you do bas=
ically only
> &gt;&gt; indexing then there must be something wrong with the algorithm. =
Otherwise,
> &gt;&gt; say one multiplication per 100 instructions, is nothing, especia=
lly when
> &gt;&gt; memory access (a very slow thing) is involved.
> &gt;=20
> &gt; The algorithm is highly optimised, and has been developed over sever=
al
> &gt; years. The choice of data structures is pretty good,
>=20
> It does not look good when indexing takes more than 25%.

Looks can be deceiving.

>=20
> &gt; Highly optimised code can still have good structure, using inlining =
and
> &gt; generics code can be elegant and fast.
>=20
> Code using generics/templates is never elegant nor fast, but that is an
> off-topic.
>=20
> &gt; There is nothing wrong in design terms with a forward-iterator, in f=
act it
> &gt; is better design from a generic point of view to expose the minimal =
set of
> &gt; methods to efficiently implement the required algorithm.
>=20
> This is only necessary when the class of the container type is wider than
> just arrays, e.g. includes other structures like lists.

This ignores the fact that the interface for an ADT should be independent o=
f its implementation. Think of ArrayList and LinkedList in Java. The code s=
hould use the List<> interface if it only needs to access elements sequenti=
ally (it is the algorithm that determines the requirements of the interface=
 not the data-type). So we code our algorithm that only needs sequential ac=
cess using the List interface, and we are then free to choose either ArrayL=
ist or LinkedList depending on which performs best for our application.

>=20
> If you have performance problems with arrays, which are built-in and
> subject of tailored optimization, you cannot expect situation improved by
> puzzling the compiler with more abstract types. That could only disable
> optimization.

Maybe you are thinking of something more complex than I am. An ADT is just =
a private type with associated functions in a package. It is the fact that =
the type itself is private and that all operations on it must be done throu=
gh the functions provided in the same package that defines an ADT. The priv=
ate type does not have to be a record, it could be an access type. Because =
all the functions get inlined there is no cost to creating an ADT like this=
.

Also if the Iterator can be in the same package as the container.

>=20
> &gt; To see elegant and efficient code constructed in this way see Stepan=
ov&#39;s
> &gt; &quot;Elements of Programming&quot;.
>=20
> I am not a STL fan.=20

Element's is different from the STL. It includes native 'concepts' that def=
ine axiomatic constraints on software modules. Its a hard book to really un=
derstand, but well worth a read I think. It changed my mind about imperativ=
e languages. I have spent a lot of time programming Haskell, and got deeply=
 into System-F, type theory etc. The key understanding for me was when I re=
alised 'concepts' in the generic sense are the same thing as type-classes f=
rom Haskell, and implementations could be structured the same way as type-c=
lasses enable the structuring of functional programs. However Haskell is sl=
ow, and where it really lacks is in direct memory manipulation.

The conclusion I came to is that a language that allows the exact control o=
f data memory-layout (like C/C++) but allows the use of generic concepts (B=
oost Concept checking library for C++ for now, Ada Signature packages) woul=
d allow code that is both fast (due to control of memory layout) and elegan=
t (by providing type-classes. For an introduction to Haskell type classes s=
ee:

http://book.realworldhaskell.org/read/using-typeclasses.html

=20
Cheers,
Keean.