From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,103803355c3db607
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.241.162 with SMTP id wj2mr1968558pbc.2.1340974299436;
        Fri, 29 Jun 2012 05:51:39 -0700 (PDT)
Path: 
 l9ni33310pbj.0!nntp.google.com!news2.google.com!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
From: Keean Schupke <keean.schupke@googlemail.com>
Newsgroups: comp.lang.ada
Subject: Re: GNAT (GCC) Profile Guided Compilation
Date: Fri, 29 Jun 2012 05:51:38 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: <8ab283e5-6aaa-47f6-acee-fefdb60afb80@googlegroups.com>
References: <dac2857a-6f74-4ecb-a5d2-f6b73fbd0ecc@googlegroups.com>
 <kzdsund37tsa.pwih72b10p2v.dlg@40tude.net>
 <62d099a8-d754-4c13-b8c8-d8eea2d6a764@googlegroups.com>
 <87ebfdda-559f-4661-a36d-7933f2c89d7e@googlegroups.com>
 <Pine.LNX.4.64.1206291328140.9802@medsec1.medien.uni-weimar.de>
NNTP-Posting-Host: 82.44.19.199
Mime-Version: 1.0
X-Trace: posting.google.com 1340974299 22125 127.0.0.1 (29 Jun 2012 12:51:39
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Fri, 29 Jun 2012 12:51:39 +0000 (UTC)
Cc: stefan-lucks@see-the.signature
In-Reply-To: <Pine.LNX.4.64.1206291328140.9802@medsec1.medien.uni-weimar.de>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.44.19.199;
 posting-account=T5Z2vAoAAAB8ExE3yV3f56dVATtEMNcM
User-Agent: G2/1.0
Content-Type: text/plain; charset=ISO-8859-1
Date: 2012-06-29T05:51:38-07:00
List-Id: <comp.lang.ada>

On Friday, 29 June 2012 13:26:23 UTC+1, (unknown)  wrote:
> On Fri, 29 Jun 2012, Keean Schupke wrote:
> 
> > On Friday, 29 June 2012 11:01:30 UTC+1, Keean Schupke  wrote:
> 
> > > Maintaining the address of the current cell is faster than using 
> > > indexed addressing (mainly because of the extra parameter that needs 
> > > to be passed around increasing the pressure on register allocation).
> 
> I guess you did some benchmarking, regarding that issue?
> 
> > > I would be interested to understand how an algorithm can be 
> > > implemented in a more Ada way?
> 
> Faster or slower, you are extensively using access types (Element_Access 
> and Set_Access). This is something Ada programmers usually try to avoid, 
> except when there is an urgent need for some dynamic memory management 
> (e.g., implementing one's own Container library). 
> 
> >     generic
> >         type Element_Type is limited private;
> >         type Element_Access is not null access all Element_Type;
> >         type Set_Index is range <>;
> >     package Eager_Disjointsets is
> >         type Set_Type is limited private;
> >         type Set_Access is not null access all Set_Type;
> 
> What, exactly, do you use Element_Access for? And why, exactly, do you 
> bother to define Set_Access?

I use Element_Access because the element is going to be a record that needs to be updated in place. To copy the record is too slow. If you look at the get method we want to do:

Get(Set, Index).Something := xxx

The record stored in each Node of the set datatype will be limited, so you cannot return it from a function, you can only return an access to it.

Why Set_Access, because in order to avoid an access-level problem you cannot define the element-access at a different block level from the set-access, so in order to safely return an element-access we must pass a set-access in.

> 
> >         function Find(
> >             Set : in Set_Access;
> >             Index : in Set_Index
> >         ) return Set_Index;
> >         pragma Inline_Always(Find);
>  
> Why don't you just write the following? (Same for your other operations.)
> 
> 	function find
> 	  (Set   : in Set_Type;
> 	   Index : in Set_Index) return Set_Index;

Because of the above requirements of the Get function, it is better to make all the methods take a Set_Access for consistency. The we only need to pass a Set_Access around, never a set, forcing pass-by-reference and not copying.

> 
> [...]
> 
> >     private
> >         type Node_Type is limited record
> >             Canonical : Set_Index;
> >             Successor : Set_Index;
> >             Size : Natural;
> >             Value : aliased Element_Type;
> >         end record;
> >         type Set_Type is array (Set_Index) of Node_Type;
> >     end Eager_Disjointsets;
> 
> Why, exactly, do you make Value aliased? Is this related to your "faster 
> than indexed addressing" point above?

This is because we need to return an access to the value from 'Get' so that it can be updated in place. Remember 'Element_Type' will be limited.

> 
> This may be hard for the compiler to properly optimize: An aliased 
> component Element_Type in a record ... and then there is an array of such 
> records, and all is inside a generic package. 

Actually the compiler does pretty well. 40k per second it matches the C++ version. It is only when trying to use profile-guided optimisation that Ada is not improving.

> 
> If you really need Value to be aliased (do you?), then you can try out 
> rewriting Set_Type:
> 
>   type Set_Index_Array is array(Set_Index) of Set_Index;
>   type Size_Array      is array(Set_Index) of Natural;
>   type Value_Array     is array(Set_Index) of aliased Element_Type;
> 
>   type Set_Type is record
>     Canonical, Successor: Set_Index_Array;
>     Size:                 Size_Array;
>     Value:                Value_Array;
>   end record;

I dont think this will be faster as we now have to compute the array index offset for accessing each property. Happy to benchmark for you...

> 
> I have no idea if the above modification helps to speed up the 
> computation, or not. It may backfire, I don't know. 
> 
> >     package body Eager_Disjointsets is
> >         function Find(
> >             Set : in Set_Access;
> >             Index : in Set_Index
> >         ) return Set_Index is begin
> >             return Set(Index).Canonical;
> >         end Find;
> 
> Now, here you would have to write "return Set.Canonical(Index);"
> 
> >     -- (C)2011 Keean Schupke, all rights reserved.
> >     generic
> >         Size : Int;
> >         type Element_Type is limited private;
> >         type Element_Access is not null access all Element_Type;
> >     package Eager_Disjointsets is
> >         type Set_Type is limited private;
> >         type Set_Access is not null access all Set_Type;
> >         type Node_Type is limited private;
> >         type Cursor is access all Node_Type;
> 
> Is Int another name for Integer?

Sorry, its a type name for the local best performing integer type, which is Integer_64 in my case. 

> 
> I understand that for performance reason you introduced Cursor as access 
> all Node_Type and later Set_Array as aliased array of Node_Type.
> 
> But again, what do you need Element_Access and Set_Access for?

See above.

> 
> >         package Node_Address is new 
> >         System.Address_To_Access_Conversions(Node_Type);
> 
> Probably, the compiler finds that out by itself -- but you may want to 
> make it explicit that Node_Size doesn't change:
> 
> >         Node_Size : constant Storage_Offset := Node_Type'Object_Size
> >             / System.Storage_Elements.Storage_Element'Size;

Yes, thats a good spot - I missed that.

> 
> Also, if Node_Size isn't a power of 2, it may be difficult for the 
> compiler to decide on the alignment of objects of type Node_Type -- with 
> possible effects on caching. On one hand, it may be good to pack the 
> objects closely together, to make as many fit into a single cache line as 
> possible. On the other hand, it is bad if a part of a certain node is in 
> one cache line, another part in another cache line.

Packing is slower, the whole data set is small enough to fit in the cache (<5k) so alignment wins over packing.

> 
> >         function Find(
> >             Node : in Cursor
> >         ) return Cursor is begin
> >             return Cursor(Node.Canonical);
> >         end Find;
> 
> Not related to performance, but why don't you just write 
> "return Node.Canonical;" as Node.Canonical is of type Cursor, anyway?

Again, good spot, Canonical was not always a Cursor, so I missed it.

> 
> 
> -- 
> ---- Stefan.Lucks (at) uni-weimar.de, University of Weimar, Germany  ----
>     <http://www.uni-weimar.de/cms/medien/mediensicherheit/home.html>
> ------  I  love  the  taste  of  Cryptanalysis  in  the  morning!  ------


Cheers,
Keean.