comp.lang.ada
 help / color / mirror / Atom feed
* Efficient Sequential Access to Arrays
@ 2012-07-15 18:40 Keean Schupke
  2012-07-15 19:48 ` Dmitry A. Kazakov
                   ` (5 more replies)
  0 siblings, 6 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-15 18:40 UTC (permalink / raw)


The Monte-Carlo simulator I am working on is now performing better in Ada than in C++ using profile-guided compilation (57k simulations per second for Ada 'vs' 56k simulations per second for C++).

However in order to achieve this performance I needed to rework the arrays as the use of Indexes was too slow. An example of this is lets say we are accessing element 5 from array A "A(5)" this requires a multiplication to access (base_address + index * record_size). To access the neighbour A(6) also requires a multiplication. Accessing the array sequentially requires a multiplication per step. 

Contrast this with C++ where we can return a pointer to the array element, and we just need to add the record_size to step to the next element.

So assuming we need this level of performance, what would be the best (and most idiomatic Ada) way to package this type of usage pattern as an abstract datatype?


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 18:40 Efficient Sequential Access to Arrays Keean Schupke
@ 2012-07-15 19:48 ` Dmitry A. Kazakov
  2012-07-15 20:03   ` Keean Schupke
  2012-07-15 21:35   ` J-P. Rosen
  2012-07-15 19:52 ` Shark8
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-15 19:48 UTC (permalink / raw)


On Sun, 15 Jul 2012 11:40:08 -0700 (PDT), Keean Schupke wrote:

> However in order to achieve this performance I needed to rework the arrays
> as the use of Indexes was too slow.

You have to measure it in order to know. Write two variants and compare
their performance.

> An example of this is lets say we are
> accessing element 5 from array A "A(5)" this requires a multiplication to
> access (base_address + index * record_size).

It does not, because 5 is constant.

> To access the neighbour A(6)
> also requires a multiplication. Accessing the array sequentially requires
> a multiplication per step. 

That depends on loop optimizations. I would expect GCC to optimize access
to array elements per the loop's index.

> So assuming we need this level of performance, what would be the best (and
> most idiomatic Ada)

Indexing is unlikely to have a significant (>5%) influence on overall
performance. Usually it is other things.

> way to package this type of usage pattern as an
> abstract datatype?

Array of aliased elements, to ensure elements independently addressable.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 18:40 Efficient Sequential Access to Arrays Keean Schupke
  2012-07-15 19:48 ` Dmitry A. Kazakov
@ 2012-07-15 19:52 ` Shark8
  2012-07-15 20:16   ` Keean Schupke
  2012-07-17 18:16 ` Florian Weimer
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 88+ messages in thread
From: Shark8 @ 2012-07-15 19:52 UTC (permalink / raw)


On Sunday, July 15, 2012 12:40:08 PM UTC-6, Keean Schupke wrote:
> The Monte-Carlo simulator I am working on is now performing better in Ada than in C++ using profile-guided compilation (57k simulations per second for Ada 'vs' 56k simulations per second for C++).
> 
> However in order to achieve this performance I needed to rework the arrays as the use of Indexes was too slow. An example of this is lets say we are accessing element 5 from array A "A(5)" this requires a multiplication to access (base_address + index * record_size). To access the neighbour A(6) also requires a multiplication. Accessing the array sequentially requires a multiplication per step. 
> 
> Contrast this with C++ where we can return a pointer to the array element, and we just need to add the record_size to step to the next element.
> 
> So assuming we need this level of performance, what would be the best (and most idiomatic Ada) way to package this type of usage pattern as an abstract datatype?
> 
> 
> Cheers,
> Keean.

Hm; the problem is that such optimizations are highly dependent on assumptions that might not be true in the future. Consider, for example, compiling targeting the JVM: the arrays are not guaranteed to be continuous with contiguous components and therefore such a method would invite incorrect result/behavior to the program.

That said, if I needed to resort to such micro-optimizations I'd throw them all into a generic-package [if possible] and comment the heck out of the subprogram bodies. {Or if you're using a common pattern you might be able to explain the pattern in the package-body and then do the implementations -- You might be able to get away with a generic-iterator inside the package body.}




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 19:48 ` Dmitry A. Kazakov
@ 2012-07-15 20:03   ` Keean Schupke
  2012-07-15 20:56     ` Keean Schupke
                       ` (3 more replies)
  2012-07-15 21:35   ` J-P. Rosen
  1 sibling, 4 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-15 20:03 UTC (permalink / raw)
  Cc: mailbox

On Sunday, 15 July 2012 20:48:35 UTC+1, Dmitry A. Kazakov  wrote:
> On Sun, 15 Jul 2012 11:40:08 -0700 (PDT), Keean Schupke wrote:
> 
> > However in order to achieve this performance I needed to rework the arrays
> > as the use of Indexes was too slow.
> 
> You have to measure it in order to know. Write two variants and compare
> their performance.

Have done, the difference is about 10k simulations per second.

> 
> > An example of this is lets say we are
> > accessing element 5 from array A "A(5)" this requires a multiplication to
> > access (base_address + index * record_size).
> 
> It does not, because 5 is constant.

Obviously the index is not actually a constant but the result of selecting a random element in the list. The multiplication is necessary for finding this first element, but the operations on relative elements (+/-1, +/-2 etc) can use addition with a pointer, not so with an index.

> 
> > To access the neighbour A(6)
> > also requires a multiplication. Accessing the array sequentially requires
> > a multiplication per step. 
> 
> That depends on loop optimizations. I would expect GCC to optimize access
> to array elements per the loop's index.

It does not seem to, I'll see if I can post some assembly.

> 
> > So assuming we need this level of performance, what would be the best (and
> > most idiomatic Ada)
> 
> Indexing is unlikely to have a significant (>5%) influence on overall
> performance. Usually it is other things.

It seems to have an significant influence in the benchmarks I have run and has a particular influence when sequentially accessing elements.

> 
> > way to package this type of usage pattern as an
> > abstract datatype?
> 
> Array of aliased elements, to ensure elements independently addressable.

Yes, the type itself will need to have aliased elements, but I was assuming this would be hidden in a package as an ADT, and would expose an indexed-iterator that has '+' and '-' operations on the iterator (not just a forward or bidirectional iterator).


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 19:52 ` Shark8
@ 2012-07-15 20:16   ` Keean Schupke
  2012-07-15 21:33     ` Georg Bauhaus
  0 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-15 20:16 UTC (permalink / raw)


On Sunday, 15 July 2012 20:52:42 UTC+1, Shark8  wrote:
> On Sunday, July 15, 2012 12:40:08 PM UTC-6, Keean Schupke wrote:
> > The Monte-Carlo simulator I am working on is now performing better in Ada than in C++ using profile-guided compilation (57k simulations per second for Ada 'vs' 56k simulations per second for C++).
> > 
> > However in order to achieve this performance I needed to rework the arrays as the use of Indexes was too slow. An example of this is lets say we are accessing element 5 from array A "A(5)" this requires a multiplication to access (base_address + index * record_size). To access the neighbour A(6) also requires a multiplication. Accessing the array sequentially requires a multiplication per step. 
> > 
> > Contrast this with C++ where we can return a pointer to the array element, and we just need to add the record_size to step to the next element.
> > 
> > So assuming we need this level of performance, what would be the best (and most idiomatic Ada) way to package this type of usage pattern as an abstract datatype?
> > 
> > 
> > Cheers,
> > Keean.
> 
> Hm; the problem is that such optimizations are highly dependent on assumptions that might not be true in the future. Consider, for example, compiling targeting the JVM: the arrays are not guaranteed to be continuous with contiguous components and therefore such a method would invite incorrect result/behavior to the program.
> 
> That said, if I needed to resort to such micro-optimizations I'd throw them all into a generic-package [if possible] and comment the heck out of the subprogram bodies. {Or if you're using a common pattern you might be able to explain the pattern in the package-body and then do the implementations -- You might be able to get away with a generic-iterator inside the package body.}


There are type-theoretic ways to deal with this in functional programming that do not rely on the underlying machine architecture at all. For example see:

The Derivative of a Regular Type is its Type of One-Hole Contexts [Conor McBride] http://strictlypositive.org/diff.pdf

Conor discusses trees, but the same is true for arrays, the type system can maintain the context, allowing a single pointer into the array to represent the 'current' element, the 'left context' and the 'right context'.

I am not sure such a rigorous analysis is necessary though, consider the following: the language semantics could be changed to allow this by for example having a primitive iterator type on arrays that supported operations like:

declare
    J : Array_Iterator := Array(5);
begin
    J := J + 1;
    J.Property = Value;
end

This would not expose the underlying machine architecture and would allow the kind of optimisations needed. J could have a range derived from the array bounds to bounds check the access.

It seems obvious enough that I am wondering if there is a way to do this that I missed?


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 20:03   ` Keean Schupke
@ 2012-07-15 20:56     ` Keean Schupke
  2012-07-16 11:34       ` Jacob Sparre Andersen
  2012-07-15 21:05     ` tmoran
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-15 20:56 UTC (permalink / raw)
  Cc: mailbox

On Sunday, 15 July 2012 21:03:12 UTC+1, Keean Schupke  wrote:
> On Sunday, 15 July 2012 20:48:35 UTC+1, Dmitry A. Kazakov  wrote:
> > On Sun, 15 Jul 2012 11:40:08 -0700 (PDT), Keean Schupke wrote:
> > 
> > > However in order to achieve this performance I needed to rework the arrays
> > > as the use of Indexes was too slow.
> > 
> > You have to measure it in order to know. Write two variants and compare
> > their performance.
> 
> Have done, the difference is about 10k simulations per second.
> 
> > 
> > > An example of this is lets say we are
> > > accessing element 5 from array A "A(5)" this requires a multiplication to
> > > access (base_address + index * record_size).
> > 
> > It does not, because 5 is constant.
> 
> Obviously the index is not actually a constant but the result of selecting a random element in the list. The multiplication is necessary for finding this first element, but the operations on relative elements (+/-1, +/-2 etc) can use addition with a pointer, not so with an index.
> 
> > 
> > > To access the neighbour A(6)
> > > also requires a multiplication. Accessing the array sequentially requires
> > > a multiplication per step. 
> > 
> > That depends on loop optimizations. I would expect GCC to optimize access
> > to array elements per the loop's index.
> 
> It does not seem to, I'll see if I can post some assembly.
> 
> > 
> > > So assuming we need this level of performance, what would be the best (and
> > > most idiomatic Ada)
> > 
> > Indexing is unlikely to have a significant (>5%) influence on overall
> > performance. Usually it is other things.
> 
> It seems to have an significant influence in the benchmarks I have run and has a particular influence when sequentially accessing elements.
> 
> > 
> > > way to package this type of usage pattern as an
> > > abstract datatype?
> > 
> > Array of aliased elements, to ensure elements independently addressable.
> 
> Yes, the type itself will need to have aliased elements, but I was assuming this would be hidden in a package as an ADT, and would expose an indexed-iterator that has '+' and '-' operations on the iterator (not just a forward or bidirectional iterator).
> 
> 
> Cheers,
> Keean.


So, looking at some test code, with simple loop bounds (say the range of the array or other constants) the whole loop gets optimised and unrolled. However this is not the case with more complex bounds (like random numbers needed in the monte-carlo simulation).  So with the following Ada code:


for J in S .. E loop
    Y := Y + X(J).Z;
end loop;


Where S and E are random numbers in the index range of the array and where S < E, and the array is an array of records (in this case 28 bytes long) containing the integer Z.


imul   $0x1c,%r9,%r10
add    0x48(%rsp,%r10,1),%edi


So you can see the multiplication is executed for every iteration.


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 20:03   ` Keean Schupke
  2012-07-15 20:56     ` Keean Schupke
@ 2012-07-15 21:05     ` tmoran
  2012-07-15 21:08       ` Keean Schupke
  2012-07-15 21:44     ` Georg Bauhaus
  2012-07-16  7:47     ` Dmitry A. Kazakov
  3 siblings, 1 reply; 88+ messages in thread
From: tmoran @ 2012-07-15 21:05 UTC (permalink / raw)


> as the use of Indexes was too slow. An example of this is lets say we are a=
> ccessing element 5 from array A "A(5)" this requires a multiplication to ac=
> cess (base_address + index * record_size). To access the neighbour A(6) als=
  Can you reasonably pad the element size to a power of two so the
multiplication can be replaced by a shift?  It would be interesting to see
the result, in various hardware contexts.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 21:05     ` tmoran
@ 2012-07-15 21:08       ` Keean Schupke
  2012-07-16  1:19         ` tmoran
  0 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-15 21:08 UTC (permalink / raw)


On Sunday, 15 July 2012 22:05:05 UTC+1, (unknown)  wrote:
> &gt; as the use of Indexes was too slow. An example of this is lets say we are a=
> &gt; ccessing element 5 from array A &quot;A(5)&quot; this requires a multiplication to ac=
> &gt; cess (base_address + index * record_size). To access the neighbour A(6) als=
>   Can you reasonably pad the element size to a power of two so the
> multiplication can be replaced by a shift?  It would be interesting to see
> the result, in various hardware contexts.

Actually in my actual use case the record is 32 bytes, and it does use a shift right by 5 instead of a multiply, but this is still 10k simulations per second slower that the pointer arithmetic version.


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 20:16   ` Keean Schupke
@ 2012-07-15 21:33     ` Georg Bauhaus
  0 siblings, 0 replies; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-15 21:33 UTC (permalink / raw)


On 15.07.12 22:16, Keean Schupke wrote:

> I am not sure such a rigorous analysis is necessary though, consider the following: the language semantics could be changed to allow this by for example having a primitive iterator type on arrays that supported operations like:
>
> declare
>      J : Array_Iterator := Array(5);
> begin
>      J := J + 1;
>      J.Property = Value;
> end
>
> This would not expose the underlying machine architecture and would allow the kind of optimisations needed. J could have a range derived from the array bounds to bounds check the access.
>
> It seems obvious enough that I am wondering if there is a way to do this that I missed?

IIUC, this would be along the lines of Ada 2012 iteration
syntax (the "of" loop)?

   for J of Some_array_5 loop
      J.Property := Value;
   end loop;

It is possible to define types that allow the above for containers,
not sure whether this extends to arrays.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 19:48 ` Dmitry A. Kazakov
  2012-07-15 20:03   ` Keean Schupke
@ 2012-07-15 21:35   ` J-P. Rosen
  1 sibling, 0 replies; 88+ messages in thread
From: J-P. Rosen @ 2012-07-15 21:35 UTC (permalink / raw)


Le 15/07/2012 21:48, Dmitry A. Kazakov a �crit :
>> To access the neighbour A(6)
>> > also requires a multiplication. Accessing the array sequentially requires
>> > a multiplication per step. 
> That depends on loop optimizations. I would expect GCC to optimize access
> to array elements per the loop's index.
> 
It might depend on the kind of loop, i.e. you may have better
optimizations with a for loop than with a while loop where you do the
increment yourself.

-- 
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 20:03   ` Keean Schupke
  2012-07-15 20:56     ` Keean Schupke
  2012-07-15 21:05     ` tmoran
@ 2012-07-15 21:44     ` Georg Bauhaus
  2012-07-16  7:47     ` Dmitry A. Kazakov
  3 siblings, 0 replies; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-15 21:44 UTC (permalink / raw)


On 15.07.12 22:03, Keean Schupke wrote:
> On Sunday, 15 July 2012 20:48:35 UTC+1, Dmitry A. Kazakov  wrote:
>> On Sun, 15 Jul 2012 11:40:08 -0700 (PDT), Keean Schupke wrote:
>>
>> &gt; However in order to achieve this performance I needed to rework the arrays
>> &gt; as the use of Indexes was too slow.
>>
>> You have to measure it in order to know. Write two variants and compare
>> their performance.
>
> Have done, the difference is about 10k simulations per second.
>
>>
>> &gt; An example of this is lets say we are
>> &gt; accessing element 5 from array A &quot;A(5)&quot; this requires a multiplication to
>> &gt; access (base_address + index * record_size).
>>
>> It does not, because 5 is constant.
>
> Obviously the index is not actually a constant but the result of selecting a random element in the list. The multiplication is necessary for finding this first element, but the operations on relative elements (+/-1, +/-2 etc) can use addition with a pointer, not so with an index.

There is address arithmetic in System.*. For the simple case of
constant distances, something like the following might have some
hints. Or it might if redone after thinking. procedure Grab has
the multiplication for offsets of components of a record type
object in an array.

The point is that one may express offsets by referring
to type attributes that follow from representation clauses
and then convert between addresses and pointers.

with Ada.Numerics.Float_Random;    use  Ada.Numerics.Float_Random;
with System.Address_To_Access_Conversions, System.Storage_Elements;
procedure Iter_Arrays (Size : Natural) is

    Storage_Elements : constant := 1;

    type Index is mod 2**32;
    type Data is digits 6;
    for Data'Size use 32;

    G : Generator;

    type Things is record
       One: Data := Data (Random (G));
       Two: Data := Data (Random (G));
    end record;

    for Things use record
       One at 0 * Storage_Elements range 0 .. Data'Size - 1;
       Two at 4 * Storage_Elements range 0 .. Data'Size - 1;
    end record;
    pragma Assert (Things'Size = 64);

    type List is array (Index range <>) of Things;
    for List'Alignment use 4 * Storage_Elements;
    for List'Component_Size use 64;


--  using array indexing

    procedure Bumble (A : in out List; Step : Index) is
    begin
       for K in A'First + Step .. A'Last - Step loop
          declare
             Diff_Backward : constant Data := A (K).One - A (K - Step).Two;
             Diff_Forward : constant Data := A (K).Two - A (K + Step).One;
          begin
             if (abs Diff_Backward) < (abs Diff_Forward) then
                A (K).One := A (K - Step).One;
             else
                A (K).Two := A (K + Step).Two;
             end if;
          end;
       end loop;
    end Bumble;


--  using relative addresses

    package Indirection is new System.Address_To_Access_Conversions
      (Object => Data);
    subtype Data_Pointer is Indirection.Object_Pointer;


    procedure Grab  (A : in out List; Step : Index) is
       -- [A] = [one][two][one][two]...[one][two]
       use System.Storage_Elements, Indirection;
       Offset_Backward : constant Storage_Offset :=
         Storage_Offset (0 + Step * 8 + 4);
       Offset_Forward : constant Storage_Offset :=
         Storage_Offset (0 - Step * 8 + 0);
    begin
       for K in A'First + Step .. A'Last - Step loop
          declare
             Back : constant Data_Pointer :=
               To_Pointer (A (K).One'Address + Offset_Backward);
             Ahead : constant Data_Pointer :=
               To_Pointer (A (K).Two'Address + Offset_Backward);
             Diff_Backward : constant Data := A (K).One - Back.all;
             Diff_Forward : constant Data := A (K).One - Ahead.all;
          begin
             if (abs Diff_Backward) < (abs Diff_Forward) then
                A (K).One := A (K - Step).One;
             else
                A (K).Two := A (K + Step).Two;
             end if;
          end;
       end loop;
    end Grab;

    Test_Data : List (0 .. Index(Size));

begin
    Bumble (Test_Data, 2);
    Grab (Test_Data, 3);
end Iter_Arrays;



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 21:08       ` Keean Schupke
@ 2012-07-16  1:19         ` tmoran
  0 siblings, 0 replies; 88+ messages in thread
From: tmoran @ 2012-07-16  1:19 UTC (permalink / raw)


> shift right by 5 instead of a multiply, but this is still 10k simulations
> per second slower that the pointer arithmetic version.
So each simulation requires on the order of 10**5 array accesses?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 20:03   ` Keean Schupke
                       ` (2 preceding siblings ...)
  2012-07-15 21:44     ` Georg Bauhaus
@ 2012-07-16  7:47     ` Dmitry A. Kazakov
  2012-07-16 10:16       ` Keean Schupke
  3 siblings, 1 reply; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-16  7:47 UTC (permalink / raw)


On Sun, 15 Jul 2012 13:03:12 -0700 (PDT), Keean Schupke wrote:

> On Sunday, 15 July 2012 20:48:35 UTC+1, Dmitry A. Kazakov  wrote:
>> On Sun, 15 Jul 2012 11:40:08 -0700 (PDT), Keean Schupke wrote:
>> 
>> &gt; However in order to achieve this performance I needed to rework the arrays
>> &gt; as the use of Indexes was too slow.
>> 
>> You have to measure it in order to know. Write two variants and compare
>> their performance.
> 
> Have done, the difference is about 10k simulations per second.

Absolute figures tell little. How big is the relative difference?

It does not worth efforts if the gain or loss is under 10%. Let your
simulation take 10 long years. 10% would mean 1 extra year of computations.
Wait 6 months, buy a new computer and you still will be ahead of the
schedule.

>> That depends on loop optimizations. I would expect GCC to optimize access
>> to array elements per the loop&#39;s index.
> 
> It does not seem to, I'll see if I can post some assembly.

Loop optimization is a tricky issue. I suspect that tiny variations of code
may turn it on or off. You should read GCC documentation on that subject to
see what is possible on which machines.

>> &gt; So assuming we need this level of performance, what would be the best (and
>> &gt; most idiomatic Ada)
>> 
>> Indexing is unlikely to have a significant (&gt;5%) influence on overall
>> performance. Usually it is other things.
> 
> It seems to have an significant influence in the benchmarks I have run and
> has a particular influence when sequentially accessing elements.

How much operations do you have per array element? If you do basically only
indexing then there must be something wrong with the algorithm. Otherwise,
say one multiplication per 100 instructions, is nothing, especially when
memory access (a very slow thing) is involved.

>> &gt; way to package this type of usage pattern as an
>> &gt; abstract datatype?
>> 
>> Array of aliased elements, to ensure elements independently addressable.
> 
> Yes, the type itself will need to have aliased elements, but I was
> assuming this would be hidden in a package as an ADT, and would expose an
> indexed-iterator that has '+' and '-' operations on the iterator (not just
> a forward or bidirectional iterator).

Low-level optimization is not compatible with proper design. You have to
choose. If you do a controlled iterator type wrapping a pointer rather than
integer index, that most likely would be even slower.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16  7:47     ` Dmitry A. Kazakov
@ 2012-07-16 10:16       ` Keean Schupke
  2012-07-16 14:57         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-16 10:16 UTC (permalink / raw)
  Cc: mailbox

On Monday, 16 July 2012 08:47:01 UTC+1, Dmitry A. Kazakov  wrote:
> On Sun, 15 Jul 2012 13:03:12 -0700 (PDT), Keean Schupke wrote:
> 
> &gt; On Sunday, 15 July 2012 20:48:35 UTC+1, Dmitry A. Kazakov  wrote:
> &gt;&gt; On Sun, 15 Jul 2012 11:40:08 -0700 (PDT), Keean Schupke wrote:
> &gt;&gt; 
> &gt;&gt; &amp;gt; However in order to achieve this performance I needed to rework the arrays
> &gt;&gt; &amp;gt; as the use of Indexes was too slow.
> &gt;&gt; 
> &gt;&gt; You have to measure it in order to know. Write two variants and compare
> &gt;&gt; their performance.
> &gt; 
> &gt; Have done, the difference is about 10k simulations per second.
> 
> Absolute figures tell little. How big is the relative difference?

About 25% (30k with indexing, 40k with relative addressing)

> 
> It does not worth efforts if the gain or loss is under 10%. Let your
> simulation take 10 long years. 10% would mean 1 extra year of computations.
> Wait 6 months, buy a new computer and you still will be ahead of the
> schedule.

The software needs to react faster than the competition at any given point in time. That is all I can say really.

> 
> &gt;&gt; That depends on loop optimizations. I would expect GCC to optimize access
> &gt;&gt; to array elements per the loop&amp;#39;s index.
> &gt; 
> &gt; It does not seem to, I&#39;ll see if I can post some assembly.
> 
> Loop optimization is a tricky issue. I suspect that tiny variations of code
> may turn it on or off. You should read GCC documentation on that subject to
> see what is possible on which machines.

It appears GCC is unable to optimise the indexing away when the index is dynamic (relies on IO or PRNG for example).

> 
> &gt;&gt; &amp;gt; So assuming we need this level of performance, what would be the best (and
> &gt;&gt; &amp;gt; most idiomatic Ada)
> &gt;&gt; 
> &gt;&gt; Indexing is unlikely to have a significant (&amp;gt;5%) influence on overall
> &gt;&gt; performance. Usually it is other things.
> &gt; 
> &gt; It seems to have an significant influence in the benchmarks I have run and
> &gt; has a particular influence when sequentially accessing elements.
> 
> How much operations do you have per array element? If you do basically only
> indexing then there must be something wrong with the algorithm. Otherwise,
> say one multiplication per 100 instructions, is nothing, especially when
> memory access (a very slow thing) is involved.

The algorithm is highly optimised, and has been developed over several years. The choice of data structures is pretty good, and the low operation count indicates that the data structures are a good match for the topology of the problem. Most highly efficient algorithms do a lot of work with a few operations. I would recommend Tarjan for a view of the kind of algorithms we are using:

http://books.google.co.uk/books/about/Data_Structures_and_Network_Algorithms.html?id=m1rAB3uWwBwC&redir_esc=y

> 
> &gt;&gt; &amp;gt; way to package this type of usage pattern as an
> &gt;&gt; &amp;gt; abstract datatype?
> &gt;&gt; 
> &gt;&gt; Array of aliased elements, to ensure elements independently addressable.
> &gt; 
> &gt; Yes, the type itself will need to have aliased elements, but I was
> &gt; assuming this would be hidden in a package as an ADT, and would expose an
> &gt; indexed-iterator that has &#39;+&#39; and &#39;-&#39; operations on the iterator (not just
> &gt; a forward or bidirectional iterator).
> 
> Low-level optimization is not compatible with proper design. You have to
> choose. If you do a controlled iterator type wrapping a pointer rather than
> integer index, that most likely would be even slower.

Highly optimised code can still have good structure, using inlining and generics code can be elegant and fast.

There is nothing wrong in design terms with a forward-iterator, in fact it is better design from a generic point of view to expose the minimal set of methods to efficiently implement the required algorithm. To see elegant and efficient code constructed in this way see Stepanov's "Elements of Programming".

http://www.elementsofprogramming.com/book.html


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 20:56     ` Keean Schupke
@ 2012-07-16 11:34       ` Jacob Sparre Andersen
  0 siblings, 0 replies; 88+ messages in thread
From: Jacob Sparre Andersen @ 2012-07-16 11:34 UTC (permalink / raw)


Keean Schupke wrote:

> for J in S .. E loop
>     Y := Y + X(J).Z;
> end loop;
>
> Where S and E are random numbers in the index range of the array and
> where S < E, and the array is an array of records (in this case 28
> bytes long) containing the integer Z.
>
> imul   $0x1c,%r9,%r10
> add    0x48(%rsp,%r10,1),%edi
>
> So you can see the multiplication is executed for every iteration.

Your example is probably too simple for your actual application, but (in
Ada 2012) you could also write it (untested):

   for J of X (S .. E) loop
      Y := Y + J.Z;
   end loop;

Greetings,

Jacob
-- 
"Never interrupt your enemy when he is making a mistake."



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 10:16       ` Keean Schupke
@ 2012-07-16 14:57         ` Dmitry A. Kazakov
  2012-07-16 16:39           ` Keean Schupke
  0 siblings, 1 reply; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-16 14:57 UTC (permalink / raw)


On Mon, 16 Jul 2012 03:16:09 -0700 (PDT), Keean Schupke wrote:

> On Monday, 16 July 2012 08:47:01 UTC+1, Dmitry A. Kazakov  wrote:
>> On Sun, 15 Jul 2012 13:03:12 -0700 (PDT), Keean Schupke wrote:
>> 
>> Absolute figures tell little. How big is the relative difference?
> 
> About 25% (30k with indexing, 40k with relative addressing)

Which itself looks like a problem. It takes no less than 25% of overall
performance for mere shuffling data in a task unrelated to stuff like
recoding etc. This looks wrong to me. I would look for a better
algorithm/structure.

>> Loop optimization is a tricky issue. I suspect that tiny variations of code
>> may turn it on or off. You should read GCC documentation on that subject to
>> see what is possible on which machines.
> 
> It appears GCC is unable to optimise the indexing away when the index is
> dynamic (relies on IO or PRNG for example).

You should read its documentation in order to be able to say. Consequent
indexing could be optimized as addition to the register-stored base.

>> How much operations do you have per array element? If you do basically only
>> indexing then there must be something wrong with the algorithm. Otherwise,
>> say one multiplication per 100 instructions, is nothing, especially when
>> memory access (a very slow thing) is involved.
> 
> The algorithm is highly optimised, and has been developed over several
> years. The choice of data structures is pretty good,

It does not look good when indexing takes more than 25%.

> Highly optimised code can still have good structure, using inlining and
> generics code can be elegant and fast.

Code using generics/templates is never elegant nor fast, but that is an
off-topic.

> There is nothing wrong in design terms with a forward-iterator, in fact it
> is better design from a generic point of view to expose the minimal set of
> methods to efficiently implement the required algorithm.

This is only necessary when the class of the container type is wider than
just arrays, e.g. includes other structures like lists.

If you have performance problems with arrays, which are built-in and
subject of tailored optimization, you cannot expect situation improved by
puzzling the compiler with more abstract types. That could only disable
optimization.

> To see elegant and efficient code constructed in this way see Stepanov's
> "Elements of Programming".

I am not a STL fan. 

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 14:57         ` Dmitry A. Kazakov
@ 2012-07-16 16:39           ` Keean Schupke
  2012-07-16 17:02             ` Georg Bauhaus
                               ` (3 more replies)
  0 siblings, 4 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-16 16:39 UTC (permalink / raw)
  Cc: mailbox

On Monday, 16 July 2012 15:57:04 UTC+1, Dmitry A. Kazakov  wrote:
> On Mon, 16 Jul 2012 03:16:09 -0700 (PDT), Keean Schupke wrote:
> 
> &gt; On Monday, 16 July 2012 08:47:01 UTC+1, Dmitry A. Kazakov  wrote:
> &gt;&gt; On Sun, 15 Jul 2012 13:03:12 -0700 (PDT), Keean Schupke wrote:
> &gt;&gt; 
> &gt;&gt; Absolute figures tell little. How big is the relative difference?
> &gt; 
> &gt; About 25% (30k with indexing, 40k with relative addressing)
> 
> Which itself looks like a problem. It takes no less than 25% of overall
> performance for mere shuffling data in a task unrelated to stuff like
> recoding etc. This looks wrong to me. I would look for a better
> algorithm/structure.

I find this comment puzzling and does not match my experience. What sort of algorithms do you work with. What algorithm for disjoint-set do you suggest I use? Have a read of this:

http://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

Can you come up with a faster algorithm? Until you have read it I don't think you can comment on the quality of algorithm I am using.

> 
> &gt;&gt; Loop optimization is a tricky issue. I suspect that tiny variations of code
> &gt;&gt; may turn it on or off. You should read GCC documentation on that subject to
> &gt;&gt; see what is possible on which machines.
> &gt; 
> &gt; It appears GCC is unable to optimise the indexing away when the index is
> &gt; dynamic (relies on IO or PRNG for example).
> 
> You should read its documentation in order to be able to say. Consequent
> indexing could be optimized as addition to the register-stored base.


It may be able to optimize the access in loops, but this is for more complex.

The node in the array contains pointers to other nodes. When we follow the pointers we want to be able to access the neighbours consider:

type Node_Type is record 
    Next : access all Node_Type;
    Other : access all Node_Type;
    Data : Data_Type;
end record;

type A is array (Index_Type) of aliased Node_Type;

X : A;

So we can access the next record like this:

D := X(Y).Next.Data;

but how do we access the neighbouring nodes? We cannot do:

D := (X(Y).Next + 1).Data;


However if we put Indexes in the Node:

type Node_Type is record 
    Next : Index_Type;
    Other : Index_Type;
    Data : Data_Type;
end record;


Then the access is slow:

D := X(X(Y).Next).Data;

I don't think there is anything the compiler can do to optimise this kind of access.

However we can now access the neighbour:

D := X(X(Y).Next + 1).Data;


Basically what we are talking about is an hybrid array / list datatype. To me this really requires memory address manipulation (and I really don't care about running on Lisp machines, I want good performance on real hardware like Intel/AMD/ARM cpus). But we can hide all that inside a nice safe package which does some checking when in debug mode.


> 
> &gt;&gt; How much operations do you have per array element? If you do basically only
> &gt;&gt; indexing then there must be something wrong with the algorithm. Otherwise,
> &gt;&gt; say one multiplication per 100 instructions, is nothing, especially when
> &gt;&gt; memory access (a very slow thing) is involved.
> &gt; 
> &gt; The algorithm is highly optimised, and has been developed over several
> &gt; years. The choice of data structures is pretty good,
> 
> It does not look good when indexing takes more than 25%.

Looks can be deceiving.

> 
> &gt; Highly optimised code can still have good structure, using inlining and
> &gt; generics code can be elegant and fast.
> 
> Code using generics/templates is never elegant nor fast, but that is an
> off-topic.
> 
> &gt; There is nothing wrong in design terms with a forward-iterator, in fact it
> &gt; is better design from a generic point of view to expose the minimal set of
> &gt; methods to efficiently implement the required algorithm.
> 
> This is only necessary when the class of the container type is wider than
> just arrays, e.g. includes other structures like lists.

This ignores the fact that the interface for an ADT should be independent of its implementation. Think of ArrayList and LinkedList in Java. The code should use the List<> interface if it only needs to access elements sequentially (it is the algorithm that determines the requirements of the interface not the data-type). So we code our algorithm that only needs sequential access using the List interface, and we are then free to choose either ArrayList or LinkedList depending on which performs best for our application.

> 
> If you have performance problems with arrays, which are built-in and
> subject of tailored optimization, you cannot expect situation improved by
> puzzling the compiler with more abstract types. That could only disable
> optimization.

Maybe you are thinking of something more complex than I am. An ADT is just a private type with associated functions in a package. It is the fact that the type itself is private and that all operations on it must be done through the functions provided in the same package that defines an ADT. The private type does not have to be a record, it could be an access type. Because all the functions get inlined there is no cost to creating an ADT like this.

Also if the Iterator can be in the same package as the container.

> 
> &gt; To see elegant and efficient code constructed in this way see Stepanov&#39;s
> &gt; &quot;Elements of Programming&quot;.
> 
> I am not a STL fan. 

Element's is different from the STL. It includes native 'concepts' that define axiomatic constraints on software modules. Its a hard book to really understand, but well worth a read I think. It changed my mind about imperative languages. I have spent a lot of time programming Haskell, and got deeply into System-F, type theory etc. The key understanding for me was when I realised 'concepts' in the generic sense are the same thing as type-classes from Haskell, and implementations could be structured the same way as type-classes enable the structuring of functional programs. However Haskell is slow, and where it really lacks is in direct memory manipulation.

The conclusion I came to is that a language that allows the exact control of data memory-layout (like C/C++) but allows the use of generic concepts (Boost Concept checking library for C++ for now, Ada Signature packages) would allow code that is both fast (due to control of memory layout) and elegant (by providing type-classes. For an introduction to Haskell type classes see:

http://book.realworldhaskell.org/read/using-typeclasses.html

 
Cheers,
Keean.





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 16:39           ` Keean Schupke
@ 2012-07-16 17:02             ` Georg Bauhaus
  2012-07-16 18:48             ` Dmitry A. Kazakov
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-16 17:02 UTC (permalink / raw)


On 16.07.12 18:39, Keean Schupke wrote:
> type Node_Type is record 
>     Next : access all Node_Type;
>     Other : access all Node_Type;
>     Data : Data_Type;
> end record;

Just in case, it should in general be advantageous to define
a pointer type and add a null exclusion where possible.
Anonymous access types may incur a number of undesirable
effects at all stages of the software life cycle.
I'm sure you knew this.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 16:39           ` Keean Schupke
  2012-07-16 17:02             ` Georg Bauhaus
@ 2012-07-16 18:48             ` Dmitry A. Kazakov
  2012-07-16 19:12               ` Keean Schupke
  2012-07-16 19:43             ` John B. Matthews
  2012-07-16 22:23             ` tmoran
  3 siblings, 1 reply; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-16 18:48 UTC (permalink / raw)


On Mon, 16 Jul 2012 09:39:45 -0700 (PDT), Keean Schupke wrote:

> On Monday, 16 July 2012 15:57:04 UTC+1, Dmitry A. Kazakov  wrote:

>> Which itself looks like a problem. It takes no less than 25% of overall
>> performance for mere shuffling data in a task unrelated to stuff like
>> recoding etc. This looks wrong to me. I would look for a better
>> algorithm/structure.
> 
> I find this comment puzzling and does not match my experience.

It is just an observation that more than 25% of CPU load is spent to heat
the atmosphere rather than doing something useful.

> What algorithm for disjoint-set do you suggest I use?

I would try to avoid using such things if the problem is large.

> Until you have read it I don't
> think you can comment on the quality of algorithm I am using.

Quality is an objective measure which does not depend on whether anybody
reads anything.
 
>> It appears GCC is unable to optimise the indexing away when the index is
>> dynamic (relies on IO or PRNG for example).
>> 
>> You should read its documentation in order to be able to say. Consequent
>> indexing could be optimized as addition to the register-stored base.
> 
> It may be able to optimize the access in loops, but this is for more complex.

Either the problem is that the GCC's is unable to optimize something or it
is not.

> The node in the array contains pointers to other nodes. When we follow the
> pointers we want to be able to access the neighbours consider:
>
> type Node_Type is record 
>    Next : access all Node_Type;
>    Other : access all Node_Type;
>    Data : Data_Type;
> end record;
>
> type A is array (Index_Type) of aliased Node_Type;
>
> X : A;
>
> So we can access the next record like this:
>
> D := X(Y).Next.Data;
>
> but how do we access the neighbouring nodes? We cannot do:
>
> D := (X(Y).Next + 1).Data;

   X(Y+1).Next.Data;
   X(Y).Next.Next.Data;

No idea what you are trying to do. But indexing *must* be faster than
indirection. You should make your data structures straight.

>> &gt; Highly optimised code can still have good structure, using inlining and
>> &gt; generics code can be elegant and fast.
>> 
>> Code using generics/templates is never elegant nor fast, but that is an
>> off-topic.
>> 
>> &gt; There is nothing wrong in design terms with a forward-iterator, in fact it
>> &gt; is better design from a generic point of view to expose the minimal set of
>> &gt; methods to efficiently implement the required algorithm.
>> 
>> This is only necessary when the class of the container type is wider than
>> just arrays, e.g. includes other structures like lists.
> 
> This ignores the fact that the interface for an ADT should be independent
> of its implementation.

That is impossible for built-in types like arrays, which (unfortunately)
lack inheritable interfaces in Ada.

>> If you have performance problems with arrays, which are built-in and
>> subject of tailored optimization, you cannot expect situation improved by
>> puzzling the compiler with more abstract types. That could only disable
>> optimization.
> 
> Maybe you are thinking of something more complex than I am. An ADT is just
> a private type with associated functions in a package.

It is difficult for the compiler to optimize wrappers of array operations
like indexing. You said you failed to make GCC to optimize raw array
access. Why wrapping that into functions should do any better?

> Element's is different from the STL. It includes native 'concepts' that
> define axiomatic constraints on software modules.

I have no idea what "axiomatic constraints on software modules" could mean.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 18:48             ` Dmitry A. Kazakov
@ 2012-07-16 19:12               ` Keean Schupke
  2012-07-17  6:31                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-16 19:12 UTC (permalink / raw)
  Cc: mailbox

On Monday, 16 July 2012 19:48:22 UTC+1, Dmitry A. Kazakov  wrote:
> On Mon, 16 Jul 2012 09:39:45 -0700 (PDT), Keean Schupke wrote:
> 
> &gt; On Monday, 16 July 2012 15:57:04 UTC+1, Dmitry A. Kazakov  wrote:
> 
> &gt;&gt; Which itself looks like a problem. It takes no less than 25% of overall
> &gt;&gt; performance for mere shuffling data in a task unrelated to stuff like
> &gt;&gt; recoding etc. This looks wrong to me. I would look for a better
> &gt;&gt; algorithm/structure.
> &gt; 
> &gt; I find this comment puzzling and does not match my experience.
> 
> It is just an observation that more than 25% of CPU load is spent to heat
> the atmosphere rather than doing something useful.
> 
> &gt; What algorithm for disjoint-set do you suggest I use?
> 
> I would try to avoid using such things if the problem is large.

Perhaps you would like to outline your methodology for algorithm development? How would you go about designing an efficient algorithm to solve these kind of problems:

• Network connectivity.
• Percolation.
• Image processing. (okay this ones a bit vague, but this list is copied from the paper)
• Least common ancestor.
• Equivalence of finite state automata.
• Hinley-Milner polymorphic type inference.
• Kruskal's minimum spanning tree algorithm. 
• Games (Go, Hex)
• Compiling equivalence statements in Fortran.

Can you can suggest a more efficient algorithm for any of these than WQUPC?

> 
> &gt; Until you have read it I don&#39;t
> &gt; think you can comment on the quality of algorithm I am using.
> 
> Quality is an objective measure which does not depend on whether anybody
> reads anything.

You can only comment on the quality of the algorithm if you understand the problem. What does a good quality algorithm to solve the percolation problem look like? (You can find a definition of the percolation problem in the WQUPC paper I posted a link to). If you can come up with a better algorithm it would be worthy of publication.

>  
> &gt;&gt; It appears GCC is unable to optimise the indexing away when the index is
> &gt;&gt; dynamic (relies on IO or PRNG for example).
> &gt;&gt; 
> &gt;&gt; You should read its documentation in order to be able to say. Consequent
> &gt;&gt; indexing could be optimized as addition to the register-stored base.
> &gt; 
> &gt; It may be able to optimize the access in loops, but this is for more complex.
> 
> Either the problem is that the GCC&#39;s is unable to optimize something or it
> is not.
> 
> &gt; The node in the array contains pointers to other nodes. When we follow the
> &gt; pointers we want to be able to access the neighbours consider:
> &gt;
> &gt; type Node_Type is record 
> &gt;    Next : access all Node_Type;
> &gt;    Other : access all Node_Type;
> &gt;    Data : Data_Type;
> &gt; end record;
> &gt;
> &gt; type A is array (Index_Type) of aliased Node_Type;
> &gt;
> &gt; X : A;
> &gt;
> &gt; So we can access the next record like this:
> &gt;
> &gt; D := X(Y).Next.Data;
> &gt;
> &gt; but how do we access the neighbouring nodes? We cannot do:
> &gt;
> &gt; D := (X(Y).Next + 1).Data;
> 
>    X(Y+1).Next.Data;
>    X(Y).Next.Next.Data;
> 
> No idea what you are trying to do. But indexing *must* be faster than
> indirection. You should make your data structures straight.

Straight data structures require copying data. If we were to use an array instead of a linked list then every state change would require copying stuff. Also each node would require enough space to store every other node (potentially) which would overflow the cache and seriously slow down the code. Other people have tried this approach and their code is slower.

> 
> &gt;&gt; &amp;gt; Highly optimised code can still have good structure, using inlining and
> &gt;&gt; &amp;gt; generics code can be elegant and fast.
> &gt;&gt; 
> &gt;&gt; Code using generics/templates is never elegant nor fast, but that is an
> &gt;&gt; off-topic.
> &gt;&gt; 
> &gt;&gt; &amp;gt; There is nothing wrong in design terms with a forward-iterator, in fact it
> &gt;&gt; &amp;gt; is better design from a generic point of view to expose the minimal set of
> &gt;&gt; &amp;gt; methods to efficiently implement the required algorithm.
> &gt;&gt; 
> &gt;&gt; This is only necessary when the class of the container type is wider than
> &gt;&gt; just arrays, e.g. includes other structures like lists.
> &gt; 
> &gt; This ignores the fact that the interface for an ADT should be independent
> &gt; of its implementation.
> 
> That is impossible for built-in types like arrays, which (unfortunately)
> lack inheritable interfaces in Ada.

I agree with you here - its a shame. I wonder if there is some kind of design principle in Ada that makes this undesirable, or if this is just an oversight.

> 
> &gt;&gt; If you have performance problems with arrays, which are built-in and
> &gt;&gt; subject of tailored optimization, you cannot expect situation improved by
> &gt;&gt; puzzling the compiler with more abstract types. That could only disable
> &gt;&gt; optimization.
> &gt; 
> &gt; Maybe you are thinking of something more complex than I am. An ADT is just
> &gt; a private type with associated functions in a package.
> 
> It is difficult for the compiler to optimize wrappers of array operations
> like indexing. You said you failed to make GCC to optimize raw array
> access. Why wrapping that into functions should do any better?

It doesn't do any worse... the functions are mostly single line and get inlined before the optimisation pass. It has the benefit of better code structure, improved modularity and better readability.

On the whole many positives, and no negatives (because its just as fast in my tests).

> 
> &gt; Element&#39;s is different from the STL. It includes native &#39;concepts&#39; that
> &gt; define axiomatic constraints on software modules.
> 
> I have no idea what &quot;axiomatic constraints on software modules&quot; could mean.

See:

http://akrzemi1.wordpress.com/2012/01/11/concept-axioms-what-for/



Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 16:39           ` Keean Schupke
  2012-07-16 17:02             ` Georg Bauhaus
  2012-07-16 18:48             ` Dmitry A. Kazakov
@ 2012-07-16 19:43             ` John B. Matthews
  2012-07-16 20:44               ` Keean Schupke
  2012-07-16 22:23             ` tmoran
  3 siblings, 1 reply; 88+ messages in thread
From: John B. Matthews @ 2012-07-16 19:43 UTC (permalink / raw)


In article <4c86d54d-dfeb-44f1-9638-3c416ea51d74@googlegroups.com>,
 Keean Schupke <keean.schupke@googlemail.com> wrote:

> > On Mon, 16 Jul 2012 03:16:09 -0700 (PDT), Keean Schupke wrote:
> > 
> > &gt; On Monday, 16 July 2012 08:47:01 UTC+1, Dmitry A. Kazakov  wrote:
> > > > On Sun, 15 Jul 2012 13:03:12 -0700 (PDT), Keean Schupke wrote:
> > > >  
> > > > Absolute figures tell little. How big is the relative difference?
> > > 
> > > About 25% (30k with indexing, 40k with relative addressing)
> > 
> > Which itself looks like a problem. It takes no less than 25% of 
> > overall performance for mere shuffling data in a task unrelated to 
> > stuff like recoding etc. This looks wrong to me. I would look for a 
> > better algorithm/structure.
> 
> I find this comment puzzling and does not match my experience. What 
> sort of algorithms do you work with. What algorithm for disjoint-set 
> do you suggest I use? Have a read of this:
> 
> http://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

Apropos to this, may I impose on you to elaborate on your previous 
decision not to use WQUPC? In particular, is your application lookup 
dominated due in some part to the choice of algorithm itself?

Sorry if I've overlooked an intervening clarification.

<https://groups.google.com/d/msg/comp.lang.ada/EDgDNVw9tgc/bUYu34D_8LEJ>

> Can you come up with a faster algorithm? Until you have read it I 
> don't think you can comment on the quality of algorithm I am using.

-- 
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 19:43             ` John B. Matthews
@ 2012-07-16 20:44               ` Keean Schupke
  0 siblings, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-16 20:44 UTC (permalink / raw)


On Monday, 16 July 2012 20:43:54 UTC+1, John B. Matthews  wrote:
>  Keean Schupke wrote:
> 
> &gt; &gt; On Mon, 16 Jul 2012 03:16:09 -0700 (PDT), Keean Schupke wrote:
> &gt; &gt; 
> &gt; &gt; &amp;gt; On Monday, 16 July 2012 08:47:01 UTC+1, Dmitry A. Kazakov  wrote:
> &gt; &gt; &gt; &gt; On Sun, 15 Jul 2012 13:03:12 -0700 (PDT), Keean Schupke wrote:
> &gt; &gt; &gt; &gt;  
> &gt; &gt; &gt; &gt; Absolute figures tell little. How big is the relative difference?
> &gt; &gt; &gt; 
> &gt; &gt; &gt; About 25% (30k with indexing, 40k with relative addressing)
> &gt; &gt; 
> &gt; &gt; Which itself looks like a problem. It takes no less than 25% of 
> &gt; &gt; overall performance for mere shuffling data in a task unrelated to 
> &gt; &gt; stuff like recoding etc. This looks wrong to me. I would look for a 
> &gt; &gt; better algorithm/structure.
> &gt; 
> &gt; I find this comment puzzling and does not match my experience. What 
> &gt; sort of algorithms do you work with. What algorithm for disjoint-set 
> &gt; do you suggest I use? Have a read of this:
> &gt; 
> &gt; http://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
> 
> Apropos to this, may I impose on you to elaborate on your previous 
> decision not to use WQUPC? In particular, is your application lookup 
> dominated due in some part to the choice of algorithm itself?
> 
> Sorry if I&#39;ve overlooked an intervening clarification.
> 
> &lt;https://groups.google.com/d/msg/comp.lang.ada/EDgDNVw9tgc/bUYu34D_8LEJ&gt;
> 
> &gt; Can you come up with a faster algorithm? Until you have read it I 
> &gt; don&#39;t think you can comment on the quality of algorithm I am using.
> 
> -- 
> John B. Matthews
> trashgod at gmail dot com
> &lt;http://sites.google.com/site/drjohnbmatthews&gt;

That is correct, it is lookup dominated. By joining the smaller set to the lager set and updating the canonical pointer for each node in the smaller set at join time we get better performance for this particular problem.

The second part of the question is harder, is it lookup dominated due to the choice of algorithm? I don't think so. Certainly the choice of Eager-Union vs WUQPC has no influence on the number of lookups.

So the question then becomes what is the wider algorithm, what are we using the disjoint-set to do, but this is getting off topic. The original point was that there exist a class of algorithms for which array indexing is a non-optimal approach to accessing the container. There exists a class of algorithms that require both indexes and linked access within the same data structure, a linked-hash table for example.

So lets try and focus here: Given that such a class of algorithms exist, and for some problems will be optimal, how do we best implement them in Ada?


Cheers,
Keean.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 16:39           ` Keean Schupke
                               ` (2 preceding siblings ...)
  2012-07-16 19:43             ` John B. Matthews
@ 2012-07-16 22:23             ` tmoran
  2012-07-17  6:40               ` Keean Schupke
  3 siblings, 1 reply; 88+ messages in thread
From: tmoran @ 2012-07-16 22:23 UTC (permalink / raw)


I'm confused.  The Subject line here is "Efficient Sequential Access to Arrays"
and you gave a sample piece of code
>  for J in S .. E loop
>      Y := Y + X(J).Z;
>  end loop;
and mentioned stepping through X with a pointer rather than an index.

  But now you are talking about algorithms mentioned in
>  http://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
  which have to do with nodes connected in a graph.  I'm not seeing there
anyplace where the relation between a(i) and a(i+1) is different from the
relation between a(i) and a(i+anything) - each node is independent and
may be connected to any other node, not just to those with "nearby" memory
addresses.

   -----------
As to
>  for J in S .. E loop
>      Y := Y + X(J).Z;
>  end loop;
  if S and E are random and typically fairly far apart, and the array
X remains constant, how about grouping Xs, eg
  X_Grouped := (others=>0);
  for i in X'range loop
    X_Grouped(i/Group_Size) := X_Grouped(i/Group_Size) + X(i);
  end loop;

  then instead of summing N = (E-S+1) X's, you can sum X_Grouped from
S/Group_Size .. E/Group_Size, which takes 1/Group_Size as many array
access and adds, and then handle the partial groups at the ends, which
will take 2*(Group_Size/2) operations on average.  So instead of
N sums on average, you will have N/Group_Size + Group_Size.  (Handling
the partial groups at the ends more carefully also cuts that second
term in half.)



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 19:12               ` Keean Schupke
@ 2012-07-17  6:31                 ` Dmitry A. Kazakov
  2012-07-17  6:50                   ` Georg Bauhaus
  2012-07-17  7:24                   ` Keean Schupke
  0 siblings, 2 replies; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-17  6:31 UTC (permalink / raw)


On Mon, 16 Jul 2012 12:12:48 -0700 (PDT), Keean Schupke wrote:

> On Monday, 16 July 2012 19:48:22 UTC+1, Dmitry A. Kazakov  wrote:
>> On Mon, 16 Jul 2012 09:39:45 -0700 (PDT), Keean Schupke wrote:
>> 
>> &gt; On Monday, 16 July 2012 15:57:04 UTC+1, Dmitry A. Kazakov  wrote:
>> 
>> &gt;&gt; Which itself looks like a problem. It takes no less than 25% of overall
>> &gt;&gt; performance for mere shuffling data in a task unrelated to stuff like
>> &gt;&gt; recoding etc. This looks wrong to me. I would look for a better
>> &gt;&gt; algorithm/structure.
>> &gt; 
>> &gt; I find this comment puzzling and does not match my experience.
>> 
>> It is just an observation that more than 25% of CPU load is spent to heat
>> the atmosphere rather than doing something useful.
>> 
>> &gt; What algorithm for disjoint-set do you suggest I use?
>> 
>> I would try to avoid using such things if the problem is large.
> 
> Perhaps you would like to outline your methodology for algorithm
> development?

I don't develop algorithms I apply them.

25% overhead on indexing is too much.

> How would you go about designing an efficient algorithm to
> solve these kind of problems:
> 
> 嚙瘟 Network connectivity.

1. Open amazon site
2. Search "netwok adapter" 
3. Choose 5 different with minimal price
4. Among them select one with maximal positive customer reports
5. Buy it

(Network connectivity is not an algorithmic problem)

>> No idea what you are trying to do. But indexing *must* be faster than
>> indirection. You should make your data structures straight.
> 
> Straight data structures require copying data. If we were to use an array
> instead of a linked list then every state change would require copying
> stuff.

In Ada you rename array elements. This lets the compiler to choose between
copying and handling it in-place. BTW copying might turn more effective
depending on the architecture.

>> &gt;&gt; If you have performance problems with arrays, which are built-in and
>> &gt;&gt; subject of tailored optimization, you cannot expect situation improved by
>> &gt;&gt; puzzling the compiler with more abstract types. That could only disable
>> &gt;&gt; optimization.
>> &gt; 
>> &gt; Maybe you are thinking of something more complex than I am. An ADT is just
>> &gt; a private type with associated functions in a package.
>> 
>> It is difficult for the compiler to optimize wrappers of array operations
>> like indexing. You said you failed to make GCC to optimize raw array
>> access. Why wrapping that into functions should do any better?
> 
> It doesn't do any worse... the functions are mostly single line and get
> inlined before the optimisation pass.

Did you verify that?

> On the whole many positives, and no negatives (because its just as fast in my tests).

But you said you failed to optimize multiplication away.

>> &gt; Element&#39;s is different from the STL. It includes native &#39;concepts&#39; that
>> &gt; define axiomatic constraints on software modules.
>> 
>> I have no idea what &quot;axiomatic constraints on software modules&quot; could mean.
> 
> See:
> 
> http://akrzemi1.wordpress.com/2012/01/11/concept-axioms-what-for/

Ah, you meant invariants, propositions regarding program semantics
maintained true during code transformations, e.g. for optimization.

In more complex cases user could help the compiler giving some advices. But
for low-level optimizations like ones of indexing this stuff is buried in
the compiler guts.

However, yes, Ada should have interfaces of integer types, interfaces of
index types, interfaces of array types etc, annotated by
pre-/post-conditions and invariants. Unfortunately, presently interfaces
are tied with by-reference types. It is unclear when Ada will provide
interfaces for all types. Otherwise, regarding program correctness stuff
there is SPARK and some in Ada 2012.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-16 22:23             ` tmoran
@ 2012-07-17  6:40               ` Keean Schupke
  2012-07-17  9:01                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-17  6:40 UTC (permalink / raw)


On Monday, 16 July 2012 23:23:34 UTC+1, (unknown)  wrote:
> I&#39;m confused.  The Subject line here is &quot;Efficient Sequential Access to Arrays&quot;
> and you gave a sample piece of code
> &gt;  for J in S .. E loop
> &gt;      Y := Y + X(J).Z;
> &gt;  end loop;
> and mentioned stepping through X with a pointer rather than an index.
> 
>   But now you are talking about algorithms mentioned in
> &gt;  http://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
>   which have to do with nodes connected in a graph.  I&#39;m not seeing there
> anyplace where the relation between a(i) and a(i+1) is different from the
> relation between a(i) and a(i+anything) - each node is independent and
> may be connected to any other node, not just to those with &quot;nearby&quot; memory
> addresses.

Look for example at the connected points example, or the Percolation example, or the hex game example. There are problems where the connectedness of subsets is only one part of the problem. In this case connectedness is one part, and spatial proximity is another.

> 
>    -----------
> As to
> &gt;  for J in S .. E loop
> &gt;      Y := Y + X(J).Z;
> &gt;  end loop;
>   if S and E are random and typically fairly far apart, and the array
> X remains constant, how about grouping Xs, eg
>   X_Grouped := (others=&gt;0);
>   for i in X&#39;range loop
>     X_Grouped(i/Group_Size) := X_Grouped(i/Group_Size) + X(i);
>   end loop;
> 
>   then instead of summing N = (E-S+1) X&#39;s, you can sum X_Grouped from
> S/Group_Size .. E/Group_Size, which takes 1/Group_Size as many array
> access and adds, and then handle the partial groups at the ends, which
> will take 2*(Group_Size/2) operations on average.  So instead of
> N sums on average, you will have N/Group_Size + Group_Size.  (Handling
> the partial groups at the ends more carefully also cuts that second
> term in half.)

This example was supposed to be a simple way to show the compiler does not optimise the multiply out of each array access, and it does not represent the actual problem.

The real problem is that each element needs to be addressed spatially (say as a 2D grid) so that all 4 direct, and 4 diagonal neighbours can be quickly found (using indexed-addressing), another is that we need to be able to tell if any two elements are in the same group (using disjoint-set union-find), and another is that we need to be able to iterate through all the members of a group given any element in that group.

If there is a faster data structure that meets all those requirements, then I am listening.

    
Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-17  6:31                 ` Dmitry A. Kazakov
@ 2012-07-17  6:50                   ` Georg Bauhaus
  2012-07-17  8:42                     ` Dmitry A. Kazakov
  2012-07-17  7:24                   ` Keean Schupke
  1 sibling, 1 reply; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-17  6:50 UTC (permalink / raw)


On 17.07.12 08:31, Dmitry A. Kazakov wrote:
>> >‧ Network connectivity.
> 1. Open amazon site
> 2. Search "netwok adapter"
> 3. Choose 5 different with minimal price
> 4. Among them select one with maximal positive customer reports
> 5. Buy it
>
> (Network connectivity is not an algorithmic problem)
>

Uh, I think it's connected graphs (V, E) here, not computers
and ethernet cables. Or social relations. Just the abstraction.

A search engine result for "network connectivity algorithm":

Abstract:
"This paper surveys the recent progress on the graph algorithms
for solving network connectivity problems such as the extreme
set problem, the cactus representation problem, the edge-connectivity
augmentation problem and the source location problem. In particular,
we show that efficient algorithms for these problems can be designed
based on maximum adjacency orderings."

Keywords: Graph theory, network flow, algorithm, MA ordering,
minimum cut, edge- connectivity, graph augmentation, source
location problem

http://www.orsj.or.jp/~archive/pdf/e_mag/Vol.47_4_199.pdf



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-17  6:31                 ` Dmitry A. Kazakov
  2012-07-17  6:50                   ` Georg Bauhaus
@ 2012-07-17  7:24                   ` Keean Schupke
  1 sibling, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-17  7:24 UTC (permalink / raw)
  Cc: mailbox

On Tuesday, 17 July 2012 07:31:11 UTC+1, Dmitry A. Kazakov  wrote:
> On Mon, 16 Jul 2012 12:12:48 -0700 (PDT), Keean Schupke wrote:
> 
> &gt; On Monday, 16 July 2012 19:48:22 UTC+1, Dmitry A. Kazakov  wrote:
> &gt;&gt; On Mon, 16 Jul 2012 09:39:45 -0700 (PDT), Keean Schupke wrote:
> &gt;&gt; 
> &gt;&gt; &amp;gt; On Monday, 16 July 2012 15:57:04 UTC+1, Dmitry A. Kazakov  wrote:
> &gt;&gt; 
> &gt;&gt; &amp;gt;&amp;gt; Which itself looks like a problem. It takes no less than 25% of overall
> &gt;&gt; &amp;gt;&amp;gt; performance for mere shuffling data in a task unrelated to stuff like
> &gt;&gt; &amp;gt;&amp;gt; recoding etc. This looks wrong to me. I would look for a better
> &gt;&gt; &amp;gt;&amp;gt; algorithm/structure.
> &gt;&gt; &amp;gt; 
> &gt;&gt; &amp;gt; I find this comment puzzling and does not match my experience.
> &gt;&gt; 
> &gt;&gt; It is just an observation that more than 25% of CPU load is spent to heat
> &gt;&gt; the atmosphere rather than doing something useful.
> &gt;&gt; 
> &gt;&gt; &amp;gt; What algorithm for disjoint-set do you suggest I use?
> &gt;&gt; 
> &gt;&gt; I would try to avoid using such things if the problem is large.
> &gt; 
> &gt; Perhaps you would like to outline your methodology for algorithm
> &gt; development?
> 
> I don&#39;t develop algorithms I apply them.
> 
> 25% overhead on indexing is too much.
> 
> &gt; How would you go about designing an efficient algorithm to
> &gt; solve these kind of problems:
> &gt; 
> &gt; • Network connectivity.
> 
> 1. Open amazon site
> 2. Search &quot;netwok adapter&quot; 
> 3. Choose 5 different with minimal price
> 4. Among them select one with maximal positive customer reports
> 5. Buy it
> 
> (Network connectivity is not an algorithmic problem)
> 
> &gt;&gt; No idea what you are trying to do. But indexing *must* be faster than
> &gt;&gt; indirection. You should make your data structures straight.
> &gt; 
> &gt; Straight data structures require copying data. If we were to use an array
> &gt; instead of a linked list then every state change would require copying
> &gt; stuff.
> 
> In Ada you rename array elements. This lets the compiler to choose between
> copying and handling it in-place. BTW copying might turn more effective
> depending on the architecture.
> 
> &gt;&gt; &amp;gt;&amp;gt; If you have performance problems with arrays, which are built-in and
> &gt;&gt; &amp;gt;&amp;gt; subject of tailored optimization, you cannot expect situation improved by
> &gt;&gt; &amp;gt;&amp;gt; puzzling the compiler with more abstract types. That could only disable
> &gt;&gt; &amp;gt;&amp;gt; optimization.
> &gt;&gt; &amp;gt; 
> &gt;&gt; &amp;gt; Maybe you are thinking of something more complex than I am. An ADT is just
> &gt;&gt; &amp;gt; a private type with associated functions in a package.
> &gt;&gt; 
> &gt;&gt; It is difficult for the compiler to optimize wrappers of array operations
> &gt;&gt; like indexing. You said you failed to make GCC to optimize raw array
> &gt;&gt; access. Why wrapping that into functions should do any better?
> &gt; 
> &gt; It doesn&#39;t do any worse... the functions are mostly single line and get
> &gt; inlined before the optimisation pass.
> 
> Did you verify that?
> 
> &gt; On the whole many positives, and no negatives (because its just as fast in my tests).
> 
> But you said you failed to optimize multiplication away.


I should probably expose the array from the package directly and operate on it instead of using an ADT interface and test the performance. Its worth a try anyway. However, to me this seems worse than the ADT with address arithmetic. 

At least with an ADT with internal address arithmetic, you can test and validate the component in isolation through its interface, and package it as a library for the 'user' code to safely use.

Exposing implementation details of a software component (like directly exposing an array type) breaks encapsulation, and prevents separate testing and validation.

When developing large pieces of software, I find good structure and abstraction vital for managing and understanding the code. I can live with address arithmetic as long as it is contained within a component with a good interface.

It is important when optimising complex problems to have good structure, so that for example you can benchmark the whole program with different disjoint-set algorithms (eager-union vs WQUPC for example).

It should be possible to replace an Array based implementation with a hash table.

Each algorithm is based on a set of fundamental concepts, An iterator is perhaps the most commonly used example. You have forward iterator, bidirectional iterator, indexed iteratorm, random iterator. If your algorithm only requires a forward iterator, you only use the interface for a forward iterator, you don't require an indexed iterator. You can then supply a data-structure that meets or exceeds those requirements. An algorithm that uses an indexed iterator could use a linked-list or an array as the data structure for example. So we break software into containers that provide the interfaces, and algorithms that use them.

I would rather use address arithmetic to achieve good performance and maintain the good code structure, than break the structure to avoid address arithmetic and have good performance. There is a third option, avoid address arithmetic, keep the good structure, but not have good performance, but this is not an option for this application.


Cheers,
Keean.


> 
> &gt;&gt; &amp;gt; Element&amp;#39;s is different from the STL. It includes native &amp;#39;concepts&amp;#39; that
> &gt;&gt; &amp;gt; define axiomatic constraints on software modules.
> &gt;&gt; 
> &gt;&gt; I have no idea what &amp;quot;axiomatic constraints on software modules&amp;quot; could mean.
> &gt; 
> &gt; See:
> &gt; 
> &gt; http://akrzemi1.wordpress.com/2012/01/11/concept-axioms-what-for/
> 
> Ah, you meant invariants, propositions regarding program semantics
> maintained true during code transformations, e.g. for optimization.
> 
> In more complex cases user could help the compiler giving some advices. But
> for low-level optimizations like ones of indexing this stuff is buried in
> the compiler guts.
> 
> However, yes, Ada should have interfaces of integer types, interfaces of
> index types, interfaces of array types etc, annotated by
> pre-/post-conditions and invariants. Unfortunately, presently interfaces
> are tied with by-reference types. It is unclear when Ada will provide
> interfaces for all types. Otherwise, regarding program correctness stuff
> there is SPARK and some in Ada 2012.
> 
> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de



On Tuesday, 17 July 2012 07:31:11 UTC+1, Dmitry A. Kazakov  wrote:
> On Mon, 16 Jul 2012 12:12:48 -0700 (PDT), Keean Schupke wrote:
> 
> &gt; On Monday, 16 July 2012 19:48:22 UTC+1, Dmitry A. Kazakov  wrote:
> &gt;&gt; On Mon, 16 Jul 2012 09:39:45 -0700 (PDT), Keean Schupke wrote:
> &gt;&gt; 
> &gt;&gt; &amp;gt; On Monday, 16 July 2012 15:57:04 UTC+1, Dmitry A. Kazakov  wrote:
> &gt;&gt; 
> &gt;&gt; &amp;gt;&amp;gt; Which itself looks like a problem. It takes no less than 25% of overall
> &gt;&gt; &amp;gt;&amp;gt; performance for mere shuffling data in a task unrelated to stuff like
> &gt;&gt; &amp;gt;&amp;gt; recoding etc. This looks wrong to me. I would look for a better
> &gt;&gt; &amp;gt;&amp;gt; algorithm/structure.
> &gt;&gt; &amp;gt; 
> &gt;&gt; &amp;gt; I find this comment puzzling and does not match my experience.
> &gt;&gt; 
> &gt;&gt; It is just an observation that more than 25% of CPU load is spent to heat
> &gt;&gt; the atmosphere rather than doing something useful.
> &gt;&gt; 
> &gt;&gt; &amp;gt; What algorithm for disjoint-set do you suggest I use?
> &gt;&gt; 
> &gt;&gt; I would try to avoid using such things if the problem is large.
> &gt; 
> &gt; Perhaps you would like to outline your methodology for algorithm
> &gt; development?
> 
> I don&#39;t develop algorithms I apply them.
> 
> 25% overhead on indexing is too much.
> 
> &gt; How would you go about designing an efficient algorithm to
> &gt; solve these kind of problems:
> &gt; 
> &gt; • Network connectivity.
> 
> 1. Open amazon site
> 2. Search &quot;netwok adapter&quot; 
> 3. Choose 5 different with minimal price
> 4. Among them select one with maximal positive customer reports
> 5. Buy it
> 
> (Network connectivity is not an algorithmic problem)
> 
> &gt;&gt; No idea what you are trying to do. But indexing *must* be faster than
> &gt;&gt; indirection. You should make your data structures straight.
> &gt; 
> &gt; Straight data structures require copying data. If we were to use an array
> &gt; instead of a linked list then every state change would require copying
> &gt; stuff.
> 
> In Ada you rename array elements. This lets the compiler to choose between
> copying and handling it in-place. BTW copying might turn more effective
> depending on the architecture.
> 
> &gt;&gt; &amp;gt;&amp;gt; If you have performance problems with arrays, which are built-in and
> &gt;&gt; &amp;gt;&amp;gt; subject of tailored optimization, you cannot expect situation improved by
> &gt;&gt; &amp;gt;&amp;gt; puzzling the compiler with more abstract types. That could only disable
> &gt;&gt; &amp;gt;&amp;gt; optimization.
> &gt;&gt; &amp;gt; 
> &gt;&gt; &amp;gt; Maybe you are thinking of something more complex than I am. An ADT is just
> &gt;&gt; &amp;gt; a private type with associated functions in a package.
> &gt;&gt; 
> &gt;&gt; It is difficult for the compiler to optimize wrappers of array operations
> &gt;&gt; like indexing. You said you failed to make GCC to optimize raw array
> &gt;&gt; access. Why wrapping that into functions should do any better?
> &gt; 
> &gt; It doesn&#39;t do any worse... the functions are mostly single line and get
> &gt; inlined before the optimisation pass.
> 
> Did you verify that?
> 
> &gt; On the whole many positives, and no negatives (because its just as fast in my tests).
> 
> But you said you failed to optimize multiplication away.
> 
> &gt;&gt; &amp;gt; Element&amp;#39;s is different from the STL. It includes native &amp;#39;concepts&amp;#39; that
> &gt;&gt; &amp;gt; define axiomatic constraints on software modules.
> &gt;&gt; 
> &gt;&gt; I have no idea what &amp;quot;axiomatic constraints on software modules&amp;quot; could mean.
> &gt; 
> &gt; See:
> &gt; 
> &gt; http://akrzemi1.wordpress.com/2012/01/11/concept-axioms-what-for/
> 
> Ah, you meant invariants, propositions regarding program semantics
> maintained true during code transformations, e.g. for optimization.
> 
> In more complex cases user could help the compiler giving some advices. But
> for low-level optimizations like ones of indexing this stuff is buried in
> the compiler guts.
> 
> However, yes, Ada should have interfaces of integer types, interfaces of
> index types, interfaces of array types etc, annotated by
> pre-/post-conditions and invariants. Unfortunately, presently interfaces
> are tied with by-reference types. It is unclear when Ada will provide
> interfaces for all types. Otherwise, regarding program correctness stuff
> there is SPARK and some in Ada 2012.
> 
> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-17  6:50                   ` Georg Bauhaus
@ 2012-07-17  8:42                     ` Dmitry A. Kazakov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-17  8:42 UTC (permalink / raw)


On Tue, 17 Jul 2012 08:50:00 +0200, Georg Bauhaus wrote:

> On 17.07.12 08:31, Dmitry A. Kazakov wrote:
>>> >‧ Network connectivity.
>> 1. Open amazon site
>> 2. Search "netwok adapter"
>> 3. Choose 5 different with minimal price
>> 4. Among them select one with maximal positive customer reports
>> 5. Buy it
>>
>> (Network connectivity is not an algorithmic problem)
> 
> Uh, I think it's connected graphs (V, E) here, not computers
> and ethernet cables.

For that matter the topology of ethernet cables can be represented by a
graph. As well as wires in a CAT cable or transistors of an Ethernet
controller etc.

> Or social relations. Just the abstraction.

The abstraction here is graph.
 
> A search engine result for "network connectivity algorithm":
> 
> Abstract:
> "This paper surveys the recent progress on the graph algorithms
> for solving network connectivity problems such as the extreme
> set problem, the cactus representation problem, the edge-connectivity
> augmentation problem and the source location problem. In particular,
> we show that efficient algorithms for these problems can be designed
> based on maximum adjacency orderings."

Yep, *graph* algorithms. To solve graph problems which may appear in an
application area of networking, e.g. packet routing etc.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-17  6:40               ` Keean Schupke
@ 2012-07-17  9:01                 ` Dmitry A. Kazakov
  2012-07-18  8:10                   ` Keean Schupke
  0 siblings, 1 reply; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-17  9:01 UTC (permalink / raw)


On Mon, 16 Jul 2012 23:40:21 -0700 (PDT), Keean Schupke wrote:

> The real problem is that each element needs to be addressed spatially (say
> as a 2D grid) so that all 4 direct, and 4 diagonal neighbours can be
> quickly found (using indexed-addressing), another is that we need to be
> able to tell if any two elements are in the same group (using disjoint-set
> union-find), and another is that we need to be able to iterate through all
> the members of a group given any element in that group.
> 
> If there is a faster data structure that meets all those requirements, then I am listening.

Depends on details.

k-d trees might be worth looking.

Less known are "pyramid" structures (used in image processing, filtering,
compression).

For image segmenting, e.g. by area growing (8th-neighbourhood), one
sometimes uses recursive elements permutations like (1,1) (1,2) (2,1) (2,2)
(1,3) (1,4) (2,3) (2,4). That allows to perform segmenting in strictly
consequent order visiting each pixel once. I remember it was popular in
PDP-11 days when integer multiplication was indeed expensive.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 18:40 Efficient Sequential Access to Arrays Keean Schupke
  2012-07-15 19:48 ` Dmitry A. Kazakov
  2012-07-15 19:52 ` Shark8
@ 2012-07-17 18:16 ` Florian Weimer
  2012-07-22 10:01 ` robin.vowels
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 88+ messages in thread
From: Florian Weimer @ 2012-07-17 18:16 UTC (permalink / raw)


* Keean Schupke:

> So assuming we need this level of performance, what would be the
> best (and most idiomatic Ada) way to package this type of usage
> pattern as an abstract datatype?

You should look at compiler dumps to figure out why tge strength
reduction optimization does not kick in.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-17  9:01                 ` Dmitry A. Kazakov
@ 2012-07-18  8:10                   ` Keean Schupke
  2012-07-18 18:11                     ` tmoran
  0 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-18  8:10 UTC (permalink / raw)
  Cc: mailbox

On Tuesday, 17 July 2012 10:01:59 UTC+1, Dmitry A. Kazakov  wrote:
> On Mon, 16 Jul 2012 23:40:21 -0700 (PDT), Keean Schupke wrote:
> 
> &gt; The real problem is that each element needs to be addressed spatially (say
> &gt; as a 2D grid) so that all 4 direct, and 4 diagonal neighbours can be
> &gt; quickly found (using indexed-addressing), another is that we need to be
> &gt; able to tell if any two elements are in the same group (using disjoint-set
> &gt; union-find), and another is that we need to be able to iterate through all
> &gt; the members of a group given any element in that group.
> &gt; 
> &gt; If there is a faster data structure that meets all those requirements, then I am listening.
> 
> Depends on details.
> 
> k-d trees might be worth looking.
> 
> Less known are &quot;pyramid&quot; structures (used in image processing, filtering,
> compression).
> 
> For image segmenting, e.g. by area growing (8th-neighbourhood), one
> sometimes uses recursive elements permutations like (1,1) (1,2) (2,1) (2,2)
> (1,3) (1,4) (2,3) (2,4). That allows to perform segmenting in strictly
> consequent order visiting each pixel once. I remember it was popular in
> PDP-11 days when integer multiplication was indeed expensive.
> 
> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de


A tree structure is going to be slower than an array for the spatial lookup part. Indexing is faster than links (I think you said so yourself). So why suggesting a linked structure to replace the array?

One reason we use links to represent the groups is that by using a cirular list we can join two groups together very quickly using an exchange. If X is any element from the first circular list, and Y any element from the second circular list we can join the two into a single circular list by doing:

T := X.Next; X.Next = Y.Next; Y.Next := T; -- or Swap(X.Next, Y.Next)

And yes, joining groups is the most frequent operation on the groups, so copying one group into the other using arrays is slower.

So 2D spatial indexing (array structure) is optimal for spatial lookup. Circular lists between element is optimal for joining groups. The link to the canonical element from the eager-union algorithm is optimal for lookup dominated disjoint set.

So you can see the data structure is already optimal for the operations frequently required. It is an array of elements that contain a next element pointer to implement a circular linked list and a canonical element pointer to identify the sub-set.

So we have the algorithm, and the data structure, which is optimal from an analytical perspective. Some observations:

- The data structure and algorithm is optimal from a machine operation perspective (if we consider the relative speeds of the machine code operations)

- If the language cannot encode this structure, it cannot get the fastest performance as the language cannot be faster than the machine code operations.

- I may have misdirected people by looking at the loop behaviour of index optimisation. In my tests it is just as fast as pointer arithmetic. The real problem is that we need to combine the circular linked list into the array and allow addressing of neighbours from the resulting pointer dereference.

All this is fine, the problem in Ada is this:

- When we lookup an element spatially by index, and then iterate through the subset members (following the next links) we lose the ability to index spatially from the next element.

- If we instead store the index of the next element instead of the link, it is slower because we have to use a relative addressing mode instead of an absolute one.

- The compiler cannot use any optimisation of the array indexing because the index comes from main-memory. It is not stepping through in sequence so it cannot use strength reduction. There is no point in reading compiler documentation, this is beyond any current optimisation technology to do this, and no amount of shuffling the code about will change this.

- doing a division to convert the address of the next node back into an index to address the neighbours does not make any sense as division is slower than multiplication.

- The conclusion is the optimal way to do this is to access the neighbours by using the 'access' stored in the next pointer, and calculate the memory location of the neighbours by address arithmetic.

- We need to package this into an ADT to hide all the address stuff so it can be unit tested and validated in isolation from the rest of the application.


Does that make sense?


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-18  8:10                   ` Keean Schupke
@ 2012-07-18 18:11                     ` tmoran
  2012-07-19 10:41                       ` Keean Schupke
  0 siblings, 1 reply; 88+ messages in thread
From: tmoran @ 2012-07-18 18:11 UTC (permalink / raw)


> - The data structure and algorithm is optimal from a machine operation
> perspective (if we consider the relative speeds of the machine code operations)
   Taking into account all the fancy pipelining and multiple arithmetic units
of the machine you are targeting?  How fast does this run in hand optimized
asm code?

>  If we instead store the index of the next element instead of the link, it
>  is slower because we have to use a relative addressing mode instead of an
> absolute one.
  How about storing not only links to other nodes, but have each node also
store a copy of its own index.

> to address the neighbours does not make any sense as division is slower
> than multiplication.
  That depends.  If the divisor is a power of two you can shift.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-18 18:11                     ` tmoran
@ 2012-07-19 10:41                       ` Keean Schupke
  2012-07-19 11:18                         ` Georg Bauhaus
  0 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-19 10:41 UTC (permalink / raw)


On Wednesday, 18 July 2012 19:11:14 UTC+1, (unknown)  wrote:
> &gt; - The data structure and algorithm is optimal from a machine operation
> &gt; perspective (if we consider the relative speeds of the machine code operations)
>    Taking into account all the fancy pipelining and multiple arithmetic units
> of the machine you are targeting?  How fast does this run in hand optimized
> asm code?

In terms of algorithm analysis yes. An O(1) algorithm is going to be faster than an O(N) algorithm no matter what fancy pipelining is involved. Array index lookup is O(1), tree lookup is O(log(n)). For the groups the complexity of joining groups is O(1) for a linked list and O(N) for an array. For eager disjoint set the complexity of finding if two nodes are in the same set is O(1).

So what we have is a data structure that allows all the dominant operations to be executed in O(1) time.

It is not going to be faster to move to an O(log(N)) tree structure for the spatial lookup no matter what pipelining the CPU has.

> 
> &gt;  If we instead store the index of the next element instead of the link, it
> &gt;  is slower because we have to use a relative addressing mode instead of an
> &gt; absolute one.
>   How about storing not only links to other nodes, but have each node also
> store a copy of its own index.

This is a good idea. It will be slower because we have to read an extra value from memory, but we might be able to live with that.

What is more of a problem is that the node structure is 32 bytes long, so two nodes fit in a 64bit cache line, and the index operation is shift left by 5. Adding the extra Index word will probably slow things down significantly, unless we can save some space elsewhere in the structure.

I may be able to replace one Integer with a Short_Int, and use a Short_Int for the index. I think this could be worth trying.

> 
> &gt; to address the neighbours does not make any sense as division is slower
> &gt; than multiplication.
>   That depends.  If the divisor is a power of two you can shift.

No, it is very unlikely to be a power of two. Besides which turning an access into an address, and then turning this into an index seems worse than using offset addressing from the access address to find the neighbours.


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-19 10:41                       ` Keean Schupke
@ 2012-07-19 11:18                         ` Georg Bauhaus
  2012-07-19 19:58                           ` Keean Schupke
  0 siblings, 1 reply; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-19 11:18 UTC (permalink / raw)


On 19.07.12 12:41, Keean Schupke wrote:
> What is more of a problem is that the node structure is 32 bytes long, so two nodes fit in a 64bit cache line, and the index operation is shift left by 5. Adding the extra Index word will probably slow things down significantly, unless we can save some space elsewhere in the structure.
> 
> I may be able to replace one Integer with a Short_Int, and use a Short_Int for the index. I think this could be worth trying.

I'll assume it is a 32 bit program already? (As GNAT uses 32 bis
for addresses then, not 64.)




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-19 11:18                         ` Georg Bauhaus
@ 2012-07-19 19:58                           ` Keean Schupke
  2012-07-21  8:24                             ` Keean Schupke
  0 siblings, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-19 19:58 UTC (permalink / raw)


On Thursday, 19 July 2012 12:18:11 UTC+1, Georg Bauhaus  wrote:
> On 19.07.12 12:41, Keean Schupke wrote:
> &gt; What is more of a problem is that the node structure is 32 bytes long, so two nodes fit in a 64bit cache line, and the index operation is shift left by 5. Adding the extra Index word will probably slow things down significantly, unless we can save some space elsewhere in the structure.
> &gt; 
> &gt; I may be able to replace one Integer with a Short_Int, and use a Short_Int for the index. I think this could be worth trying.
> 
> I&#39;ll assume it is a 32 bit program already? (As GNAT uses 32 bis
> for addresses then, not 64.)

Its a 64bit platform, so the pointers are 64bits but the integers 32bit. I think what I was thinking is with the indexed version, I may be able to shrink the whole data structure from 32bytes to 16bytes, and that would fit 4 elements in a cache line instead of two. Maybe this would make up for the slower indexed addressing. It will try and post my results.

Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-19 19:58                           ` Keean Schupke
@ 2012-07-21  8:24                             ` Keean Schupke
  2012-07-21 11:52                               ` Georg Bauhaus
  2012-07-22  5:13                               ` Shark8
  0 siblings, 2 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-21  8:24 UTC (permalink / raw)


I have re-implemented the code to use array indexing, but with the improvements I made in the profile-guided thread. The indexed code does not use 'access' variables at all, but uses 'renames' and 'in out' functions (from Ada-2012), except for the one function discussed below.

Current performance is:

Relative Addressing code 57,000 simulations per second
Array Indexing code 42,000 simulations per second

So the relative addressing code is 36% faster. I have some ideas to improve the array indexed version, as the element size is no longer a power of two that may be having some affect. I can also pack the data structure tighter as I think I can fit it in 16 bytes now I only have to store two 16bit indexes instead of two 64bit addresses. 

In rewriting this code I came across the following issue, that I have not yet found a totally satisfactory solution to. I have an ADT representing the disjoint set, but the element stored in it is limited and I cannot return it from a function. I have several options:

1) throw away structured programming and make everything happen in one package directly on the array, hurting readability, understandability, maintainability and losing the ability to separately test and validate the code.

2) make the array part of the public interface for the package, breaking encapsulation and the ability to separately test and validate the code.

3) return an access, however to avoid an access level violation I have to pass an access to the ADT into this function instead of the ADT itself.

Ignoring performance which is the best way to structure this? Is there another way I have missed?


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-21  8:24                             ` Keean Schupke
@ 2012-07-21 11:52                               ` Georg Bauhaus
  2012-07-21 16:14                                 ` Keean Schupke
  2012-07-22  5:13                               ` Shark8
  1 sibling, 1 reply; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-21 11:52 UTC (permalink / raw)


On 21.07.12 10:24, Keean Schupke wrote:
> Ignoring performance which is the best way to structure this? Is there another way I have missed?

If returning had meant a copy anyway, a copying procedure
seems another option.

Also, are there cases where some callback style interface would do?
That is, pass an "algorithm" to the containing structure for
computation much like the Process parameter of Query_Element,
or like when GUI classes expect programmers to write a concrete
implementation of some abstract type that specifies the protocol.






^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-21 11:52                               ` Georg Bauhaus
@ 2012-07-21 16:14                                 ` Keean Schupke
  2012-07-21 17:09                                   ` Keean Schupke
       [not found]                                   ` <i78m081tccmp078spmsei1l5vnj3k0kbkj@invalid.netcom.com>
  0 siblings, 2 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-21 16:14 UTC (permalink / raw)


On Saturday, 21 July 2012 12:52:48 UTC+1, Georg Bauhaus  wrote:
> On 21.07.12 10:24, Keean Schupke wrote:
> &gt; Ignoring performance which is the best way to structure this? Is there another way I have missed?
> 
> If returning had meant a copy anyway, a copying procedure
> seems another option.

Okay, but not suitable for large values if we care about performance.

> 
> Also, are there cases where some callback style interface would do?
> That is, pass an &quot;algorithm&quot; to the containing structure for
> computation much like the Process parameter of Query_Element,
> or like when GUI classes expect programmers to write a concrete
> implementation of some abstract type that specifies the protocol.

Here are my results:

24 byte nodes (all integers) : 42k sims per second
16 byte nodes (all short_ints) : 40k sims per second
32 byte nodes (all integers + padding) : 42k sims per second.

So 32bit integers seem faster than 16bit integers, and as the whole data set fits in the cache (its less than 10kbytes) the size of the nodes does not really affect performance.

I try something like query element next and see how that performs.


Cheers,
Keean.


On Saturday, 21 July 2012 12:52:48 UTC+1, Georg Bauhaus  wrote:
> On 21.07.12 10:24, Keean Schupke wrote:
> &gt; Ignoring performance which is the best way to structure this? Is there another way I have missed?
> 
> If returning had meant a copy anyway, a copying procedure
> seems another option.
> 
> Also, are there cases where some callback style interface would do?
> That is, pass an &quot;algorithm&quot; to the containing structure for
> computation much like the Process parameter of Query_Element,
> or like when GUI classes expect programmers to write a concrete
> implementation of some abstract type that specifies the protocol.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-21 16:14                                 ` Keean Schupke
@ 2012-07-21 17:09                                   ` Keean Schupke
       [not found]                                   ` <i78m081tccmp078spmsei1l5vnj3k0kbkj@invalid.netcom.com>
  1 sibling, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-21 17:09 UTC (permalink / raw)


On Saturday, 21 July 2012 17:14:11 UTC+1, Keean Schupke  wrote:
> On Saturday, 21 July 2012 12:52:48 UTC+1, Georg Bauhaus  wrote:
> &gt; On 21.07.12 10:24, Keean Schupke wrote:
> &gt; &amp;gt; Ignoring performance which is the best way to structure this? Is there another way I have missed?
> &gt; 
> &gt; If returning had meant a copy anyway, a copying procedure
> &gt; seems another option.
> 
> Okay, but not suitable for large values if we care about performance.
> 
> &gt; 
> &gt; Also, are there cases where some callback style interface would do?
> &gt; That is, pass an &amp;quot;algorithm&amp;quot; to the containing structure for
> &gt; computation much like the Process parameter of Query_Element,
> &gt; or like when GUI classes expect programmers to write a concrete
> &gt; implementation of some abstract type that specifies the protocol.
> 
> Here are my results:
> 
> 24 byte nodes (all integers) : 42k sims per second
> 16 byte nodes (all short_ints) : 40k sims per second
> 32 byte nodes (all integers + padding) : 42k sims per second.
> 
> So 32bit integers seem faster than 16bit integers, and as the whole data set fits in the cache (its less than 10kbytes) the size of the nodes does not really affect performance.
> 
> I try something like query element next and see how that performs.
> 
> 
> Cheers,
> Keean.
> 
> 
> On Saturday, 21 July 2012 12:52:48 UTC+1, Georg Bauhaus  wrote:
> &gt; On 21.07.12 10:24, Keean Schupke wrote:
> &gt; &amp;gt; Ignoring performance which is the best way to structure this? Is there another way I have missed?
> &gt; 
> &gt; If returning had meant a copy anyway, a copying procedure
> &gt; seems another option.
> &gt; 
> &gt; Also, are there cases where some callback style interface would do?
> &gt; That is, pass an &amp;quot;algorithm&amp;quot; to the containing structure for
> &gt; computation much like the Process parameter of Query_Element,
> &gt; or like when GUI classes expect programmers to write a concrete
> &gt; implementation of some abstract type that specifies the protocol.

With Query_Element, why is passing an anonymous access to a function considered better than returning an access type? I guess the best way to do this is with generics to avoid the function access?

Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-21  8:24                             ` Keean Schupke
  2012-07-21 11:52                               ` Georg Bauhaus
@ 2012-07-22  5:13                               ` Shark8
  1 sibling, 0 replies; 88+ messages in thread
From: Shark8 @ 2012-07-22  5:13 UTC (permalink / raw)


On Saturday, July 21, 2012 2:24:19 AM UTC-6, Keean Schupke wrote:
> I have an ADT representing the disjoint set, but the element stored in it is limited and I cannot return it from a function.

So, what's stopping you from having something like this:
Procedure Adjustment( Object : In Out Disjoint_Set; State: In Boolean_Array_of_elements );
set as a Primitive operation?

Actually, can't primitive operations (as functions) return the limited type, you know for initialization and such? The rationale says so. ( http://www.adaic.org/resources/add_content/standards/05rat/html/Rat-4-5.html )

Ada 2005 introduces a new approach to returning the results from functions which can be used to solve this and other problems.
We will first consider the case of a type that is limited such as 
type T is limited
   record
      A: Integer;
      B: Boolean;
      C: Float;
   end record;
We can declare a function that returns a value of type T provided that the return does not involve any copying. For example we could have 
function Init(X: Integer; Y: Boolean; Z: Float) return T is
begin
   return (X, Y, Z);
end Init;



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 18:40 Efficient Sequential Access to Arrays Keean Schupke
                   ` (2 preceding siblings ...)
  2012-07-17 18:16 ` Florian Weimer
@ 2012-07-22 10:01 ` robin.vowels
  2012-07-30  6:31   ` Jacob Sparre Andersen
  2012-07-22 10:09 ` robin.vowels
  2012-07-22 16:21 ` Florian Weimer
  5 siblings, 1 reply; 88+ messages in thread
From: robin.vowels @ 2012-07-22 10:01 UTC (permalink / raw)


On Monday, 16 July 2012 04:40:08 UTC+10, Keean Schupke  wrote:
> The Monte-Carlo simulator I am working on is now performing better in Ada than in C++ using profile-guided compilation (57k simulations per second for Ada &#39;vs&#39; 56k simulations per second for C++).

One part in 59 is not a significant difference.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 18:40 Efficient Sequential Access to Arrays Keean Schupke
                   ` (3 preceding siblings ...)
  2012-07-22 10:01 ` robin.vowels
@ 2012-07-22 10:09 ` robin.vowels
  2012-07-22 16:02   ` Keean Schupke
  2012-07-22 16:21 ` Florian Weimer
  5 siblings, 1 reply; 88+ messages in thread
From: robin.vowels @ 2012-07-22 10:09 UTC (permalink / raw)


On Monday, 16 July 2012 04:40:08 UTC+10, Keean Schupke  wrote:

> However in order to achieve this performance I needed to rework the arrays as the use of Indexes was too slow. An example of this is lets say we are accessing element 5 from array A &quot;A(5)&quot; this requires a multiplication to access (base_address + index * record_size). To access the neighbour A(6) also requires a multiplication. Accessing the array sequentially requires a multiplication per step.

Have you tried turning on high optimisation?

If you are accessing elements sequentially,
the optimiser will probably not do multiplications,
but instead access successive elements by means of additions.

Other factors may be relevant:
1. For example, if the size of a
   record is small-ish, the compiler may use shifts and/or
   adds/subtracts instead of multiplications.

2. The hardware multiplier may be one of the fast kind,
   in which case the time is negligible.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
       [not found]                                   ` <i78m081tccmp078spmsei1l5vnj3k0kbkj@invalid.netcom.com>
@ 2012-07-22 15:50                                     ` Keean Schupke
  0 siblings, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-22 15:50 UTC (permalink / raw)


On Saturday, 21 July 2012 22:38:49 UTC+1, Dennis Lee Bieber  wrote:
> On Sat, 21 Jul 2012 09:14:11 -0700 (PDT), Keean Schupke declaimed the following in comp.lang.ada:
> 
> 
> &gt; Here are my results:
> &gt; 
> &gt; 24 byte nodes (all integers) : 42k sims per second
> &gt; 16 byte nodes (all short_ints) : 40k sims per second
> &gt; 32 byte nodes (all integers + padding) : 42k sims per second.
> &gt; 
> &gt; So 32bit integers seem faster than 16bit integers, and as the whole data set fits in the cache (its less than 10kbytes) the size of the nodes does not really affect performance.
> &gt;
> 	Presuming the computer has a 32-bit bus, those results are viable...
> The 16-bit data may require loading 32-bit words followed by shifting
> and/or masking the undesired part.
> -- 
> 	Wulfraed                 Dennis Lee Bieber         AF6VN

The CPU is 64bits wide, the cache lines are 64 bytes long. The CPU is connected to the cache by a bus much wider than 64 bits (512 bits I think) and the cache is several megabytes so the working set easily fits in the bus. The main memory controller has multiple channels operating at 32 bit width but clocked independently of the processor so it is better to think in terms of memory bandwidth and page-miss cost of the DRAM.


Cheers,
Keean.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 10:09 ` robin.vowels
@ 2012-07-22 16:02   ` Keean Schupke
  0 siblings, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-22 16:02 UTC (permalink / raw)


On Sunday, 22 July 2012 11:09:44 UTC+1, (unknown)  wrote:
> On Monday, 16 July 2012 04:40:08 UTC+10, Keean Schupke  wrote:
> 
> &gt; However in order to achieve this performance I needed to rework the arrays as the use of Indexes was too slow. An example of this is lets say we are accessing element 5 from array A &amp;quot;A(5)&amp;quot; this requires a multiplication to access (base_address + index * record_size). To access the neighbour A(6) also requires a multiplication. Accessing the array sequentially requires a multiplication per step.
> 
> Have you tried turning on high optimisation?


Yes this is compiled with

test: test.adb
	rm -f *.gcda
	gnatmake -march=native -O3 -fprofile-generate -ftest-coverage -gnatp -gnatn test -bargs -shared -cargs -fdata-sections -ffunction-sections -largs -Wl,--gc-sections
	./test
	touch test.adb
	gnatmake -march=native -O3 -fprofile-use -gnatp -gnatn test -bargs -shared -cargs -fdata-sections -ffunction-sections -largs -Wl,--gc-sections


> 
> If you are accessing elements sequentially,
> the optimiser will probably not do multiplications,
> but instead access successive elements by means of additions.


It is difficult to say if lack of this optimisation is the cause of the slowness.

What does seem to be the case is that storing the index of the next element of a linked list is slower than storing the address.


> 
> Other factors may be relevant:
> 1. For example, if the size of a
>    record is small-ish, the compiler may use shifts and/or
>    adds/subtracts instead of multiplications.
> 
> 2. The hardware multiplier may be one of the fast kind,
>    in which case the time is negligible.


The target (achieved by the relative addressing code) is 57,000 simulations per second. Currently the fastest version using array indexing is 42,000 simulations per second. 36% is too big a difference to ignore, and this is a real world problem not a benchmark. So I am looking for ideas to improve the array indexing version. 

I am not sure what is the cause of the speed difference, it may not be the actual array index lookup time, but a consequence of using array lookup where the other version uses direct addressing (pointer dereference with no offset).


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-15 18:40 Efficient Sequential Access to Arrays Keean Schupke
                   ` (4 preceding siblings ...)
  2012-07-22 10:09 ` robin.vowels
@ 2012-07-22 16:21 ` Florian Weimer
  2012-07-22 16:46   ` Keean Schupke
  2012-07-22 17:34   ` Niklas Holsti
  5 siblings, 2 replies; 88+ messages in thread
From: Florian Weimer @ 2012-07-22 16:21 UTC (permalink / raw)


* Keean Schupke:

> So assuming we need this level of performance, what would be the
> best (and most idiomatic Ada) way to package this type of usage
> pattern as an abstract datatype?

You should look at compiler dumps to figure out why the strength
reduction optimization does not kick in.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 16:21 ` Florian Weimer
@ 2012-07-22 16:46   ` Keean Schupke
  2012-07-22 18:41     ` Florian Weimer
  2012-07-22 17:34   ` Niklas Holsti
  1 sibling, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-22 16:46 UTC (permalink / raw)


On Sunday, 22 July 2012 17:21:57 UTC+1, Florian Weimer  wrote:
> * Keean Schupke:
> 
> &gt; So assuming we need this level of performance, what would be the
> &gt; best (and most idiomatic Ada) way to package this type of usage
> &gt; pattern as an abstract datatype?
> 
> You should look at compiler dumps to figure out why the strength
> reduction optimization does not kick in.

What is the best way to do that? The program uses extensive inlining and optimisation, so individual functions are hard to find in the output. By the time its compiled its about a single 32k function.


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 16:21 ` Florian Weimer
  2012-07-22 16:46   ` Keean Schupke
@ 2012-07-22 17:34   ` Niklas Holsti
  2012-07-22 18:21     ` Keean Schupke
  2012-07-24 16:00     ` robin.vowels
  1 sibling, 2 replies; 88+ messages in thread
From: Niklas Holsti @ 2012-07-22 17:34 UTC (permalink / raw)


On 12-07-22 19:21 , Florian Weimer wrote:
> * Keean Schupke:
>
>> So assuming we need this level of performance, what would be the
>> best (and most idiomatic Ada) way to package this type of usage
>> pattern as an abstract datatype?
>
> You should look at compiler dumps to figure out why the strength
> reduction optimization does not kick in.
>

As I understand it, the program is accessing A(I,J), where A is a 2D 
array, and also accessing A(I+1,J), A(I,J+1), and the other neighbours 
of A(I,J), eight in all.

I,J are indices that are *not* traversed in sequential order, but in 
some scattered order.

Computing the address of A(I,J) involves multiplications of J and I with 
the size of an array element and the size of an array row, respectively. 
The problem seems to be that the results of these multiplications are 
not reused in computing the address of A(I+1,J) etc.

If the index constraints (or at least the row length) of A are static 
(compile-time) constants, and the address of A(I,J) has been computed, 
and c1 and c2 are some other compile-time constants (-1, 0, +1 in this 
program), then the address of A(I+c1,J+c2) can be computed by adding a 
static offset to the address of A(I,J). This is what happens when A(I,J) 
and the neighbours are accessed through a C pointer, or when Keean uses 
address arithmetic explicitly in the Ada program. The question is why 
the Ada compiler is not doing this optimization.

As I understand it, "strength reduction" is a loop optimization where an 
expression of the form C*i, with "i" the loop index, is replaced by a 
new variable that is incremented by C on each loop iteration. But in 
Keean's program the indices I,J are not loop indices, nor linearly 
related to loop indices.

The optimization that would be required here is to replace (X+C)*F by 
X*F+C*F, when C and F are static constants and X*F has already been 
computed and is available. This would replace the multiplication by an 
addition of a static constant. I've never seen this special kind of 
optimization proposed or described.

(We are of course (I believe) assuming that index range checks are 
turned off.)

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 17:34   ` Niklas Holsti
@ 2012-07-22 18:21     ` Keean Schupke
  2012-07-30 14:18       ` Jacob Sparre Andersen
  2012-07-24 16:00     ` robin.vowels
  1 sibling, 1 reply; 88+ messages in thread
From: Keean Schupke @ 2012-07-22 18:21 UTC (permalink / raw)


On Sunday, 22 July 2012 18:34:26 UTC+1, Niklas Holsti  wrote:
> On 12-07-22 19:21 , Florian Weimer wrote:
> &gt; * Keean Schupke:
> &gt;
> &gt;&gt; So assuming we need this level of performance, what would be the
> &gt;&gt; best (and most idiomatic Ada) way to package this type of usage
> &gt;&gt; pattern as an abstract datatype?
> &gt;
> &gt; You should look at compiler dumps to figure out why the strength
> &gt; reduction optimization does not kick in.
> &gt;
> 
> As I understand it, the program is accessing A(I,J), where A is a 2D 
> array, and also accessing A(I+1,J), A(I,J+1), and the other neighbours 
> of A(I,J), eight in all.
> 
> I,J are indices that are *not* traversed in sequential order, but in 
> some scattered order.
> 
> Computing the address of A(I,J) involves multiplications of J and I with 
> the size of an array element and the size of an array row, respectively. 
> The problem seems to be that the results of these multiplications are 
> not reused in computing the address of A(I+1,J) etc.
> 
> If the index constraints (or at least the row length) of A are static 
> (compile-time) constants, and the address of A(I,J) has been computed, 
> and c1 and c2 are some other compile-time constants (-1, 0, +1 in this 
> program), then the address of A(I+c1,J+c2) can be computed by adding a 
> static offset to the address of A(I,J). This is what happens when A(I,J) 
> and the neighbours are accessed through a C pointer, or when Keean uses 
> address arithmetic explicitly in the Ada program. The question is why 
> the Ada compiler is not doing this optimization.
> 
> As I understand it, &quot;strength reduction&quot; is a loop optimization where an 
> expression of the form C*i, with &quot;i&quot; the loop index, is replaced by a 
> new variable that is incremented by C on each loop iteration. But in 
> Keean&#39;s program the indices I,J are not loop indices, nor linearly 
> related to loop indices.
> 
> The optimization that would be required here is to replace (X+C)*F by 
> X*F+C*F, when C and F are static constants and X*F has already been 
> computed and is available. This would replace the multiplication by an 
> addition of a static constant. I&#39;ve never seen this special kind of 
> optimization proposed or described.
> 
> (We are of course (I believe) assuming that index range checks are 
> turned off.)
> 
> -- 
> Niklas Holsti
> Tidorum Ltd
> niklas holsti tidorum fi
>        .      @       .

That's some of it, the other part is that sometimes the 'scattered order' is from following a linked list between elements. So A1(I, J) is computed from a PRNG, and we access A1(I, J) and may access all or none of A1(I, J-1), A1(I-1, J), A1(I+1, J), A1(I, J+1) and all or none of A1(I-1, J-1), A1(I+1, J-1), A1(I-1, J+1), A1(I+1, J+1). We may also access the 'canonical' element of the subset A* which in the fast code is a link and the neighbours addresses calculated from the link address, and all the elements in the subset (A2, A3, .. AN) and each of their horizontal and vertical neighbours.

So with the relative addressing code, stepping through the subset is dereference link, add constant[+/- row_size * element_size] and constant[+/- element_size] to access the neighbours, then dereference next link etc. The size of the array are generic parameters, so I presume the constants get pre-calculated at elaboration time.

With the indexed version, it is storing the index of the canonical element of the subset, and the index of the next element in the subset instead of the address.

Does that help determine if it is going to be possible to get better performance from the array indexed version?


Cheers,
Keean.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 16:46   ` Keean Schupke
@ 2012-07-22 18:41     ` Florian Weimer
  2012-07-24  8:21       ` Keean Schupke
  0 siblings, 1 reply; 88+ messages in thread
From: Florian Weimer @ 2012-07-22 18:41 UTC (permalink / raw)


* Keean Schupke:

> What is the best way to do that? The program uses extensive inlining
> and optimisation, so individual functions are hard to find in the
> output. By the time its compiled its about a single 32k function.

-gnatG shows you Ada-like code which is passed to the compiler
middle-end, and -fdump-tree-all writes files showing the results of
individual passes.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 18:41     ` Florian Weimer
@ 2012-07-24  8:21       ` Keean Schupke
  0 siblings, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-24  8:21 UTC (permalink / raw)


On Sunday, 22 July 2012 19:41:29 UTC+1, Florian Weimer  wrote:
> * Keean Schupke:
> 
> &gt; What is the best way to do that? The program uses extensive inlining
> &gt; and optimisation, so individual functions are hard to find in the
> &gt; output. By the time its compiled its about a single 32k function.
> 
> -gnatG shows you Ada-like code which is passed to the compiler
> middle-end, and -fdump-tree-all writes files showing the results of
> individual passes.

At the moment my conclusion is with the current state of the compiler, you cannot match the performance of the relative addressing version of this algorithm with indexed addressing. 

Maybe compiler technology will improve (but that's not useful right now), and maybe there's a better algorithm (but I don't know what it is).

So for the moment it looks like I am going to stick with the relative addressing as the solution that best addresses the project goals.

It seems that for simple cases the belief that the compiler can optimise array indexing to be just as fast as pointer arithmetic is true, but it does not appear to be the case for all algorithms. Certainly I seem to have one of the cases where it is not true. As all the most common operations are supported by the data structures I am using in O(1) time, I don't hold out much hope of finding a faster algorithm more suited to indexed addressing.


Cheers,
Keean.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 17:34   ` Niklas Holsti
  2012-07-22 18:21     ` Keean Schupke
@ 2012-07-24 16:00     ` robin.vowels
  2012-07-25  7:09       ` Niklas Holsti
  1 sibling, 1 reply; 88+ messages in thread
From: robin.vowels @ 2012-07-24 16:00 UTC (permalink / raw)


On Monday, 23 July 2012 03:34:26 UTC+10, Niklas Holsti  wrote:
> On 12-07-22 19:21 , Florian Weimer wrote:
> &gt; * Keean Schupke:
> &gt;
> &gt;&gt; So assuming we need this level of performance, what would be the
> &gt;&gt; best (and most idiomatic Ada) way to package this type of usage
> &gt;&gt; pattern as an abstract datatype?
> &gt;
> &gt; You should look at compiler dumps to figure out why the strength
> &gt; reduction optimization does not kick in.
> &gt;
> 
> As I understand it, the program is accessing A(I,J), where A is a 2D 
> array, and also accessing A(I+1,J), A(I,J+1), and the other neighbours 
> of A(I,J), eight in all.
> 
> I,J are indices that are *not* traversed in sequential order, but in 
> some scattered order.
> 
> Computing the address of A(I,J) involves multiplications of J and I with 
> the size of an array element and the size of an array row, respectively. 
> The problem seems to be that the results of these multiplications are 
> not reused in computing the address of A(I+1,J) etc.
> 
> If the index constraints (or at least the row length) of A are static 
> (compile-time) constants, and the address of A(I,J) has been computed, 
> and c1 and c2 are some other compile-time constants (-1, 0, +1 in this 
> program), then the address of A(I+c1,J+c2) can be computed by adding a 
> static offset to the address of A(I,J). This is what happens when A(I,J) 
> and the neighbours are accessed through a C pointer, or when Keean uses 
> address arithmetic explicitly in the Ada program. The question is why 
> the Ada compiler is not doing this optimization.
> 
> As I understand it, &quot;strength reduction&quot; is a loop optimization where an 
> expression of the form C*i, with &quot;i&quot; the loop index, is replaced by a 
> new variable that is incremented by C on each loop iteration. But in 
> Keean&#39;s program the indices I,J are not loop indices, nor linearly 
> related to loop indices.
> 
> The optimization that would be required here is to replace (X+C)*F by 
> X*F+C*F, when C and F are static constants and X*F has already been 
> computed and is available. This would replace the multiplication by an 
> addition of a static constant. I've never seen this special kind of 
> optimization proposed or described.

I think that it's a standard optimisation.

For the PL/I optimising compiler for the CDC Cyber (c. 1980),
multiplication was eliminated for matrix elements
by using a table of offsets into each row.
Thus, double subscripting reduced to simple addition.

> (We are of course (I believe) assuming that index range checks are 
> turned off.)

Index range checks should be left on.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-24 16:00     ` robin.vowels
@ 2012-07-25  7:09       ` Niklas Holsti
  2012-07-25  9:03         ` Keean Schupke
  2012-07-26 14:47         ` Robin Vowels
  0 siblings, 2 replies; 88+ messages in thread
From: Niklas Holsti @ 2012-07-25  7:09 UTC (permalink / raw)


On 12-07-24 19:00 , robin.vowels@gmail.com wrote:
> On Monday, 23 July 2012 03:34:26 UTC+10, Niklas Holsti  wrote:
>> On 12-07-22 19:21 , Florian Weimer wrote:
>> &gt; * Keean Schupke:
>> &gt;
>> &gt;&gt; So assuming we need this level of performance, what would be the
>> &gt;&gt; best (and most idiomatic Ada) way to package this type of usage
>> &gt;&gt; pattern as an abstract datatype?
>> &gt;
>> &gt; You should look at compiler dumps to figure out why the strength
>> &gt; reduction optimization does not kick in.
>> &gt;
>>
>> As I understand it, the program is accessing A(I,J), where A is a 2D
>> array, and also accessing A(I+1,J), A(I,J+1), and the other neighbours
>> of A(I,J), eight in all.
>>
>> I,J are indices that are *not* traversed in sequential order, but in
>> some scattered order.
>>
>> Computing the address of A(I,J) involves multiplications of J and I with
>> the size of an array element and the size of an array row, respectively.
>> The problem seems to be that the results of these multiplications are
>> not reused in computing the address of A(I+1,J) etc.
>>
>> If the index constraints (or at least the row length) of A are static
>> (compile-time) constants, and the address of A(I,J) has been computed,
>> and c1 and c2 are some other compile-time constants (-1, 0, +1 in this
>> program), then the address of A(I+c1,J+c2) can be computed by adding a
>> static offset to the address of A(I,J). This is what happens when A(I,J)
>> and the neighbours are accessed through a C pointer, or when Keean uses
>> address arithmetic explicitly in the Ada program. The question is why
>> the Ada compiler is not doing this optimization.
>>
>> As I understand it, &quot;strength reduction&quot; is a loop optimization where an
>> expression of the form C*i, with &quot;i&quot; the loop index, is replaced by a
>> new variable that is incremented by C on each loop iteration. But in
>> Keean&#39;s program the indices I,J are not loop indices, nor linearly
>> related to loop indices.
>>
>> The optimization that would be required here is to replace (X+C)*F by
>> X*F+C*F, when C and F are static constants and X*F has already been
>> computed and is available. This would replace the multiplication by an
>> addition of a static constant. I've never seen this special kind of
>> optimization proposed or described.
>
> I think that it's a standard optimisation.

I'm willing to be convinced of that. But it seems that GNAT/GCC is not 
performing this optimization in Keean's program.

> For the PL/I optimising compiler for the CDC Cyber (c. 1980),
> multiplication was eliminated for matrix elements
> by using a table of offsets into each row.
> Thus, double subscripting reduced to simple addition.

That sounds like a specific choice of code generation for arrays, rather 
than a general optimization.

Keean could simulate this code by defining the array as a 
one-dimensional array of one-dimensional arrays (rows), which would 
replace the row-length multiplication by an additional indexing.

>> (We are of course (I believe) assuming that index range checks are
>> turned off.)
>
> Index range checks should be left on.

Sure, in general. But here the OP (Keean) is trying to beat the 
performance of C++ that uses pointers without index checks. Enabling 
index checks in the Ada code would probably (in this case, where the 
algorithm has scatterered indexing of the array) slow it by an amount 
that swamps the effect of the multiplication optimization.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-25  7:09       ` Niklas Holsti
@ 2012-07-25  9:03         ` Keean Schupke
  2012-07-26  0:15           ` Randy Brukardt
  2012-07-26  7:11           ` Markus Schöpflin
  2012-07-26 14:47         ` Robin Vowels
  1 sibling, 2 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-25  9:03 UTC (permalink / raw)


On Wednesday, 25 July 2012 08:09:09 UTC+1, Niklas Holsti  wrote:
> On 12-07-24 19:00 , robin vowels wrote:
> &gt; On Monday, 23 July 2012 03:34:26 UTC+10, Niklas Holsti  wrote:
> &gt;&gt; On 12-07-22 19:21 , Florian Weimer wrote:
> &gt;&gt; &amp;gt; * Keean Schupke:
> &gt;&gt; &amp;gt;
> &gt;&gt; &amp;gt;&amp;gt; So assuming we need this level of performance, what would be the
> &gt;&gt; &amp;gt;&amp;gt; best (and most idiomatic Ada) way to package this type of usage
> &gt;&gt; &amp;gt;&amp;gt; pattern as an abstract datatype?
> &gt;&gt; &amp;gt;
> &gt;&gt; &amp;gt; You should look at compiler dumps to figure out why the strength
> &gt;&gt; &amp;gt; reduction optimization does not kick in.
> &gt;&gt; &amp;gt;
> &gt;&gt;
> &gt;&gt; As I understand it, the program is accessing A(I,J), where A is a 2D
> &gt;&gt; array, and also accessing A(I+1,J), A(I,J+1), and the other neighbours
> &gt;&gt; of A(I,J), eight in all.
> &gt;&gt;
> &gt;&gt; I,J are indices that are *not* traversed in sequential order, but in
> &gt;&gt; some scattered order.
> &gt;&gt;
> &gt;&gt; Computing the address of A(I,J) involves multiplications of J and I with
> &gt;&gt; the size of an array element and the size of an array row, respectively.
> &gt;&gt; The problem seems to be that the results of these multiplications are
> &gt;&gt; not reused in computing the address of A(I+1,J) etc.
> &gt;&gt;
> &gt;&gt; If the index constraints (or at least the row length) of A are static
> &gt;&gt; (compile-time) constants, and the address of A(I,J) has been computed,
> &gt;&gt; and c1 and c2 are some other compile-time constants (-1, 0, +1 in this
> &gt;&gt; program), then the address of A(I+c1,J+c2) can be computed by adding a
> &gt;&gt; static offset to the address of A(I,J). This is what happens when A(I,J)
> &gt;&gt; and the neighbours are accessed through a C pointer, or when Keean uses
> &gt;&gt; address arithmetic explicitly in the Ada program. The question is why
> &gt;&gt; the Ada compiler is not doing this optimization.
> &gt;&gt;
> &gt;&gt; As I understand it, &amp;quot;strength reduction&amp;quot; is a loop optimization where an
> &gt;&gt; expression of the form C*i, with &amp;quot;i&amp;quot; the loop index, is replaced by a
> &gt;&gt; new variable that is incremented by C on each loop iteration. But in
> &gt;&gt; Keean&amp;#39;s program the indices I,J are not loop indices, nor linearly
> &gt;&gt; related to loop indices.
> &gt;&gt;
> &gt;&gt; The optimization that would be required here is to replace (X+C)*F by
> &gt;&gt; X*F+C*F, when C and F are static constants and X*F has already been
> &gt;&gt; computed and is available. This would replace the multiplication by an
> &gt;&gt; addition of a static constant. I&#39;ve never seen this special kind of
> &gt;&gt; optimization proposed or described.
> &gt;
> &gt; I think that it&#39;s a standard optimisation.
> 
> I&#39;m willing to be convinced of that. But it seems that GNAT/GCC is not 
> performing this optimization in Keean&#39;s program.
> 
> &gt; For the PL/I optimising compiler for the CDC Cyber (c. 1980),
> &gt; multiplication was eliminated for matrix elements
> &gt; by using a table of offsets into each row.
> &gt; Thus, double subscripting reduced to simple addition.
> 
> That sounds like a specific choice of code generation for arrays, rather 
> than a general optimization.
> 
> Keean could simulate this code by defining the array as a 
> one-dimensional array of one-dimensional arrays (rows), which would 
> replace the row-length multiplication by an additional indexing.
> 
> &gt;&gt; (We are of course (I believe) assuming that index range checks are
> &gt;&gt; turned off.)
> &gt;
> &gt; Index range checks should be left on.
> 
> Sure, in general. But here the OP (Keean) is trying to beat the 
> performance of C++ that uses pointers without index checks. Enabling 
> index checks in the Ada code would probably (in this case, where the 
> algorithm has scatterered indexing of the array) slow it by an amount 
> that swamps the effect of the multiplication optimization.
> 
> -- 
> Niklas Holsti
> Tidorum Ltd
> niklas holsti tidorum fi
>        .      @       .

On the specific point of array index checking, if you can statically prove the algorithm using the array will never use an out of range index, there is no need to run-time index check.

If we generate a random number between one and N, and then lookup the Nth element, we do not need to range check as the number is by definition in range. If the type system can carry such proofs (for example using range types in Ada) so you have:


subtype Idx is Integer range 1 .. 10;
type Arr is array (Idx) of Integer;  
function Random returns Idx;
A : Arr;
A(Random); -- as the random number is already in the range there needs to be no range checks on the array lookup.


I don't know how good the compiler is at doing this?

There are times when the programmer can guarantee that the output of a function will be within range due to extensive unit testing and validation.


I wonder, is there a way to get the compiler (GNAT) to issue a warning when it has to insert a range check?


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-25  9:03         ` Keean Schupke
@ 2012-07-26  0:15           ` Randy Brukardt
  2012-07-26  8:38             ` Keean Schupke
  2012-07-26  7:11           ` Markus Schöpflin
  1 sibling, 1 reply; 88+ messages in thread
From: Randy Brukardt @ 2012-07-26  0:15 UTC (permalink / raw)


"Keean Schupke" <keean.schupke@googlemail.com> wrote in message 
news:f2f61da1-3139-4dfc-b851-8398f27d558d@googlegroups.com...
> On Wednesday, 25 July 2012 08:09:09 UTC+1, Niklas Holsti  wrote:
...
>> Sure, in general. But here the OP (Keean) is trying to beat the
>> performance of C++ that uses pointers without index checks. Enabling
>> index checks in the Ada code would probably (in this case, where the
>> algorithm has scatterered indexing of the array) slow it by an amount
>> that swamps the effect of the multiplication optimization.
>
> On the specific point of array index checking, if you can statically prove 
> the algorithm
> using the array will never use an out of range index, there is no need to 
> run-time index check.
>
> If we generate a random number between one and N, and then lookup the Nth 
> element, we do not
> need to range check as the number is by definition in range. If the type 
> system can carry such proofs
> (for example using range types in Ada) so you have:
>
> subtype Idx is Integer range 1 .. 10;
> type Arr is array (Idx) of Integer;
> function Random returns Idx;
> A : Arr;
> A(Random); -- as the random number is already in the range there needs to 
> be no
>                         range checks on the array lookup.

This isn't necessarily true. The following is *very* Ada language-lawyer, so 
skip if you don't want to know the details. (BTW, this is all Ada 95 
thinking, refined somewhat since.)

Ada scalar objects can be in a state known as "invalid" (see 13.9.1 in your 
Ada Reference Manual). For integer types, this corresponds to a value out of 
range. Ada compilers are always allowed to return invalid results from 
scalar operations, with a few exceptions. So in general, an Ada compiler 
does not actually need to make range checks on scalar objects. The only time 
that a check on a scalar object *has* to be performed is when using the 
value could lead to erroneous execution (such as in indexing out of range).

So, formally, writing a scalar subtype does not ensure that a compiler can 
eliminate checks. Indeed, it doesn't necessarily mean anything at all, as 
scalar objects can be uninitialized and the compiler has to assume that they 
could be out of range unless it can prove that they are initialized on all 
paths.

Of course, us compiler implementers *want* to eliminate checks based in 
subtypes. There are various strategies for doing so. The "usual" one is that 
the compiler defines some subtest of scalar objects as "known to be valid". 
Those are objects that the compiler can easily prove are initialized. (For 
Janus/Ada, we use a fairly conservative definition: stand-alone objects that 
are explicitly initialized, literals, loop parameters, and subprogram 
parameters. I don't remember if function results are included or not.). 
Checks have to be performed on assignments and other operations on "known to 
be valid" objects (array indexing is also treated this way) unless subtypes 
are the same or statically compatible AND all constituents of the source 
expression are "known to be valid".

Our optimizer can remove a few checks that have been generated because of 
later flow analysis and the like, but what can be done then is quite 
limited. Generally, the compiler has to not generate the checks in the first 
place.

I'm not sure precisely how GNAT handles this, but elimination of checks is 
*much* harder than it appears on the surface, mainly because Ada allows 
uninitialized objects (and still expects uses of them to be detected).

> I don't know how good the compiler is at doing this?

If the implementation model is "wrong", it might not do it at all.

> There are times when the programmer can guarantee that the output of a 
> function will be
> within range due to extensive unit testing and validation.

But that doesn't guarantee that the compiler can remove the checks.

> I wonder, is there a way to get the compiler (GNAT) to issue a warning 
> when it has to insert a range check?

Can't say if there is a GNAT option, but it seems like a good idea. I added 
such an "information" (it's not a warning in the sense that normally it 
doesn't need to be corrected) to the (lengthy) to-do list for Janus/Ada.

                                        Randy.





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-25  9:03         ` Keean Schupke
  2012-07-26  0:15           ` Randy Brukardt
@ 2012-07-26  7:11           ` Markus Schöpflin
  1 sibling, 0 replies; 88+ messages in thread
From: Markus Schöpflin @ 2012-07-26  7:11 UTC (permalink / raw)


Am 25.07.2012 11:03, schrieb Keean Schupke:

[...]

> I wonder, is there a way to get the compiler (GNAT) to issue a warning when it has to insert a range check?

Not that I know of, but when these sort of things really start to matter for 
us we usually check the expanded code generated by GNAT (-gnatG).

There you can see what code exactly will be compiled, including all the checks 
generated.

Markus



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26  0:15           ` Randy Brukardt
@ 2012-07-26  8:38             ` Keean Schupke
  2012-07-26 10:10               ` Brian Drummond
                                 ` (3 more replies)
  0 siblings, 4 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-26  8:38 UTC (permalink / raw)


On Thursday, 26 July 2012 01:15:25 UTC+1, Randy Brukardt  wrote:
> &quot;Keean Schupke&quot; wrote in message 
> &gt; On Wednesday, 25 July 2012 08:09:09 UTC+1, Niklas Holsti  wrote:
> ...
> &gt;&gt; Sure, in general. But here the OP (Keean) is trying to beat the
> &gt;&gt; performance of C++ that uses pointers without index checks. Enabling
> &gt;&gt; index checks in the Ada code would probably (in this case, where the
> &gt;&gt; algorithm has scatterered indexing of the array) slow it by an amount
> &gt;&gt; that swamps the effect of the multiplication optimization.
> &gt;
> &gt; On the specific point of array index checking, if you can statically prove 
> &gt; the algorithm
> &gt; using the array will never use an out of range index, there is no need to 
> &gt; run-time index check.
> &gt;
> &gt; If we generate a random number between one and N, and then lookup the Nth 
> &gt; element, we do not
> &gt; need to range check as the number is by definition in range. If the type 
> &gt; system can carry such proofs
> &gt; (for example using range types in Ada) so you have:
> &gt;
> &gt; subtype Idx is Integer range 1 .. 10;
> &gt; type Arr is array (Idx) of Integer;
> &gt; function Random returns Idx;
> &gt; A : Arr;
> &gt; A(Random); -- as the random number is already in the range there needs to 
> &gt; be no
> &gt;                         range checks on the array lookup.
> 
> This isn&#39;t necessarily true. The following is *very* Ada language-lawyer, so 
> skip if you don&#39;t want to know the details. (BTW, this is all Ada 95 
> thinking, refined somewhat since.)
> 
> Ada scalar objects can be in a state known as &quot;invalid&quot; (see 13.9.1 in your 
> Ada Reference Manual). For integer types, this corresponds to a value out of 
> range. Ada compilers are always allowed to return invalid results from 
> scalar operations, with a few exceptions. So in general, an Ada compiler 
> does not actually need to make range checks on scalar objects. The only time 
> that a check on a scalar object *has* to be performed is when using the 
> value could lead to erroneous execution (such as in indexing out of range).

This is an equivalent concept to nullable, so two questions:

What is the literal for 'invalid', IE:

X : Index := #INVALID -- we should be able to initialise to invalid.

if X = #INVALID -- I want to be able to test for invalid.


> 
> So, formally, writing a scalar subtype does not ensure that a compiler can 
> eliminate checks. Indeed, it doesn&#39;t necessarily mean anything at all, as 
> scalar objects can be uninitialized and the compiler has to assume that they 
> could be out of range unless it can prove that they are initialized on all 
> paths.

Like not null we therefore need:

type X_Type is new Integer not invalid range 0 .. 10;
X : X_Type; -- error
X : X_Type := 0; -- OK

> 
> Of course, us compiler implementers *want* to eliminate checks based in 
> subtypes. There are various strategies for doing so. The &quot;usual&quot; one is that 
> the compiler defines some subtest of scalar objects as &quot;known to be valid&quot;. 
> Those are objects that the compiler can easily prove are initialized. (For 
> Janus/Ada, we use a fairly conservative definition: stand-alone objects that 
> are explicitly initialized, literals, loop parameters, and subprogram 
> parameters. I don&#39;t remember if function results are included or not.). 
> Checks have to be performed on assignments and other operations on &quot;known to 
> be valid&quot; objects (array indexing is also treated this way) unless subtypes 
> are the same or statically compatible AND all constituents of the source 
> expression are &quot;known to be valid&quot;.

The above helps the compiler eliminate range checks, the other is type promotion: for example:

X : not invalid Integer := 99;
Y : array (10 .. 1000) of Integer;

if X <= 1000 then
   -- type of X is now : not invalid Integer range Integer'First .. 1000;
   if X >= 10 then
       -- type of X is now : not invalid Integer range 10 .. 1000;
       Y(X) -- No need for any range checks.
   end if;
end if;

> 
> Our optimizer can remove a few checks that have been generated because of 
> later flow analysis and the like, but what can be done then is quite 
> limited. Generally, the compiler has to not generate the checks in the first 
> place.

See above for language changes to facilitate better range check removal.

> 
> I&#39;m not sure precisely how GNAT handles this, but elimination of checks is 
> *much* harder than it appears on the surface, mainly because Ada allows 
> uninitialized objects (and still expects uses of them to be detected).

This is something I have looked at in a couple of different contexts, and using dependent types (a simple example of which is above) there are known techniques for removing all implicit range checks. Personally I dont like implicit stuff, so for me personally I would like the compiler to never insert implicit checks but instead issue a compile error if it ever had to insert an implicit range check. As above you may have to insert explicit range checks to satisfy the compiler, or you may used an unchecked conversion to tell the compiler, trust me, i know what I am doing and I have verified the code will not generate out of bounds values due to unit testing and formal verification methods.

> 
> &gt; I don&#39;t know how good the compiler is at doing this?
> 
> If the implementation model is &quot;wrong&quot;, it might not do it at all.
> 
> &gt; There are times when the programmer can guarantee that the output of a 
> &gt; function will be
> &gt; within range due to extensive unit testing and validation.
> 
> But that doesn&#39;t guarantee that the compiler can remove the checks.

The compiler should remove the range checks if the programmer is telling it that the value is definitely within range. With a 'not invalid' type modifier, this can be done with some kind of unchecked conversion.

> 
> &gt; I wonder, is there a way to get the compiler (GNAT) to issue a warning 
> &gt; when it has to insert a range check?
> 
> Can&#39;t say if there is a GNAT option, but it seems like a good idea. I added 
> such an &quot;information&quot; (it&#39;s not a warning in the sense that normally it 
> doesn&#39;t need to be corrected) to the (lengthy) to-do list for Janus/Ada.
> 
>                                         Randy.

Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26  8:38             ` Keean Schupke
@ 2012-07-26 10:10               ` Brian Drummond
  2012-07-26 12:32                 ` Keean Schupke
  2012-07-26 13:08               ` Georg Bauhaus
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 88+ messages in thread
From: Brian Drummond @ 2012-07-26 10:10 UTC (permalink / raw)


On Thu, 26 Jul 2012 01:38:41 -0700, Keean Schupke wrote:

> This is an equivalent concept to nullable, so two questions:
> 
> What is the literal for 'invalid', IE:
> 
> X : Index := #INVALID -- we should be able to initialise to invalid.
> 
> if X = #INVALID -- I want to be able to test for invalid.

There's an attribute for that.

if X'Valid ...

- Brian



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26 10:10               ` Brian Drummond
@ 2012-07-26 12:32                 ` Keean Schupke
  2012-07-26 13:46                   ` Dmitry A. Kazakov
  2012-07-27  8:21                   ` Keean Schupke
  0 siblings, 2 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-26 12:32 UTC (permalink / raw)


On Thursday, 26 July 2012 11:10:39 UTC+1, Brian Drummond  wrote:
> On Thu, 26 Jul 2012 01:38:41 -0700, Keean Schupke wrote:
> 
> &gt; This is an equivalent concept to nullable, so two questions:
> &gt; 
> &gt; What is the literal for &#39;invalid&#39;, IE:
> &gt; 
> &gt; X : Index := #INVALID -- we should be able to initialise to invalid.
> &gt; 
> &gt; if X = #INVALID -- I want to be able to test for invalid.
> 
> There&#39;s an attribute for that.
> 
> if X&#39;Valid ...
> 
> - Brian


So if I do

if X'Valid then
    Y(X) -- if X is the same type as the index of Y then no need to range check.
end if

If I explicitly check 'Valid, the compiler should be able to omit some implicit range checks, as 'Valid is an explicit range check.


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26  8:38             ` Keean Schupke
  2012-07-26 10:10               ` Brian Drummond
@ 2012-07-26 13:08               ` Georg Bauhaus
  2012-07-26 19:37                 ` stefan-lucks
  2012-07-26 19:21               ` stefan-lucks
  2012-07-31  3:12               ` Randy Brukardt
  3 siblings, 1 reply; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-26 13:08 UTC (permalink / raw)


On 26.07.12 10:38, Keean Schupke wrote:
> What is the literal for 'invalid', IE:
> 
> X : Index := #INVALID -- we should be able to initialise to invalid.

A good approximation should be

  X : Index := To_Index (Index'Last + 1);

where To_Index is an instance of Unchecked_Conversion.




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26 12:32                 ` Keean Schupke
@ 2012-07-26 13:46                   ` Dmitry A. Kazakov
  2012-07-26 17:40                     ` Shark8
       [not found]                     ` <6ov218tnkbqu3vpkuo4t77rd7de0a3aesf@invalid.netcom.com>
  2012-07-27  8:21                   ` Keean Schupke
  1 sibling, 2 replies; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-26 13:46 UTC (permalink / raw)


On Thu, 26 Jul 2012 05:32:32 -0700 (PDT), Keean Schupke wrote:

> So if I do
> 
> if X'Valid then
>     Y(X) -- if X is the same type as the index of Y then no need to range check.
> end if
> 
> If I explicitly check 'Valid, the compiler should be able to omit some
> implicit range checks, as 'Valid is an explicit range check.

No. You do it this way:

declare
   subtype Y_Index is Integer range Y'Range;
   Safe_X : Y_Index := X;
begin
   ... Y (Safe_X) ... -- The compiler has here information to
                           -- eliminate check, though not required to
end;

P.S. The attribute 'Valid is used when there is a danger that the compiler
hasn't checked the argument for whatever reason. Usually it is after
Unchecked_Conversion.

Some would disagree, but IMO in a well-written Ada program you should never
ever need 'Valid. Don't do Unchecked_Conversion with doubtful outcome.
Don't use stream attributes for constrained types. Don't use for X'Address
use Z'Address for the purpose of conversion etc.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-25  7:09       ` Niklas Holsti
  2012-07-25  9:03         ` Keean Schupke
@ 2012-07-26 14:47         ` Robin Vowels
  2012-07-26 15:18           ` Martin
  1 sibling, 1 reply; 88+ messages in thread
From: Robin Vowels @ 2012-07-26 14:47 UTC (permalink / raw)


On Jul 25, 5:09 pm, Niklas Holsti <niklas.hol...@tidorum.invalid>
wrote:
> On 12-07-24 19:00 , rrr...@gmail.com wrote:

> > For the PL/I optimising compiler for the CDC Cyber (c. 1980),
> > multiplication was eliminated for matrix elements
> > by using a table of offsets into each row.
> > Thus, double subscripting reduced to simple addition.
>
> That sounds like a specific choice of code generation for arrays, rather
> than a general optimization.

It was a general optimisation for matrices,
with significant benefits for intensive use of subscripting.

Optimisation comes from a number of optimisation strategies,
not just one.

> Keean could simulate this code by defining the array as a
> one-dimensional array of one-dimensional arrays (rows), which would
> replace the row-length multiplication by an additional indexing.
>
> >> (We are of course (I believe) assuming that index range checks are
> >> turned off.)
>
> > Index range checks should be left on.
>
> Sure, in general. But here the OP (Keean) is trying to beat the
> performance of C++ that uses pointers without index checks. Enabling
> index checks in the Ada code would probably (in this case, where the
> algorithm has scatterered indexing of the array) slow it by an amount
> that swamps the effect of the multiplication optimization.

There's really no point in executing code without such checks.
You may never know that there is a bug in the code.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26 14:47         ` Robin Vowels
@ 2012-07-26 15:18           ` Martin
  0 siblings, 0 replies; 88+ messages in thread
From: Martin @ 2012-07-26 15:18 UTC (permalink / raw)


On Thursday, July 26, 2012 3:47:14 PM UTC+1, Robin Vowels wrote:
[snip]
> There&#39;s really no point in executing code without such checks.
> You may never know that there is a bug in the code.

But you could use other tools to prove that removal of such checks are safe - e.g. Polyspace.

-- Martin



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26 13:46                   ` Dmitry A. Kazakov
@ 2012-07-26 17:40                     ` Shark8
  2012-07-26 18:56                       ` Dmitry A. Kazakov
       [not found]                     ` <6ov218tnkbqu3vpkuo4t77rd7de0a3aesf@invalid.netcom.com>
  1 sibling, 1 reply; 88+ messages in thread
From: Shark8 @ 2012-07-26 17:40 UTC (permalink / raw)
  Cc: mailbox

On Thursday, July 26, 2012 7:46:04 AM UTC-6, Dmitry A. Kazakov wrote:
>
> Some would disagree, but IMO in a well-written Ada program you should never
> ever need 'Valid. Don't do Unchecked_Conversion with doubtful outcome.
> Don't use stream attributes for constrained types.

It might not be avoidable though; the presence of 'Valid however does let you ensure that your [sub]typing assumptions are met. This might be especially true if you're importing data from some third-party where the input-format doesn't specifically conform to your expected-inputs. (This happens with disturbing frequency in handling electronic medical records; usually because whoever started the record was using a spreadsheet.)

> Don't use for X'Address
> use Z'Address for the purpose of conversion etc.

What do you mean here? Overlaying?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
       [not found]                     ` <6ov218tnkbqu3vpkuo4t77rd7de0a3aesf@invalid.netcom.com>
@ 2012-07-26 18:49                       ` Dmitry A. Kazakov
       [not found]                         ` <86d31898ne39maimbl24knds7rf38qg7vc@invalid.netcom.com>
  0 siblings, 1 reply; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-26 18:49 UTC (permalink / raw)


On Thu, 26 Jul 2012 13:35:46 -0400, Dennis Lee Bieber wrote:

> 	My prior job had a, uhm, valid use for 'Valid... the code had to
> receive data structures generated externally (and passed through half a
> dozen converters from hardware signal pins to software emulation), along
> with receiving "commands" from test engineers that came from some
> database (converting text string into enumeration values).
> 
> 	A mismatch between the software build and the database could result
> in an (currently) invalid representation/enumeration value being passed
> in.

I know this case well because getting data out/into hardware is my daily
job. I do not consider it legitimate for 'Valid. To put it clearer, I don't
see re-interpretation of some bit pattern a good design. Which is what
actually happens before 'Valid. It is always worth 10ns or so CPU time of
doing proper conversion or value construction per arithmetic operations
instead. It is not only cleaner, it is also safer, because 'Valid is a very
illusory protection against mangled input.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26 17:40                     ` Shark8
@ 2012-07-26 18:56                       ` Dmitry A. Kazakov
  2012-07-27  5:25                         ` Shark8
  0 siblings, 1 reply; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-26 18:56 UTC (permalink / raw)


On Thu, 26 Jul 2012 10:40:32 -0700 (PDT), Shark8 wrote:

> On Thursday, July 26, 2012 7:46:04 AM UTC-6, Dmitry A. Kazakov wrote:
>>
>> Some would disagree, but IMO in a well-written Ada program you should never
>> ever need 'Valid. Don't do Unchecked_Conversion with doubtful outcome.
>> Don't use stream attributes for constrained types.
> 
> It might not be avoidable though; the presence of 'Valid however does let
> you ensure that your [sub]typing assumptions are met.

These assumptions must be upheld per value construction.

> This might be especially true if you're importing data from some
> third-party where the input-format doesn't specifically conform to your
> expected-inputs. (This happens with disturbing frequency in handling
> electronic medical records; usually because whoever started the record was
> using a spreadsheet.)

Since this stuff is my primary occupation, a word of caution: don't do it
this way. Always implement the protocol layers as defined, right from the
representation data types (e.g. octet), up to Ada native ones. Never ever
use representation clauses to cut corners.

>> Don't use for X'Address
>> use Z'Address for the purpose of conversion etc.
> 
> What do you mean here? Overlaying?

Yes.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26  8:38             ` Keean Schupke
  2012-07-26 10:10               ` Brian Drummond
  2012-07-26 13:08               ` Georg Bauhaus
@ 2012-07-26 19:21               ` stefan-lucks
  2012-07-31  3:12               ` Randy Brukardt
  3 siblings, 0 replies; 88+ messages in thread
From: stefan-lucks @ 2012-07-26 19:21 UTC (permalink / raw)


On Thu, 26 Jul 2012, Keean Schupke wrote:

> What is the literal for 'invalid', IE:
> 
> X : Index := #INVALID -- we should be able to initialise to invalid.

Of course, you cannot *assign* an invalid value (out of the range of the 
(sub-)type of X) to X, at least not without Unchecked_Conversion or any 
other dirty tricks.

But you can use pragma Normalize_Scalars to ensure X is set to invalid:
<http://en.wikibooks.org/wiki/Ada_Programming/Pragmas/Normalize_Scalars>. 

There is also pragma Initialize_Scalars, but I believe it is 
implementation defined for gnat, not standard Ada. 

> if X = #INVALID -- I want to be able to test for invalid.

As already answered by Brian Drummond, this is "not X.Valid". 



-- 
---- Stefan.Lucks (at) uni-weimar.de, University of Weimar, Germany  ----
    <http://www.uni-weimar.de/cms/medien/mediensicherheit/home.html>
------  I  love  the  taste  of  Cryptanalysis  in  the  morning!  ------




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26 13:08               ` Georg Bauhaus
@ 2012-07-26 19:37                 ` stefan-lucks
  0 siblings, 0 replies; 88+ messages in thread
From: stefan-lucks @ 2012-07-26 19:37 UTC (permalink / raw)


On Thu, 26 Jul 2012, Georg Bauhaus wrote:

> On 26.07.12 10:38, Keean Schupke wrote:
> > What is the literal for 'invalid', IE:
> > 
> > X : Index := #INVALID -- we should be able to initialise to invalid.
> 
> A good approximation should be
> 
>   X : Index := To_Index (Index'Last + 1);
> 
> where To_Index is an instance of Unchecked_Conversion.

If To_Index is defined as

  function To_Index is new Unchecked_Conversion(Integer, Index);

this will fail if Index'Last = Integer'Last. 

Better use pragma Normalize_Scalars -- that cannot surprise you with a 
Constraint_Error at initialization time.  

-- 
---- Stefan.Lucks (at) uni-weimar.de, University of Weimar, Germany  ----
    <http://www.uni-weimar.de/cms/medien/mediensicherheit/home.html>
------  I  love  the  taste  of  Cryptanalysis  in  the  morning!  ------




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26 18:56                       ` Dmitry A. Kazakov
@ 2012-07-27  5:25                         ` Shark8
  2012-07-27  6:28                           ` Dmitry A. Kazakov
  0 siblings, 1 reply; 88+ messages in thread
From: Shark8 @ 2012-07-27  5:25 UTC (permalink / raw)
  Cc: mailbox

On Thursday, July 26, 2012 12:56:49 PM UTC-6, Dmitry A. Kazakov wrote:
> On Thu, 26 Jul 2012 10:40:32 -0700 (PDT), Shark8 wrote:
> 
> > On Thursday, July 26, 2012 7:46:04 AM UTC-6, Dmitry A. Kazakov wrote:
> >>
> >> Some would disagree, but IMO in a well-written Ada program you should never
> >> ever need 'Valid. Don't do Unchecked_Conversion with doubtful outcome.
> >> Don't use stream attributes for constrained types.
> > 
> > It might not be avoidable though; the presence of 'Valid however does let
> > you ensure that your [sub]typing assumptions are met.
> 
> These assumptions must be upheld per value construction.

Er, yes? Your value-construction procedure, which might be a stream-implementation, could use 'Input to ensure the data is valid, no? Then isn't the 'Read the value-construction?

{I'd actually place it into subprograms if we were talking about some sort of string-processing/basic-error-checking; in a little LISP-interpreter I was working on a while back I had it ensure balanced parens in reading a file/text via stream -- unbalanced means program-error, so throw an exception, balanced means I can continue processing and return the underlying structure for execution/operation.}

> > This might be especially true if you're importing data from some
> > third-party where the input-format doesn't specifically conform to your
> > expected-inputs. (This happens with disturbing frequency in handling
> > electronic medical records; usually because whoever started the record was
> > using a spreadsheet.)
> 
> Since this stuff is my primary occupation, a word of caution: don't do it
> this way. Always implement the protocol layers as defined, right from the
> representation data types (e.g. octet), up to Ada native ones. Never ever
> use representation clauses to cut corners.

Who's relying on representation clauses here? That's an assumption on your part that I don't think anyone here is making. Consider some value read in from a DB which is *always* supposed to be Positive; throwing a 'Valid check when loading the value isn't unreasonable: it keeps you from loading zero.

Many DBs don't support such range subtyping, so why is it "bad form" to ensure that such values, when read (ie value construction from the database) conform to the proper subtypes. (Though this might be done automatically on assignment from function if it falls outside the 'Range... I'll have to read up on when subtype constraints are applied.)



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  5:25                         ` Shark8
@ 2012-07-27  6:28                           ` Dmitry A. Kazakov
  2012-07-27  7:01                             ` Vasiliy Molostov
  2012-07-28  5:32                             ` Shark8
  0 siblings, 2 replies; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-27  6:28 UTC (permalink / raw)


On Thu, 26 Jul 2012 22:25:58 -0700 (PDT), Shark8 wrote:

> On Thursday, July 26, 2012 12:56:49 PM UTC-6, Dmitry A. Kazakov wrote:
>> On Thu, 26 Jul 2012 10:40:32 -0700 (PDT), Shark8 wrote:
>> 
>>> On Thursday, July 26, 2012 7:46:04 AM UTC-6, Dmitry A. Kazakov wrote:
>>>>
>>>> Some would disagree, but IMO in a well-written Ada program you should never
>>>> ever need 'Valid. Don't do Unchecked_Conversion with doubtful outcome.
>>>> Don't use stream attributes for constrained types.
>>> 
>>> It might not be avoidable though; the presence of 'Valid however does let
>>> you ensure that your [sub]typing assumptions are met.
>> 
>> These assumptions must be upheld per value construction.
> 
> Er, yes? Your value-construction procedure, which might be a
> stream-implementation, could use 'Input to ensure the data is valid, no?
> Then isn't the 'Read the value-construction?

Neither. If you use stream you should do it properly, i.e. as the transport
layer. The transport is defined in terms of bytes, characters or other
units. So the only read here might be Character'Read.

> Many DBs don't support such range subtyping, so why is it "bad form" to
> ensure that such values, when read (ie value construction from the
> database) conform to the proper subtypes. (Though this might be done
> automatically on assignment from function if it falls outside the
> 'Range... I'll have to read up on when subtype constraints are applied.)

When DB client does not support type mapping, you should always use the DB
native type, e.g. SQLINTEGER. Ada type must be obtained from that using
type conversion.

Which should be obvious if we remember that Ada discourages implicit type
conversions. Why a DB to Ada one should be implicit?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
       [not found]                         ` <86d31898ne39maimbl24knds7rf38qg7vc@invalid.netcom.com>
@ 2012-07-27  6:45                           ` Dmitry A. Kazakov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-27  6:45 UTC (permalink / raw)


On Thu, 26 Jul 2012 17:36:11 -0400, Dennis Lee Bieber wrote:

> On Thu, 26 Jul 2012 20:49:34 +0200, "Dmitry A. Kazakov"
> <mailbox@dmitry-kazakov.de> declaimed the following in comp.lang.ada:
> 
>> I know this case well because getting data out/into hardware is my daily
>> job. I do not consider it legitimate for 'Valid. To put it clearer, I don't
>> see re-interpretation of some bit pattern a good design. Which is what
>> actually happens before 'Valid. It is always worth 10ns or so CPU time of
>> doing proper conversion or value construction per arithmetic operations
>> instead. It is not only cleaner, it is also safer, because 'Valid is a very
>> illusory protection against mangled input.
> 
> 	Given that the some input sources were from a non-Ada system (it was
> part of a satellite ground control test suite) which took operator text
> strings for command directives to the simulations. Those directives were
> converted to a set of integer values (opcode, argument1...n) via a
> database, and then passed though a packetized I/O system to the end
> program(s) (which weren't stand-alone, but dynamically loaded libraries,
> where the "opcode" defined which library function was to be called to
> handle the arguments)...
> 
> 	The choice then comes down to:
> 
> 		if argument'Valid then
> 			-- where "argument" is a defined (sub)type (often
> 			-- an enumeration)
> 			process directive
> 		else
> 			log an error
> 		end if
> 
> or
> 		case argument is
> 			-- where "argument" is an generic integer
> 			when some-hand-maintained-value => enumvalue := x
> 			when other-hand-maintained-value => enumvalue := y
> 			when others => log an error
> 		end case

You have here arguments of two different types. So the example does not
tell all truth. But anyway, I always use the second one properly
encapsulated with a comment put near the case statement referring the
document and the page where the encoding is defined.

The mapping "some-hand-maintained-value" to "some-enumeration-value" is
mandated by the protocol. It is outside Ada. You cannot leave that to bit
patterns.

In some cases you could use:

   Enum_Value := T'Val (Argument);

commenting that the encoding is zero-based.

> 	Side note: the directive database and the handler signatures (Ada
> spec and body -- including the enumerated type values) were both
> generated from an XML file describing the interface to the simulation
> model (the library) -- but on the test systems one had to ensure that
> the database was updated whenever the libraries were... Sometimes that
> didn't happen correctly, allowing for values to become invalid.

My first move in our projects is to root out the XML plague. Pretty much
measurement hardware uses XML in these days for what they call
"configuration", and I call malfunction. Once eliminated the rest works
more or less smoothly.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  6:28                           ` Dmitry A. Kazakov
@ 2012-07-27  7:01                             ` Vasiliy Molostov
  2012-07-27  8:48                               ` Dmitry A. Kazakov
  2012-07-28  5:32                             ` Shark8
  1 sibling, 1 reply; 88+ messages in thread
From: Vasiliy Molostov @ 2012-07-27  7:01 UTC (permalink / raw)


Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> писал(а) в своём письме Fri,  
27 Jul 2012 10:28:45 +0400:

> On Thu, 26 Jul 2012 22:25:58 -0700 (PDT), Shark8 wrote:
>
>> On Thursday, July 26, 2012 12:56:49 PM UTC-6, Dmitry A. Kazakov wrote:
>>> On Thu, 26 Jul 2012 10:40:32 -0700 (PDT), Shark8 wrote:
>>>
>>>> On Thursday, July 26, 2012 7:46:04 AM UTC-6, Dmitry A. Kazakov wrote:
>>>>>
>>>>> Some would disagree, but IMO in a well-written Ada program you  
>>>>> should never
>>>>> ever need 'Valid. Don't do Unchecked_Conversion with doubtful  
>>>>> outcome.
>>>>> Don't use stream attributes for constrained types.
>>>>
>>>> It might not be avoidable though; the presence of 'Valid however does  
>>>> let
>>>> you ensure that your [sub]typing assumptions are met.
>>>
>>> These assumptions must be upheld per value construction.
>>
>> Er, yes? Your value-construction procedure, which might be a
>> stream-implementation, could use 'Input to ensure the data is valid, no?
>> Then isn't the 'Read the value-construction?
>
> Neither. If you use stream you should do it properly, i.e. as the  
> transport
> layer. The transport is defined in terms of bytes, characters or other
> units. So the only read here might be Character'Read.

no. We might object'read with any object. You are mixing abstraction  
layers here. Even you refer to "or other units".


>> Many DBs don't support such range subtyping, so why is it "bad form" to
>> ensure that such values, when read (ie value construction from the
>> database) conform to the proper subtypes. (Though this might be done
>> automatically on assignment from function if it falls outside the
>> 'Range... I'll have to read up on when subtype constraints are applied.)
>
> When DB client does not support type mapping, you should always use the  
> DB
> native type, e.g. SQLINTEGER. Ada type must be obtained from that using
> type conversion.
>
> Which should be obvious if we remember that Ada discourages implicit type
> conversions. Why a DB to Ada one should be implicit?

You suppose sqlinterger as property of any db which is wrong.


-- 
Написано в почтовом клиенте браузера Opera: http://www.opera.com/mail/



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26 12:32                 ` Keean Schupke
  2012-07-26 13:46                   ` Dmitry A. Kazakov
@ 2012-07-27  8:21                   ` Keean Schupke
  2012-07-27  8:50                     ` Dmitry A. Kazakov
  2013-02-25 21:18                     ` Eryndlia Mavourneen
  1 sibling, 2 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-27  8:21 UTC (permalink / raw)


On Thursday, 26 July 2012 13:32:32 UTC+1, Keean Schupke  wrote:
> On Thursday, 26 July 2012 11:10:39 UTC+1, Brian Drummond  wrote:
> &gt; On Thu, 26 Jul 2012 01:38:41 -0700, Keean Schupke wrote:
> &gt; 
> &gt; &amp;gt; This is an equivalent concept to nullable, so two questions:
> &gt; &amp;gt; 
> &gt; &amp;gt; What is the literal for &amp;#39;invalid&amp;#39;, IE:
> &gt; &amp;gt; 
> &gt; &amp;gt; X : Index := #INVALID -- we should be able to initialise to invalid.
> &gt; &amp;gt; 
> &gt; &amp;gt; if X = #INVALID -- I want to be able to test for invalid.
> &gt; 
> &gt; There&amp;#39;s an attribute for that.
> &gt; 
> &gt; if X&amp;#39;Valid ...
> &gt; 
> &gt; - Brian
> 
> 
> So if I do
> 
> if X&#39;Valid then
>     Y(X) -- if X is the same type as the index of Y then no need to range check.
> end if
> 
> If I explicitly check &#39;Valid, the compiler should be able to omit some implicit range checks, as &#39;Valid is an explicit range check.
> 
> 
> Cheers,
> Keean.


I am wondering if there is a way to use address arithmetic that would be just as safe as array indexing? What about something like:


type Value_Type is limited private;
type Index_Type is range <>;
type Array_Type is array (Index_Type) of Value_Type;

A : Array_Type;
 
type Iterator = Address_Integer range A(A'First)'Address .. A(A'Last)'Address step Value_Type'Size;


This would be safe as it could only contain the address of a valid record in the array, and the syntax could be neat with automatic dereferencing.


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  7:01                             ` Vasiliy Molostov
@ 2012-07-27  8:48                               ` Dmitry A. Kazakov
  2012-07-27 12:01                                 ` Georg Bauhaus
  2012-07-27 16:49                                 ` Vasiliy Molostov
  0 siblings, 2 replies; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-27  8:48 UTC (permalink / raw)


On Fri, 27 Jul 2012 11:01:17 +0400, Vasiliy Molostov wrote:

> Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> яПНяПНяПНяПНяПН(яПН) яПН яПНяПНяПНяПНяПН яПНяПНяПНяПНяПНяПН Fri,  
> 27 Jul 2012 10:28:45 +0400:
> 
> We might object'read with any object.

Don't use 'Read/'Input unless overridden, except for some very rare cases
when stream is used inside single program. The only 'Read allowed is on the
units for which the underlying hardware is guaranteed to handle the
representation. Usually it is octet. Anything on top must be overridden.

>> When DB client does not support type mapping, you should always use the DB
>> native type, e.g. SQLINTEGER. Ada type must be obtained from that using
>> type conversion.
>>
>> Which should be obvious if we remember that Ada discourages implicit type
>> conversions. Why a DB to Ada one should be implicit?
> 
> You suppose sqlinterger as property of any db which is wrong.

I didn't supposed anything, SQLINTEGER was given as an example of a DB data
type, as provided by the DB client. Conversions to an Ada type from the
problem domain must be explicit.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  8:21                   ` Keean Schupke
@ 2012-07-27  8:50                     ` Dmitry A. Kazakov
  2012-07-27  9:17                       ` Keean Schupke
  2013-02-25 21:18                     ` Eryndlia Mavourneen
  1 sibling, 1 reply; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-27  8:50 UTC (permalink / raw)


On Fri, 27 Jul 2012 01:21:18 -0700 (PDT), Keean Schupke wrote:

> I am wondering if there is a way to use address arithmetic that would be
> just as safe as array indexing? What about something like:
> 
> 
> type Value_Type is limited private;
> type Index_Type is range <>;
> type Array_Type is array (Index_Type) of Value_Type;
> 
> A : Array_Type;
>  
> type Iterator = Address_Integer range A(A'First)'Address .. A(A'Last)'Address step Value_Type'Size;
> 
> This would be safe as it could only contain the address of a valid record
> in the array, and the syntax could be neat with automatic dereferencing.

Value_Type = Boolean

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  8:50                     ` Dmitry A. Kazakov
@ 2012-07-27  9:17                       ` Keean Schupke
  0 siblings, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-27  9:17 UTC (permalink / raw)
  Cc: mailbox

On Friday, 27 July 2012 09:50:41 UTC+1, Dmitry A. Kazakov  wrote:
> On Fri, 27 Jul 2012 01:21:18 -0700 (PDT), Keean Schupke wrote:
> 
> &gt; I am wondering if there is a way to use address arithmetic that would be
> &gt; just as safe as array indexing? What about something like:
> &gt; 
> &gt; 
> &gt; type Value_Type is limited private;
> &gt; type Index_Type is range &lt;&gt;;
> &gt; type Array_Type is array (Index_Type) of Value_Type;

type Array_Type is array (Index_Type) of aliased Value_Type;

> &gt; 
> &gt; A : Array_Type;
> &gt;  
> &gt; type Iterator = Address_Integer range A(A&#39;First)&#39;Address .. A(A&#39;Last)&#39;Address step Value_Type&#39;Size;
> &gt; 
> &gt; This would be safe as it could only contain the address of a valid record
> &gt; in the array, and the syntax could be neat with automatic dereferencing.
> 
> Value_Type = Boolean

Error: array value must be aliased to use iterator.


Cheers,
Keean.

> 
> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  8:48                               ` Dmitry A. Kazakov
@ 2012-07-27 12:01                                 ` Georg Bauhaus
  2012-07-27 16:49                                 ` Vasiliy Molostov
  1 sibling, 0 replies; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-27 12:01 UTC (permalink / raw)


On 27.07.12 10:48, Dmitry A. Kazakov wrote:
>>> When DB client does not support type mapping, you should always use the DB
>>> >> native type, e.g. SQLINTEGER. Ada type must be obtained from that using
>>> >> type conversion.
>>> >>
>>> >> Which should be obvious if we remember that Ada discourages implicit type
>>> >> conversions. Why a DB to Ada one should be implicit?
>> > 
>> > You suppose sqlinterger as property of any db which is wrong.
> I didn't supposed anything, SQLINTEGER was given as an example of a DB data
> type, as provided by the DB client. Conversions to an Ada type from the
> problem domain must be explicit.

For illustration, I am just now debugging this, which is likely delivered
by a database driver (after a system update to some enterprise GNU/Linux):

x-dates
2011-12-30^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@:2012-07-26^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  8:48                               ` Dmitry A. Kazakov
  2012-07-27 12:01                                 ` Georg Bauhaus
@ 2012-07-27 16:49                                 ` Vasiliy Molostov
  2012-07-27 19:38                                   ` Dmitry A. Kazakov
  1 sibling, 1 reply; 88+ messages in thread
From: Vasiliy Molostov @ 2012-07-27 16:49 UTC (permalink / raw)


Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> писал(а) в своём письме Fri,  
27 Jul 2012 12:48:01 +0400:

> On Fri, 27 Jul 2012 11:01:17 +0400, Vasiliy Molostov wrote:
> Don't use 'Read/'Input unless overridden, except for some very rare cases
> when stream is used inside single program. The only 'Read allowed is on  
> the
> units for which the underlying hardware is guaranteed to handle the
> representation. Usually it is octet. Anything on top must be overridden.

I wonder: have you seen a point-out to the fact that units for read/write  
can be not only characters (as you told), and also that you-self refer to  
octets (that are not characters), and other units?

perhaps you are speaking with you-self, like voodoo doctor. What does that  
mean at all - "Don't use"? I want and I will, do you plan to zombie me  
with octets or any other asn1 tool? WTF? Are you al-right?

-- 
Написано в почтовом клиенте браузера Opera: http://www.opera.com/mail/



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27 16:49                                 ` Vasiliy Molostov
@ 2012-07-27 19:38                                   ` Dmitry A. Kazakov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-27 19:38 UTC (permalink / raw)


On Fri, 27 Jul 2012 20:49:35 +0400, Vasiliy Molostov wrote:

> Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> яПНяПНяПНяПНяПН(яПН) яПН яПНяПНяПНяПНяПН яПНяПНяПНяПНяПНяПН Fri,  
> 27 Jul 2012 12:48:01 +0400:
> 
>> On Fri, 27 Jul 2012 11:01:17 +0400, Vasiliy Molostov wrote:
>> Don't use 'Read/'Input unless overridden, except for some very rare cases
>> when stream is used inside single program. The only 'Read allowed is on the
>> units for which the underlying hardware is guaranteed to handle the
>> representation. Usually it is octet. Anything on top must be overridden.
> 
> I wonder: have you seen a point-out to the fact that units for read/write  
> can be not only characters (as you told), and also that you-self refer to  
> octets (that are not characters), and other units?

In earlier times I saw some piece of serial hardware which was truly
bit-oriented. I don't remember what it was. It is long time dead and
buried. Otherwise only characters and octets. Larger units are never used.
Not in automotive, smart metering, safety where I am working.

A frame grabber could possible represent somewhat different case, but I
doubt anybody would use a stream interface on it.

> What does that mean at all - "Don't use"?

An advice, which you are free to ignore if you know better.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  6:28                           ` Dmitry A. Kazakov
  2012-07-27  7:01                             ` Vasiliy Molostov
@ 2012-07-28  5:32                             ` Shark8
  2012-07-28  7:37                               ` Dmitry A. Kazakov
  1 sibling, 1 reply; 88+ messages in thread
From: Shark8 @ 2012-07-28  5:32 UTC (permalink / raw)
  Cc: mailbox

>>> These assumptions must be upheld per value construction.
>>
>> Er, yes? Your value-construction procedure, which might be a
>> stream-implementation, could use 'Input to ensure the data is valid, no?
>> Then isn't the 'Read the value-construction?
>
>Neither. If you use stream you should do it properly, i.e. as the transport
>layer. The transport is defined in terms of bytes, characters or other
>units. So the only read here might be Character'Read.

Um, I think you completely misread what I was saying. You can use 'Read and 'Write to handle your import/export to the requisite format; the 'Write shouldn't be an issue (WRT validity; at this point your object should be valid and your writing it out should be straightforward), but the 'Read certainly needs to check for validity. Using Character'Read is perfectly reasonable when talking about parsing from text into the internal types, as I was doing in my LISP interpreter; indeed I was using Character'Read.

'Valid might be a perfectly reasonable way to raise an exception; as I said before many DBs do not have ranges on their values; if your data (say for a simulation) is exported to a DB, then any import needs to check that the constraints on the fields [present in the program, not present in the DB] have not been broken. (A common cause would be because of some SQL update-statement some user ran.) Why would using 'Valid be the wrong way to go if those constraints were strictly subranges?



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-28  5:32                             ` Shark8
@ 2012-07-28  7:37                               ` Dmitry A. Kazakov
  2012-07-28 18:32                                 ` Shark8
  0 siblings, 1 reply; 88+ messages in thread
From: Dmitry A. Kazakov @ 2012-07-28  7:37 UTC (permalink / raw)


On Fri, 27 Jul 2012 22:32:15 -0700 (PDT), Shark8 wrote:

>>>> These assumptions must be upheld per value construction.
>>>
>>> Er, yes? Your value-construction procedure, which might be a
>>> stream-implementation, could use 'Input to ensure the data is valid, no?
>>> Then isn't the 'Read the value-construction?
>>
>>Neither. If you use stream you should do it properly, i.e. as the transport
>>layer. The transport is defined in terms of bytes, characters or other
>>units. So the only read here might be Character'Read.
> 
> Um, I think you completely misread what I was saying. You can use 'Read
> and 'Write to handle your import/export to the requisite format; the
> 'Write shouldn't be an issue (WRT validity; at this point your object
> should be valid and your writing it out should be straightforward),

No, Write is like as Read. It must be validated against the prescribed
format of the stream. Nothing in Ada standard mandates that. E.g. you take
the protocol which tells you that the number to be in the "Motorola
representation", 2 octets, range so and so, that describes some non-Ada
type to which your application type need to be converted, explicitly.

Usually it is bounds which make most work. Because sending an out-of-range
value might in some cases severely damage the device. Though I remember a
case when the communication protocol used IEEE 754 and the problem was with
non-numbers, like NaN. It is also not so uncommon that some integer values
have a reserved meaning. E.g. 16#FFFF# might mean "start calibration
procedure", rather than +10V.

>'Valid might be a perfectly reasonable way to raise an exception; as I
> said before many DBs do not have ranges on their values; if your data (say
> for a simulation) is exported to a DB, then any import needs to check that
> the constraints on the fields [present in the program, not present in the
> DB] have not been broken. (A common cause would be because of some SQL
> update-statement some user ran.) Why would using 'Valid be the wrong way
> to go if those constraints were strictly subranges?

As I said, what you describe is a wrong pattern of implementation
communication with an external party. The proper pattern, I think I
described it already, does not require 'Valid.

BTW, when you talk about constraints, that let me suggest another problem
you have. You seem to use same numeric type for many things. In Ada it is
considered good practice to declare many disparate types rather than
subtypes of Integer. This is "weak" vs. "strong" typing POV.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-28  7:37                               ` Dmitry A. Kazakov
@ 2012-07-28 18:32                                 ` Shark8
  0 siblings, 0 replies; 88+ messages in thread
From: Shark8 @ 2012-07-28 18:32 UTC (permalink / raw)
  Cc: mailbox

On Saturday, July 28, 2012 1:37:30 AM UTC-6, Dmitry A. Kazakov wrote:
> 
> BTW, when you talk about constraints, that let me suggest another problem
> you have. You seem to use same numeric type for many things. In Ada it is
> considered good practice to declare many disparate types rather than
> subtypes of Integer. This is "weak" vs. "strong" typing POV.

This is true, and I have no problem with having multiple types. *However*, it should be noted that using subtyping may be precisely what is needed; perhaps the datatype in the simulation goes through intermediate calculations where the sign is negative but the result is always Positive or Natural.

Vectors in math/physics is a good example -- there is something *very* wrong if you try loading vectors with a negative magnitude -- where having the magnitude in 2's complement might make it easier for some calculations.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 10:01 ` robin.vowels
@ 2012-07-30  6:31   ` Jacob Sparre Andersen
  2012-07-30  7:16     ` Keean Schupke
  2012-07-30  9:20     ` Georg Bauhaus
  0 siblings, 2 replies; 88+ messages in thread
From: Jacob Sparre Andersen @ 2012-07-30  6:31 UTC (permalink / raw)


robin.vowels@gmail.com writes:

> On Monday, 16 July 2012 04:40:08 UTC+10, Keean Schupke  wrote:

>> The Monte-Carlo simulator I am working on is now performing better in
>> Ada than in C++ using profile-guided compilation (57k simulations per
>> second for Ada vs 56k simulations per second for C++).
>
> One part in 59 is not a significant difference.

That depends on the variance.  But reporting performance numbers without
the corresponding variance/standard deviation is very bad style.

Greetings,

Jacob
-- 
"How may I be honest with you today?"



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-30  6:31   ` Jacob Sparre Andersen
@ 2012-07-30  7:16     ` Keean Schupke
  2012-07-30  9:20     ` Georg Bauhaus
  1 sibling, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-30  7:16 UTC (permalink / raw)


On Monday, 30 July 2012 07:31:58 UTC+1, Jacob Sparre Andersen  wrote:
> robin.vowels writes:
> 
> 
> 
> > On Monday, 16 July 2012 04:40:08 UTC+10, Keean Schupke  wrote:
> 
> 
> 
> >> The Monte-Carlo simulator I am working on is now performing better in
> 
> >> Ada than in C++ using profile-guided compilation (57k simulations per
> 
> >> second for Ada vs 56k simulations per second for C++).
> 
> >
> 
> > One part in 59 is not a significant difference.
> 
> 
> 
> That depends on the variance.  But reporting performance numbers without
> 
> the corresponding variance/standard deviation is very bad style.
> 
> 
> 
> Greetings,
> 
> 
> 
> Jacob
> 
> -- 
> 
> "How may I be honest with you today?"


C++: avg=56.415 var=0.001
Ada: avg=57.712 var=0.001

Cheers,
Keean.
 




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-30  6:31   ` Jacob Sparre Andersen
  2012-07-30  7:16     ` Keean Schupke
@ 2012-07-30  9:20     ` Georg Bauhaus
  2012-07-30 14:04       ` Keean Schupke
  1 sibling, 1 reply; 88+ messages in thread
From: Georg Bauhaus @ 2012-07-30  9:20 UTC (permalink / raw)


On 30.07.12 08:31, Jacob Sparre Andersen wrote:
> robin.vowels@gmail.com writes:
> 
>> On Monday, 16 July 2012 04:40:08 UTC+10, Keean Schupke  wrote:
> 
>>> The Monte-Carlo simulator I am working on is now performing better in
>>> Ada than in C++ using profile-guided compilation (57k simulations per
>>> second for Ada vs 56k simulations per second for C++).
>>
>> One part in 59 is not a significant difference.
> 
> That depends on the variance.  But reporting performance numbers without
> the corresponding variance/standard deviation is very bad style.

When I see fluctuations, and the system is otherwise idle,
I'll typically assume that there is something random going on
in the program or in its run-time support system, or that
some contemporary CPU lets its cores run at different
speed without me knowing a way to influence that (by, e.g.,
asking for a constant number of ins per sec that would not
create too much heat (that would force fewer ins per sec)).
Because otherwise, how could the sequence of instructions
create fluctuations if it is not self-modifying?

In this case, the inputs seem of fixed size, tailored to
the caches' sizes in particular, so I'll assume that the results
show that it is not universally true that equivalent programs
written in Ada or C-17 cannot be as fast as the other.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-30  9:20     ` Georg Bauhaus
@ 2012-07-30 14:04       ` Keean Schupke
  0 siblings, 0 replies; 88+ messages in thread
From: Keean Schupke @ 2012-07-30 14:04 UTC (permalink / raw)


On Monday, 30 July 2012 10:20:48 UTC+1, Georg Bauhaus  wrote:
> On 30.07.12 08:31, Jacob Sparre Andersen wrote:
> 
> > robin.vowel writes:
> 
> > 
> 
> >> On Monday, 16 July 2012 04:40:08 UTC+10, Keean Schupke  wrote:
> 
> > 
> 
> >>> The Monte-Carlo simulator I am working on is now performing better in
> 
> >>> Ada than in C++ using profile-guided compilation (57k simulations per
> 
> >>> second for Ada vs 56k simulations per second for C++).
> 
> >>
> 
> >> One part in 59 is not a significant difference.
> 
> > 
> 
> > That depends on the variance.  But reporting performance numbers without
> 
> > the corresponding variance/standard deviation is very bad style.
> 
> 
> 
> When I see fluctuations, and the system is otherwise idle,
> 
> I'll typically assume that there is something random going on
> 
> in the program or in its run-time support system, or that
> 
> some contemporary CPU lets its cores run at different
> 
> speed without me knowing a way to influence that (by, e.g.,
> 
> asking for a constant number of ins per sec that would not
> 
> create too much heat (that would force fewer ins per sec)).
> 
> Because otherwise, how could the sequence of instructions
> 
> create fluctuations if it is not self-modifying?
> 
> 
> 
> In this case, the inputs seem of fixed size, tailored to
> 
> the caches' sizes in particular, so I'll assume that the results
> 
> show that it is not universally true that equivalent programs
> 
> written in Ada or C-17 cannot be as fast as the other.


I agree with this. If you want more accuracy can write a kernel module to run the timed critical code without interrupts.


Cheers,
Keean.



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-22 18:21     ` Keean Schupke
@ 2012-07-30 14:18       ` Jacob Sparre Andersen
  0 siblings, 0 replies; 88+ messages in thread
From: Jacob Sparre Andersen @ 2012-07-30 14:18 UTC (permalink / raw)


Keean Schupke wrote:

> So with the relative addressing code, stepping through the subset is
> dereference link, add constant[+/- row_size * element_size] and
> constant[+/- element_size] to access the neighbours, then dereference
> next link etc. The size of the array are generic parameters, so I
> presume the constants get pre-calculated at elaboration time.

If the generic is instantiated at compile-time this is likely to be
true, but I doubt it is the case for run-time instantiations.

Greetings,

Jacob
-- 
"For there are only two reasons why war is made against a
 republic: The one, to become lord over her: the other, the
 fear of being occupied by her."       -- Nicolo Machiavelli



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-26  8:38             ` Keean Schupke
                                 ` (2 preceding siblings ...)
  2012-07-26 19:21               ` stefan-lucks
@ 2012-07-31  3:12               ` Randy Brukardt
  3 siblings, 0 replies; 88+ messages in thread
From: Randy Brukardt @ 2012-07-31  3:12 UTC (permalink / raw)


"Keean Schupke" <keean.schupke@googlemail.com> wrote in message 
news:b18e5dd8-cd07-49df-8011-22e911dc1da0@googlegroups.com...
On Thursday, 26 July 2012 01:15:25 UTC+1, Randy Brukardt  wrote:
...
>> Ada scalar objects can be in a state known as &quot;invalid&quot; (see 
>> 13.9.1 in your
>> Ada Reference Manual). For integer types, this corresponds to a value out 
>> of
>> range. Ada compilers are always allowed to return invalid results from
>> scalar operations, with a few exceptions. So in general, an Ada compiler
>> does not actually need to make range checks on scalar objects. The only 
>> time
>> that a check on a scalar object *has* to be performed is when using the
>> value could lead to erroneous execution (such as in indexing out of 
>> range).
>
>This is an equivalent concept to nullable, so two questions:
>
>What is the literal for 'invalid', IE:
>
>X : Index := #INVALID -- we should be able to initialise to invalid.

A subtype doesn't necessarily have any invalid values, so this is impossible 
in general. For instance, you can't write this for type Integer.

Pragma Normalize_Scalars does what you want if that's possible, but it 
applies to all objects in a program. (There's no way to do use it on a 
case-by-case basis.)

>if X = #INVALID -- I want to be able to test for invalid.

if not X'Valid then

will do the trick.

                                         Randy. 





^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: Efficient Sequential Access to Arrays
  2012-07-27  8:21                   ` Keean Schupke
  2012-07-27  8:50                     ` Dmitry A. Kazakov
@ 2013-02-25 21:18                     ` Eryndlia Mavourneen
  1 sibling, 0 replies; 88+ messages in thread
From: Eryndlia Mavourneen @ 2013-02-25 21:18 UTC (permalink / raw)


On Friday, July 27, 2012 3:21:18 AM UTC-5, Keean Schupke wrote:
> 
> I am wondering if there is a way to use address arithmetic that would be just as safe as array indexing? ...


Perhaps something along the lines of:

with System, System.Storage_Elements;
use  System, System.Storage_Elements;

package body Keean_Pkg is

   type Value_Type is limited private;
   type Index_Type is range <>;
   type Array_Type is array (Index_Type) of Value_Type;

   Element_Size : constant Storage_Offset := A'Component_Size / Storage_Unit;
   Current_A    : Address;

begin

   Current_A := Some_Element'Address; -- address of 1st element to be processed

   Processing_Loop: ... loop
      Declare_A: declare
         A : Array_Type;
         for A'Address use Current_A;
      begin
         -- Perform processing on the element of array A
         ...

         Current_A := Current_A + Element_Size;
      end Declare_A;
   end loop Processing_Loop;

end Keean_Pkg;



^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2013-02-25 21:18 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-15 18:40 Efficient Sequential Access to Arrays Keean Schupke
2012-07-15 19:48 ` Dmitry A. Kazakov
2012-07-15 20:03   ` Keean Schupke
2012-07-15 20:56     ` Keean Schupke
2012-07-16 11:34       ` Jacob Sparre Andersen
2012-07-15 21:05     ` tmoran
2012-07-15 21:08       ` Keean Schupke
2012-07-16  1:19         ` tmoran
2012-07-15 21:44     ` Georg Bauhaus
2012-07-16  7:47     ` Dmitry A. Kazakov
2012-07-16 10:16       ` Keean Schupke
2012-07-16 14:57         ` Dmitry A. Kazakov
2012-07-16 16:39           ` Keean Schupke
2012-07-16 17:02             ` Georg Bauhaus
2012-07-16 18:48             ` Dmitry A. Kazakov
2012-07-16 19:12               ` Keean Schupke
2012-07-17  6:31                 ` Dmitry A. Kazakov
2012-07-17  6:50                   ` Georg Bauhaus
2012-07-17  8:42                     ` Dmitry A. Kazakov
2012-07-17  7:24                   ` Keean Schupke
2012-07-16 19:43             ` John B. Matthews
2012-07-16 20:44               ` Keean Schupke
2012-07-16 22:23             ` tmoran
2012-07-17  6:40               ` Keean Schupke
2012-07-17  9:01                 ` Dmitry A. Kazakov
2012-07-18  8:10                   ` Keean Schupke
2012-07-18 18:11                     ` tmoran
2012-07-19 10:41                       ` Keean Schupke
2012-07-19 11:18                         ` Georg Bauhaus
2012-07-19 19:58                           ` Keean Schupke
2012-07-21  8:24                             ` Keean Schupke
2012-07-21 11:52                               ` Georg Bauhaus
2012-07-21 16:14                                 ` Keean Schupke
2012-07-21 17:09                                   ` Keean Schupke
     [not found]                                   ` <i78m081tccmp078spmsei1l5vnj3k0kbkj@invalid.netcom.com>
2012-07-22 15:50                                     ` Keean Schupke
2012-07-22  5:13                               ` Shark8
2012-07-15 21:35   ` J-P. Rosen
2012-07-15 19:52 ` Shark8
2012-07-15 20:16   ` Keean Schupke
2012-07-15 21:33     ` Georg Bauhaus
2012-07-17 18:16 ` Florian Weimer
2012-07-22 10:01 ` robin.vowels
2012-07-30  6:31   ` Jacob Sparre Andersen
2012-07-30  7:16     ` Keean Schupke
2012-07-30  9:20     ` Georg Bauhaus
2012-07-30 14:04       ` Keean Schupke
2012-07-22 10:09 ` robin.vowels
2012-07-22 16:02   ` Keean Schupke
2012-07-22 16:21 ` Florian Weimer
2012-07-22 16:46   ` Keean Schupke
2012-07-22 18:41     ` Florian Weimer
2012-07-24  8:21       ` Keean Schupke
2012-07-22 17:34   ` Niklas Holsti
2012-07-22 18:21     ` Keean Schupke
2012-07-30 14:18       ` Jacob Sparre Andersen
2012-07-24 16:00     ` robin.vowels
2012-07-25  7:09       ` Niklas Holsti
2012-07-25  9:03         ` Keean Schupke
2012-07-26  0:15           ` Randy Brukardt
2012-07-26  8:38             ` Keean Schupke
2012-07-26 10:10               ` Brian Drummond
2012-07-26 12:32                 ` Keean Schupke
2012-07-26 13:46                   ` Dmitry A. Kazakov
2012-07-26 17:40                     ` Shark8
2012-07-26 18:56                       ` Dmitry A. Kazakov
2012-07-27  5:25                         ` Shark8
2012-07-27  6:28                           ` Dmitry A. Kazakov
2012-07-27  7:01                             ` Vasiliy Molostov
2012-07-27  8:48                               ` Dmitry A. Kazakov
2012-07-27 12:01                                 ` Georg Bauhaus
2012-07-27 16:49                                 ` Vasiliy Molostov
2012-07-27 19:38                                   ` Dmitry A. Kazakov
2012-07-28  5:32                             ` Shark8
2012-07-28  7:37                               ` Dmitry A. Kazakov
2012-07-28 18:32                                 ` Shark8
     [not found]                     ` <6ov218tnkbqu3vpkuo4t77rd7de0a3aesf@invalid.netcom.com>
2012-07-26 18:49                       ` Dmitry A. Kazakov
     [not found]                         ` <86d31898ne39maimbl24knds7rf38qg7vc@invalid.netcom.com>
2012-07-27  6:45                           ` Dmitry A. Kazakov
2012-07-27  8:21                   ` Keean Schupke
2012-07-27  8:50                     ` Dmitry A. Kazakov
2012-07-27  9:17                       ` Keean Schupke
2013-02-25 21:18                     ` Eryndlia Mavourneen
2012-07-26 13:08               ` Georg Bauhaus
2012-07-26 19:37                 ` stefan-lucks
2012-07-26 19:21               ` stefan-lucks
2012-07-31  3:12               ` Randy Brukardt
2012-07-26  7:11           ` Markus Schöpflin
2012-07-26 14:47         ` Robin Vowels
2012-07-26 15:18           ` Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox