From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "G.B." <bauhaus@notmyhomepage.invalid>
Newsgroups: comp.lang.ada
Subject: Re: Community Input for the Maintenance and Revision of the Ada
 Programming Language
Date: Sat, 9 Sep 2017 16:48:07 +0200
Organization: A noiseless patient Spider
Message-ID: <op0uv8$1ah$1@dont-email.me>
References: <oludae$seh$1@franka.jacob-sparre.dk>
 <4dc188de-802b-41ad-9cdd-b8246eb9a1c7@googlegroups.com>
 <47cc6474-8b75-4644-92d0-bd1f694c20e7@googlegroups.com>
 <338b355a-dee4-4c73-b00e-09d9a8430fb1@googlegroups.com>
 <ooflve$ukd$1@franka.jacob-sparre.dk>
 <e9e2556d-9b23-4e7f-9058-776dd506f73c@googlegroups.com>
Reply-To: nonlegitur@notmyhomepage.de
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 9 Sep 2017 14:48:08 -0000 (UTC)
Injection-Info: reader.eternal-september.org;
 posting-host="862771ff8e606bcd2901649a23c0bad4";
	logging-data="1361"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX1+nhsbhDD0wTwCH4IvpJsRtS8ktuI3ogdU="
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0)
 Gecko/20100101 Thunderbird/45.8.0
In-Reply-To: <e9e2556d-9b23-4e7f-9058-776dd506f73c@googlegroups.com>
Cancel-Lock: sha1:sRNQfJS9IwWGmnVSeatg5f75e5o=
Xref: news.eternal-september.org comp.lang.ada:47995
Date: 2017-09-09T16:48:07+02:00
List-Id: <comp.lang.ada>

On 06.09.17 17:02, Johan Söderlind Åström wrote:
> I do not know how Ada accesses a array element internally. But a zero indexed array must be cheaper than a custom indexed array, see:
>
> Accessing nth element from zero indexed array:
> (base_address + index*element_size).
>
> Accessing nth element from a custom indexed array:
> (base_address + (index - first)*element_size)
>
> I do not think that accessing a element from a array is a complicated thing and starting from a zero indexed array is even simpler, then why would a compiler not guarantee such a simple thing to work?

It seems to work well the way it is?
Is it a certainty that your assumptions describe
how our processors are working?

So, I checked, starting from a naive Average function, and
another Average_0 function. Both perform the same algorithm
with the exception of the parameter types: these are,
respectively, an array type with Index range <> and
a zero-based fixed size array type. (Index starts at 0.)

This is the loop body of both functions:

       for V : Likeness of Difference loop
          Result := Result + V / Likeness'Base (Difference'Length);
       end loop;

where Likeness is like Uniformly_Distributed from
Ada.Numerics.Float_Random, and Difference is an array.

The difference in the resulting loop bodies, on Intel,
for Average and Average_0 is this, when passing options
-O2 -gnatp -fno-inline to GNAT:

L17:
	addq	$1, %rax
	movss	(%rcx,%rax,4), %xmm1
	cmpq	%rdx, %rax
	divss	%xmm2, %xmm1
	addss	%xmm1, %xmm0
	jne	L17

L45:
	movss	(%rdi), %xmm1
	addq	$4, %rdi
	cmpq	%rax, %rdi
	divss	%xmm2, %xmm1
	addss	%xmm1, %xmm0
	jne	L45

When I have the functions called 100_000 times, the calls
result in execution time of some 50 seconds for either function.

There are 6 instructions for either 0-based or free bounds.

Maybe the loop isn't sufficiently dominated by addressing
from one or two registers, given DIVSS?

But first, using a modular type as the index type in place
of a range type starting at 0, I get

L24:
	movl	%eax, %edx
	addq	$1, %rax
	subq	%rsi, %rdx
	cmpq	%rax, %rcx
	movss	(%rdi,%rdx,4), %xmm1
	divss	%xmm2, %xmm1
	addss	%xmm1, %xmm0
	jne	L24

L59:
	movss	(%rdi,%rax,4), %xmm1
	addq	$1, %rax
	cmpq	$100000, %rax
	divss	%xmm2, %xmm1
	addss	%xmm1, %xmm0
	jne	L59

Still, no difference. So, using Sum now, here is the loop body
of Sum and Sum_0:

       for V : Likeness of Difference loop
          Result := Result + Float (V);
       end loop;


L63:
	movl	%eax, %edx
	addq	$1, %rax
	subq	%rsi, %rdx
	cmpq	%rax, %rcx
	addss	(%rdi,%rdx,4), %xmm0
	jne	L63

L67:
	addss	(%rdi,%rax,4), %xmm0
	addq	$1, %rax
	cmpq	$100000, %rax
	jne	L67
	ret

This result is independent of the choice of index type.
I had expected the first block to be slower than
the second due to SUB, but was wrong, seemingly.
Durations are around 10 seconds now.

So, if I haven't made some other stupid mistake, 0-based
arrays appear to be a non-issue, at least on Intel CPUs.