From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "G.B." Newsgroups: comp.lang.ada Subject: Re: Community Input for the Maintenance and Revision of the Ada Programming Language Date: Sat, 9 Sep 2017 16:48:07 +0200 Organization: A noiseless patient Spider Message-ID: References: <4dc188de-802b-41ad-9cdd-b8246eb9a1c7@googlegroups.com> <47cc6474-8b75-4644-92d0-bd1f694c20e7@googlegroups.com> <338b355a-dee4-4c73-b00e-09d9a8430fb1@googlegroups.com> Reply-To: nonlegitur@notmyhomepage.de Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sat, 9 Sep 2017 14:48:08 -0000 (UTC) Injection-Info: reader.eternal-september.org; posting-host="862771ff8e606bcd2901649a23c0bad4"; logging-data="1361"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+nhsbhDD0wTwCH4IvpJsRtS8ktuI3ogdU=" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 In-Reply-To: Cancel-Lock: sha1:sRNQfJS9IwWGmnVSeatg5f75e5o= Xref: news.eternal-september.org comp.lang.ada:47995 Date: 2017-09-09T16:48:07+02:00 List-Id: On 06.09.17 17:02, Johan Söderlind Åström wrote: > I do not know how Ada accesses a array element internally. But a zero indexed array must be cheaper than a custom indexed array, see: > > Accessing nth element from zero indexed array: > (base_address + index*element_size). > > Accessing nth element from a custom indexed array: > (base_address + (index - first)*element_size) > > I do not think that accessing a element from a array is a complicated thing and starting from a zero indexed array is even simpler, then why would a compiler not guarantee such a simple thing to work? It seems to work well the way it is? Is it a certainty that your assumptions describe how our processors are working? So, I checked, starting from a naive Average function, and another Average_0 function. Both perform the same algorithm with the exception of the parameter types: these are, respectively, an array type with Index range <> and a zero-based fixed size array type. (Index starts at 0.) This is the loop body of both functions: for V : Likeness of Difference loop Result := Result + V / Likeness'Base (Difference'Length); end loop; where Likeness is like Uniformly_Distributed from Ada.Numerics.Float_Random, and Difference is an array. The difference in the resulting loop bodies, on Intel, for Average and Average_0 is this, when passing options -O2 -gnatp -fno-inline to GNAT: L17: addq $1, %rax movss (%rcx,%rax,4), %xmm1 cmpq %rdx, %rax divss %xmm2, %xmm1 addss %xmm1, %xmm0 jne L17 L45: movss (%rdi), %xmm1 addq $4, %rdi cmpq %rax, %rdi divss %xmm2, %xmm1 addss %xmm1, %xmm0 jne L45 When I have the functions called 100_000 times, the calls result in execution time of some 50 seconds for either function. There are 6 instructions for either 0-based or free bounds. Maybe the loop isn't sufficiently dominated by addressing from one or two registers, given DIVSS? But first, using a modular type as the index type in place of a range type starting at 0, I get L24: movl %eax, %edx addq $1, %rax subq %rsi, %rdx cmpq %rax, %rcx movss (%rdi,%rdx,4), %xmm1 divss %xmm2, %xmm1 addss %xmm1, %xmm0 jne L24 L59: movss (%rdi,%rax,4), %xmm1 addq $1, %rax cmpq $100000, %rax divss %xmm2, %xmm1 addss %xmm1, %xmm0 jne L59 Still, no difference. So, using Sum now, here is the loop body of Sum and Sum_0: for V : Likeness of Difference loop Result := Result + Float (V); end loop; L63: movl %eax, %edx addq $1, %rax subq %rsi, %rdx cmpq %rax, %rcx addss (%rdi,%rdx,4), %xmm0 jne L63 L67: addss (%rdi,%rax,4), %xmm0 addq $1, %rax cmpq $100000, %rax jne L67 ret This result is independent of the choice of index type. I had expected the first block to be slower than the second due to SUB, but was wrong, seemingly. Durations are around 10 seconds now. So, if I haven't made some other stupid mistake, 0-based arrays appear to be a non-issue, at least on Intel CPUs.