* Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? @ 2010-02-15 10:58 Colin Paul Gloster 2010-02-15 13:02 ` John B. Matthews ` (4 more replies) 0 siblings, 5 replies; 21+ messages in thread From: Colin Paul Gloster @ 2010-02-15 10:58 UTC (permalink / raw) Hello, I have been improving a program by suppressing C++ in it. After speeding it up a lot by making changes, I have found one considerable part which calls a library routine, which is unfortunately very slow as provided as standard with a number of Ada compilers but which is very fast with implementations of other languages. For some Ada compilers perhaps I shall simply need to not rely on an implementation of this routine by an Ada vendor. Perhaps this post will motivate Ada vendors to speed up their implementations or for people to report timings for efficient implementations which are used by default by other Ada compilers. At present the routine (which is still being called by a C++ part of the code) takes up maybe approximately one per cent of the running time. (I have profiled it but I do not have the timing profiles available where I am typing this.) This might not sound as if it is much and it is certainly much less than much of the C++ overhead which I have removed, but it is trivial to speed it up and speeding up this one per cent or so would reduce running time by hours. To contrast different implementations, I have produced the C++ program #include<iostream> #include<cmath> int main() { double answer = 0.0; for(int I=1; I<=1000000; ++I) { #include"body_in_CPlusPlus.cc" } std::cout << answer; } where body_in_CPlusPlus.cc contained answer += std::log10(0.10000000000000000000 ); /*Lines produced by for((i=1; i<=500; ++i)) do echo -n 'answer += std::log10(0' ; echo "$i/10" | bc -l ; echo ');'; done */ answer += std::log10(050.00000000000000000000 ); and the Ada program with Ada.Numerics.Generic_Elementary_Functions; with Interfaces.C; with Ada.Text_IO; procedure Logarithmic_Work_In_Ada is answer : Interfaces.C.double := 0.0; package double_library is new Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); package double_output_library is new Ada.Text_IO.Float_IO(Interfaces.C.double); begin for I in 1 .. 1_000_000 loop --I would not have this loop but one compiler crashed if --the source file was sized a few megabytes. answer := Interfaces.C."+"(answer, double_library.log(0.10000000000000000000 , 10.0 )); --Lines produced by --for((i=1; i<=500; ++i)) do echo -n 'answer := Interfaces.C."+"(answer, double_library.log(0' ; echo "$i/10" | bc -l ; echo ', 10.0 ));'; done answer := Interfaces.C."+"(answer, double_library.log(050.00000000000000000000 , 10.0 )); end loop; double_output_library.Put(answer); end; The aforementioned code which I have been speeding up did not consist of merely 500 million calls to std::log10() as consecutive calls, but instead it consisted of hundreds of times as many calls to std::log10() interspersed across maybe 200 other functions. Of the two programs shown, the fastest C++ implementation on one test platform took less than one millisecond and the fastest Ada implementation took one minute and 31 seconds and 874 milliseconds on the same platform. Both g++ and gnatmake were from the same installation of GCC 4.1.2 20080704 (Red Hat 4.1.2-44). g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus.cc -o logarithmic_work_in_CPlusPlus_with_-ffast-math time ./logarithmic_work_in_CPlusPlus_with_-ffast-math 6.34086e+08 real 0m0.005s user 0m0.000s sys 0m0.000s g++ -O3 logarithmic_work_in_CPlusPlus.cc -o logarithmic_work_in_CPlusPlus time ./logarithmic_work_in_CPlusPlus 6.34086e+08 real 0m46.443s user 0m46.435s sys 0m0.000s gnatmake -O3 -ffast-math logarithmic_work_in_Ada.adb -o logarithmic_work_in_Ada_compiled_by_GNAT_with_-ffast-math (which did gcc -c -O3 -ffast-math logarithmic_work_in_Ada.adb logarithmic_work_in_Ada.adb:4:11: warning: file name does not match unit name, should be "logarithmic_work_in_ada.adb" gnatbind -x logarithmic_work_in_Ada.ali gnatlink logarithmic_work_in_Ada.ali -ffast-math -o logarithmic_work_in_Ada_compiled_by_GNAT_with_-ffast-mat whereas trying -cargs resulted in no compilation) time ./logarithmic_work_in_Ada_compiled_by_GNAT_with_-ffast-math 6.34086408536266E+08 real 1m31.879s user 1m31.874s sys 0m0.000s gnatmake -O3 logarithmic_work_in_Ada.adb -o logarithmic_work_in_Ada_compiled_by_GNAT time ./logarithmic_work_in_Ada_compiled_by_GNAT 6.34086408536266E+08 real 1m33.338s user 1m33.338s sys 0m0.000s proprietary_compiler -O -m logarithmic_work_in_Ada.adb -o logarithmic_work_in_Ada_compiled_by_a_proprietary_compiler time ./logarithmic_work_in_Ada_compiled_by_a_proprietary_compiler 6.34086408605593E+08 real 1m41.966s user 1m41.954s sys 0m0.000s ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 10:58 Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? Colin Paul Gloster @ 2010-02-15 13:02 ` John B. Matthews 2010-02-15 14:17 ` Colin Paul Gloster 2010-02-15 14:54 ` jonathan ` (3 subsequent siblings) 4 siblings, 1 reply; 21+ messages in thread From: John B. Matthews @ 2010-02-15 13:02 UTC (permalink / raw) In article <alpine.LNX.2.00.1002151055530.17315@Bluewhite64.example.net>, Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> wrote: > gnatmake -O3 logarithmic_work_in_Ada.adb -o > logarithmic_work_in_Ada_compiled_by_GNAT > > time ./logarithmic_work_in_Ada_compiled_by_GNAT > 6.34086408536266E+08 > > real 1m33.338s > user 1m33.338s > sys 0m0.000s I get a different answer: 698970 = 1000000 * (log10(50) - 1). $ make clean logada ; time ./logada rm -f *.o *.ali b~* core logada gnatmake logada -cargs -O3 -gnatwa -bargs -shared -largs -dead_strip gcc -c -O3 -gnatwa logada.adb gnatbind -shared -x logada.ali gnatlink logada.ali -shared-libgcc -dead_strip 6.98970004334243E+05 real 0m0.138s user 0m0.136s sys 0m0.002s -- John B. Matthews trashgod at gmail dot com <http://sites.google.com/site/drjohnbmatthews> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 13:02 ` John B. Matthews @ 2010-02-15 14:17 ` Colin Paul Gloster 2010-02-15 17:19 ` John B. Matthews 0 siblings, 1 reply; 21+ messages in thread From: Colin Paul Gloster @ 2010-02-15 14:17 UTC (permalink / raw) On Mon, 15 Feb 2010, John B. Matthews wrote: |--------------------------------------------------------------------| |"In article | |<alpine.LNX.2.00.1002151055530.17315@Bluewhite64.example.net>, | | Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> wrote: | | | |> gnatmake -O3 logarithmic_work_in_Ada.adb -o | |> logarithmic_work_in_Ada_compiled_by_GNAT | |> | |> time ./logarithmic_work_in_Ada_compiled_by_GNAT | |> 6.34086408536266E+08 | |> | |> real 1m33.338s | |> user 1m33.338s | |> sys 0m0.000s | | | |I get a different answer: 698970 = 1000000 * (log10(50) - 1). | | | |$ make clean logada ; time ./logada | |rm -f *.o *.ali b~* core logada | |gnatmake logada -cargs -O3 -gnatwa -bargs -shared -largs -dead_strip| |gcc -c -O3 -gnatwa logada.adb | |gnatbind -shared -x logada.ali | |gnatlink logada.ali -shared-libgcc -dead_strip | | 6.98970004334243E+05 | | | |real 0m0.138s | |user 0m0.136s | |sys 0m0.002s | | | |-- | |John B. Matthews | |trashgod at gmail dot com | |<http://sites.google.com/site/drjohnbmatthews> " | |--------------------------------------------------------------------| Hi, Thanks for running it. The Ada program for timing had 500 statements in the body of the loop. I reproduced only the first and last verbatim: I showed a bash (Bourne Again SHell) line for producing all 500 statements in a comment in-between the the first statement and the last statement. You ran a program with 498 of the statements missing. Anyway, the answer produced by the program is not so much of concern as the relative speeds of different implementations. Did g++ produce a faster result for you than GNAT? It did for me for many versions of GCC today on a different platform than I used in the beginning of this thread... g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus.cc -o logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.4.3_with_-ffast-math time ./logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.4.3_with_-ffast-math 6.34086e+08 real 0m0.012s user 0m0.008s sys 0m0.004s g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus.cc -o logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.3.2_with_-ffast-math time ./logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.3.2_with_-ffast-math 6.34086e+08 real 0m0.012s user 0m0.012s sys 0m0.000s g++ -O3 logarithmic_work_in_CPlusPlus.cc -o logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.3.2 time ./logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.3.2 6.34086e+08 real 0m0.567s user 0m0.560s sys 0m0.000s g++ -O3 logarithmic_work_in_CPlusPlus.cc -o logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.4.3 time ./logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.4.3 6.34086e+08 real 0m0.566s user 0m0.564s sys 0m0.004s g++ -O3 logarithmic_work_in_CPlusPlus.cc -o logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.3.3 time ./logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.3.3 6.34086e+08 real 0m0.583s user 0m0.572s sys 0m0.004s gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada.adb -o Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.4.3_with_-ffast-math time ./Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.4.3_with_-ffast-math 6.34086408536266E+08 real 0m31.618s user 0m31.618s sys 0m0.000s gnatmake -O3 Logarithmic_Work_In_Ada.adb -o Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.4.3 time ./Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.4.3 6.34086408606382E+08 real 0m32.750s user 0m32.746s sys 0m0.004s gnatmake -O3 Logarithmic_Work_In_Ada.adb -o Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.3.2 time ./Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.3.2 6.34086408606382E+08 real 0m33.537s user 0m33.506s sys 0m0.004s gnatmake -O3 Logarithmic_Work_In_Ada.adb -o Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.3.3 time ./Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.3.3 6.34086408606382E+08 real 0m33.717s user 0m33.718s sys 0m0.000s g++ -O3 logarithmic_work_in_CPlusPlus.cc -o logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.2.4 time ./logarithmic_work_in_CPlusPlus_compiled_by_64bit_GCC4.2.4 6.34086e+08 real 0m34.000s user 0m33.982s sys 0m0.000s gnatmake -O3 Logarithmic_Work_In_Ada.adb -o Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.2.4 time ./Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.2.4 6.34086408606382E+08 real 0m34.037s user 0m34.034s sys 0m0.004s gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada.adb -o Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.3.2_with_-ffast-math time ./Logarithmic_Work_In_Ada_compiled_by_64bit_GCC4.3.2_with_-ffast-math 6.34086408536266E+08 real 0m35.093s user 0m35.090s sys 0m0.004s ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 14:17 ` Colin Paul Gloster @ 2010-02-15 17:19 ` John B. Matthews 0 siblings, 0 replies; 21+ messages in thread From: John B. Matthews @ 2010-02-15 17:19 UTC (permalink / raw) In article <alpine.LNX.2.00.1002151339400.17315@Bluewhite64.example.net>, Colin Paul Gloster <Colin_Paul_Gloster@ACM.org> wrote: > Thanks for running it. The Ada program for timing had 500 statements > in the body of the loop. I reproduced only the first and last > verbatim: I showed a bash (Bourne Again SHell) line for producing > all 500 statements in a comment in-between the the first statement > and the last statement. You ran a program with 498 of the statements > missing. Ah, source level compression! > Anyway, the answer produced by the program is not so much of concern > as the relative speeds of different implementations. Did g++ > produce a faster result for you than GNAT? It did for me for many > versions of GCC today on a different platform than I used in the > beginning of this thread... Retesting both 500+ line programs produces results similar to yours: $ make test Darwin: gcc 4.3.4 rm -f *.o *.ali b~* core rm -f *.s temp.txt logada logcpp gnatmake logada -cargs -O3 -gnatnp -bargs -shared -largs -ffast-math -dead_strip gcc -c -O3 -gnatnp logada.adb gnatbind -shared -x logada.ali gnatlink logada.ali -shared-libgcc -ffast-math -dead_strip g++ -O3 logcpp.cc -o logcpp time ./logcpp 6.35785e+08 0.57 real 0.56 user 0.00 sys time ./logada 6.35785378608776E+08 28.15 real 28.10 user 0.02 sys -- John B. Matthews trashgod at gmail dot com <http://sites.google.com/site/drjohnbmatthews> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 10:58 Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? Colin Paul Gloster 2010-02-15 13:02 ` John B. Matthews @ 2010-02-15 14:54 ` jonathan 2010-02-15 15:04 ` jonathan 2010-02-15 18:26 ` (see below) ` (2 subsequent siblings) 4 siblings, 1 reply; 21+ messages in thread From: jonathan @ 2010-02-15 14:54 UTC (permalink / raw) On Feb 15, 10:58 am, Colin Paul Gloster <Colin_Paul_Glos...@ACM.org> wrote: > Hello, > > I have been improving a program by suppressing C++ in it. After > speeding it up a lot by making changes, I have found one considerable > part which calls a library routine, which is unfortunately very slow > as provided as standard with a number of Ada compilers but which is > very fast with implementations of other languages.... Here is my own bench ... easier to play with: with ada.numerics.generic_elementary_functions; with text_io; use text_io; procedure log_bench is type Real is digits 15; package Math is new ada.numerics.generic_elementary_functions (Real); use Math; x, y : Real := 0.1; -- might (or might not!) be faster to use: -- log_base_10 (x) = log_base_10_of_e * log_base_e (x) Log_base_10_of_e : constant := 0.434_294_481_903_251_827_651_129; begin for i in 1 .. 1_000_000 loop x := x + Log_base_10_of_e * Log (y); y := y + 0.0000000000001; end loop; put (Real'Image(x)); end log_bench; gnatmake gnatnp -O2 -march=native log_bench.adb ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 14:54 ` jonathan @ 2010-02-15 15:04 ` jonathan 2010-02-15 19:50 ` sjw 0 siblings, 1 reply; 21+ messages in thread From: jonathan @ 2010-02-15 15:04 UTC (permalink / raw) On Feb 15, 2:54 pm, jonathan <johns...@googlemail.com> wrote: > On Feb 15, 10:58 am, Colin Paul Gloster <Colin_Paul_Glos...@ACM.org> > wrote: > > > Hello, > > > I have been improving a program by suppressing C++ in it. After > > speeding it up a lot by making changes, I have found one considerable > > part which calls a library routine, which is unfortunately very slow > > as provided as standard with a number of Ada compilers but which is > > very fast with implementations of other languages.... > > Here is my own bench ... easier to play with: > > with ada.numerics.generic_elementary_functions; > with text_io; use text_io; > > procedure log_bench is > > type Real is digits 15; > package Math is new ada.numerics.generic_elementary_functions > (Real); > use Math; > x, y : Real := 0.1; > > -- might (or might not!) be faster to use: > -- log_base_10 (x) = log_base_10_of_e * log_base_e (x) > > Log_base_10_of_e : constant := 0.434_294_481_903_251_827_651_129; > > begin > > for i in 1 .. 1_000_000 loop > x := x + Log_base_10_of_e * Log (y); > y := y + 0.0000000000001; > end loop; > put (Real'Image(x)); > > > > gnatmake gnatnp -O2 -march=native log_bench.adb Opps, accidently hit the send button! Continuing discussion: gnatmake gnatnp -O2 -march=native log_bench.adb -9.99999682845800E+05 real 0m0.024s user 0m0.024s sys 0m0.000s NOW, comment out the y := y + ... so that the inner loop is constant, and use the same compilation switches: gnatmake gnatnp -O2 -march=native log_bench.adb -9.99999900000000E+05 real 0m0.002s user 0m0.000s sys 0m0.000s You see now what's happening. With the gnatn switch the compiler is smart enough to call the Log just once, rather than 10**6 times. If you remove the -gnatn or -gnatN switches, then it runs in 0m0.024s again. Jonathan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 15:04 ` jonathan @ 2010-02-15 19:50 ` sjw 2010-02-16 16:50 ` Colin Paul Gloster 0 siblings, 1 reply; 21+ messages in thread From: sjw @ 2010-02-15 19:50 UTC (permalink / raw) On Feb 15, 3:04 pm, jonathan <johns...@googlemail.com> wrote: > On Feb 15, 2:54 pm, jonathan <johns...@googlemail.com> wrote: > > On Feb 15, 10:58 am, Colin Paul Gloster <Colin_Paul_Glos...@ACM.org> > > wrote: > > > > Hello, > > > > I have been improving a program by suppressing C++ in it. After > > > speeding it up a lot by making changes, I have found one considerable > > > part which calls a library routine, which is unfortunately very slow > > > as provided as standard with a number of Ada compilers but which is > > > very fast with implementations of other languages.... > > > Here is my own bench ... easier to play with: > > > with ada.numerics.generic_elementary_functions; > > with text_io; use text_io; > > > procedure log_bench is > > > type Real is digits 15; > > package Math is new ada.numerics.generic_elementary_functions > > (Real); > > use Math; > > x, y : Real := 0.1; > > > -- might (or might not!) be faster to use: > > -- log_base_10 (x) = log_base_10_of_e * log_base_e (x) > > > Log_base_10_of_e : constant := 0.434_294_481_903_251_827_651_129; > > > begin > > > for i in 1 .. 1_000_000 loop > > x := x + Log_base_10_of_e * Log (y); > > y := y + 0.0000000000001; > > end loop; > > put (Real'Image(x)); > > > gnatmake gnatnp -O2 -march=native log_bench.adb > > Opps, accidently hit the send button! Continuing discussion: > > gnatmake gnatnp -O2 -march=native log_bench.adb > > -9.99999682845800E+05 > real 0m0.024s > user 0m0.024s > sys 0m0.000s > > NOW, comment out the y := y + ... so that the inner loop > is constant, and use the same compilation switches: > > gnatmake gnatnp -O2 -march=native log_bench.adb > > -9.99999900000000E+05 > real 0m0.002s > user 0m0.000s > sys 0m0.000s > > You see now what's happening. With the gnatn switch the > compiler is smart enough to call the Log just once, rather > than 10**6 times. > > If you remove the -gnatn or -gnatN switches, then it runs in > 0m0.024s again. The trouble is that that benchmark does something other than Colin's! This might be a more accurate translation: with Ada.Numerics.Generic_Elementary_Functions; with Text_IO; use Text_IO; procedure Log_Bench_0 is type Real is digits 15; package Math is new Ada.Numerics.Generic_Elementary_Functions (Real); use Math; Answer : Real := 0.0; Log_Base_10_Of_E : constant := 0.434_294_481_903_251_827_651_129; begin for I in 1 .. 1_000_000 loop declare X : Real := 0.1; begin for J in 1 .. 500 loop Answer := Answer + Log_Base_10_Of_E * Log (X); X := X + 0.1; end loop; end; end loop; Put (Real'Image(Answer)); end Log_Bench_0; I've tried inlining GNAT's implementation (GCC 4.5.0, x86_64-aqpple- darwin10.2.0) and even just calling up the C log10 routine using an inline. None was very impressive compared to the g++ result. Colin's time: 37s Jonathan's time (-O3 -ffast-math -gnatp): 16s Jonathan;s time (-O3 -ffast-math -gnatp -gnatN -funroll-loops): 14s Jonathan's time (same opts, but using C log10()): 11s so we still have 3 orders of magnitude to go to get to the g++ result: 0.02s This is my final version, with the inlined GNAT implementation too: with Ada.Numerics.Generic_Elementary_Functions; with System.Machine_Code; use System.Machine_Code; with Text_IO; use Text_IO; procedure Log_Bench is type Real is digits 15; package Math is new Ada.Numerics.Generic_Elementary_Functions (Real); use Math; Answer : Real := 0.0; Log_Base_10_Of_E : constant := 0.434_294_481_903_251_827_651_129; function LogM (X : Real) return Real; pragma Inline_Always (LogM); function LogM (X : Real) return Real is Result : Real; NL : constant String := ASCII.LF & ASCII.HT; begin Asm (Template => "fldln2 " & NL & "fxch " & NL & "fyl2x " & NL, Outputs => Real'Asm_Output ("=t", Result), Inputs => Real'Asm_Input ("0", X)); return Result; end LogM; function LogL (X : Real) return Real; pragma Import (C, LogL, "log10"); begin for I in 1 .. 1_000_000 loop declare X : Real := 0.1; begin for J in 1 .. 500 loop -- Answer := Answer + Log_Base_10_Of_E * LogM (X); Answer := Answer + LogL (X); X := X + 0.1; end loop; end; end loop; Put (Real'Image(Answer)); end Log_Bench; This is a fragment of the assembler output from the g++ code: movsd LC0(%rip), %xmm0 call _log10 movsd %xmm0, -3976(%rbp) movsd LC1(%rip), %xmm0 call _log10 movsd %xmm0, -3968(%rbp) movsd LC2(%rip), %xmm0 call _log10 and this is a fragment of the assembler from the last Ada: movapd %xmm1, %xmm0 movsd %xmm1, -144(%rbp) addl $10, %ebx call _log10 movsd -144(%rbp), %xmm9 addsd -120(%rbp), %xmm0 addsd LC1(%rip), %xmm9 movsd %xmm0, -120(%rbp) movapd %xmm9, %xmm0 movsd %xmm9, -144(%rbp) call _log10 movsd -144(%rbp), %xmm8 addsd -120(%rbp), %xmm0 addsd LC1(%rip), %xmm8 movsd %xmm0, -120(%rbp) movapd %xmm8, %xmm0 movsd %xmm8, -144(%rbp) call _log10 at which point I have to leave it to the experts; why do the 7 Ada instructions take so much time compared to the 2 g++ instructions??? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 19:50 ` sjw @ 2010-02-16 16:50 ` Colin Paul Gloster 0 siblings, 0 replies; 21+ messages in thread From: Colin Paul Gloster @ 2010-02-16 16:50 UTC (permalink / raw) [-- Attachment #1: Type: TEXT/PLAIN, Size: 12325 bytes --] On Mon, 15 Feb 2010, S. J. W. posted: |---------------------------------------------------------------------------| |"[..] | |> | |> You see now what's happening. With the gnatn switch the | |> compiler is smart enough to call the Log just once, rather | |> than 10**6 times. | |> | |> If you remove the -gnatn or -gnatN switches, then it runs in | |> 0m0.024s again. | | | |The trouble is that that benchmark does something other than Colin's!" | |---------------------------------------------------------------------------| That is not the problem. The code which I posted at the beginning of this thread was not a means in itself, but was intended for timing performances of implementations of logarithm functions in the base of ten in a manner representative of real code which I use. The real code is not dedicated to calculating something approximately equal to 6.3E+08. I could have written 500 * 1_000_000 calls or 3.14 * 1000 calls or a single call. A single call might have been overwhelmed by overhead unrelated to the logarithm function. In the case of the C++ version when using a particular compilation switch, I failed in the task because the hardcoded arguments I provided resulted in a trivial and dramatic optimization which would not happen in the real code. While it is unfortunate for Ada code in general that Ada compilers fail to mimic this optimization of G++'s, that particular optimization would not benefit the usage of logarithms in the real code I mentioned. Dr. Jonathan Parker is free to pursue this problem in a subthread or with vendors. |---------------------------------------------------------------------------| |"This might be a more accurate translation: | | | |with Ada.Numerics.Generic_Elementary_Functions; | |with Text_IO; use Text_IO; | |procedure Log_Bench_0 is | | type Real is digits 15; | | package Math is new Ada.Numerics.Generic_Elementary_Functions | |(Real); | | use Math; | | Answer : Real := 0.0; | | Log_Base_10_Of_E : constant := 0.434_294_481_903_251_827_651_129; | |begin | | for I in 1 .. 1_000_000 loop | | declare | | X : Real := 0.1; | | begin | | for J in 1 .. 500 loop | | Answer := Answer + Log_Base_10_Of_E * Log (X); | | X := X + 0.1; | | end loop; | | end; | | end loop; | | Put (Real'Image(Answer)); | |end Log_Bench_0; | | | |I've tried inlining GNAT's implementation (GCC 4.5.0, x86_64-aqpple- | |darwin10.2.0) and even just calling up the C log10 routine using an | |inline. None was very impressive compared to the g++ result. | | | |Colin's time: 37s | |Jonathan's time (-O3 -ffast-math -gnatp): 16s | |Jonathan;s time (-O3 -ffast-math -gnatp -gnatN -funroll-loops): 14s | |Jonathan's time (same opts, but using C log10()): 11s" | |---------------------------------------------------------------------------| That ordering does not necessarily hold... GCC4.2.4... gnatmake -O3 -ffast-math -gnatp Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp time ./Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp 6.34086408606382E+08 real 0m14.328s user 0m14.329s sys 0m0.000s gnatmake -O3 -ffast-math -gnatp -gnatN -funroll-loops Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops time ./Log_Bench_0_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops 6.34086408606382E+08 real 0m14.346s user 0m14.341s sys 0m0.004s GCC4.4.3 (slower than GCC4.2.4 for this program)... gnatmake -O3 -ffast-math Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m14.713s user 0m14.689s sys 0m0.000s gnatmake -O3 -ffast-math -gnatp Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp 6.34086408606382E+08 real 0m14.691s user 0m14.693s sys 0m0.000s gnatmake -O3 -ffast-math -gnatp -gnatN -funroll-loops Log_Bench_0.adb -o Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops time ./Log_Bench_0_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops 6.34086408606382E+08 real 0m14.690s user 0m14.689s sys 0m0.000s |---------------------------------------------------------------------------| |"so we still have 3 orders of magnitude to go to get to the g++ result: | |0.02s | | | |This is my final version, with the inlined GNAT implementation too: | | | |with Ada.Numerics.Generic_Elementary_Functions; | |with System.Machine_Code; use System.Machine_Code; | |with Text_IO; use Text_IO; | |procedure Log_Bench is | | type Real is digits 15; | | package Math is new Ada.Numerics.Generic_Elementary_Functions | |(Real); | | use Math; | | Answer : Real := 0.0; | | Log_Base_10_Of_E : constant := 0.434_294_481_903_251_827_651_129; | | function LogM (X : Real) return Real; | | pragma Inline_Always (LogM); | | function LogM (X : Real) return Real is | | Result : Real; | | NL : constant String := ASCII.LF & ASCII.HT; | | begin | | Asm (Template => | | "fldln2 " & NL | | & "fxch " & NL | | & "fyl2x " & NL, | | Outputs => Real'Asm_Output ("=t", Result), | | Inputs => Real'Asm_Input ("0", X)); | | return Result; | | end LogM; | | function LogL (X : Real) return Real; | | pragma Import (C, LogL, "log10"); | |begin | | for I in 1 .. 1_000_000 loop | | declare | | X : Real := 0.1; | | begin | | for J in 1 .. 500 loop | |-- Answer := Answer + Log_Base_10_Of_E * LogM (X); | | Answer := Answer + LogL (X); | | X := X + 0.1; | | end loop; | | end; | | end loop; | | Put (Real'Image(Answer)); | |end Log_Bench; | | | |[..]" | |---------------------------------------------------------------------------| Not all of those switches would yield fair proxies for timings of logarithms in the real code which inspired this thread, but anyway... 64bit GCC4.2.4... gnatmake -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math 6.34086408606382E+08 real 0m34.497s user 0m34.494s sys 0m0.004s gnatmake -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp 6.34086408606382E+08 real 0m34.503s user 0m34.506s sys 0m0.000s gnatmake -gnatN -funroll-loops -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.2.4_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops 6.34086408606382E+08 real 0m34.547s user 0m34.546s sys 0m0.004s 64bit GCC4.4.3... gnatmake -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m34.257s user 0m34.258s sys 0m0.000s gnatmake -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp 6.34086408606382E+08 real 0m34.474s user 0m34.478s sys 0m0.000s gnatmake -gnatN -funroll-loops -gnatp -O3 -ffast-math Log_Bench.adb -o Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops -largs /lib/libm.so.6 time ./Log_Bench_compiled_by_GCC4.4.3_with_-ffast-math_and_-gnatp_and_-gnatN_and_-funroll-loops 6.34086408606382E+08 real 0m34.188s user 0m34.182s sys 0m0.004s ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 10:58 Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? Colin Paul Gloster 2010-02-15 13:02 ` John B. Matthews 2010-02-15 14:54 ` jonathan @ 2010-02-15 18:26 ` (see below) 2010-02-15 18:51 ` jonathan ` (2 more replies) 2010-02-15 23:04 ` Jeffrey R. Carter 2010-02-15 23:20 ` Randy Brukardt 4 siblings, 3 replies; 21+ messages in thread From: (see below) @ 2010-02-15 18:26 UTC (permalink / raw) On 15/02/2010 10:58, in article alpine.LNX.2.00.1002151055530.17315@Bluewhite64.example.net, "Colin Paul Gloster" <Colin_Paul_Gloster@ACM.org> wrote: > Of the two programs shown, the fastest C++ implementation on one test > platform took less than one millisecond and the fastest Ada > implementation took one minute and 31 seconds and 874 milliseconds on > the same platform. Both g++ and gnatmake were from the same > installation of GCC 4.1.2 20080704 (Red Hat 4.1.2-44). Is that 1 millisecond for 1e6 calls? This implies 1ns per call in C++. I find it incredible that a log function could be so fast. I think the loop body must be evaluated at compile-time in C++. On my system your Ada code gives: 6.34086408536266E+08 real 0m33.918s user 0m33.864s sys 0m0.025s And your original C++ code gives: 6.34086e+08 real 0m0.110s user 0m0.003s sys 0m0.003s But if I replace the C++ loop body by: for(int j=1; j<=500; ++j) answer += std::log10(j*0.100000000000000000000); It now gives: 6.34086e+08 real 0m18.112s user 0m18.082s sys 0m0.015s This less than twice as fast as the more generalized Ada code. The simpler inner loop: for(int j=1; j<=500; ++j) answer += j; gives: 1.2525e+11 real 0m0.677s user 0m0.614s sys 0m0.003s So the difference cannot be due to loop overhead. -- Bill Findlay <surname><forename> chez blueyonder.co.uk ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 18:26 ` (see below) @ 2010-02-15 18:51 ` jonathan 2010-02-15 20:00 ` sjw 2010-02-16 17:33 ` Colin Paul Gloster 2 siblings, 0 replies; 21+ messages in thread From: jonathan @ 2010-02-15 18:51 UTC (permalink / raw) On Feb 15, 6:26 pm, "(see below)" <yaldni...@blueyonder.co.uk> wrote: > On 15/02/2010 10:58, in article > alpine.LNX.2.00.1002151055530.17...@Bluewhite64.example.net, "Colin Paul > > Gloster" <Colin_Paul_Glos...@ACM.org> wrote: > > Of the two programs shown, the fastest C++ implementation on one test > > platform took less than one millisecond and the fastest Ada > > implementation took one minute and 31 seconds and 874 milliseconds on > > the same platform. Both g++ and gnatmake were from the same > > installation of GCC 4.1.2 20080704 (Red Hat 4.1.2-44). > > Is that 1 millisecond for 1e6 calls? This implies 1ns per call in C++. > I find it incredible that a log function could be so fast. > I think the loop body must be evaluated at compile-time in C++. > > On my system your Ada code gives: > > 6.34086408536266E+08 > > real 0m33.918s > user 0m33.864s > sys 0m0.025s > >... if I replace the C++ loop body by: > > for(int j=1; j<=500; ++j) > answer += std::log10(j*0.100000000000000000000); > It now gives: > > 6.34086e+08 > real 0m18.112s > user 0m18.082s > sys 0m0.015s > > Bill Findlay > <surname><forename> chez blueyonder.co.uk On my computer, Log_base_10_of_e * Log (y) is twice as fast as Log (y, Base => 10.0). Jonathan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 18:26 ` (see below) 2010-02-15 18:51 ` jonathan @ 2010-02-15 20:00 ` sjw 2010-02-15 21:17 ` jonathan 2010-02-16 17:33 ` Colin Paul Gloster 2 siblings, 1 reply; 21+ messages in thread From: sjw @ 2010-02-15 20:00 UTC (permalink / raw) On Feb 15, 6:26 pm, "(see below)" <yaldni...@blueyonder.co.uk> wrote: > On 15/02/2010 10:58, in article > alpine.LNX.2.00.1002151055530.17...@Bluewhite64.example.net, "Colin Paul > > Gloster" <Colin_Paul_Glos...@ACM.org> wrote: > > Of the two programs shown, the fastest C++ implementation on one test > > platform took less than one millisecond and the fastest Ada > > implementation took one minute and 31 seconds and 874 milliseconds on > > the same platform. Both g++ and gnatmake were from the same > > installation of GCC 4.1.2 20080704 (Red Hat 4.1.2-44). > > Is that 1 millisecond for 1e6 calls? This implies 1ns per call in C++. > I find it incredible that a log function could be so fast. > I think the loop body must be evaluated at compile-time in C++. I think it's a mixture of your point & Jonathan's; the loop body, with the 500 calls to log10(), is executed once, then the compiler smartly realizes that each iteration produces the same increment. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 20:00 ` sjw @ 2010-02-15 21:17 ` jonathan 2010-02-16 0:09 ` jonathan 0 siblings, 1 reply; 21+ messages in thread From: jonathan @ 2010-02-15 21:17 UTC (permalink / raw) On Feb 15, 8:00 pm, sjw <simon.j.wri...@mac.com> wrote: > On Feb 15, 6:26 pm, "(see below)" <yaldni...@blueyonder.co.uk> wrote: > > Is that 1 millisecond for 1e6 calls? This implies 1ns per call in C++. > > I find it incredible that a log function could be so fast. > > I think the loop body must be evaluated at compile-time in C++. > > I think it's a mixture of your point & Jonathan's; the loop body, with > the 500 calls to log10(), is executed once, then the compiler smartly > realizes that each iteration produces the same increment. Right ... there was never any mystery here ... that's why I thought a translation of the original bench wasn't necessary. The remaining puzzle is the smallish difference between the gnat Log and the gcc Log. I will (also) leave this to the experts. When I looked at this in the past, it has always been because gnat did the calculation to 18 significant figures (even if you use type Real is digits 15), gcc to 15. Jonathan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 21:17 ` jonathan @ 2010-02-16 0:09 ` jonathan 0 siblings, 0 replies; 21+ messages in thread From: jonathan @ 2010-02-16 0:09 UTC (permalink / raw) I said: >The remaining puzzle > is the smallish difference between the gnat Log and the gcc Log. I > will (also) leave this to the experts. When I looked at this in > the past it has always been because gnat did the calculation to 18 > significant figures (even if you use type > Real is digits 15), gcc to 15. Well, I can't resist .. its easy to demonstrate (on Intel cpu's). Let's do a benchmark of 10_000_000 base e Log's with 15 digits and then 18 digits. Must compile without -gnatn or -gnatN so that it doesn't optimize away the constant in the inner loop. I just used -O3. (Couldn't find anything faster.) with ada.numerics.generic_elementary_functions; with text_io; use text_io; procedure log_bench_2 is type Real is digits 15; --type Real is digits 18; package Math is new ada.numerics.generic_elementary_functions(Real); use Math; x, y : Real := 0.5; begin for i in 1 .. 10_000_000 loop x := Log (y); end loop; Put (Real'Image (x)); end log_bench_2; Using 15 digits I get: -6.93147180559945E-01 real 0m0.209s user 0m0.208s sys 0m0.000s Using 18 digits I get: -6.93147180559945309E-01 real 0m0.208s user 0m0.208s sys 0m0.000s So 18 digits are calculated just as fast as 15. Jonathan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 18:26 ` (see below) 2010-02-15 18:51 ` jonathan 2010-02-15 20:00 ` sjw @ 2010-02-16 17:33 ` Colin Paul Gloster 2010-02-24 10:07 ` Colin Paul Gloster 2 siblings, 1 reply; 21+ messages in thread From: Colin Paul Gloster @ 2010-02-16 17:33 UTC (permalink / raw) On Mon, 15 Feb 2010, William Findlay posted: |------------------------------------------------------------------------| |"On 15/02/2010 10:58, in article | |alpine.LNX.2.00.1002151055530.17315@Bluewhite64.example.net, "Colin Paul| |Gloster" <Colin_Paul_Gloster@ACM.org> wrote: | | | |> Of the two programs shown, the fastest C++ implementation on one test | |> platform took less than one millisecond and the fastest Ada | |> implementation took one minute and 31 seconds and 874 milliseconds on | |> the same platform. Both g++ and gnatmake were from the same | |> installation of GCC 4.1.2 20080704 (Red Hat 4.1.2-44). | | | |Is that 1 millisecond for 1e6 calls?" | |------------------------------------------------------------------------| No, that was less than one millisecond for 500 * 10**6 C++ calls. |------------------------------------------------------------------------| |" This implies 1ns per call in C++. | |I find it incredible that a log function could be so fast. | |I think the loop body must be evaluated at compile-time in C++." | |------------------------------------------------------------------------| The C++ compiler did manage to eliminate almost everything. |------------------------------------------------------------------------| |"On my system your Ada code gives: | | | |6.34086408536266E+08 | | | |real 0m33.918s | |user 0m33.864s | |sys 0m0.025s | | | |And your original C++ code gives: | | | |6.34086e+08 | |real 0m0.110s | |user 0m0.003s | |sys 0m0.003s | | | |But if I replace the C++ loop body by: | | | | for(int j=1; j<=500; ++j) | | answer += std::log10(j*0.100000000000000000000); | |It now gives: | | | |6.34086e+08 | |real 0m18.112s | |user 0m18.082s | |sys 0m0.015s | | | |This less than twice as fast as the more generalized Ada code. | | | |[..]" | |------------------------------------------------------------------------| Thank you for exposing this flaw in the C++ code. with Ada.Numerics.Generic_Elementary_Functions; with Interfaces.C; with Ada.Text_IO; procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop is answer : Interfaces.C.double := 0.0; package double_library is new Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); package double_output_library is new Ada.Text_IO.Float_IO(Interfaces.C.double); begin for I in 1 .. 1_000_000 loop for J in 1 .. 500 loop answer := Interfaces.C."+"( answer, double_library.log( Interfaces.C."*"( Interfaces.C.double(J), 0.100000000000000000000 ) , 10.0 ) ); end loop; end loop; double_output_library.Put(answer); end; gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m31.091s user 0m31.090s sys 0m0.004s time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m31.094s user 0m31.094s sys 0m0.004s gnatmake -O3 Logarithmic_Work_In_Ada_with_a_Findlay_loop.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_compiled_by_GCC4.4.3 6.34086408606382E+08 real 0m31.388s user 0m31.378s sys 0m0.008s g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086e+08 real 0m38.388s user 0m38.390s sys 0m0.000s time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086e+08 real 0m38.547s user 0m38.546s sys 0m0.000s g++ -O3 logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3 time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3 6.34086e+08 real 0m38.428s user 0m38.426s sys 0m0.004s with Ada.Numerics.Generic_Elementary_Functions; with Interfaces.C; with Ada.Text_IO; procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism is Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm : constant Interfaces.C.Double := 0.434_294_481_903_251_827_651_129; answer : Interfaces.C.double := 0.0; package double_library is new Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); package double_output_library is new Ada.Text_IO.Float_IO(Interfaces.C.double); begin for I in 1 .. 1_000_000 loop for J in 1 .. 500 loop answer := Interfaces.C."+" ( answer, Interfaces.C."*" ( double_library.Log ( Interfaces.C."*" ( Interfaces.C.double(J), 0.100000000000000000000 ) ) , Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm ) ); end loop; end loop; double_output_library.Put(answer); end; gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m14.434s user 0m14.433s sys 0m0.004s -bash bluewhite64 /home/Colin_Paul/logarithms $ -bash bluewhite64 /home/Colin_Paul/logarithms $ gnatmake -O3 Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3 time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3 6.34086408606382E+08 real 0m14.450s user 0m14.453s sys 0m0.000s ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-16 17:33 ` Colin Paul Gloster @ 2010-02-24 10:07 ` Colin Paul Gloster 0 siblings, 0 replies; 21+ messages in thread From: Colin Paul Gloster @ 2010-02-24 10:07 UTC (permalink / raw) On Tue, 16 Feb 2010, Colin Paul Gloster alleged: |------------------------------------------------------------------------------------------------------------------------| |"[..] | | | |with Ada.Numerics.Generic_Elementary_Functions; | |with Interfaces.C; | |with Ada.Text_IO; | |procedure Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism is | | Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm : constant | |Interfaces.C.Double := 0.434_294_481_903_251_827_651_129; | | | | answer : Interfaces.C.double := 0.0; | | package double_library is new | |Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); | | package double_output_library is new | |Ada.Text_IO.Float_IO(Interfaces.C.double); | |begin | | | | for I in 1 .. 1_000_000 loop | | for J in 1 .. 500 loop | | answer := Interfaces.C."+" | | ( | | answer, Interfaces.C."*" | | ( | | double_library.Log | | ( | | Interfaces.C."*" | | ( | | Interfaces.C.double(J),| | 0.100000000000000000000| | ) | | ) | | , | | Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm | | ) | | ); | | end loop; | | end loop; | | | | double_output_library.Put(answer); | |end; | | | |[..]" | |------------------------------------------------------------------------------------------------------------------------| Actually this is not a GNATism. I have noticed that "*"(Left=>variable, Right=>Log_Base_10_Of_The_Base_Of_The_Natural_Logarithm) is faster than log(X=>variable, Base=>10.0) on a number of other compilers. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 10:58 Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? Colin Paul Gloster ` (2 preceding siblings ...) 2010-02-15 18:26 ` (see below) @ 2010-02-15 23:04 ` Jeffrey R. Carter 2010-02-16 14:54 ` Colin Paul Gloster 2010-02-15 23:20 ` Randy Brukardt 4 siblings, 1 reply; 21+ messages in thread From: Jeffrey R. Carter @ 2010-02-15 23:04 UTC (permalink / raw) Colin Paul Gloster wrote: > with Ada.Numerics.Generic_Elementary_Functions; > with Interfaces.C; > with Ada.Text_IO; > procedure Logarithmic_Work_In_Ada is > answer : Interfaces.C.double := 0.0; > package double_library is new > Ada.Numerics.Generic_Elementary_Functions(Interfaces.C.double); > package double_output_library is new > Ada.Text_IO.Float_IO(Interfaces.C.double); > begin > > for I in 1 .. 1_000_000 loop --I would not have this loop but one > compiler crashed if > --the source file was sized a few > megabytes. > > answer := Interfaces.C."+"(answer, > double_library.log(0.10000000000000000000 > , 10.0 )); > --Lines produced by > --for((i=1; i<=500; ++i)) do echo -n 'answer := Interfaces.C."+"(answer, > double_library.log(0' ; echo "$i/10" | bc -l ; echo ', 10.0 ));'; done > answer := Interfaces.C."+"(answer, > double_library.log(050.00000000000000000000 > , 10.0 )); > > > end loop; > > double_output_library.Put(answer); > end; IIUC, your Ada program is equivalent to with Ada.Numerics.Generic_Elementary_Functions; with Ada.Text_IO; with Interfaces.C; procedure Optimization_Test is subtype Double is Interfaces.C.Double; use type Double; package Math is new Ada.Numerics.Generic_Elementary_Functions (Double); Answer : Double := 0.0; begin -- Optimization_Test for I in 1 .. 1_000_000 loop Answer := Answer + Math.Log (0.1, 10.0); for J in 1 .. 500 loop Answer := Answer + Math.Log (Double (J) / 10.0, 10.0); end loop; Answer := Answer + Math.Log (50.0, 10.0); end loop; Ada.Text_IO.Put (Answer'Img); end Optimization_Test; On my machine I get something like jrcarter@jrcarter-acer-laptop:~/Code$ gnatmake -O2 -gnatnp optimization_test.adb gcc-4.3 -c -O2 -gnatnp optimization_test.adb gnatbind -x optimization_test.ali gnatlink optimization_test.ali jrcarter@jrcarter-acer-laptop:~/Code$ time ./optimization_test 6.34785378608078E+08 real 1m33.817s user 1m33.810s sys 0m0.000s Note that suppressing runtime checks (-gnatp) is needed to be sort of equivalent to C++. A good compiler could recognize that this adds a constant amount to Answer each time through the outer loop, and the outer loop is run a static constant number of times, and might optimize it to something like with Ada.Numerics.Generic_Elementary_Functions; with Ada.Text_IO; with Interfaces.C; procedure Optimization_Test is subtype Double is Interfaces.C.Double; use type Double; package Math is new Ada.Numerics.Generic_Elementary_Functions (Double); function Inner_Loop return Double is Result : Double := 0.0; begin -- Inner_Loop for J in 1 .. 500 loop Result := Result + Math.Log (Double (J) / 10.0, 10.0); end loop; return Result; end Inner_Loop; pragma Inline (Inner_Loop); Answer : constant Double := 1.0E6 * (Math.Log (0.1, 10.0) + Inner_Loop + Math.Log (50.0, 10.0) ); begin -- Optimization_Test Ada.Text_IO.Put (Answer'Img); end Optimization_Test; This, on my machine and with the same options, gives something like jrcarter@jrcarter-acer-laptop:~/Code$ time ./optimization_test 6.34785378539470E+08 real 0m0.009s user 0m0.000s sys 0m0.010s I suspect this optimization is what the C++ compiler is doing, and the Ada compilers are not. Whether a similar optimization may be applied to the calculations in your real program of interest is another question. -- Jeff Carter "Friends don't let friends program in C++." Ludovic Brenta 114 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 23:04 ` Jeffrey R. Carter @ 2010-02-16 14:54 ` Colin Paul Gloster 2010-02-16 15:24 ` Colin Paul Gloster 2010-02-16 19:01 ` Jeffrey R. Carter 0 siblings, 2 replies; 21+ messages in thread From: Colin Paul Gloster @ 2010-02-16 14:54 UTC (permalink / raw) On Mon, 15 Feb 2010, Jeffrey R. Carter sent: |--------------------------------------------------------------------------------| |"[..] | | | |IIUC, your Ada program is equivalent to | | | |with Ada.Numerics.Generic_Elementary_Functions; | |with Ada.Text_IO; | | | |with Interfaces.C; | | | |procedure Optimization_Test is | | subtype Double is Interfaces.C.Double; | | use type Double; | | | | package Math is new Ada.Numerics.Generic_Elementary_Functions (Double); | | | | Answer : Double := 0.0; | |begin -- Optimization_Test | | for I in 1 .. 1_000_000 loop | | Answer := Answer + Math.Log (0.1, 10.0); | | | | for J in 1 .. 500 loop | | Answer := Answer + Math.Log (Double (J) / 10.0, 10.0);" | |--------------------------------------------------------------------------------| Aside from not encouraging division instructions when not necessary, as I mentioned on HTTP://Programmer.97Things.OReilly.com/wiki/index.php/Execution_Speed_versus_Maintenance_Effort , I wanted to not have too high overhead obscuring the focus of measuring speeds of log implementations. |--------------------------------------------------------------------------------| |" end loop; | | | | Answer := Answer + Math.Log (50.0, 10.0); | | end loop; | | | | Ada.Text_IO.Put (Answer'Img); | |end Optimization_Test; | | | |On my machine I get something like | | | |jrcarter@jrcarter-acer-laptop:~/Code$ gnatmake -O2 -gnatnp optimization_test.adb| |gcc-4.3 -c -O2 -gnatnp optimization_test.adb | |gnatbind -x optimization_test.ali | |gnatlink optimization_test.ali | |jrcarter@jrcarter-acer-laptop:~/Code$ time ./optimization_test | | 6.34785378608078E+08 | | | |real 1m33.817s | |user 1m33.810s | |sys 0m0.000s" | |--------------------------------------------------------------------------------| time ./Optimization_Test_with_many_assignments_to_Answer_compiled_by_GCC4.4.3_with_-ffast-math 6.34785378608078E+08 real 0m34.319s user 0m34.318s sys 0m0.004s here. |---------------------------------------------------------------------------------| |"Note that suppressing runtime checks (-gnatp) is needed to be sort of equivalent| |to C++." | |---------------------------------------------------------------------------------| Thanks for the tip, but I do not program in Ada to really program in C++ with Ada syntax. |--------------------------------------------------------------------------------| |"A good compiler could recognize that this adds a constant amount to Answer each| |time through the outer loop, and the outer loop is run a static constant number | |of times, and might optimize it to something like | | | |with Ada.Numerics.Generic_Elementary_Functions; | |with Ada.Text_IO; | | | |with Interfaces.C; | | | |procedure Optimization_Test is | | subtype Double is Interfaces.C.Double; | | use type Double; | | | | package Math is new Ada.Numerics.Generic_Elementary_Functions (Double); | | | | function Inner_Loop return Double is | | Result : Double := 0.0; | | begin -- Inner_Loop | | for J in 1 .. 500 loop | | Result := Result + Math.Log (Double (J) / 10.0, 10.0); | | end loop; | | | | return Result; | | end Inner_Loop; | | pragma Inline (Inner_Loop); | | | | Answer : constant Double := | | 1.0E6 * (Math.Log (0.1, 10.0) + Inner_Loop + Math.Log (50.0, 10.0) ); | |begin -- Optimization_Test | | Ada.Text_IO.Put (Answer'Img); | |end Optimization_Test; | | | |This, on my machine and with the same options, gives something like | | | |jrcarter@jrcarter-acer-laptop:~/Code$ time ./optimization_test | |6.34785378539470E+08 | | | |real 0m0.009s | |user 0m0.000s | |sys 0m0.010s" | |--------------------------------------------------------------------------------| time ./Optimization_Test_with_a_single_assignment_to_Answer_compiled_by_GCC4.4.3_with_-ffast-math 6.34785378539470E+08 real 0m0.002s user 0m0.000s sys 0m0.008s here. |--------------------------------------------------------------------------------| |"I suspect this optimization is what the C++ compiler is doing, and the Ada | |compilers are not." | |--------------------------------------------------------------------------------| Actually, a C++ compiler optimized out calls to std::log10(). |--------------------------------------------------------------------------------| |"Whether a similar optimization may be applied to the calculations in your real | |program of interest is another question." | |--------------------------------------------------------------------------------| Such an optimization can not help the log10 calls like that in the real code because they are spread out in conditional statements between hundreds of other statements instead of being an analytically simple sequence of millions of calls to log, as per the following... G4double G4LogLogInterpolation::Calculate(G4double x, G4int bin, const G4DataVector& points, const G4DataVector& data) const { G4int nBins = data.size() - 1; G4double value = 0.; if (x < points[0]) { value = 0.; } else if (bin < nBins) { /*..*/ value = (std::log10(d1)*std::log10(e2/x) + std::log10(d2)*std::log10(x/e1)) /*..*/ /*..*/ } and G4double G4SemiLogInterpolation::Calculate(G4double x, G4int bin, const G4DataVector& points, const G4DataVector& data) const { G4int nBins = data.size() - 1; /*C++ programmers have a way of replicating statements across functions.*/ G4double value = 0.; if (x < points[0]) { value = 0.; } else if (bin < nBins) { /*..*/ value = (d1*std::log10(e2/x) + d2*std::log10(x/e1)) /*..*/ /*..*/ } and inline size_t G4PhysicsLogVector::FindBinLocation(G4double theEnergy) const { // For G4PhysicsLogVector, FindBinLocation is implemented using // a simple arithmetic calculation. // // Because this is a virtual function, it is accessed through a // pointer to the G4PhyiscsVector object for most usages. In this // case, 'inline' will not be invoked. However, there is a possibility // that the user access to the G4PhysicsLogVector object directly and // not through pointers or references. In this case, the 'inline' will // be invoked. (See R.B.Murray, "C++ Strategies and Tactics", Chap.6.6) return size_t( std::log10(theEnergy)//*Ugh, this division is unnecessary.*/dBin - baseBin ); } and G4double G4ProtonInelasticCrossSection:: GetCrossSection(G4double kineticEnergy, G4double atomicNumber, G4double nOfProtons) { /*..*/ if(nOfNeutrons>1.5) fac2=std::log((nOfNeutrons)); /*..*/ } and G4double G4LinLogLogInterpolation::Calculate(G4double x, G4int bin, const G4DataVector& points, const G4DataVector& data) const { G4int nBins = data.size() - 1; G4double value = 0.; if (x < points[0]) { value = 0.; } else if (bin < nBins) { /*..*/ /*C++ programmers have a way of replicating statements across functions...*/ if(d1 > 0.0 && d2 > 0.0) { /*... and not remembering to include checks for mathematical well-definededness in each code clone nor indeed for each call to std::log10() in each statement guarded by two checks for mathematical well-definededness.*/ value = (std::log10(d1)*std::log10(e2/x) + std::log10(d2)*std::log10(x/e1)) /*..*/ /*..*/ } ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-16 14:54 ` Colin Paul Gloster @ 2010-02-16 15:24 ` Colin Paul Gloster 2010-02-16 19:01 ` Jeffrey R. Carter 1 sibling, 0 replies; 21+ messages in thread From: Colin Paul Gloster @ 2010-02-16 15:24 UTC (permalink / raw) On Tue, 16 Feb 2010, Colin Paul Gloster alleged: |-------------------------------------------------------------------------------------------------| |"[..] | | | |time ./Optimization_Test_with_many_assignments_to_Answer_compiled_by_GCC4.4.3_with_-ffast-math | | 6.34785378608078E+08 | | | |real 0m34.319s | |user 0m34.318s | |sys 0m0.004s | |here. | | | |[..] | | | |time ./Optimization_Test_with_a_single_assignment_to_Answer_compiled_by_GCC4.4.3_with_-ffast-math| | 6.34785378539470E+08 | | | |real 0m0.002s | |user 0m0.000s | |sys 0m0.008s | |here. | | | |[..]" | |-------------------------------------------------------------------------------------------------| I apologize, those two runs were actually with GCC4.2.4. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-16 14:54 ` Colin Paul Gloster 2010-02-16 15:24 ` Colin Paul Gloster @ 2010-02-16 19:01 ` Jeffrey R. Carter 2010-02-17 10:25 ` Colin Paul Gloster 1 sibling, 1 reply; 21+ messages in thread From: Jeffrey R. Carter @ 2010-02-16 19:01 UTC (permalink / raw) Colin Paul Gloster wrote: > > |---------------------------------------------------------------------------------| > |"Note that suppressing runtime checks (-gnatp) is needed to be sort of equivalent| > |to C++." | > |---------------------------------------------------------------------------------| > > Thanks for the tip, but I do not program in Ada to really program in > C++ with Ada syntax. I would hope not. But when comparing execution times between Ada and a language like C++, it's important not to try to compare apples to lugnuts. -- Jeff Carter "I don't know why I ever come in here. The flies get the best of everything." Never Give a Sucker an Even Break 102 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-16 19:01 ` Jeffrey R. Carter @ 2010-02-17 10:25 ` Colin Paul Gloster 0 siblings, 0 replies; 21+ messages in thread From: Colin Paul Gloster @ 2010-02-17 10:25 UTC (permalink / raw) On Tue, 16 Feb 2010, Jeffrey R. Carter sent: |------------------------------------------------------------------| |"[..] | | | |[..] when comparing execution times between Ada and a language | |like C++, it's important not to try to compare apples to lugnuts."| |------------------------------------------------------------------| Fair enough, but when I say Ada is better than C++ I am not comparing an apple with an apple. Anyway, as I mentioned in news:alpine.LNX.2.00.1002161654110.21651@Bluewhite64.example.net in response to Bill Findlay, G++ has produced much slower code than GNAT (the GNATism is in standard Ada, merely the obvious way to do it in standard Ada is different)... gnatmake -O3 -ffast-math Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math.adb -o Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math time ./Logarithmic_Work_In_Ada_with_a_Findlay_loop_with_a_Parker_GNATism_compiled_by_GCC4.4.3_with_-ffast-math 6.34086408606382E+08 real 0m14.434s user 0m14.433s sys 0m0.004s g++ -O3 -ffast-math logarithmic_work_in_CPlusPlus_with_a_Findlay_loop.cc -o logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math time ./logarithmic_work_in_CPlusPlus_with_a_Findlay_loop_compiled_by_GCC4.4.3_with_-ffast-math 6.34086e+08 real 0m38.388s user 0m38.390s sys 0m0.000s ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? 2010-02-15 10:58 Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? Colin Paul Gloster ` (3 preceding siblings ...) 2010-02-15 23:04 ` Jeffrey R. Carter @ 2010-02-15 23:20 ` Randy Brukardt 4 siblings, 0 replies; 21+ messages in thread From: Randy Brukardt @ 2010-02-15 23:20 UTC (permalink / raw) "Colin Paul Gloster" <Colin_Paul_Gloster@ACM.org> wrote in message news:alpine.LNX.2.00.1002151055530.17315@Bluewhite64.example.net... > I have been improving a program by suppressing C++ in it. After > speeding it up a lot by making changes, I have found one considerable > part which calls a library routine, which is unfortunately very slow > as provided as standard with a number of Ada compilers but which is > very fast with implementations of other languages. For some Ada > compilers perhaps I shall simply need to not rely on an implementation > of this routine by an Ada vendor. Perhaps this post will motivate Ada > vendors to speed up their implementations or for people to report > timings for efficient implementations which are used by default by > other Ada compilers. I doubt it's possible in general. Ada has strong accuracy requirements on the implementation of Generic_Elementary_Functions, while other languages have little to say. The result is that it isn't possible for Ada to be as fast as possible because those techniques don't have the required accuracy. I haven't worked on this recently, but back in the day, the old 8087 hardware instructions didn't have the required accuracy for Sine and Tangent, so a software solution had to be used (which of course was a lot slower). (Yes, compilers can and do provide faster implementations of these functions that don't meet the accuracy requirements, but those solutions aren't portable in general.) This is a major difference between Ada and most other programming languages: Ada defines error bounds for all numeric operations, so that it's possible to perform useful numeric analysis on an Ada program without knowing any at all about the target hardware. There is of course a cost to that capability. But it is an important capability which rarely is taken advantage of : as noted in a previous thread, without numerical analysis, you have no idea whether the results you are getting has any significance or not. And "not" is far more likely that people like to admit. Randy. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2010-02-24 10:07 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-02-15 10:58 Is there an Ada compiler whose Ada.Numerics.Generic_Elementary_Functions.Log(Base=>10, X=>variable) is efficient? Colin Paul Gloster 2010-02-15 13:02 ` John B. Matthews 2010-02-15 14:17 ` Colin Paul Gloster 2010-02-15 17:19 ` John B. Matthews 2010-02-15 14:54 ` jonathan 2010-02-15 15:04 ` jonathan 2010-02-15 19:50 ` sjw 2010-02-16 16:50 ` Colin Paul Gloster 2010-02-15 18:26 ` (see below) 2010-02-15 18:51 ` jonathan 2010-02-15 20:00 ` sjw 2010-02-15 21:17 ` jonathan 2010-02-16 0:09 ` jonathan 2010-02-16 17:33 ` Colin Paul Gloster 2010-02-24 10:07 ` Colin Paul Gloster 2010-02-15 23:04 ` Jeffrey R. Carter 2010-02-16 14:54 ` Colin Paul Gloster 2010-02-16 15:24 ` Colin Paul Gloster 2010-02-16 19:01 ` Jeffrey R. Carter 2010-02-17 10:25 ` Colin Paul Gloster 2010-02-15 23:20 ` Randy Brukardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox