Re: Problems converting Float to Integer efficiently

comp.lang.ada
 help / color / mirror / Atom feed

From: "Robert I. Eachus" <rieachus@comcast.net>
Subject: Re: Problems converting Float to Integer efficiently
Date: Fri, 10 Oct 2003 17:15:47 GMT
Date: 2003-10-10T17:15:47+00:00	[thread overview]
Message-ID: <3F86E923.7050409@comcast.net> (raw)
In-Reply-To: Oplhb.9211$RU4.88029@newsfep4-glfd.server.ntli.net

Dr. Adrian Wrigley wrote:
> Jeff C, wrote:
> 
>> If you can use a more up to date gcc (a 3.X series) and have a pentium 4
>> (maybe ok on III)
>> then you can try this little thing I made
> 
> ...
>  > Asm ("cvttss2si %1, %0",inputs  => float'asm_input("m", y),
>  >                          Outputs => integer'asm_output("=r",temp));
> 
> This works great!
> 
> I have had no problems on my Athlon (1100MHz clock speed) and GNAT 3.15p
> 
> My experiments show the following:
> -- X is integer, Y is float
> X := X + Truncate(Y);               -- 0.029us    1x
> X := X + Integer (Y);               -- 0.360us   12x
> X := X + Integer (Float'Floor (Y)); -- 1.196us   41x
> 
> This is timed in a  tight loop.  Optimisation "-O3", Checks suppressed, 
> inlined where possible.
> I checked carefully that the "right" code was being generated, and the 
> test was fair.
> These lines are not exactly equivalent in function - I wanted "floor", 
> but "Truncate" is OK
> 
> As you can see, the "obvious" Ada code for converting to integer is at 
> least 12x slower
> than Jeff's version. This has the potential to be quite significant in 
> my code which
> implements interpolation of five dimensional arrays for function 
> approximation.
> 
> I also had a look at the "stereopsis" and "gems" web site.  It seems the 
> deficiency of compiler in
> getting integers from floats has caused other people difficulties too.  
> I'm slightly worried that
> the "Truncate" function will fail if the FPU/SSE mode is changed.  I'll 
> try to cross that hurdle
> if I encounter it.
> 
> Quoting the stereopsis site : "I've seen quite well-written code get 8 
> TIMES FASTER after
> fixing conversion problems. Fix yours! And here's how.."
> 
> This whole situation illustrates one of the things which frustrates me 
> about coding in Ada/GNAT: performance and reliability of Ada code 
> is often very good, but you need to be eternally
> vigilant that you are getting out the code that you think you should be 
> getting.  The speed of the compiled code is sometimes critically 
> dependent on apparently inocuous choices.

> Things like enumerated types, use of generics under certain
> circumstances etc. can cause apparently good source code to crawl.
> Is this a necessary price for ditching low-level HLLs > like 'C'?  Perhaps someone should create a web site highlighting the 
> problem areas and explaining the solutions? Volunteers anybody?
> 
> Thanks guys for the help on this!

I think that is just the nature of designing, building, and using 
high-level languages (including in this case C) and compilers.  If the 
hardware architecture is well designed for what you want to do, fine. 
If it isn't, you run into issues like this.  The compiler has to deal 
with choosing the correct implementation in for every different case, 
and since the program space is infinite, you can't exhaustively test the 
compiler.  All you can do--and it is well worth doing--is ensure that 
efficient code is generated in most cases, and correct code in all cases.

The discussion of square root functions going on in another thread is a 
case in point.  The Newton-Rhapson approach has all kinds of nice 
mathematical properties, but it uses divide, and on many modern 
processors divide is painfully slow.  There are ways to take advantage 
of the convergence properties of the algorithm.  For example use 
single-precision divide for all but the last two iterations.  However, 
on most hardware a square root routine that uses add, subtract compare 
and shifts is much faster.  Of course the best solution is to use the 
hardware square root that is part of the IEEE math library--if it is 
present.

And that gets back to the original problem.  Using compiler flags to 
tell the compiler which particular CPU chip you are targetting is very 
worthwhile.  For example, you should see almost a  factor of two speedup 
in floating-point code on a Pentium 4 if you tell the compiler to use 
SSE2.  But on the new Athlon64 from AMD, you can use SSE2 code if you 
want in 32-bit mode, but it will be no faster--and no slower--than x87 
code.

Incidently, the 64-bit long mode on the Athlon64 allows SSE2 floating 
point code only.  Since long mode programs on the Athlon64 can call 
32-bit mode dlls, and these can use the x87 IEEE floating-point 
arithmetic, you can see that this has created a whole new bunch of fun 
for compiler optimizers--and it has nothing to do with HLLs as such.

-- 
                                          Robert I. Eachus

"Quality is the Buddha. Quality is scientific reality. Quality is the 
goal of Art. It remains to work these concepts into a practical, 
down-to-earth context, and for this there is nothing more practical or 
down-to-earth than what I have been talking about all along...the repair 
of an old motorcycle."  -- from Zen and the Art of Motorcycle 
Maintenance by Robert Pirsig

next prev parent reply	other threads:[~2003-10-10 17:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-09  0:06 Problems converting Float to Integer efficiently Dr. Adrian Wrigley
2003-10-09  1:08 ` Jeffrey Carter
2003-10-09  2:36 ` Jeff C,
2003-10-09  3:21   ` Dr. Adrian Wrigley
2003-10-09  3:36     ` Jeff C,
2003-10-17 20:57     ` Randy Brukardt
2003-10-09 22:36   ` Dr. Adrian Wrigley
2003-10-10  2:05     ` Jeff C,
2003-10-10 17:15     ` Robert I. Eachus [this message]
2003-10-11  1:47     ` Waldek Hebisch
2003-10-09  7:10 ` Robert I. Eachus

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox