From: "Robert I. Eachus" <rieachus@comcast.net>
Subject: Re: Problems converting Float to Integer efficiently
Date: Fri, 10 Oct 2003 17:15:47 GMT
Date: 2003-10-10T17:15:47+00:00 [thread overview]
Message-ID: <3F86E923.7050409@comcast.net> (raw)
In-Reply-To: Oplhb.9211$RU4.88029@newsfep4-glfd.server.ntli.net
Dr. Adrian Wrigley wrote:
> Jeff C, wrote:
>
>> If you can use a more up to date gcc (a 3.X series) and have a pentium 4
>> (maybe ok on III)
>> then you can try this little thing I made
>
> ...
> > Asm ("cvttss2si %1, %0",inputs => float'asm_input("m", y),
> > Outputs => integer'asm_output("=r",temp));
>
> This works great!
>
> I have had no problems on my Athlon (1100MHz clock speed) and GNAT 3.15p
>
> My experiments show the following:
> -- X is integer, Y is float
> X := X + Truncate(Y); -- 0.029us 1x
> X := X + Integer (Y); -- 0.360us 12x
> X := X + Integer (Float'Floor (Y)); -- 1.196us 41x
>
> This is timed in a tight loop. Optimisation "-O3", Checks suppressed,
> inlined where possible.
> I checked carefully that the "right" code was being generated, and the
> test was fair.
> These lines are not exactly equivalent in function - I wanted "floor",
> but "Truncate" is OK
>
> As you can see, the "obvious" Ada code for converting to integer is at
> least 12x slower
> than Jeff's version. This has the potential to be quite significant in
> my code which
> implements interpolation of five dimensional arrays for function
> approximation.
>
> I also had a look at the "stereopsis" and "gems" web site. It seems the
> deficiency of compiler in
> getting integers from floats has caused other people difficulties too.
> I'm slightly worried that
> the "Truncate" function will fail if the FPU/SSE mode is changed. I'll
> try to cross that hurdle
> if I encounter it.
>
> Quoting the stereopsis site : "I've seen quite well-written code get 8
> TIMES FASTER after
> fixing conversion problems. Fix yours! And here's how.."
>
> This whole situation illustrates one of the things which frustrates me
> about coding in Ada/GNAT: performance and reliability of Ada code
> is often very good, but you need to be eternally
> vigilant that you are getting out the code that you think you should be
> getting. The speed of the compiled code is sometimes critically
> dependent on apparently inocuous choices.
> Things like enumerated types, use of generics under certain
> circumstances etc. can cause apparently good source code to crawl.
> Is this a necessary price for ditching low-level HLLs > like 'C'? Perhaps someone should create a web site highlighting the
> problem areas and explaining the solutions? Volunteers anybody?
>
> Thanks guys for the help on this!
I think that is just the nature of designing, building, and using
high-level languages (including in this case C) and compilers. If the
hardware architecture is well designed for what you want to do, fine.
If it isn't, you run into issues like this. The compiler has to deal
with choosing the correct implementation in for every different case,
and since the program space is infinite, you can't exhaustively test the
compiler. All you can do--and it is well worth doing--is ensure that
efficient code is generated in most cases, and correct code in all cases.
The discussion of square root functions going on in another thread is a
case in point. The Newton-Rhapson approach has all kinds of nice
mathematical properties, but it uses divide, and on many modern
processors divide is painfully slow. There are ways to take advantage
of the convergence properties of the algorithm. For example use
single-precision divide for all but the last two iterations. However,
on most hardware a square root routine that uses add, subtract compare
and shifts is much faster. Of course the best solution is to use the
hardware square root that is part of the IEEE math library--if it is
present.
And that gets back to the original problem. Using compiler flags to
tell the compiler which particular CPU chip you are targetting is very
worthwhile. For example, you should see almost a factor of two speedup
in floating-point code on a Pentium 4 if you tell the compiler to use
SSE2. But on the new Athlon64 from AMD, you can use SSE2 code if you
want in 32-bit mode, but it will be no faster--and no slower--than x87
code.
Incidently, the 64-bit long mode on the Athlon64 allows SSE2 floating
point code only. Since long mode programs on the Athlon64 can call
32-bit mode dlls, and these can use the x87 IEEE floating-point
arithmetic, you can see that this has created a whole new bunch of fun
for compiler optimizers--and it has nothing to do with HLLs as such.
--
Robert I. Eachus
"Quality is the Buddha. Quality is scientific reality. Quality is the
goal of Art. It remains to work these concepts into a practical,
down-to-earth context, and for this there is nothing more practical or
down-to-earth than what I have been talking about all along...the repair
of an old motorcycle." -- from Zen and the Art of Motorcycle
Maintenance by Robert Pirsig
next prev parent reply other threads:[~2003-10-10 17:15 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-09 0:06 Problems converting Float to Integer efficiently Dr. Adrian Wrigley
2003-10-09 1:08 ` Jeffrey Carter
2003-10-09 2:36 ` Jeff C,
2003-10-09 3:21 ` Dr. Adrian Wrigley
2003-10-09 3:36 ` Jeff C,
2003-10-17 20:57 ` Randy Brukardt
2003-10-09 22:36 ` Dr. Adrian Wrigley
2003-10-10 2:05 ` Jeff C,
2003-10-10 17:15 ` Robert I. Eachus [this message]
2003-10-11 1:47 ` Waldek Hebisch
2003-10-09 7:10 ` Robert I. Eachus
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox