From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,e0423f8984d47f76 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-10-10 10:15:47 PST Path: archiver1.google.com!news2.google.com!news.maxwell.syr.edu!wn14feed!wn13feed!worldnet.att.net!204.127.198.203!attbi_feed3!attbi.com!rwcrnsc54.POSTED!not-for-mail Message-ID: <3F86E923.7050409@comcast.net> From: "Robert I. Eachus" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01 X-Accept-Language: en-us, en MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Problems converting Float to Integer efficiently References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit NNTP-Posting-Host: 24.34.139.183 X-Complaints-To: abuse@comcast.net X-Trace: rwcrnsc54 1065806147 24.34.139.183 (Fri, 10 Oct 2003 17:15:47 GMT) NNTP-Posting-Date: Fri, 10 Oct 2003 17:15:47 GMT Organization: Comcast Online Date: Fri, 10 Oct 2003 17:15:47 GMT Xref: archiver1.google.com comp.lang.ada:627 Date: 2003-10-10T17:15:47+00:00 List-Id: Dr. Adrian Wrigley wrote: > Jeff C, wrote: > >> If you can use a more up to date gcc (a 3.X series) and have a pentium 4 >> (maybe ok on III) >> then you can try this little thing I made > > ... > > Asm ("cvttss2si %1, %0",inputs => float'asm_input("m", y), > > Outputs => integer'asm_output("=r",temp)); > > This works great! > > I have had no problems on my Athlon (1100MHz clock speed) and GNAT 3.15p > > My experiments show the following: > -- X is integer, Y is float > X := X + Truncate(Y); -- 0.029us 1x > X := X + Integer (Y); -- 0.360us 12x > X := X + Integer (Float'Floor (Y)); -- 1.196us 41x > > This is timed in a tight loop. Optimisation "-O3", Checks suppressed, > inlined where possible. > I checked carefully that the "right" code was being generated, and the > test was fair. > These lines are not exactly equivalent in function - I wanted "floor", > but "Truncate" is OK > > As you can see, the "obvious" Ada code for converting to integer is at > least 12x slower > than Jeff's version. This has the potential to be quite significant in > my code which > implements interpolation of five dimensional arrays for function > approximation. > > I also had a look at the "stereopsis" and "gems" web site. It seems the > deficiency of compiler in > getting integers from floats has caused other people difficulties too. > I'm slightly worried that > the "Truncate" function will fail if the FPU/SSE mode is changed. I'll > try to cross that hurdle > if I encounter it. > > Quoting the stereopsis site : "I've seen quite well-written code get 8 > TIMES FASTER after > fixing conversion problems. Fix yours! And here's how.." > > This whole situation illustrates one of the things which frustrates me > about coding in Ada/GNAT: performance and reliability of Ada code > is often very good, but you need to be eternally > vigilant that you are getting out the code that you think you should be > getting. The speed of the compiled code is sometimes critically > dependent on apparently inocuous choices. > Things like enumerated types, use of generics under certain > circumstances etc. can cause apparently good source code to crawl. > Is this a necessary price for ditching low-level HLLs > like 'C'? Perhaps someone should create a web site highlighting the > problem areas and explaining the solutions? Volunteers anybody? > > Thanks guys for the help on this! I think that is just the nature of designing, building, and using high-level languages (including in this case C) and compilers. If the hardware architecture is well designed for what you want to do, fine. If it isn't, you run into issues like this. The compiler has to deal with choosing the correct implementation in for every different case, and since the program space is infinite, you can't exhaustively test the compiler. All you can do--and it is well worth doing--is ensure that efficient code is generated in most cases, and correct code in all cases. The discussion of square root functions going on in another thread is a case in point. The Newton-Rhapson approach has all kinds of nice mathematical properties, but it uses divide, and on many modern processors divide is painfully slow. There are ways to take advantage of the convergence properties of the algorithm. For example use single-precision divide for all but the last two iterations. However, on most hardware a square root routine that uses add, subtract compare and shifts is much faster. Of course the best solution is to use the hardware square root that is part of the IEEE math library--if it is present. And that gets back to the original problem. Using compiler flags to tell the compiler which particular CPU chip you are targetting is very worthwhile. For example, you should see almost a factor of two speedup in floating-point code on a Pentium 4 if you tell the compiler to use SSE2. But on the new Athlon64 from AMD, you can use SSE2 code if you want in 32-bit mode, but it will be no faster--and no slower--than x87 code. Incidently, the 64-bit long mode on the Athlon64 allows SSE2 floating point code only. Since long mode programs on the Athlon64 can call 32-bit mode dlls, and these can use the x87 IEEE floating-point arithmetic, you can see that this has created a whole new bunch of fun for compiler optimizers--and it has nothing to do with HLLs as such. -- Robert I. Eachus "Quality is the Buddha. Quality is scientific reality. Quality is the goal of Art. It remains to work these concepts into a practical, down-to-earth context, and for this there is nothing more practical or down-to-earth than what I have been talking about all along...the repair of an old motorcycle." -- from Zen and the Art of Motorcycle Maintenance by Robert Pirsig