From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,fedc2d05e82c9174 X-Google-Attributes: gid103376,public From: "Robert I. Eachus" Subject: Re: Calculating SQRT in ADA Date: 1999/04/02 Message-ID: <3705555E.5572A782@mitre.org>#1/1 X-Deja-AN: 462108094 Content-Transfer-Encoding: 7bit References: <7dbv6t$4u5$1@nnrp1.dejanews.com> <19990324201959.00800.00000708@ngol04.aol.com> <7dei9a$dvo$1@nnrp1.dejanews.com> <7dhjhi$27a$1@nnrp1.dejanews.com> <36FFF83A.BE789C93@mitre.org> <7dq5b2$2dk$1@nnrp1.dejanews.com> X-Accept-Language: en Content-Type: text/plain; charset=us-ascii Organization: The MITRE Corporation Mime-Version: 1.0 Newsgroups: comp.lang.ada Date: 1999-04-02T00:00:00+00:00 List-Id: robert_dewar@my-dejanews.com wrote: > I am not sure what this refers to, what "hardware" IEEE > instructions are you referring to. Certainly IEEE does not > include elementary functions except for sqrt, and this is > of course NOT hardware on most machines. News to me. A lot of the current processor architectures emulate some of the trigonmetric and trancendental functions with microcode or hardware traps to library routines, but there are still available members of these families that implement the instructions in hardware. For example, in the 68000 family, the earlier processor families had all the instructions in hardware in the 68881 and 68882 coprocessor chips. The 68040 implemented many floating point instructions in hardware and emulated others, in the 68060, almost all of the instructions other than the basic floating point operations are done in emulation libraries. > Sure, a sqrt in hardware can be as fast as a divide, since > a very similar algorithm can be used. But I challenge your > initial statement here. Please cough up code on a specific > machine to justify the statement that you can do a sqrt in > floating-point divide time. Well the first manual I grabbed off the shelf surprised me slightly: on the 68881, the floating point sqare root took two cycles more than a divide--out of about 130. (The exact number of cycles depends on register modes.) Of course the cost of loading the second operand for divide takes longer than two clocks--four to 40 depending on source and memory speed. The FSQRT has been part of the SPARC architecture since version 7, it is implemented in hardware on almost all chipsets. I don't have timing tables handy, but I have tested several SPARC processors where FSQRT is faster than FDIV. As above, the speed advantage comes from only having one operand more than anything. Loading FP registers, especially from memory, costs. (Of course, YMMV, but I was more concerned with cases where I was calculating for a large set of points, so the high speed caches didn't much effect the data loading.) For the integer case, a div.l takes about 90 clocks on a 68020, while the corresponding square root algorithm takes 16 iterations through a loop: L0: MOVE.L (operand),D5; TRAPMI ; Error if operand is negative MOVEQ #15,D4; MOVEQ #0,D6; MOVEQ #1,D7; L1: ROR #1,D6 ; First rotation has no effect. ROR #2,D7 ; First rotation results in ; #8000000 CMP.L D6,D5; BLT L2; ADD D7,D6; SUB D6,D5; L2: DBF D4,L1; I'm not sure I have this correct from memory, but it is close. The version I used unwound the loop, used a 64-bit operand, and did a BFFFO to skip leading zeros. I needed to do SQRT(X*X+Y*Y) fast, again for lots of points.