From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 10f6aa,aff4e00598e7fdf3,start X-Google-Attributes: gid10f6aa,public X-Google-Thread: 109fba,baaf5f793d03d420 X-Google-Attributes: gid109fba,public X-Google-Thread: fc89c,97188312486d4578 X-Google-Attributes: gidfc89c,public X-Google-Thread: 103376,97188312486d4578 X-Google-Attributes: gid103376,public X-Google-Thread: 1014db,6154de2e240de72a X-Google-Attributes: gid1014db,public From: z007400b@bcfreenet.seflin.lib.fl.us (Ralph Silverman) Subject: Re: (topic change on) Teaching sorts Date: 1996/08/27 Message-ID: <4vv0ab$r3i@nntp.seflin.lib.fl.us>#1/1 X-Deja-AN: 176776105 references: <01bb8f1b$ce59c820$32ee6fce@timhome2> <4vfk6b$i6h@krusty.irvine.com> <321C7A2F.49A6@nashville.net> organization: SEFLIN Free-Net - Broward newsgroups: comp.lang.c,comp.lang.c++,comp.unix.programmer,comp.lang.ada,comp.os.msdos.programmer Date: 1996-08-27T00:00:00+00:00 List-Id: Marcus H. Mendenhall (mendenmh@nashville.net) wrote: : Christian Bau wrote: : -> On a real computer (PowerMac, no virtual memory, no background : processes, : -> nothing that would interfere with execution time), the _number of : -> instructions per second_ did reproducably vary by a factor up to : _seven_ : -> when going from n to n+1 (for example, case n = 128 took seven times : -> longer than cases n = 127 and n = 129). So for this computer, and : this : Isn't cacheing fun? I have observed many bizarre effects on the : PowerMacs when one is doing work which involves thrashing memory (FFT's, : matrix multiplies, etc.). : In effect, one can usually assume that the total number of cpu cycles : actully used for floating point arithmetic in these cases is 0. : Counting real memory hits due to cache reloads gives a much more : accurate measure of time. : In the case of testing your matrix multiply, you could use a trick I did : to investigate timing for FFT's: I took out all pointer increments from : the loop, so that the algorithm proceeded as usual, but carried out all : its operations on the same few bytes of memory. It yields nonsense for : the result, but gives an idea of how many cpu cycles are spent on : everything except fetching. It is sometimes quite shocking (> factor of : 10) the speed increase. : In your case, with the problem at 128 elements, i suspect this was : because of the way the PowerPC chips (some of them at least) choose : which cache line to fill with new data, and the 1024 byte offset between : successive data points probably meant that each fetch required a : complete cache line reload. : Marcus Mendenhall -- *****************begin r.s. response******************* yes...the beauty of old, simple computers for this!!!!! think about an old 286 running old dos (c.3.31) such as my compaq 2551! a) much less in the way of these problems, you can be sure!!!!! b) so cheap it seems like a joke!!!!! c) also...shareware development systems (or freeware) you can get and try (or use) for nothing... chasm a86 fmodula2 desmet pcc micro c (dave dunfield) small c (hi-tech) pacific ppd c-- (interpreted languages too!) ubasic xlisp pc-lisp icon (may not have tried this myself) snobol (may not have tried this myself) abc (used this small amount) etc. *****************end r.s. response********************* Ralph Silverman z007400b@bcfreenet.seflin.lib.fl.us