From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,cd86d70d109cd9b1 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news3.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!newsfeed00.sul.t-online.de!t-online.de!130.59.10.21.MISMATCH!kanaga.switch.ch!switch.ch!newsserver.news.garr.it!newsserver.cilea.it!not-for-mail From: Colin Paul Gloster Newsgroups: comp.lang.ada Subject: Re: multicore-multithreading benchmarks Date: 22 Dec 2006 17:12:24 GMT Organization: CILEA Message-ID: References: NNTP-Posting-Host: docenti.ing.unipi.it X-Trace: newsserver.cilea.it 1166807544 18869 131.114.28.20 (22 Dec 2006 17:12:24 GMT) X-Complaints-To: news@cilea.it NNTP-Posting-Date: 22 Dec 2006 17:12:24 GMT Xref: g2news2.google.com comp.lang.ada:7992 Date: 2006-12-22T17:12:24+00:00 List-Id: Dear Mr. Moran, Tom Moran posted in news:g8OdnXUIoJrz2hbYnZ2dnUVZ_v63nZ2d@comcast.com : "Karl Nyberg was kind enough to run the N-CPU quicksort test on his Sun "Try and Buy" evaluation T1000, with 8 cores, 4 threads per core and got the results below. For the large-N cases there is a significant speedup when N becomes a larger power of two. That's logical since the number of partitions at any given time is a power of two (approximately, since the split may not have been exactly even). [..] CPUs N= 1000 N= 10000 N= 100000 N= 1000000 [..] 7 0.001457500 0.006942000 0.094361000 0.744371500 8 0.001456750 0.007131000 0.083258000 0.693531250 [..] 7 0.001418250 0.006858750 0.077237750 0.598086250 8 0.001422000 0.007047500 0.074811750 0.481415250 [..] 7 0.001445250 0.007033000 0.096396000 0.929958500 8 0.001444000 0.007120500 0.091150250 0.870250250" We can note that with N= 10000, 8 CPUs are often slightly slower in Nyberg's; Alexander's; and my measurements (below) than 7 so-called "CPUs", though this is not nearly as dramatic as the erratic timings of finding the first 32 prime numbers in Wai-Mee Ching and Alex Katz, "An Experimental APL Compiler for a Distributed Memory Parallel Machine", Supercomputing conference 1994. Except for my last run, TRYSORTN was the only process reported by top to be using more than 2 "%CPU". For most of a run of most of the runs, TRYSORTN was reported by top to be using approximately 41% or sixty-something% or seventy-something% or 99.9 "%CPU" depending on the run. I compiled TRYSORTN with gnatmake -O3 and I ran TRYSORTN on a machine with four x86_64s cores (AMD Opterons) (1.8GHz) which in many cases was outperformed by the "dual AMD Opteron" which Alexander used ( news:1166492674.379186.85310@t46g2000cwa.googlegroups.com and news:1166497708.533742.140650@48g2000cwx.googlegroups.com ). It may be worthwhile if Alexander provides more details (e.g. clock speed), but these would probably still not justify the slowness of the four AMD Opterons. If we divide the Sun Niagara T1000's numbers in news:g8OdnXUIoJrz2hbYnZ2dnUVZ_v63nZ2d@comcast.com by six ( news:icednfoKjsdI6hbYnZ2dnUVZ_r2onZ2d@comcast.com ), many of the speeds are slower than the "dual AMD Opteron"'s so the approximately $14445 for the Sun Niagara T1000 might not be value for money. Regards, Colin Paul Gloster time ./TRYSORTN CPUs N= 1000 N= 10000 N= 100000 N= 1000000 1 0.000085250 0.001124500 0.013804250 0.170973750 2 0.000085250 0.000998500 0.009506500 0.092315750 3 0.000085250 0.000915500 0.008373250 0.081367000 4 0.000085000 0.001170250 0.009327500 0.074474500 5 0.000083750 0.001054250 0.008961750 0.076013750 6 0.000085250 0.001531250 0.009179000 0.075535000 7 0.000083750 0.001367250 0.009538000 0.065402000 8 0.000085250 0.001385250 0.007406250 0.064408750 real 0m4.674s user 0m8.261s sys 0m0.096s time ./TRYSORTN CPUs N= 1000 N= 10000 N= 100000 N= 1000000 1 0.000081500 0.001108000 0.013810000 0.172433500 2 0.000082750 0.000849500 0.008702750 0.092600250 3 0.000082750 0.001033000 0.010324250 0.074759750 4 0.000082750 0.001243000 0.008898500 0.076791750 5 0.000082750 0.001232500 0.008550500 0.077942000 6 0.000084750 0.001320750 0.008310250 0.077038000 7 0.000082750 0.001440500 0.008635500 0.063810750 8 0.000083000 0.001594750 0.008516750 0.065588750 real 0m5.062s user 0m8.337s sys 0m0.100s time ./TRYSORTN CPUs N= 1000 N= 10000 N= 100000 N= 1000000 1 0.000083250 0.001178750 0.013625500 0.172203750 2 0.000085000 0.001180750 0.009293750 0.092631750 3 0.000084750 0.001093250 0.009663750 0.076857000 4 0.000089750 0.000881500 0.008177500 0.057204750 5 0.000087000 0.000946500 0.008502000 0.056925750 6 0.000086500 0.001260000 0.007659000 0.057980000 7 0.000083500 0.001646750 0.008052000 0.056552750 8 0.000084750 0.001221000 0.007820750 0.063489000 real 0m4.242s user 0m8.229s sys 0m0.056s time ./TRYSORTN CPUs N= 1000 N= 10000 N= 100000 N= 1000000 1 0.000089250 0.001124750 0.014132000 0.175325000 2 0.000090750 0.001117750 0.010619000 0.093166000 3 0.000090750 0.000858750 0.009654000 0.079514500 4 0.000091250 0.001423750 0.008418500 0.075662750 5 0.000090500 0.001086250 0.009355750 0.081312750 6 0.000090750 0.001260500 0.009833500 0.077358250 7 0.000091000 0.001032750 0.010370500 0.067873000 8 0.000090500 0.001441750 0.008750250 0.065373250 real 0m4.640s user 0m8.357s sys 0m0.124s time ./TRYSORTN CPUs N= 1000 N= 10000 N= 100000 N= 1000000 1 0.000082750 0.001063250 0.014282000 0.170518750 2 0.000083750 0.000688250 0.011545500 0.093950500 3 0.000084000 0.000972500 0.010276250 0.085763250 4 0.000082500 0.001058500 0.009569500 0.092093500 5 0.000083000 0.001263500 0.012566750 0.076557250 6 0.000084000 0.001459500 0.009440250 0.074283000 7 0.000082250 0.001378750 0.008399000 0.079315750 8 0.000085750 0.001449000 0.010754750 0.071387000 real 0m4.843s user 0m8.209s sys 0m0.128s time ./TRYSORTN CPUs N= 1000 N= 10000 N= 100000 N= 1000000 1 0.000090000 0.001139750 0.014450000 0.165776750 2 0.000086250 0.001113000 0.009400250 0.087525750 3 0.000091750 0.001160000 0.008839000 0.082454750 4 0.000088000 0.001009500 0.009091000 0.054664750 5 0.000086250 0.001241500 0.010316250 0.054852750 6 0.000086250 0.001354000 0.009450250 0.053322500 7 0.000086500 0.001272250 0.008177000 0.059826500 8 0.000086250 0.001562500 0.009065750 0.057336250 real 0m4.287s user 0m8.069s sys 0m0.064s gnatmake -v GNATMAKE 4.0.2 20051125 (Red Hat 4.0.2-8) Copyright 1995-2004 Free Software Foundation, Inc. gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix-- ----enable-checking=release --with-system-zlib-- ----enable-__cxa_atexit --disable-libunwind-exceptions-- ----enable-libgcj-multifile --enable-languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=x86_64-redhat-linux Thread model: posix gcc version 4.0.2 20051125 (Red Hat 4.0.2-8) uname --all Linux urano.iet.unipi.it 2.6.16-1.2115_FC4smp #1 SMP Mon Jun 5 15:01:20 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux