From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,2c57913d6b8220c1 X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII Path: g2news2.google.com!postnews.google.com!o41g2000yqb.googlegroups.com!not-for-mail From: jonathan Newsgroups: comp.lang.ada Subject: Re: Tasking for Mandelbrot program Date: Mon, 28 Sep 2009 12:52:36 -0700 (PDT) Organization: http://groups.google.com Message-ID: <59856ad8-2434-4370-a1df-875b46b3b7bc@o41g2000yqb.googlegroups.com> References: <4abebaf4$0$31342$9b4e6d93@newsspool4.arcor-online.net> <4abfd8df$0$31337$9b4e6d93@newsspool4.arcor-online.net> NNTP-Posting-Host: 143.117.23.233 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1254167556 32147 127.0.0.1 (28 Sep 2009 19:52:36 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Mon, 28 Sep 2009 19:52:36 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: o41g2000yqb.googlegroups.com; posting-host=143.117.23.233; posting-account=Jzt5lQoAAAB4PhTgRLOPGuTLd_K1LY-C User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.14) Gecko/2009091008 Iceweasel/3.0.6 (Debian-3.0.6-3),gzip(gfe),gzip(gfe) Xref: g2news2.google.com comp.lang.ada:8520 Date: 2009-09-28T12:52:36-07:00 List-Id: On Sep 27, 10:27=A0pm, Georg Bauhaus wrote: > - writing the image bytes with Stream_IO removes 6% running time > =A0 when compared to GNAT.IO.Put. > =A0 This adds standard Ada but also adds about 10 lines of code for > =A0 the Put procedure and a Stdout variable. Is it worth it? Speeding up IO would give you a very detectable improvement in the multi-core benchmark, since the present program parallelizes the computation well, and the remaining problem is a small but irritating IO overhead that can't be parallelized. Here are a few timings on 8 cores. Perfect parallelization would give speed-up factor =3D 8. With Output enabled: No_Of_Workers (tasks) =3D 8, speed-up factor =3D 4.45 No_Of_Workers (tasks) =3D 16, speed-up factor =3D 6.30 No_Of_Workers (tasks) =3D 24, speed-up factor =3D 6.66 No_Of_Workers (tasks) =3D 32, speed-up factor =3D 6.86 With Output disabled, it is nearer the optimal speed-up factor of 8: No_Of_Workers (tasks) =3D 32, speed-up factor =3D 7.66 The actual benchmark uses 4 cores, so I suspect that the present standard setting of No_Of_Workers =3D 16 is good. For those who are interested in this problem as much as I am, a few more words of explanation ... The difficulty with mandelbrot is that if you parallelize it by breaking it up into work-segments (break up the outer loop into segments of equal length), then some work-segments finish quick, some slow, so we have a load balancing problem. The solution Georg came up with breaks the problem into a number of independent tasks four time greater in number than the number of cores. The operating system successfully distributes the tasks over the cores in such a way that the cores do comparable amounts of work. (Hope my description is accurate.) Jonathan