From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.107.17.87 with SMTP id z84mr8805611ioi.51.1508709858106; Sun, 22 Oct 2017 15:04:18 -0700 (PDT) X-Received: by 10.157.0.7 with SMTP id 7mr396165ota.14.1508709858074; Sun, 22 Oct 2017 15:04:18 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.linkpendium.com!news.linkpendium.com!news.snarked.org!border2.nntp.dca1.giganews.com!nntp.giganews.com!k70no3741827itk.0!news-out.google.com!u132ni3869ita.0!nntp.google.com!l196no3734483itl.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Sun, 22 Oct 2017 15:04:17 -0700 (PDT) In-Reply-To: <498648ad-7a60-4847-b272-772383d197d0@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2601:191:8303:2100:5985:2c17:9409:aa9c; posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3 NNTP-Posting-Host: 2601:191:8303:2100:5985:2c17:9409:aa9c References: <498648ad-7a60-4847-b272-772383d197d0@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <2908c15e-0359-460e-824d-f832da32d196@googlegroups.com> Subject: Re: Poor performance after upgrate to xubuntu 17.10 From: Robert Eachus Injection-Date: Sun, 22 Oct 2017 22:04:18 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: news.eternal-september.org comp.lang.ada:48570 Date: 2017-10-22T15:04:17-07:00 List-Id: On Saturday, October 21, 2017 at 6:41:59 AM UTC-4, Charly wrote:=20 > My Hardware > AMD FX(tm)-8350 Eight-Core Processor Oh boy! Welcome to the wonderful world of modern tasking. Intel chips with= Hyperthreading and the new AMD Ryzens are different, but the issues come o= ut the same: sometimes not all cores can be treated equally. The 8350 has four modules with two cores each. Each core has its own L1 ins= truction and data cache. It shares a 2 MByte L2 cache with its partner in = the module, and there is an 8 Meg L3 cache. I assume your program is small= enough that the compute tasks' instructions and data fit into the L1 cache= s. If you are using any floating point instructions or registers, that opens u= p more potential problems. I have some compute cores that work best on Bul= ldozer family AMD chips and Intel chips with Hyperthreading by using every = other CPU number: 0,2,4,6 in your case. But I don't think this code runs i= nto that. So far, so good. But it looks like you are getting tripped up by one or mo= re data cache lines being shared between compute engines. (Instruction cach= e lines? Sharing is fine.) It could be an actual value shared among tasks,= or several different values that get allocated in close proximity. I hope= , and count on, task stacks not being adjacent, so this usually happens for= (shared) variables in the parent of the tasks, or variables in the spec of= generic library packages. If this happens, the cache management will result in just what you are seei= ng. Owning that cache line will act like a ring token passed from task to = task. Parallel and Ta_Types are the two packages I'd be suspicious of. The = detail here that may be biting you is that the variables in these packages = are on the main stack, not duplicated, if necessary, in each task stack. Eventually you get to the point of paranoia where you make anything that go= es on the main stack a multiple of 64 or 128 bytes and insure that the comp= iler follows your intent. You also have the worker tasks copy as constants = any main program variables that are, in their view, constants. Finally, just good task programming. If you expect to have each task on it= s own CPU core or thread, use affinities to tie them to specific cores. Wh= y? Modern processors do not flush all caches when an interrupt is serviced= . If you have an interrupt that doesn't, you want the same task back on th= at CPU or thread. (In fact, some CPUs go further, and have ownership tags o= n the cache lines. So some data in cache can belong to the OS, and the res= t of it to your task.) Note that when setting affinities, CPU 0 becomes affinity 1, etc. For each = thread there is a bit vector of threads it can run on. On Windows, the arg= ument is a hex number that converts to a bit vector. On a Hyperthreaded or= Zen CPU, affinity 3 means run on either thread on CPU 0. In your case 3 w= ould mean run on either of the processors in module 0, and so on. Setting = affinity to 0 is not a good idea. =20 By the way, is the duplicate value 'X' in the declaration of Ta_Types.Chip_= Name intentional? Certainly worth a comment if it is.