comp.lang.ada
 help / color / mirror / Atom feed
From: Robert Eachus <rieachus@comcast.net>
Subject: Re: Poor performance after upgrate to xubuntu 17.10
Date: Sun, 22 Oct 2017 15:04:17 -0700 (PDT)
Date: 2017-10-22T15:04:17-07:00	[thread overview]
Message-ID: <2908c15e-0359-460e-824d-f832da32d196@googlegroups.com> (raw)
In-Reply-To: <498648ad-7a60-4847-b272-772383d197d0@googlegroups.com>

On Saturday, October 21, 2017 at 6:41:59 AM UTC-4, Charly wrote: 
> My Hardware
> AMD FX(tm)-8350 Eight-Core Processor

Oh boy! Welcome to the wonderful world of modern tasking.  Intel chips with Hyperthreading and the new AMD Ryzens are different, but the issues come out the same: sometimes not all cores can be treated equally.

The 8350 has four modules with two cores each. Each core has its own L1 instruction and data cache.  It shares a 2 MByte L2 cache with its partner in the module, and there is an 8 Meg L3 cache.  I assume your program is small enough that the compute tasks' instructions and data fit into the L1 caches.

If you are using any floating point instructions or registers, that opens up more potential problems.  I have some compute cores that work best on Bulldozer family AMD chips and Intel chips with Hyperthreading by using every other CPU number: 0,2,4,6 in your case.  But I don't think this code runs into that.

So far, so good.  But it looks like you are getting tripped up by one or more data cache lines being shared between compute engines. (Instruction cache lines?  Sharing is fine.) It could be an actual value shared among tasks, or several different values that get allocated in close proximity.  I hope, and count on, task stacks not being adjacent, so this usually happens for (shared) variables in the parent of the tasks, or variables in the spec of generic library packages.

If this happens, the cache management will result in just what you are seeing.  Owning that cache line will act like a ring token passed from task to task. Parallel and Ta_Types are the two packages I'd be suspicious of. The detail here that may be biting you is that the variables in these packages are on the main stack, not duplicated, if necessary, in each task stack.

Eventually you get to the point of paranoia where you make anything that goes on the main stack a multiple of 64 or 128 bytes and insure that the compiler follows your intent. You also have the worker tasks copy as constants any main program variables that are, in their view, constants.

Finally, just good task programming.  If you expect to have each task on its own CPU core or thread, use affinities to tie them to specific cores.  Why?  Modern processors do not flush all caches when an interrupt is serviced.  If you have an interrupt that doesn't, you want the same task back on that CPU or thread. (In fact, some CPUs go further, and have ownership tags on the cache lines.  So some data in cache can belong to the OS, and the rest of it to your task.)

Note that when setting affinities, CPU 0 becomes affinity 1, etc. For each thread there is a bit vector of threads it can run on.  On Windows, the argument is a hex number that converts to a bit vector.  On a Hyperthreaded or Zen CPU, affinity 3 means run on either thread on CPU 0.  In your case 3 would mean run on either of the processors in module 0, and so on.  Setting affinity to 0 is not a good idea.
 
By the way, is the duplicate value 'X' in the declaration of Ta_Types.Chip_Name intentional? Certainly worth a comment if it is.

  parent reply	other threads:[~2017-10-22 22:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-21 10:41 Poor performance after upgrate to xubuntu 17.10 Charly
2017-10-21 19:58 ` Chris M Moore
2017-10-22 20:31   ` Charly
2017-10-22 22:04 ` Robert Eachus [this message]
2017-10-23  6:11   ` Luke A. Guest
2017-10-23  8:00     ` Mark Lorenzen
2017-10-25 18:56   ` Charly
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox