Conversation
thread. Each plugin thread adopts the worker's affinity value during `thread_new` and `thread_continue` procedure calls. Cleanup routine in affinity.c needs to be improved. Need to add a command line switch to selectively enable pinning.
…ity call to thread_resume
Codecov Report
@@ Coverage Diff @@
## dev #1063 +/- ##
==========================================
+ Coverage 54.96% 55.05% +0.09%
==========================================
Files 127 129 +2
Lines 19235 19276 +41
Branches 4590 4601 +11
==========================================
+ Hits 10572 10613 +41
+ Misses 5896 5885 -11
- Partials 2767 2778 +11
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
I've not done any experiments with Phantom, but on the current release version of Shadow, I've run some experiments that show pinning to the A sides of CPUs (effectively disabling hyperthreads) is noticeably faster than using all logical CPUs -- specifically, on some large experiments, setting the number of workers to the number of physical cores and pinning to the A sides was significantly faster than pinning to A and B sides, whether setting the number of workers to the number of physical or logical cores (though the details will likely vary with specific workload, and that was pinning the entire process, not each thread, so there were still migrations happening). Pinning can be accomplished with numactl, but since A and B sides are frequently (but not always) interleaved numerically, a user who doesn't know to do that and runs Shadow with fewer workers than logical CPUs will likely end up with the workers bunched up on the same physical cores when distributing via It might also be worth trying to keep threads on the same socket when multiple are available (and also bind to the memory for those numa nodes, strictly speaking, but I think linux is smart enough to handle that step on its own?). I don't know if there's a standard way to get all this information via glibc/syscalls, but it's available in |
robgjansen
left a comment
There was a problem hiding this comment.
I think we tested the case that @jtracey found works better for classic shadow when we were testing phantom (i.e., effectively disabling hyperthreading); could you summarize our conclusions there?
| assert(_ncpus > 0 && pid >= 0); | ||
|
|
||
| // We can short-circuit if there's no work to do. | ||
| if (!_affinity_enabled || new_cpu_num == AFFINITY_UNINIT || new_cpu_num == old_cpu_num) { |
There was a problem hiding this comment.
If new_cpu_num == AFFINITY_UNINIT then don't we want to "unpin" the process to allow the scheduler to run it anywhere?
There was a problem hiding this comment.
I'm using AFFINITY_UNINIT to denote the case when AFFINITY is unknown or has never been set (as opposed to the case when affinity is set to all CPUs). I think it'd be better to introduce a new variable to unpin/"unset" affinity if we want that behavior.
src/main/host/affinity.c
Outdated
| } | ||
|
|
||
| if (!set_affinity_suceeded) { | ||
| warning("Could not set CPU affinity for PID %d to %d", (int)pid, new_cpu_num); |
There was a problem hiding this comment.
If we use the cpu-pin option and we can't pin, should that be an error or a warning? IIRC, Jim thought error makes more sense, rather than continuing to run the simulation without pinning and potentially tricking the user that pinning is active when it isn't. I'm thinking that logging a warning makes sense but then refusing to run the experiment. That's more work to code because you'll have to propagate the error back, but does it give us the behavior we want?
There was a problem hiding this comment.
I promoted to critical so that the user has a higher likelihood of seeing.
Crashing with an error makes me nervous because there are potentially thousands of affinity changes that could occur over the course of an experiment (during work stealing, for example). If a single one of those affinity changes doesn't succeed, I'm not sure the experiment should crash since it could otherwise complete successfully with negligible performance impact.
Regardless, if we want to make this a crash instead of a warning, I think it should be done at the call site and not inside this module.
There was a problem hiding this comment.
Crashing with an error makes me nervous because there are potentially thousands of affinity changes that could occur over the course of an experiment
Haven't looked at the code in this PR yet but when I made that comment I believe the proposed pinning was all done during startup. If something goes wrong during startup I think it makes sense to crash. I agree we want to avoid crashing in the middle of a simulation if we can.
| #include <sched.h> | ||
| #include <sys/types.h> | ||
|
|
||
| enum { AFFINITY_UNINIT = -1 }; |
There was a problem hiding this comment.
[optional nit] A brief note describing this would be nice.
There was a problem hiding this comment.
OK, I added a comment
Yeah, so most of the performance improvements due to pinning I believe is due to reducing the cost of context switching (this only applied in the process-oriented I agree with Justin that there's more benefit that could be gained by intelligently pinning workers to particular CPUs, for example taking NUMA nodes into account. Some experiments we've run so far show that pinning to hyperthread cores pessimizes performance, but for the machines I'm running experiments on the HT cores are all high numbered, so by setting Thanks for the feedback @jtracey ! |
| } | ||
|
|
||
| cpu_set_t* cpu_set = NULL; | ||
| bool set_affinity_suceeded = false; |
There was a problem hiding this comment.
typo, 'suceeded' (I noticed the typo while reading the discussion :)
|
We decided to close this PR and open 2 new ones in order to make sure we have the pinning feature in both master (classic shadow) and dev (phantom) branches. The first PR will add the feature to master, then we'll merge master to dev, and the second PR will make the changes needed to do the pinning in dev. |
Ok, removing myself as a reviewer on this one, then :) |
This PR adds an optional flag that enables CPU pinning for Shadow workers and plugins. Workers are pinned uniformly among logical CPUs available on the machine. Plugin threads are pinned to the same logical CPU that the worker is pinned to. This behavior is enabled with the
--cpu-pin=1flag.Pinning can yield a significant performance improvement. On syscall dense workloads, I am measuring a 20-50% improvement in performance (using
--cpu-pin=1 and --preload-spin-max=0).