Support pinning when running parallel simulations on same hardware

Currently Shadow's cpu-pinning logic assumes it has the entire machine to itself. In particular, if two instances of Shadow are running concurrently with pinning, they'll both pin to the same set of CPUs. It'd be good if we could avoid this, particularly when running on shared hardware where other users might be running simulations at the same time.

Some options:

* Out of band global coordination: In Shadow, use sched_getaffinity before doing any pinning, and only use the set of CPUs that were initially assigned when pinning. A user could then use a tool like `taskset(1)` to assign each shadow simulation a disjoint set of CPUs to work with. This option is pretty easy to implement in Shadow, but puts the burden of global coordination on the user. EDIT: Now implemented in #1575 
* Shadow global coordination: Each instance of Shadow could have a "pid" file that records which CPUs it's using. When Shadow starts up it would check for existing pid files, validate that those pids are still running, read them to find which CPUs are in use, choose its own set of CPUs disjoint from those in the current files, and write its own pid file. Some care would be needed to avoid race conditions (maybe just a global lock file). This removes the burden of global coordination from the user, but adds substantial complexity to Shadow, including global mutable state.
* Flexible pinning: Let the Linux scheduler choose the initial CPU to run each worker thread on, and then pin the worker and its managed thread to that CPU. We might also want to periodically unpin to give the Linux scheduler a chance to choose a different CPU. This strategy would let the scheduler handle picking idle CPUs. Potential downsides though are more CPU migrations (if/when we unpin to allow the scheduler to reassign), and giving up control over the initial assignment (which currently tries to maximize cache affinity and avoid using multiple logical CPUs on the same core).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pinning when running parallel simulations on same hardware #1565

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support pinning when running parallel simulations on same hardware #1565

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions