Skip to content

Fix Unbound background thread pool affinity#607

Merged
glommer merged 6 commits intoDataDog:masterfrom
vlovich:fix-thread-pool-affinity
Oct 6, 2023
Merged

Fix Unbound background thread pool affinity#607
glommer merged 6 commits intoDataDog:masterfrom
vlovich:fix-thread-pool-affinity

Conversation

@vlovich
Copy link
Copy Markdown
Contributor

@vlovich vlovich commented Oct 4, 2023

What does this PR do?

Creating a local executor with a non-Unbound placement but with a background thread of Unbound placement now correctly has those background threads free-floating across any CPU instead of pinned to the CPU affinity assigned to the executor.

This is accomplished by creating the background thread pool before we bind the executor's CPU affinity.

Motivation

Seems like buggy behavior for the executor having an affinity breaking Unbound placement for background threads. Violated principle of least surprise when my pinned executor was doing blocking dispatch to 16 threads but all that work was still getting scheduled on just 1 thread.

Additional Notes

If a non-unbound placement for background threads is provided, this PR has no effect.

Checklist

[X] I have added unit tests to the code I am submitting
[X] My unit tests cover both failure and success scenarios
[] If applicable, I have discussed my architecture

mut config: LocalExecutorConfig,
) -> Result<LocalExecutor> {
let blocking_thread =
BlockingThreadPool::new(config.thread_pool_placement, notifier.clone())?;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So technically this now maybe has problems with the NUMA logic below? Not 100% sure about the impact.

The other way to do this maybe is to retrieve the CPU affinity prior to binding & passing that into BlockingThreadPool so that it resets the CPU affinity when the thread starts?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fine. The executor is still allocated in the node where it wants to run, and the BlockingThreadPool is way less performance sensitive.

@vlovich vlovich force-pushed the fix-thread-pool-affinity branch from fa1647c to 9dd747a Compare October 4, 2023 02:48
@vlovich
Copy link
Copy Markdown
Contributor Author

vlovich commented Oct 4, 2023

I don't think the build failure in the tests has to do with any of my changes does it? I'm unable to repro locally but I have a newer Rust version (1.72 vs 1.58). Maybe the minimum Rust version needs to be bumped?

When I try to run 1.58 locally I get

cargo +1.58 check --lib
    Updating crates.io index
error: failed to select a version for the requirement `memchr = "^2.3.3"`
candidate versions found which didn't match: 2.5.0, 2.4.1, 2.4.0, ...
location searched: crates.io index
required by package `futures-lite v1.13.0`
    ... which satisfies dependency `futures-lite = "^1.11.1"` (locked to 1.13.0) of package `examples v0.0.0 (glommio/examples)`

When I exclude examples from the workspace, the next failure is:

cargo +1.58 check --lib
    Updating crates.io index
error: failed to select a version for the requirement `once_cell = "^1.8"`
candidate versions found which didn't match: 1.17.2, 1.17.1, 1.17.0, ...
location searched: crates.io index
required by package `ahash v0.7.6`
    ... which satisfies dependency `ahash = "^0.7"` (locked to 0.7.6) of package `glommio v0.8.0 (glommio/glommio)`

#[test]
fn blocking_function_placement_independent_of_executor_placement() {
let num_schedulable_cpus = LocalExecutorBuilder::new(Placement::Fixed(0))
.blocking_thread_pool_placement(PoolPlacement::Unbound(4))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this test will break when executed in machines with less than 4 CPUs.

Should we skip it in that scenario? It's still a valuable test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually need 4 CPUs. Can be reduced to 2 and I'll adjust to skip the test if there's only 1 online CPU found.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'd be better

@glommer
Copy link
Copy Markdown
Collaborator

glommer commented Oct 4, 2023

@vlovich I'm okay bumping. 1.58 is very old, and if I try, I get a different set of errors altogether, meaning this won't be easy to track down.

@glommer
Copy link
Copy Markdown
Collaborator

glommer commented Oct 5, 2023

I opened https://github.com/DataDog/glommio/actions/runs/6412940182/job/17411069816?pr=608, but there's now a flurry of clippy stuff.

I'll take a look tomorrow.

backtrace picked up a new addr2line in a patch release that depends on
1.65.
```
warning: some crates are on edition 2021 which defaults to `resolver = "2"`, but virtual workspaces default to `resolver = "1"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
```
@vlovich vlovich force-pushed the fix-thread-pool-affinity branch 2 times, most recently from c2ac572 to 6e6fd78 Compare October 5, 2023 04:30
@vlovich
Copy link
Copy Markdown
Contributor Author

vlovich commented Oct 5, 2023

I bumped it to 1.65 here as it seems like the root cause (at least for me locally) was backtrace acquiring a new dep on addr2line which required 1.65. Maybe other packages had updated dependencies too, but that was the obvious one. Not sure about the policy about how to pick the minimum Rust version (picking the latest stable today also seems like a valid strategy).

@vlovich vlovich force-pushed the fix-thread-pool-affinity branch from 6e6fd78 to 35e6c4c Compare October 5, 2023 04:52
@vlovich
Copy link
Copy Markdown
Contributor Author

vlovich commented Oct 5, 2023

@glommer one thing that nightly clippy warns me about that is the single warning not resolved by this PR:

warning: usage of an `Arc` that is not `Send` or `Sync`
   --> glommio/src/channels/channel_mesh.rs:311:20
    |
311 |             peers: Arc::new(RwLock::new(Vec::new())),
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: the trait `Sync` is not implemented for `RwLock<Vec<Peer>>`
    = note: required for `Arc<RwLock<Vec<Peer>>>` to implement `Send` and `Sync`
    = help: consider using an `Rc` instead or wrapping the inner type with a `Mutex`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#arc_with_non_send_sync
    = note: `#[warn(clippy::arc_with_non_send_sync)]` on by default

The warning seems accurate since this is glommio's RwLock which is single-thread only, so wrapping it in Arc seems superfluous since it can't be shared between threads anyway. Certainly changing it to Rc seems to compile everything fine, but I didn't dig into it more deeply than that if there are risks with that conversion.

EDIT: Actually, it fails to compile without it so just going to assume for now it should be Rc and that there's no subtle bomb I'm creating by doing this if the tests pass.

@vlovich vlovich force-pushed the fix-thread-pool-affinity branch from 35e6c4c to e545ade Compare October 5, 2023 05:26
Any threads we spawn inherit our affinity. This means that background
thread pools for any executors that don't have an affinity of "Unbound"
will only ever run on the executor's CPU rather than distributing work
across many CPUs. This seems undesirable given that the background
thread pool placement can be set independent of the executor it's
associated with. Additionally, the background thread pool placement by
default uses the pool placement so this has no impact on background
threads that didn't have a custom placement set (& if the placement was
set, it was likely the wrong thing happening).
@vlovich vlovich force-pushed the fix-thread-pool-affinity branch from e545ade to 7da7f76 Compare October 5, 2023 05:28
@glommer
Copy link
Copy Markdown
Collaborator

glommer commented Oct 6, 2023

We should be fine with an RC here.

@glommer glommer merged commit e3bdf40 into DataDog:master Oct 6, 2023
@vlovich vlovich deleted the fix-thread-pool-affinity branch October 6, 2023 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants