Fix Unbound background thread pool affinity#607
Conversation
| mut config: LocalExecutorConfig, | ||
| ) -> Result<LocalExecutor> { | ||
| let blocking_thread = | ||
| BlockingThreadPool::new(config.thread_pool_placement, notifier.clone())?; |
There was a problem hiding this comment.
So technically this now maybe has problems with the NUMA logic below? Not 100% sure about the impact.
The other way to do this maybe is to retrieve the CPU affinity prior to binding & passing that into BlockingThreadPool so that it resets the CPU affinity when the thread starts?
There was a problem hiding this comment.
I think this should be fine. The executor is still allocated in the node where it wants to run, and the BlockingThreadPool is way less performance sensitive.
fa1647c to
9dd747a
Compare
|
I don't think the build failure in the tests has to do with any of my changes does it? I'm unable to repro locally but I have a newer Rust version (1.72 vs 1.58). Maybe the minimum Rust version needs to be bumped? When I try to run 1.58 locally I get When I exclude |
glommio/src/executor/mod.rs
Outdated
| #[test] | ||
| fn blocking_function_placement_independent_of_executor_placement() { | ||
| let num_schedulable_cpus = LocalExecutorBuilder::new(Placement::Fixed(0)) | ||
| .blocking_thread_pool_placement(PoolPlacement::Unbound(4)) |
There was a problem hiding this comment.
I am afraid this test will break when executed in machines with less than 4 CPUs.
Should we skip it in that scenario? It's still a valuable test
There was a problem hiding this comment.
This doesn't actually need 4 CPUs. Can be reduced to 2 and I'll adjust to skip the test if there's only 1 online CPU found.
|
@vlovich I'm okay bumping. 1.58 is very old, and if I try, I get a different set of errors altogether, meaning this won't be easy to track down. |
|
I opened https://github.com/DataDog/glommio/actions/runs/6412940182/job/17411069816?pr=608, but there's now a flurry of clippy stuff. I'll take a look tomorrow. |
backtrace picked up a new addr2line in a patch release that depends on 1.65.
``` warning: some crates are on edition 2021 which defaults to `resolver = "2"`, but virtual workspaces default to `resolver = "1"` note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest ```
c2ac572 to
6e6fd78
Compare
|
I bumped it to 1.65 here as it seems like the root cause (at least for me locally) was backtrace acquiring a new dep on addr2line which required 1.65. Maybe other packages had updated dependencies too, but that was the obvious one. Not sure about the policy about how to pick the minimum Rust version (picking the latest stable today also seems like a valid strategy). |
6e6fd78 to
35e6c4c
Compare
|
@glommer one thing that nightly clippy warns me about that is the single warning not resolved by this PR: The warning seems accurate since this is glommio's RwLock which is single-thread only, so wrapping it in Arc seems superfluous since it can't be shared between threads anyway. Certainly changing it to Rc seems to compile everything fine, but I didn't dig into it more deeply than that if there are risks with that conversion. EDIT: Actually, it fails to compile without it so just going to assume for now it should be |
Resolves clippy errors that are now caught: bitflags/bitflags#373
35e6c4c to
e545ade
Compare
Any threads we spawn inherit our affinity. This means that background thread pools for any executors that don't have an affinity of "Unbound" will only ever run on the executor's CPU rather than distributing work across many CPUs. This seems undesirable given that the background thread pool placement can be set independent of the executor it's associated with. Additionally, the background thread pool placement by default uses the pool placement so this has no impact on background threads that didn't have a custom placement set (& if the placement was set, it was likely the wrong thing happening).
e545ade to
7da7f76
Compare
|
We should be fine with an RC here. |
What does this PR do?
Creating a local executor with a non-Unbound placement but with a background thread of
Unboundplacement now correctly has those background threads free-floating across any CPU instead of pinned to the CPU affinity assigned to the executor.This is accomplished by creating the background thread pool before we bind the executor's CPU affinity.
Motivation
Seems like buggy behavior for the executor having an affinity breaking
Unboundplacement for background threads. Violated principle of least surprise when my pinned executor was doing blocking dispatch to 16 threads but all that work was still getting scheduled on just 1 thread.Additional Notes
If a non-unbound placement for background threads is provided, this PR has no effect.
Checklist
[X] I have added unit tests to the code I am submitting
[X] My unit tests cover both failure and success scenarios
[] If applicable, I have discussed my architecture