[PerfFix] Avoid separate thread for MP executor shm spin#28012
[PerfFix] Avoid separate thread for MP executor shm spin#28012njhill merged 6 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the MultiprocExecutor to avoid using a separate thread for handling asynchronous results, aiming to fix a performance regression. The approach is to use a queue of futures on the main thread, processing them when a result is requested or a blocking call is made. While the overall direction is sound and simplifies the logic by removing the ThreadPoolExecutor, I've identified a few critical issues in the implementation concerning execution order and API consistency, particularly in the MultiprocExecutor and RayExecutor.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Nick Hill <nhill@redhat.com>
ad98efe to
3c65afe
Compare
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
00d65b3 to
84cc1fa
Compare
Signed-off-by: Nick Hill <nhill@redhat.com>
|
The remaining failing tests appear to unrelated and also failing elsewhere. |
mgoin
left a comment
There was a problem hiding this comment.
This looks reasonable to me although I'm not the right person to know this area. Approving for now to unblock release issues.
Can you add a benchmark result to show how this resolves the perf regression?
|
Thanks @mgoin, I've added the benchmark results above. |
…m-project#28012)" This reverts commit c9f66da. Signed-off-by: NickLucche <nlucches@redhat.com>
)" (#28289) Signed-off-by: NickLucche <nlucches@redhat.com>
…t#28012) Signed-off-by: Nick Hill <nhill@redhat.com>
…m-project#28012)" (vllm-project#28289) Signed-off-by: NickLucche <nlucches@redhat.com>
…t#28012) Signed-off-by: Nick Hill <nhill@redhat.com>
…m-project#28012)" (vllm-project#28289) Signed-off-by: NickLucche <nlucches@redhat.com>
#26866 included a change to MultiprocExecutor to always busy-wait on the shm message queue from a separate thread, even when async scheduling and pipeline parallel aren't used.
This appears to have drastic performance consequences, possibly due to CPU contention / context switches of the spin with concurrent work happening in the scheduler process (e.g. serialization/deserialization).
This PR reworks the Future mechanism to busy-wait on the queue from the main thread while it is blocking on
Future.get_result().Apart from addressing the non-async performance regression, this may hopefully further improve async scheduling and pipeline parallel performance.
Note: The Executor
collective_rpcmethod withnon_block=Truehas been changed to return a future that resolves to a list rather than a list of futures.Benchmark
Pre-#26866
Post-#26866
After this PR: