Skip to content

[PerfFix] Avoid separate thread for MP executor shm spin#28012

Merged
njhill merged 6 commits intovllm-project:mainfrom
njhill:nonblock-execmodel
Nov 4, 2025
Merged

[PerfFix] Avoid separate thread for MP executor shm spin#28012
njhill merged 6 commits intovllm-project:mainfrom
njhill:nonblock-execmodel

Conversation

@njhill
Copy link
Copy Markdown
Member

@njhill njhill commented Nov 4, 2025

#26866 included a change to MultiprocExecutor to always busy-wait on the shm message queue from a separate thread, even when async scheduling and pipeline parallel aren't used.

This appears to have drastic performance consequences, possibly due to CPU contention / context switches of the spin with concurrent work happening in the scheduler process (e.g. serialization/deserialization).

This PR reworks the Future mechanism to busy-wait on the queue from the main thread while it is blocking on Future.get_result().

Apart from addressing the non-async performance regression, this may hopefully further improve async scheduling and pipeline parallel performance.

Note: The Executor collective_rpc method with non_block=True has been changed to return a future that resolves to a list rather than a list of futures.

Benchmark

vllm bench latency --model meta-llama/Llama-3.1-8B-Instruct --batch-size 256 --input-len 2048 --output-len 2048 -tp 4 --num-iters-warmup 1 --num-iters 3

Pre-#26866

Avg latency: 41.68921127310023 seconds
10% percentile latency: 41.66592846997082 seconds
25% percentile latency: 41.66958249360323 seconds
50% percentile latency: 41.675672532990575 seconds
75% percentile latency: 41.7020706825424 seconds
90% percentile latency: 41.71790957227349 seconds
99% percentile latency: 41.72741290611215 seconds

Post-#26866

Avg latency: 47.77594091727709 seconds
10% percentile latency: 47.59141381587833 seconds
25% percentile latency: 47.68351502344012 seconds
50% percentile latency: 47.83701703604311 seconds
75% percentile latency: 47.89890487049706 seconds
90% percentile latency: 47.936037571169436 seconds
99% percentile latency: 47.95831719157286 seconds

After this PR:

Avg latency: 40.96941473521292 seconds
10% percentile latency: 40.95436927340925 seconds
25% percentile latency: 40.960011321585625 seconds
50% percentile latency: 40.96941473521292 seconds
75% percentile latency: 40.97881814884022 seconds
90% percentile latency: 40.984460197016595 seconds
99% percentile latency: 40.98784542592242 seconds

@njhill njhill added this to the v0.11.1 milestone Nov 4, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MultiprocExecutor to avoid using a separate thread for handling asynchronous results, aiming to fix a performance regression. The approach is to use a queue of futures on the main thread, processing them when a result is requested or a blocking call is made. While the overall direction is sound and simplifies the logic by removing the ThreadPoolExecutor, I've identified a few critical issues in the implementation concerning execution order and API consistency, particularly in the MultiprocExecutor and RayExecutor.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 4, 2025
@njhill njhill mentioned this pull request Nov 4, 2025
22 tasks
Signed-off-by: Nick Hill <nhill@redhat.com>
@njhill njhill force-pushed the nonblock-execmodel branch from ad98efe to 3c65afe Compare November 4, 2025 00:26
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
@njhill njhill force-pushed the nonblock-execmodel branch from 00d65b3 to 84cc1fa Compare November 4, 2025 01:24
Signed-off-by: Nick Hill <nhill@redhat.com>
@njhill
Copy link
Copy Markdown
Member Author

njhill commented Nov 4, 2025

The remaining failing tests appear to unrelated and also failing elsewhere.

Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me although I'm not the right person to know this area. Approving for now to unblock release issues.
Can you add a benchmark result to show how this resolves the perf regression?

@njhill njhill merged commit c9f66da into vllm-project:main Nov 4, 2025
51 checks passed
@njhill njhill deleted the nonblock-execmodel branch November 4, 2025 16:34
@njhill
Copy link
Copy Markdown
Member Author

njhill commented Nov 4, 2025

Thanks @mgoin, I've added the benchmark results above.

NickLucche added a commit to NickLucche/vllm that referenced this pull request Nov 7, 2025
…m-project#28012)"

This reverts commit c9f66da.

Signed-off-by: NickLucche <nlucches@redhat.com>
DarkLight1337 pushed a commit that referenced this pull request Nov 7, 2025
)" (#28289)

Signed-off-by: NickLucche <nlucches@redhat.com>
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants