[PerfFix] Avoid separate thread for MP executor shm spin by njhill · Pull Request #28012 · vllm-project/vllm

njhill · 2025-11-04T00:09:17Z

#26866 included a change to MultiprocExecutor to always busy-wait on the shm message queue from a separate thread, even when async scheduling and pipeline parallel aren't used.

This appears to have drastic performance consequences, possibly due to CPU contention / context switches of the spin with concurrent work happening in the scheduler process (e.g. serialization/deserialization).

This PR reworks the Future mechanism to busy-wait on the queue from the main thread while it is blocking on Future.get_result().

Apart from addressing the non-async performance regression, this may hopefully further improve async scheduling and pipeline parallel performance.

Note: The Executor collective_rpc method with non_block=True has been changed to return a future that resolves to a list rather than a list of futures.

Benchmark

vllm bench latency --model meta-llama/Llama-3.1-8B-Instruct --batch-size 256 --input-len 2048 --output-len 2048 -tp 4 --num-iters-warmup 1 --num-iters 3

Pre-#26866

Avg latency: 41.68921127310023 seconds
10% percentile latency: 41.66592846997082 seconds
25% percentile latency: 41.66958249360323 seconds
50% percentile latency: 41.675672532990575 seconds
75% percentile latency: 41.7020706825424 seconds
90% percentile latency: 41.71790957227349 seconds
99% percentile latency: 41.72741290611215 seconds

Post-#26866

Avg latency: 47.77594091727709 seconds
10% percentile latency: 47.59141381587833 seconds
25% percentile latency: 47.68351502344012 seconds
50% percentile latency: 47.83701703604311 seconds
75% percentile latency: 47.89890487049706 seconds
90% percentile latency: 47.936037571169436 seconds
99% percentile latency: 47.95831719157286 seconds

After this PR:

Avg latency: 40.96941473521292 seconds
10% percentile latency: 40.95436927340925 seconds
25% percentile latency: 40.960011321585625 seconds
50% percentile latency: 40.96941473521292 seconds
75% percentile latency: 40.97881814884022 seconds
90% percentile latency: 40.984460197016595 seconds
99% percentile latency: 40.98784542592242 seconds

gemini-code-assist

Code Review

This pull request refactors the MultiprocExecutor to avoid using a separate thread for handling asynchronous results, aiming to fix a performance regression. The approach is to use a queue of futures on the main thread, processing them when a result is requested or a blocking call is made. While the overall direction is sound and simplifies the logic by removing the ThreadPoolExecutor, I've identified a few critical issues in the implementation concerning execution order and API consistency, particularly in the MultiprocExecutor and RayExecutor.

vllm/v1/executor/multiproc_executor.py

vllm/v1/executor/ray_executor.py

vllm/v1/executor/ray_utils.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/v1/executor/ray_utils.py

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill · 2025-11-04T05:58:57Z

The remaining failing tests appear to unrelated and also failing elsewhere.

mgoin

This looks reasonable to me although I'm not the right person to know this area. Approving for now to unblock release issues.
Can you add a benchmark result to show how this resolves the perf regression?

njhill · 2025-11-04T17:04:40Z

Thanks @mgoin, I've added the benchmark results above.

…m-project#28012)" This reverts commit c9f66da. Signed-off-by: NickLucche <nlucches@redhat.com>

)" (#28289) Signed-off-by: NickLucche <nlucches@redhat.com>

…t#28012) Signed-off-by: Nick Hill <nhill@redhat.com>

…m-project#28012)" (vllm-project#28289) Signed-off-by: NickLucche <nlucches@redhat.com>

…t#28012) Signed-off-by: Nick Hill <nhill@redhat.com>

…m-project#28012)" (vllm-project#28289) Signed-off-by: NickLucche <nlucches@redhat.com>

njhill requested review from ApostaC and NickLucche as code owners November 4, 2025 00:09

njhill added this to the v0.11.1 milestone Nov 4, 2025

mergify bot added v1 kv-connector labels Nov 4, 2025

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 4, 2025

View reviewed changes

vllm/v1/executor/ray_utils.py Outdated Show resolved Hide resolved

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 4, 2025

njhill mentioned this pull request Nov 4, 2025

Async Scheduling Plan #27679

Closed

22 tasks

[PerfFix] Avoid separate thread for MP executor shm spin

3c65afe

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill force-pushed the nonblock-execmodel branch from ad98efe to 3c65afe Compare November 4, 2025 00:26

njhill added 3 commits November 3, 2025 16:32

fix ray FutureWrapper change

fc50346

Signed-off-by: Nick Hill <nhill@redhat.com>

fix output aggregator test

45c6b36

Signed-off-by: Nick Hill <nhill@redhat.com>

guard against cancellation in FutureWrapper

84cc1fa

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill force-pushed the nonblock-execmodel branch from 00d65b3 to 84cc1fa Compare November 4, 2025 01:24

fix ray future return

817c525

Signed-off-by: Nick Hill <nhill@redhat.com>

mgoin approved these changes Nov 4, 2025

View reviewed changes

Merge branch 'main' into nonblock-execmodel

fe0877d

njhill merged commit c9f66da into vllm-project:main Nov 4, 2025
51 checks passed

njhill deleted the nonblock-execmodel branch November 4, 2025 16:34

gemini-code-assist bot mentioned this pull request Nov 7, 2025

Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012)" #28289

Merged

NickLucche added a commit to NickLucche/vllm that referenced this pull request Nov 7, 2025

Revert "[PerfFix] Avoid separate thread for MP executor shm spin (vll…

8a1d25a

…m-project#28012)" This reverts commit c9f66da. Signed-off-by: NickLucche <nlucches@redhat.com>

DarkLight1337 pushed a commit that referenced this pull request Nov 7, 2025

Revert "[PerfFix] Avoid separate thread for MP executor shm spin (#28012

68a72a5

)" (#28289) Signed-off-by: NickLucche <nlucches@redhat.com>

njhill mentioned this pull request Nov 7, 2025

[PerfFix] Avoid separate thread for MP executor shm spin (take 2) #28319

Merged

njhill mentioned this pull request Nov 7, 2025

[Core] Async scheduling + structured outputs compatibility #26866

Merged

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

[PerfFix] Avoid separate thread for MP executor shm spin (vllm-projec…

b6b6fc5

…t#28012) Signed-off-by: Nick Hill <nhill@redhat.com>

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025

Revert "[PerfFix] Avoid separate thread for MP executor shm spin (vll…

373707f

…m-project#28012)" (vllm-project#28289) Signed-off-by: NickLucche <nlucches@redhat.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[PerfFix] Avoid separate thread for MP executor shm spin (vllm-projec…

5c08b40

…t#28012) Signed-off-by: Nick Hill <nhill@redhat.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

Revert "[PerfFix] Avoid separate thread for MP executor shm spin (vll…

5a3923d

…m-project#28012)" (vllm-project#28289) Signed-off-by: NickLucche <nlucches@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PerfFix] Avoid separate thread for MP executor shm spin#28012

[PerfFix] Avoid separate thread for MP executor shm spin#28012
njhill merged 6 commits intovllm-project:mainfrom
njhill:nonblock-execmodel

njhill commented Nov 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

njhill commented Nov 4, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

njhill commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

njhill commented Nov 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Pre-#26866

Post-#26866

After this PR:

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

njhill commented Nov 4, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

njhill commented Nov 4, 2025 •

edited by github-actions bot

Loading