[llm] upgrade vllm to 0.12.0 by kouroshHakha · Pull Request #58026 · ray-project/ray

kouroshHakha · 2025-10-23T00:09:57Z

Related prs that we should review when upgrading fully:

[serve][llm] update vllm_engine.py to check for VLLM_USE_V1 attribute #58820
Note from Rui: When we bump new vllm version, we should go with 0.11.2 instead of 0.11.1, which fixes a Ray multi-node PP regression that was introduced when adding torch-based PP
https://github.com/vllm-project/vllm/releases/tag/v0.11.2

Issues:

Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>

github-actions · 2025-11-06T00:38:41Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

…emp-vllm-0.12.0 Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

# Conflicts: # python/deplocks/llm/rayllm_py311_cpu.lock # python/deplocks/llm/rayllm_py311_cu128.lock # python/deplocks/llm/rayllm_test_py311_cpu.lock # python/deplocks/llm/rayllm_test_py311_cu128.lock

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…rgument 'tokens_only' Addresses ray-project#58973 - vLLM release 0.11.1 introduces tokens_only arguments to both FrontendArgs and EngineArgs. VLLMEngine.start() gathers arguments from both of them, which raises errors when collisions occur - Allow different argument sets to define the same arguments by name, and give precedence to the engine args (in case of collisions), then merge the dicts Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

- fixing ci error with `//ci/raydepsets:raydepsets -- build --all-configs` Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh

release tests all passing except for [jailed]llm_batch_vllm_multi_node (None) (0) - tracked here #58062 (comment)

PR to fix (by Rui) - #58866

nrghosh

python/ray/llm/tests/batch/gpu /processor/test_vllm_engine_proc.py::test_embedding_model is failing due to

RuntimeError: flashinfer-cubin version (0.5.3) does not match flashinfer version (0.5.2)

there is a mismatch between flashinfer-python (0.5.2) and flashinfer-cubin (0.5.3). The lock file pins flashinfer-python==0.5.2, but then vLLM 0.11.2 pulls in flashinfer-cubin==0.5.3 as a transitive dependency.

…n required by vllm 0.11.2 Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh

test failure for /doc:source/data/doc_code/working-with-llms/embedding_example

msgspec.ValidationError: Expected `bool`, got `None` - at `$[4][10]`

issue is in vllm - fixed here (merged): vllm-project/vllm#29364 - so pushing to 0.12.0

Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

vLLM renamed guided_decoding to structured_outputs and changed the embedding API: - SamplingParams: GuidedDecodingParams -> StructuredOutputsParams, guided_decoding -> structured_outputs (vllm-project/vllm#22772, vllm-project/vllm#29326) - Embedding: use encode(pooling_params=...) instead of generate(sampling_params=...) for pooling tasks (vllm-project/vllm#16188, vllm-project/vllm#25524) - EngineArgs: guided_decoding_backend -> structured_outputs_config User-facing "guided_decoding" key in sampling_params dict preserved for backwards compatibility. Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh

after this merges, will undo transformers pin

ctx: #58980 (review)

eicherseiji left a comment
Ideally we can drop the transformers pin in the next vLLM version bump

cc @eicherseiji the multi-gpu tests (dp_pd_example and dp_basic_example.py) are OOMing on warmup (at first glance) - is reducing sequence length ok, or did something change with the compute configs? They both run on this branch with on a workspace w/ 4xL4.

eicherseiji · 2025-12-08T17:57:06Z

Yeah @nrghosh we can reduce max_model_len and max_num_seqs

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh

looks good @kouroshHakha

kouroshHakha · 2025-12-08T20:22:30Z

@aslonnie @richardliaw can you guys approve?

richardliaw

stamp

Related prs that we should review when upgrading fully: - ray-project#58820 - Note from Rui: When we bump new vllm version, we should go with 0.11.2 instead of 0.11.1, which fixes a Ray multi-node PP regression that was introduced when adding torch-based PP https://github.com/vllm-project/vllm/releases/tag/v0.11.2 Issues: - closes ray-project#58937 - closes ray-project#58973 - closes ray-project#58702 --------- Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com> Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com> Co-authored-by: Seiji Eicher <seiji@anyscale.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Co-authored-by: elliot-barn <elliot.barnwell@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

wip

312499a

Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 6, 2025

github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Nov 20, 2025

eicherseiji self-assigned this Nov 20, 2025

eicherseiji added the go add ONLY when ready to merge, run all tests label Nov 20, 2025

eicherseiji added 3 commits November 20, 2025 09:18

Bump vLLM, pydantic, packaging deps

392f2cf

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Merge branch 'master' of https://github.com/ray-project/ray into kh/t…

1f233c9

…emp-vllm-0.12.0 Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Bump to 0.11.2

d13838c

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

nrghosh self-assigned this Nov 21, 2025

eicherseiji removed their assignment Nov 24, 2025

This was referenced Nov 24, 2025

[serve] update to support latest vLLM #58945

Merged

[serve] TypeError: argparse.Namespace() got multiple values for keyword argument 'tokens_only' #58973

Closed

nrghosh and others added 5 commits December 1, 2025 16:06

Merge remote-tracking branch 'upstream/master' into kh/temp-vllm-0.12.0

6069b67

# Conflicts: # python/deplocks/llm/rayllm_py311_cpu.lock # python/deplocks/llm/rayllm_py311_cu128.lock # python/deplocks/llm/rayllm_test_py311_cpu.lock # python/deplocks/llm/rayllm_test_py311_cu128.lock

Merge branch 'master' into kh/temp-vllm-0.12.0

95b4f89

recompile deps + fix lock files

870a8a8

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Merge branch 'master' into kh/temp-vllm-0.12.0

d93ed01

nrghosh changed the title ~~[wip] ongoing pr for upgrading to next vllm release based on the state of nightly~~ [wip] upgrade vllm to 0.11.2 Dec 3, 2025

build / refresh all lock files

78cf123

- fixing ci error with `//ci/raydepsets:raydepsets -- build --all-configs` Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh reviewed Dec 3, 2025

View reviewed changes

nrghosh added 2 commits December 2, 2025 21:10

constrain flashinfer-cubin to 0.5.2 to match flashinfer-python versio…

fe70c09

…n required by vllm 0.11.2 Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

update lock files for flashinfer-cubin pin

6f095c5

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

This was referenced Dec 3, 2025

[llm][docker] Add flashinfer-python to ray-llm image #57606

Closed

[Serve] Duplicated arguments for vLLM frontend and engine #58937

Closed

nrghosh reviewed Dec 3, 2025

View reviewed changes

eicherseiji mentioned this pull request Dec 3, 2025

Deepseek OCR unsupported #58702

Closed

nrghosh changed the title ~~[wip] upgrade vllm to 0.11.2~~ [wip] upgrade vllm to 0.12.0 Dec 4, 2025

nrghosh and others added 2 commits December 3, 2025 20:42

Merge branch 'master' into kh/temp-vllm-0.12.0

4141ab0

Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>

bump to 0.12.0

fc1dc56

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

jreiml mentioned this pull request Dec 4, 2025

[wip] fix: adjust imports for vLLM 0.12.0 #59171

Closed

elliot-barn and others added 5 commits December 5, 2025 01:47

updating lock files

e8936a6

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

working depset

7b11b44

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Merge branch 'master' into kh/temp-vllm-0.12.0

5ddc8ed

Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>

fix imports - vllm modules moved around

f8d4350

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh changed the title ~~[wip] upgrade vllm to 0.12.0~~ [llm] upgrade vllm to 0.12.0 Dec 7, 2025

Merge branch 'master' into kh/temp-vllm-0.12.0

ea02049

nrghosh reviewed Dec 7, 2025

View reviewed changes

Reduce memory usage in multi-GPU DP doc tests

3b5ba99

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

eicherseiji approved these changes Dec 8, 2025

View reviewed changes

nrghosh approved these changes Dec 8, 2025

View reviewed changes

kouroshHakha marked this pull request as ready for review December 8, 2025 20:18

kouroshHakha requested review from a team, aslonnie, edoakes and richardliaw as code owners December 8, 2025 20:18

richardliaw approved these changes Dec 8, 2025

View reviewed changes

richardliaw merged commit b06b8a2 into ray-project:master Dec 8, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llm] upgrade vllm to 0.12.0#58026

[llm] upgrade vllm to 0.12.0#58026
richardliaw merged 21 commits intoray-project:masterfrom
kouroshHakha:kh/temp-vllm-0.12.0

kouroshHakha commented Oct 23, 2025 •

edited by eicherseiji

Loading

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

nrghosh left a comment •

edited

Loading

Uh oh!

nrghosh left a comment

Uh oh!

nrghosh left a comment •

edited

Loading

Uh oh!

nrghosh left a comment •

edited

Loading

Uh oh!

eicherseiji commented Dec 8, 2025

Uh oh!

nrghosh left a comment

Uh oh!

kouroshHakha commented Dec 8, 2025

Uh oh!

richardliaw left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kouroshHakha commented Oct 23, 2025 • edited by eicherseiji Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

nrghosh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

nrghosh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrghosh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eicherseiji commented Dec 8, 2025

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha commented Dec 8, 2025

Uh oh!

richardliaw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kouroshHakha commented Oct 23, 2025 •

edited by eicherseiji

Loading

nrghosh left a comment •

edited

Loading

nrghosh left a comment •

edited

Loading

nrghosh left a comment •

edited

Loading