Skip to content

[llm] upgrade vllm to 0.12.0#58026

Merged
richardliaw merged 21 commits intoray-project:masterfrom
kouroshHakha:kh/temp-vllm-0.12.0
Dec 8, 2025
Merged

[llm] upgrade vllm to 0.12.0#58026
richardliaw merged 21 commits intoray-project:masterfrom
kouroshHakha:kh/temp-vllm-0.12.0

Conversation

@kouroshHakha
Copy link
Copy Markdown
Contributor

@kouroshHakha kouroshHakha commented Oct 23, 2025

Related prs that we should review when upgrading fully:

Issues:

Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 6, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 6, 2025
@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Nov 20, 2025
@eicherseiji eicherseiji self-assigned this Nov 20, 2025
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Nov 20, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…emp-vllm-0.12.0

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
nrghosh and others added 5 commits December 1, 2025 16:06
# Conflicts:
#	python/deplocks/llm/rayllm_py311_cpu.lock
#	python/deplocks/llm/rayllm_py311_cu128.lock
#	python/deplocks/llm/rayllm_test_py311_cpu.lock
#	python/deplocks/llm/rayllm_test_py311_cu128.lock
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
…rgument 'tokens_only'

Addresses ray-project#58973

- vLLM release 0.11.1 introduces tokens_only arguments to both
  FrontendArgs and EngineArgs. VLLMEngine.start() gathers arguments from
both of them, which raises errors when collisions occur
- Allow different argument sets to define the same arguments by name,
  and give precedence to the engine args (in case of collisions), then
merge the dicts

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh changed the title [wip] ongoing pr for upgrading to next vllm release based on the state of nightly [wip] upgrade vllm to 0.11.2 Dec 3, 2025
- fixing ci error with `//ci/raydepsets:raydepsets -- build
--all-configs`

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Copy link
Copy Markdown
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

release tests all passing except for [jailed]llm_batch_vllm_multi_node (None) (0) - tracked here #58062 (comment)

image

PR to fix (by Rui) - #58866

Copy link
Copy Markdown
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python/ray/llm/tests/batch/gpu /processor/test_vllm_engine_proc.py::test_embedding_model is failing due to

RuntimeError: flashinfer-cubin version (0.5.3) does not match flashinfer version (0.5.2)

there is a mismatch between flashinfer-python (0.5.2) and flashinfer-cubin (0.5.3). The lock file pins flashinfer-python==0.5.2, but then vLLM 0.11.2 pulls in flashinfer-cubin==0.5.3 as a transitive dependency.

…n required by vllm 0.11.2

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Copy link
Copy Markdown
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test failure for /doc:source/data/doc_code/working-with-llms/embedding_example

msgspec.ValidationError: Expected `bool`, got `None` - at `$[4][10]`

issue is in vllm - fixed here (merged): vllm-project/vllm#29364 - so pushing to 0.12.0

@nrghosh nrghosh changed the title [wip] upgrade vllm to 0.11.2 [wip] upgrade vllm to 0.12.0 Dec 4, 2025
nrghosh and others added 2 commits December 3, 2025 20:42
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
elliot-barn and others added 5 commits December 5, 2025 01:47
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
vLLM renamed guided_decoding to structured_outputs and changed
the embedding API:

- SamplingParams: GuidedDecodingParams -> StructuredOutputsParams,
  guided_decoding -> structured_outputs (vllm-project/vllm#22772,
  vllm-project/vllm#29326)

- Embedding: use encode(pooling_params=...) instead of
  generate(sampling_params=...) for pooling tasks
  (vllm-project/vllm#16188, vllm-project/vllm#25524)

- EngineArgs: guided_decoding_backend -> structured_outputs_config

User-facing "guided_decoding" key in sampling_params dict preserved
for backwards compatibility.

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh changed the title [wip] upgrade vllm to 0.12.0 [llm] upgrade vllm to 0.12.0 Dec 7, 2025
Copy link
Copy Markdown
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this merges, will undo transformers pin

ctx: #58980 (review)

eicherseiji left a comment
Ideally we can drop the transformers pin in the next vLLM version bump

cc @eicherseiji the multi-gpu tests (dp_pd_example and dp_basic_example.py) are OOMing on warmup (at first glance) - is reducing sequence length ok, or did something change with the compute configs? They both run on this branch with on a workspace w/ 4xL4.

@eicherseiji
Copy link
Copy Markdown
Contributor

Yeah @nrghosh we can reduce max_model_len and max_num_seqs

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Copy link
Copy Markdown
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good @kouroshHakha

Image

@kouroshHakha kouroshHakha marked this pull request as ready for review December 8, 2025 20:18
@kouroshHakha
Copy link
Copy Markdown
Contributor Author

@aslonnie @richardliaw can you guys approve?

Copy link
Copy Markdown
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp

@richardliaw richardliaw merged commit b06b8a2 into ray-project:master Dec 8, 2025
6 checks passed
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
Related prs that we should review when upgrading fully:
- ray-project#58820
- Note from Rui: When we bump new vllm version, we should go with 0.11.2
instead of 0.11.1, which fixes a Ray multi-node PP regression that was
introduced when adding torch-based PP
https://github.com/vllm-project/vllm/releases/tag/v0.11.2

Issues:
- closes ray-project#58937
- closes ray-project#58973
- closes ray-project#58702

---------

Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Co-authored-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Co-authored-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

5 participants