[llm][deps] Upgrade vLLM to 0.18.0 by jeffreywang-anyscale · Pull Request #61952 · ray-project/ray

jeffreywang-anyscale · 2026-03-22T00:39:36Z

Description

vLLM 0.18.0 moved validation error handling from create_completion / create_chat_completion into api_router.py. Since Ray Serve LLM calls these methods directly (bypassing api_router.py), validation errors like invalid temperature or too-long prompts now propagate as unhandled exceptions, returning 500 instead of 400.

We wrap these calls with try/except in vllm_engine.py, mirroring api_router.py's pattern and delegating to vLLM's create_error_response for correct status code mapping. Related files: https://github.com/ray-project/ray/pull/61952/changes#diff-ef22ec52e4a4e7156a9c391f529e5b1fc9f0e06fceb9397b2364094c113fc858.

An alternative was adding VLLMValidationError handling in Ray's middleware, but that would duplicate vLLM's mapping logic and couple a vLLM-agnostic layer to vLLM-specific exceptions.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

gemini-code-assist

Code Review

This pull request upgrades vLLM to version 0.18.0. The changes include updating the vLLM version in the Dockerfile, requirements files, and regenerating the dependency lock files. A notable improvement is the removal of a workaround for a vLLM pickling issue, which is presumably fixed in this new version. The review identifies one out-of-scope change in python/requirements_compiled.txt that should be moved to a separate PR to keep this one focused.

python/requirements_compiled.txt

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

aslonnie

does this fix the LD_LIBRARY_PATH thing?

jeffreywang-anyscale · 2026-03-23T03:22:47Z

does this fix the LD_LIBRARY_PATH thing?

No, the branch for this vLLM version was cut on Thursday. The fix (whether in vLLM or how we build Ray LLM images) is still in progress.

kouroshHakha

Thanks for the upgrade and the error handling fix — the approach of mirroring api_router.py's pattern at the Ray Serve boundary makes sense.

Two main concerns:

Embeddings endpoint is broken on 0.18.0 — vLLM renamed the attribute from openai_serving_embedding → serving_embedding (line 332, unchanged in this PR) and replaced create_embedding() with a callable class. This PR still uses the old names. See #61959 for the fix — those changes need to be folded into this PR. Separately: the fact that this breakage wasn't caught means we're missing release test coverage for the embeddings endpoint. Can we add an embedding model to the serve integration tests?
Bare except Exception will mask server-side errors — OOMs, AttributeError, etc. will be swallowed and returned as 400s to the user. This should be narrowed to VLLMValidationError. See inline comments.

Note

This review was co-written with AI assistance (Claude Code).

python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py

release/llm_tests/serve/test_llm_serve_multi_node_integration.py

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

release/llm_tests/serve/test_llm_serve_multi_node_integration.py

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale · 2026-03-24T06:18:46Z

python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py

+                request,
+                raw_request=raw_request,
+            )
+        except ValueError as e:


Catch ValueError because

VLLMValidationError subclasses it

covers vLLM’s schema/grammar validation errors

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha

Re-review: previous feedback has been addressed well.

except Exception narrowed to except ValueError — correctly covers VLLMValidationError without masking server errors.
_make_error_response safety net is clean — re-raises original if create_error_response fails.
Embeddings API adapted (attribute rename, callable invocation, Starlette Response parsing).
New integration tests for embedding and score endpoints fill the coverage gap.

One minor question inline about whether the isinstance(embedding_response, VLLMErrorResponse) branch is still reachable in 0.18. Not blocking.

LGTM pending CI.

Note

This review was co-written with AI assistance (Claude Code).

kouroshHakha · 2026-03-25T01:26:06Z

python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py

+            yield self._make_error_response(self._oai_serving_embedding, e)
+            return

        if isinstance(embedding_response, VLLMErrorResponse):


Nit: Is this isinstance(embedding_response, VLLMErrorResponse) branch still reachable in vLLM 0.18? If the callable now raises exceptions for errors (caught by the except ValueError above) rather than returning VLLMErrorResponse, this may be dead code. Fine to leave as a safety net, but worth confirming.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Upgrade vLLM to 0.18.0

c28cea4

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Mar 22, 2026

gemini-code-assist bot reviewed Mar 22, 2026

View reviewed changes

python/requirements_compiled.txt Outdated Show resolved Hide resolved

jeffreywang-anyscale added 5 commits March 21, 2026 17:58

Fix linter

1da56c0

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Remove nvidia-ml-py

60e1c8d

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Fix tests

7a9ae8f

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Error handling in vllm_engine.py

8b8bdfc

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Restore test

6eea5c3

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale marked this pull request as ready for review March 22, 2026 23:44

jeffreywang-anyscale requested review from a team, aslonnie, edoakes and richardliaw as code owners March 22, 2026 23:44

jeffreywang-anyscale requested review from eicherseiji and kouroshHakha March 22, 2026 23:45

aslonnie approved these changes Mar 23, 2026

View reviewed changes

ray-gardener bot added serve Ray Serve Related Issue llm labels Mar 23, 2026

kouroshHakha mentioned this pull request Mar 24, 2026

fix(serve): Fix Ray Serve LLM embeddings endpoint for pooling models #61959

Closed

kouroshHakha reviewed Mar 24, 2026

View reviewed changes

python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py Outdated Show resolved Hide resolved

python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py Outdated Show resolved Hide resolved

release/llm_tests/serve/test_llm_serve_multi_node_integration.py Show resolved Hide resolved

jeffreywang-anyscale added 2 commits March 23, 2026 21:13

CR feedback

a471418

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Fix release tests

a3e5ad2

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

cursor bot reviewed Mar 24, 2026

View reviewed changes

release/llm_tests/serve/test_llm_serve_multi_node_integration.py Outdated Show resolved Hide resolved

jeffreywang-anyscale commented Mar 24, 2026

View reviewed changes

release/llm_tests/serve/test_llm_serve_multi_node_integration.py Outdated Show resolved Hide resolved

Apply suggestion from @jeffreywang-anyscale

1463904

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale commented Mar 24, 2026

View reviewed changes

Add more tests

9203ecb

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha reviewed Mar 25, 2026

View reviewed changes

Remove dead code

5653ae6

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha enabled auto-merge (squash) March 25, 2026 15:45

kouroshHakha approved these changes Mar 25, 2026

View reviewed changes

kouroshHakha merged commit 08e1288 into master Mar 25, 2026
7 checks passed

kouroshHakha deleted the vllm-0.18.0 branch March 25, 2026 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llm][deps] Upgrade vLLM to 0.18.0#61952

[llm][deps] Upgrade vLLM to 0.18.0#61952
kouroshHakha merged 11 commits intomasterfrom
vllm-0.18.0

jeffreywang-anyscale commented Mar 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

aslonnie left a comment

Uh oh!

jeffreywang-anyscale commented Mar 23, 2026

Uh oh!

kouroshHakha left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

jeffreywang-anyscale Mar 24, 2026

Uh oh!

kouroshHakha left a comment

Uh oh!

kouroshHakha Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeffreywang-anyscale commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

aslonnie left a comment

Choose a reason for hiding this comment

Uh oh!

jeffreywang-anyscale commented Mar 23, 2026

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeffreywang-anyscale Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeffreywang-anyscale commented Mar 22, 2026 •

edited

Loading