Skip to content

[llm][deps] Upgrade vLLM to 0.18.0#61952

Merged
kouroshHakha merged 11 commits intomasterfrom
vllm-0.18.0
Mar 25, 2026
Merged

[llm][deps] Upgrade vLLM to 0.18.0#61952
kouroshHakha merged 11 commits intomasterfrom
vllm-0.18.0

Conversation

@jeffreywang-anyscale
Copy link
Copy Markdown
Contributor

@jeffreywang-anyscale jeffreywang-anyscale commented Mar 22, 2026

Description

vLLM 0.18.0 moved validation error handling from create_completion / create_chat_completion into api_router.py. Since Ray Serve LLM calls these methods directly (bypassing api_router.py), validation errors like invalid temperature or too-long prompts now propagate as unhandled exceptions, returning 500 instead of 400.

We wrap these calls with try/except in vllm_engine.py, mirroring api_router.py's pattern and delegating to vLLM's create_error_response for correct status code mapping. Related files: https://github.com/ray-project/ray/pull/61952/changes#diff-ef22ec52e4a4e7156a9c391f529e5b1fc9f0e06fceb9397b2364094c113fc858.

An alternative was adding VLLMValidationError handling in Ray's middleware, but that would duplicate vLLM's mapping logic and couple a vLLM-agnostic layer to vLLM-specific exceptions.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale jeffreywang-anyscale added the go add ONLY when ready to merge, run all tests label Mar 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades vLLM to version 0.18.0. The changes include updating the vLLM version in the Dockerfile, requirements files, and regenerating the dependency lock files. A notable improvement is the removal of a workaround for a vLLM pickling issue, which is presumably fixed in this new version. The review identifies one out-of-scope change in python/requirements_compiled.txt that should be moved to a separate PR to keep this one focused.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale jeffreywang-anyscale marked this pull request as ready for review March 22, 2026 23:44
Copy link
Copy Markdown
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this fix the LD_LIBRARY_PATH thing?

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue llm labels Mar 23, 2026
@jeffreywang-anyscale
Copy link
Copy Markdown
Contributor Author

does this fix the LD_LIBRARY_PATH thing?

No, the branch for this vLLM version was cut on Thursday. The fix (whether in vLLM or how we build Ray LLM images) is still in progress.

Copy link
Copy Markdown
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the upgrade and the error handling fix — the approach of mirroring api_router.py's pattern at the Ray Serve boundary makes sense.

Two main concerns:

  1. Embeddings endpoint is broken on 0.18.0 — vLLM renamed the attribute from openai_serving_embeddingserving_embedding (line 332, unchanged in this PR) and replaced create_embedding() with a callable class. This PR still uses the old names. See #61959 for the fix — those changes need to be folded into this PR. Separately: the fact that this breakage wasn't caught means we're missing release test coverage for the embeddings endpoint. Can we add an embedding model to the serve integration tests?

  2. Bare except Exception will mask server-side errors — OOMs, AttributeError, etc. will be swallowed and returned as 400s to the user. This should be narrowed to VLLMValidationError. See inline comments.

Note

This review was co-written with AI assistance (Claude Code).

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
request,
raw_request=raw_request,
)
except ValueError as e:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch ValueError because

  • VLLMValidationError subclasses it
  • covers vLLM’s schema/grammar validation errors

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Copy link
Copy Markdown
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: previous feedback has been addressed well.

  • except Exception narrowed to except ValueError — correctly covers VLLMValidationError without masking server errors.
  • _make_error_response safety net is clean — re-raises original if create_error_response fails.
  • Embeddings API adapted (attribute rename, callable invocation, Starlette Response parsing).
  • New integration tests for embedding and score endpoints fill the coverage gap.

One minor question inline about whether the isinstance(embedding_response, VLLMErrorResponse) branch is still reachable in 0.18. Not blocking.

LGTM pending CI.

Note

This review was co-written with AI assistance (Claude Code).

yield self._make_error_response(self._oai_serving_embedding, e)
return

if isinstance(embedding_response, VLLMErrorResponse):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is this isinstance(embedding_response, VLLMErrorResponse) branch still reachable in vLLM 0.18? If the callable now raises exceptions for errors (caught by the except ValueError above) rather than returning VLLMErrorResponse, this may be dead code. Fine to leave as a safety net, but worth confirming.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@kouroshHakha kouroshHakha enabled auto-merge (squash) March 25, 2026 15:45
@kouroshHakha kouroshHakha merged commit 08e1288 into master Mar 25, 2026
7 checks passed
@kouroshHakha kouroshHakha deleted the vllm-0.18.0 branch March 25, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

3 participants