[llm][deps] Upgrade vLLM to 0.18.0#61952
Conversation
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request upgrades vLLM to version 0.18.0. The changes include updating the vLLM version in the Dockerfile, requirements files, and regenerating the dependency lock files. A notable improvement is the removal of a workaround for a vLLM pickling issue, which is presumably fixed in this new version. The review identifies one out-of-scope change in python/requirements_compiled.txt that should be moved to a separate PR to keep this one focused.
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
aslonnie
left a comment
There was a problem hiding this comment.
does this fix the LD_LIBRARY_PATH thing?
No, the branch for this vLLM version was cut on Thursday. The fix (whether in vLLM or how we build Ray LLM images) is still in progress. |
kouroshHakha
left a comment
There was a problem hiding this comment.
Thanks for the upgrade and the error handling fix — the approach of mirroring api_router.py's pattern at the Ray Serve boundary makes sense.
Two main concerns:
-
Embeddings endpoint is broken on 0.18.0 — vLLM renamed the attribute from
openai_serving_embedding→serving_embedding(line 332, unchanged in this PR) and replacedcreate_embedding()with a callable class. This PR still uses the old names. See #61959 for the fix — those changes need to be folded into this PR. Separately: the fact that this breakage wasn't caught means we're missing release test coverage for the embeddings endpoint. Can we add an embedding model to the serve integration tests? -
Bare
except Exceptionwill mask server-side errors — OOMs,AttributeError, etc. will be swallowed and returned as 400s to the user. This should be narrowed toVLLMValidationError. See inline comments.
Note
This review was co-written with AI assistance (Claude Code).
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
release/llm_tests/serve/test_llm_serve_multi_node_integration.py
Outdated
Show resolved
Hide resolved
release/llm_tests/serve/test_llm_serve_multi_node_integration.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
| request, | ||
| raw_request=raw_request, | ||
| ) | ||
| except ValueError as e: |
There was a problem hiding this comment.
Catch ValueError because
VLLMValidationErrorsubclasses it- covers vLLM’s schema/grammar validation errors
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
kouroshHakha
left a comment
There was a problem hiding this comment.
Re-review: previous feedback has been addressed well.
except Exceptionnarrowed toexcept ValueError— correctly coversVLLMValidationErrorwithout masking server errors._make_error_responsesafety net is clean — re-raises original ifcreate_error_responsefails.- Embeddings API adapted (attribute rename, callable invocation, Starlette Response parsing).
- New integration tests for embedding and score endpoints fill the coverage gap.
One minor question inline about whether the isinstance(embedding_response, VLLMErrorResponse) branch is still reachable in 0.18. Not blocking.
LGTM pending CI.
Note
This review was co-written with AI assistance (Claude Code).
| yield self._make_error_response(self._oai_serving_embedding, e) | ||
| return | ||
|
|
||
| if isinstance(embedding_response, VLLMErrorResponse): |
There was a problem hiding this comment.
Nit: Is this isinstance(embedding_response, VLLMErrorResponse) branch still reachable in vLLM 0.18? If the callable now raises exceptions for errors (caught by the except ValueError above) rather than returning VLLMErrorResponse, this may be dead code. Fine to leave as a safety net, but worth confirming.
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Description
vLLM 0.18.0 moved validation error handling from
create_completion/create_chat_completioninto api_router.py. Since Ray Serve LLM calls these methods directly (bypassingapi_router.py), validation errors like invalid temperature or too-long prompts now propagate as unhandled exceptions, returning 500 instead of 400.We wrap these calls with try/except in
vllm_engine.py, mirroringapi_router.py's pattern and delegating to vLLM'screate_error_responsefor correct status code mapping. Related files: https://github.com/ray-project/ray/pull/61952/changes#diff-ef22ec52e4a4e7156a9c391f529e5b1fc9f0e06fceb9397b2364094c113fc858.An alternative was adding
VLLMValidationErrorhandling in Ray's middleware, but that would duplicate vLLM's mapping logic and couple a vLLM-agnostic layer to vLLM-specific exceptions.Related issues
Additional information