Skip to content

[Serve][LLM] Fix inconsistent v0/v1 config passed to vLLM#52185

Merged
richardliaw merged 2 commits intoray-project:masterfrom
ruisearch42:v0v1
Apr 9, 2025
Merged

[Serve][LLM] Fix inconsistent v0/v1 config passed to vLLM#52185
richardliaw merged 2 commits intoray-project:masterfrom
ruisearch42:v0v1

Conversation

@ruisearch42
Copy link
Copy Markdown
Contributor

Why are these changes needed?

We observed the following error in deploying Ray LLM V0:


2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | return run_method(self, method, args, kwargs)
-- | -- | -- | -- | --
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/utils.py", line 2255, in run_method
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | return func(*args, **kwargs)
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 604, in init_device
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | self.worker.init_device()  # type: ignore
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 120, in init_device
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | self.model_runner: GPUModelRunner = GPUModelRunner(
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 110, in __init__
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | self.attn_backend = get_attn_backend(
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/attention/selector.py", line 95, in get_attn_backend
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | return _cached_get_attn_backend(Show Details
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/attention/selector.py", line 148, in _cached_get_attn_backend
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | attention_cls = current_platform.get_attn_backend_cls(
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/platforms/cuda.py", line 270, in get_attn_backend_cls
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | and kv_cache_dtype.startswith("fp8"))
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | AttributeError: 'torch.dtype' object has no attribute 'startswith'

This is because vLLM has inconsistent logic when handling kv_cache_dtype. For V0, it is a string and for V1, it is a torch.dtype. And when the V0 and V1 configs are messed up, this issue manifests.

Another recent change in vLLM is that it auto tries V1 and use V1 configs if possible. Therefore, when we are not using an environment VLLM_USE_V1, Ray LLM defaults to V0 and vLLM may use V1. This PR fixes the inconsistency by always explicitly passing the environment variable.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
@ruisearch42 ruisearch42 requested a review from a team as a code owner April 9, 2025 19:38
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
@richardliaw richardliaw added the go add ONLY when ready to merge, run all tests label Apr 9, 2025
@richardliaw richardliaw merged commit 788e968 into ray-project:master Apr 9, 2025
6 checks passed
han-steve pushed a commit to han-steve/ray that referenced this pull request Apr 11, 2025
…t#52185)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

We observed the following error in deploying Ray LLM V0:

```

2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | return run_method(self, method, args, kwargs)
-- | -- | -- | -- | --
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/utils.py", line 2255, in run_method
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | return func(*args, **kwargs)
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 604, in init_device
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | self.worker.init_device()  # type: ignore
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 120, in init_device
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | self.model_runner: GPUModelRunner = GPUModelRunner(
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 110, in __init__
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | self.attn_backend = get_attn_backend(
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/attention/selector.py", line 95, in get_attn_backend
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | return _cached_get_attn_backend(Show Details
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/attention/selector.py", line 148, in _cached_get_attn_backend
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | attention_cls = current_platform.get_attn_backend_cls(
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/platforms/cuda.py", line 270, in get_attn_backend_cls
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | and kv_cache_dtype.startswith("fp8"))
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | ^^^^^^^^^^^^^^^^^^^^^^^^^
U | 2025-04-09, 10:50:12.511 | worker |   | ip-10-0-97-109 | AttributeError: 'torch.dtype' object has no attribute 'startswith'

```

This is because vLLM has inconsistent logic when handling
`kv_cache_dtype`. For V0, it is a string and for V1, it is a
torch.dtype. And when the V0 and V1 configs are messed up, this issue
manifests.

Another recent change in vLLM is that it auto tries V1 and use V1
configs if possible. Therefore, when we are not using an environment
VLLM_USE_V1, Ray LLM defaults to V0 and vLLM may use V1. This PR fixes
the inconsistency by always explicitly passing the environment variable.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Signed-off-by: Steve Han <stevehan2001@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants