-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogllmserveRay Serve Related IssueRay Serve Related IssuestabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Description
What happened + What you expected to happen
vLLM recently added an argument tokens_only for both frontend(code) and engine(code), which causes problem when creating the argparse.Namespace object here
Versions / Dependencies
ray-serve nightly, python3.12
Reproduction script
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app
llm_config = LLMConfig(
model_loading_config={
"model_id": "Qwen/Qwen3-VL-235B-A22B-Instruct",
"model_source": "Qwen/Qwen3-VL-235B-A22B-Instruct",
},
deployment_config={
"autoscaling_config": {
"min_replicas": 1,
"max_replicas": 2,
},
},
engine_kwargs={
"tensor_parallel_size": 4,
"max_model_len": 32768
},
runtime_env={"env_vars": {"VLLM_USE_V1": "1"}},
)
app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)Issue Severity
None
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogllmserveRay Serve Related IssueRay Serve Related IssuestabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)