Skip to content

[serve] TypeError: argparse.Namespace() got multiple values for keyword argument 'tokens_only' #58973

@kevin-bates

Description

@kevin-bates

What happened + What you expected to happen

This issue is masked by the issue fixed in #58820, so (currently) only reproducible using nightly builds. Once #58820 is released, then I suspect it will be more widely reported.

As with #58820, the vLLM release 0.11.1 introduces tokens_only arguments to both FrontendArgs and EngineArgs in this pull request. This code in VLLMEngine.start() gathers arguments from both FrontendArgs and EngineArgs...

args = argparse.Namespace(
**vllm_frontend_args.__dict__,
**vllm_engine_args.__dict__,
)

which then throws the TypeError exception:

(ServeController pid=2517)   File "/opt/micromamba/envs/runtime/lib/python3.12/site-packages/ray/llm/_internal/serve/core/server/llm_server.py", line 147, in __init__
(ServeController pid=2517)     await self.start()
(ServeController pid=2517)   File "/opt/micromamba/envs/runtime/lib/python3.12/site-packages/ray/llm/_internal/serve/core/server/llm_server.py", line 193, in start
(ServeController pid=2517)     await asyncio.wait_for(self._start_engine(), timeout=ENGINE_START_TIMEOUT_S)
(ServeController pid=2517)   File "/opt/micromamba/envs/runtime/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
(ServeController pid=2517)     return await fut
(ServeController pid=2517)            ^^^^^^^^^
(ServeController pid=2517)   File "/opt/micromamba/envs/runtime/lib/python3.12/site-packages/ray/llm/_internal/serve/core/server/llm_server.py", line 239, in _start_engine
(ServeController pid=2517)     await self.engine.start()
(ServeController pid=2517)   File "/opt/micromamba/envs/runtime/lib/python3.12/site-packages/ray/llm/_internal/serve/engines/vllm/vllm_engine.py", line 197, in start
(ServeController pid=2517)     args = argparse.Namespace(
(ServeController pid=2517)            ^^^^^^^^^^^^^^^^^^^
(ServeController pid=2517) TypeError: argparse.Namespace() got multiple values for keyword argument 'tokens_only'

Since it seems reasonable to allow different argument sets to define the same arguments by name, it feels like this should be addressed in ray[serve] rather than vLLM.

The work around is to pin vllm < 0.11.1 (as with #58820).

Versions / Dependencies

ray[llm] nightly build (but will be an issue once #58820 is released)
vllm >= 0.11.1
python == 3.12

Reproduction script

I'm sorry, I don't know how to put this together as this is far below our call stack.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn'tcommunity-backlogllmserveRay Serve Related Issuestability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions