[Serve] Duplicated arguments for vLLM frontend and engine

### What happened + What you expected to happen

vLLM recently added an argument `tokens_only` for both frontend([code](https://github.com/vllm-project/vllm/blob/cec418b5df3bf032a83b6a6795e8026d39e199bd/vllm/entrypoints/openai/cli_args.py#L192)) and engine([code](https://github.com/vllm-project/vllm/blob/cec418b5df3bf032a83b6a6795e8026d39e199bd/vllm/engine/arg_utils.py#L574)), which causes problem when creating the `argparse.Namespace` object [here](https://github.com/ray-project/ray/blob/d50cb5b63fa3a97bc9a8701f951011bb07e98d69/python/ray/llm/_internal/serve/engines/vllm/vllm_engine.py#L197)

### Versions / Dependencies

ray-serve nightly, python3.12

### Reproduction script

```python
from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app

llm_config = LLMConfig(
    model_loading_config={
        "model_id": "Qwen/Qwen3-VL-235B-A22B-Instruct",
        "model_source": "Qwen/Qwen3-VL-235B-A22B-Instruct",
    },
    deployment_config={
        "autoscaling_config": {
            "min_replicas": 1,
            "max_replicas": 2,
        },
    },
    engine_kwargs={
        "tensor_parallel_size": 4,
        "max_model_len": 32768
    },
    runtime_env={"env_vars": {"VLLM_USE_V1": "1"}},
)

app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)

```

### Issue Severity

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Duplicated arguments for vLLM frontend and engine #58937

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Serve] Duplicated arguments for vLLM frontend and engine #58937

Description

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions