Skip to content

[Bugfix] Honor tool_choice=None / "none" in Chat Completions streaming#44102

Closed
FutureSkyFly wants to merge 1 commit into
vllm-project:mainfrom
FutureSkyFly:fix/issue-42747-tool-choice-none-streaming-broader
Closed

[Bugfix] Honor tool_choice=None / "none" in Chat Completions streaming#44102
FutureSkyFly wants to merge 1 commit into
vllm-project:mainfrom
FutureSkyFly:fix/issue-42747-tool-choice-none-streaming-broader

Conversation

@FutureSkyFly

@FutureSkyFly FutureSkyFly commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #42747.

Streaming Chat Completions with tool_choice="none" — or omitted on a no-tools request, where request.tool_choice ends up as None — could still produce delta.tool_calls and finish with finish_reason="tool_calls" whenever the server was launched with a --tool-call-parser and the model output happened to match that parser's tool-call format. Non-streaming Chat Completions already handles both cases correctly.

Root cause

DelegatingParser.parse_delta in vllm/parser/abstract_parser.py invokes _extract_tool_calls_streaming unconditionally once the stream enters the tool-call phase, without inspecting request.tool_choice. The non-streaming path at vllm/entrypoints/openai/chat_completion/serving.py already short-circuits both cases:

elif not request.tool_choice or request.tool_choice == "none":
    message = ChatMessage(role=role, reasoning=reasoning, content=content)

The streaming path was missing the equivalent guard.

Fix

In DelegatingParser.parse_delta, when not request.tool_choice or request.tool_choice == "none", skip _extract_tool_calls_streaming and surface any remaining (post-reasoning) text as plain content. Because the tool parser is never invoked, state.function_name_returned stays untouched and the downstream tools_streamed[i] flag stays False, so finish_reason naturally falls back to "stop". Reasoning extraction is untouched.

Difference from #42752

#42752 was the original attempt at this fix and at the time this PR was opened was OPEN, CONFLICTING with main, last updated 2026-05-23, and only guarded on request.tool_choice == "none". The review feedback on that PR (gemini-code-assist, 2026-05-15) explicitly asked to broaden the check to also cover request.tool_choice is None — the no-tools request case raised by @QwertyJack in the comment thread under #42747.

This PR implements that broader guard so streaming matches the non-streaming not request.tool_choice or request.tool_choice == "none" semantics exactly. Acknowledging @hoobnn for the original direction; happy to close this if #42752 is updated with the same broader guard.

This logic has been independently validated downstream in vllm-project/vllm-ascend#9776 against vLLM v0.20.2.

Duplicate-PR check (per AGENTS.md)

gh issue view 42747 --repo vllm-project/vllm --comments
gh pr list --repo vllm-project/vllm --state open --search "42747 in:body"

This PR is materially different from #42752 in the guard scope.

Test plan

Added in tests/entrypoints/openai/test_tool_choice_content_none.py:

  • test_parse_delta_with_tool_choice_none_skips_tool_parser — explicit tool_choice="none": parser is not invoked, raw delta text surfaces as DeltaMessage.content.
  • test_parse_delta_with_omitted_tool_choice_skips_tool_parser — omitted tool_choice on a no-tools request (request.tool_choice is None): parser is not invoked, raw delta text surfaces as content. This is the additional case beyond [Bugfix] Honor tool_choice="none" in Chat Completions streaming #42752.
  • test_parse_delta_without_tool_choice_none_still_runs_tool_parser — sanity: tool_choice="auto" still hits the tool parser (no regression).
  • test_parse_delta_tool_choice_none_multiple_chunks_remain_content — multi-chunk streaming stays in content mode across deltas.

A local pytest run was not possible in the contributing environment (macOS aarch64, no torch available). The new tests rely only on vllm.entrypoints.openai.chat_completion.protocol, vllm.parser.abstract_parser and a stub parser; they will run on CI on this PR.

AI assistance was used (Claude) to draft the patch and the test stub, mirroring the already-reviewed approach in #42752.


Closing note (2026-05-31): @hoobnn has folded the broader guard into #42752 and rebased it onto current main with the same request.tool_choice is None regression coverage. Closing this PR as duplicate of #42752 per its now-equivalent scope. Thanks @hoobnn for picking it up.

Fixes vllm-project#42747 alongside the existing vllm-project#42752 attempt.

Streaming Chat Completions with `tool_choice="none"` -- or omitted on a
no-tools request, where `request.tool_choice` ends up as `None` -- could
still produce `delta.tool_calls` and finish with `finish_reason="tool_calls"`
whenever the server was launched with a `--tool-call-parser` and the model
output happened to match that parser's tool-call format.

`DelegatingParser.parse_delta` in `vllm/parser/abstract_parser.py` invokes
`_extract_tool_calls_streaming` unconditionally once the stream enters the
tool-call phase, without inspecting `request.tool_choice`. The non-streaming
path at `vllm/entrypoints/openai/chat_completion/serving.py` already
short-circuits both cases:

    elif not request.tool_choice or request.tool_choice == "none":
        message = ChatMessage(role=role, reasoning=reasoning, content=content)

The streaming path was missing the equivalent guard.

Fix
---
In `DelegatingParser.parse_delta`, when `not request.tool_choice or
request.tool_choice == "none"`, skip `_extract_tool_calls_streaming` and
surface any remaining (post-reasoning) text as plain `content`. Because the
tool parser is never invoked, `state.function_name_returned` stays untouched
and the downstream `tools_streamed[i]` flag stays `False`, so `finish_reason`
naturally falls back to `"stop"`. Reasoning extraction is untouched.

Difference from vllm-project#42752
----------------------
vllm-project#42752 (open, currently CONFLICTING with main) only guards on
`request.tool_choice == "none"`. The pending review feedback on that PR
(gemini-code-assist, 2026-05-15) explicitly asks to broaden the check to
also cover the `request.tool_choice is None` case (no-tools request without
an explicit tool_choice). This PR implements that broader guard so the
streaming behavior matches the non-streaming `not request.tool_choice or
request.tool_choice == "none"` semantics exactly.

This logic has been independently validated downstream in
vllm-project/vllm-ascend#9776 against vllm v0.20.2.

Tests
-----
Added in tests/entrypoints/openai/test_tool_choice_content_none.py:

- test_parse_delta_with_tool_choice_none_skips_tool_parser
  -- explicit tool_choice="none": parser is not invoked, raw delta text
  surfaces as content.

- test_parse_delta_with_omitted_tool_choice_skips_tool_parser
  -- omitted tool_choice on a no-tools request (request.tool_choice is
  None): parser is not invoked, raw delta text surfaces as content. This
  is the additional case beyond vllm-project#42752.

- test_parse_delta_without_tool_choice_none_still_runs_tool_parser
  -- sanity: tool_choice="auto" still hits the tool parser (no regression).

- test_parse_delta_tool_choice_none_multiple_chunks_remain_content
  -- multi-chunk streaming stays in content mode across deltas.

Note: a local pytest run was not possible in the contributing
environment (macos-aarch64, no torch available). The change mirrors the
already-reviewed approach of vllm-project#42752, and CI on this PR will exercise the
new tests.

Signed-off-by: liuchenbing <chenliumail@163.com>
@hoobnn

hoobnn commented May 31, 2026

Copy link
Copy Markdown
Contributor

Thanks @FutureSkyFly for picking this up and for independently validating the broader guard (and the cross-check in vllm-project/vllm-ascend#9776) 🙏

I've folded the broader guard into the original PR #42752: the streaming check now reads not request.tool_choice or request.tool_choice == "none", so it covers both tool_choice="none" and the explicit-null (request.tool_choice is None) case, matching the non-streaming path exactly. The PR is also rebased onto current main (reconciled with the boundary-delta reasoning change from #42691), DCO is signed off, and a dedicated regression test for the request.tool_choice is None path is added.

Since #42752 now covers the same fix, I think this one can be closed as duplicate — but very happy to defer if you'd prefer to drive it. Either way, appreciate the push to broaden the guard.

@FutureSkyFly

Copy link
Copy Markdown
Contributor Author

Thanks @hoobnn for picking it up and folding the broader guard into #42752 — appreciate the quick turnaround. Closing this as duplicate; #42752 now covers the same scope (not request.tool_choice or request.tool_choice == "none") plus the dedicated request.tool_choice is None regression test, on top of being rebased onto current main with DCO signed off. Cheering #42752 on. 🙏

@FutureSkyFly

Copy link
Copy Markdown
Contributor Author

Closing as duplicate of #42752 (now updated with the broader guard).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: Chat Completions streaming invokes tool parser despite tool_choice="none"

3 participants