[Bugfix] Honor tool_choice=None / "none" in Chat Completions streaming by FutureSkyFly · Pull Request #44102 · vllm-project/vllm

FutureSkyFly · 2026-05-31T08:46:14Z

Summary

Streaming Chat Completions with tool_choice="none" — or omitted on a no-tools request, where request.tool_choice ends up as None — could still produce delta.tool_calls and finish with finish_reason="tool_calls" whenever the server was launched with a --tool-call-parser and the model output happened to match that parser's tool-call format. Non-streaming Chat Completions already handles both cases correctly.

Root cause

DelegatingParser.parse_delta in vllm/parser/abstract_parser.py invokes _extract_tool_calls_streaming unconditionally once the stream enters the tool-call phase, without inspecting request.tool_choice. The non-streaming path at vllm/entrypoints/openai/chat_completion/serving.py already short-circuits both cases:

elif not request.tool_choice or request.tool_choice == "none":
    message = ChatMessage(role=role, reasoning=reasoning, content=content)

The streaming path was missing the equivalent guard.

Fix

In DelegatingParser.parse_delta, when not request.tool_choice or request.tool_choice == "none", skip _extract_tool_calls_streaming and surface any remaining (post-reasoning) text as plain content. Because the tool parser is never invoked, state.function_name_returned stays untouched and the downstream tools_streamed[i] flag stays False, so finish_reason naturally falls back to "stop". Reasoning extraction is untouched.

Difference from #42752

#42752 was the original attempt at this fix and at the time this PR was opened was OPEN, CONFLICTING with main, last updated 2026-05-23, and only guarded on request.tool_choice == "none". The review feedback on that PR (gemini-code-assist, 2026-05-15) explicitly asked to broaden the check to also cover request.tool_choice is None — the no-tools request case raised by @QwertyJack in the comment thread under #42747.

This PR implements that broader guard so streaming matches the non-streaming not request.tool_choice or request.tool_choice == "none" semantics exactly. Acknowledging @hoobnn for the original direction; happy to close this if #42752 is updated with the same broader guard.

This logic has been independently validated downstream in vllm-project/vllm-ascend#9776 against vLLM v0.20.2.

Duplicate-PR check (per AGENTS.md)

gh issue view 42747 --repo vllm-project/vllm --comments
gh pr list --repo vllm-project/vllm --state open --search "42747 in:body"

[Bugfix] Honor tool_choice="none" in Chat Completions streaming #42752 (@hoobnn, OPEN, stale 2 weeks at PR-open time, CONFLICTING, pre-commit FAILURE): same approach, narrower guard, awaiting author response to broaden.
entrypoints/openai: skip tool parser in streaming when tool_choice="none" #42868 (OPEN): unrelated approach — patches chat_completion/serving.py, not abstract_parser.py; touches a different surface area.

This PR is materially different from #42752 in the guard scope.

Test plan

Added in tests/entrypoints/openai/test_tool_choice_content_none.py:

test_parse_delta_with_tool_choice_none_skips_tool_parser — explicit tool_choice="none": parser is not invoked, raw delta text surfaces as DeltaMessage.content.
test_parse_delta_with_omitted_tool_choice_skips_tool_parser — omitted tool_choice on a no-tools request (request.tool_choice is None): parser is not invoked, raw delta text surfaces as content. This is the additional case beyond [Bugfix] Honor tool_choice="none" in Chat Completions streaming #42752.
test_parse_delta_without_tool_choice_none_still_runs_tool_parser — sanity: tool_choice="auto" still hits the tool parser (no regression).
test_parse_delta_tool_choice_none_multiple_chunks_remain_content — multi-chunk streaming stays in content mode across deltas.

A local pytest run was not possible in the contributing environment (macOS aarch64, no torch available). The new tests rely only on vllm.entrypoints.openai.chat_completion.protocol, vllm.parser.abstract_parser and a stub parser; they will run on CI on this PR.

AI assistance was used (Claude) to draft the patch and the test stub, mirroring the already-reviewed approach in #42752.

Closing note (2026-05-31): @hoobnn has folded the broader guard into #42752 and rebased it onto current main with the same request.tool_choice is None regression coverage. Closing this PR as duplicate of #42752 per its now-equivalent scope. Thanks @hoobnn for picking it up.

Fixes vllm-project#42747 alongside the existing vllm-project#42752 attempt. Streaming Chat Completions with `tool_choice="none"` -- or omitted on a no-tools request, where `request.tool_choice` ends up as `None` -- could still produce `delta.tool_calls` and finish with `finish_reason="tool_calls"` whenever the server was launched with a `--tool-call-parser` and the model output happened to match that parser's tool-call format. `DelegatingParser.parse_delta` in `vllm/parser/abstract_parser.py` invokes `_extract_tool_calls_streaming` unconditionally once the stream enters the tool-call phase, without inspecting `request.tool_choice`. The non-streaming path at `vllm/entrypoints/openai/chat_completion/serving.py` already short-circuits both cases: elif not request.tool_choice or request.tool_choice == "none": message = ChatMessage(role=role, reasoning=reasoning, content=content) The streaming path was missing the equivalent guard. Fix --- In `DelegatingParser.parse_delta`, when `not request.tool_choice or request.tool_choice == "none"`, skip `_extract_tool_calls_streaming` and surface any remaining (post-reasoning) text as plain `content`. Because the tool parser is never invoked, `state.function_name_returned` stays untouched and the downstream `tools_streamed[i]` flag stays `False`, so `finish_reason` naturally falls back to `"stop"`. Reasoning extraction is untouched. Difference from vllm-project#42752 ---------------------- vllm-project#42752 (open, currently CONFLICTING with main) only guards on `request.tool_choice == "none"`. The pending review feedback on that PR (gemini-code-assist, 2026-05-15) explicitly asks to broaden the check to also cover the `request.tool_choice is None` case (no-tools request without an explicit tool_choice). This PR implements that broader guard so the streaming behavior matches the non-streaming `not request.tool_choice or request.tool_choice == "none"` semantics exactly. This logic has been independently validated downstream in vllm-project/vllm-ascend#9776 against vllm v0.20.2. Tests ----- Added in tests/entrypoints/openai/test_tool_choice_content_none.py: - test_parse_delta_with_tool_choice_none_skips_tool_parser -- explicit tool_choice="none": parser is not invoked, raw delta text surfaces as content. - test_parse_delta_with_omitted_tool_choice_skips_tool_parser -- omitted tool_choice on a no-tools request (request.tool_choice is None): parser is not invoked, raw delta text surfaces as content. This is the additional case beyond vllm-project#42752. - test_parse_delta_without_tool_choice_none_still_runs_tool_parser -- sanity: tool_choice="auto" still hits the tool parser (no regression). - test_parse_delta_tool_choice_none_multiple_chunks_remain_content -- multi-chunk streaming stays in content mode across deltas. Note: a local pytest run was not possible in the contributing environment (macos-aarch64, no torch available). The change mirrors the already-reviewed approach of vllm-project#42752, and CI on this PR will exercise the new tests. Signed-off-by: liuchenbing <chenliumail@163.com>

hoobnn · 2026-05-31T09:05:52Z

Thanks @FutureSkyFly for picking this up and for independently validating the broader guard (and the cross-check in vllm-project/vllm-ascend#9776) 🙏

I've folded the broader guard into the original PR #42752: the streaming check now reads not request.tool_choice or request.tool_choice == "none", so it covers both tool_choice="none" and the explicit-null (request.tool_choice is None) case, matching the non-streaming path exactly. The PR is also rebased onto current main (reconciled with the boundary-delta reasoning change from #42691), DCO is signed off, and a dedicated regression test for the request.tool_choice is None path is added.

Since #42752 now covers the same fix, I think this one can be closed as duplicate — but very happy to defer if you'd prefer to drive it. Either way, appreciate the push to broaden the guard.

FutureSkyFly · 2026-05-31T10:07:16Z

Thanks @hoobnn for picking it up and folding the broader guard into #42752 — appreciate the quick turnaround. Closing this as duplicate; #42752 now covers the same scope (not request.tool_choice or request.tool_choice == "none") plus the dedicated request.tool_choice is None regression test, on top of being rebased onto current main with DCO signed off. Cheering #42752 on. 🙏

FutureSkyFly · 2026-05-31T10:07:24Z

Closing as duplicate of #42752 (now updated with the broader guard).

FutureSkyFly requested review from AndreasKaratzas, DarkLight1337, NickLucche, aarnphm, bbrowning, chaunceyjiang, robertgshaw2-redhat and sfeng33 as code owners May 31, 2026 08:46

mergify Bot added tool-calling bug Something isn't working labels May 31, 2026

github-project-automation Bot added this to Tool Calling May 31, 2026

hoobnn mentioned this pull request May 31, 2026

[Bugfix] Honor tool_choice="none" in Chat Completions streaming #42752

Merged

4 tasks

FutureSkyFly closed this May 31, 2026

github-project-automation Bot moved this to Done in Tool Calling May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Honor tool_choice=None / "none" in Chat Completions streaming#44102

[Bugfix] Honor tool_choice=None / "none" in Chat Completions streaming#44102
FutureSkyFly wants to merge 1 commit into
vllm-project:mainfrom
FutureSkyFly:fix/issue-42747-tool-choice-none-streaming-broader

FutureSkyFly commented May 31, 2026 •

edited

Loading

Uh oh!

hoobnn commented May 31, 2026

Uh oh!

FutureSkyFly commented May 31, 2026

Uh oh!

FutureSkyFly commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

FutureSkyFly commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Difference from #42752

Duplicate-PR check (per AGENTS.md)

Test plan

Uh oh!

hoobnn commented May 31, 2026

Uh oh!

FutureSkyFly commented May 31, 2026

Uh oh!

FutureSkyFly commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FutureSkyFly commented May 31, 2026 •

edited

Loading