Skip to content

fix: clear response_format when tool_choice is auto to allow tool calls#39969

Open
he-yufeng wants to merge 1 commit into
vllm-project:mainfrom
he-yufeng:fix/auto-tool-choice-response-format
Open

fix: clear response_format when tool_choice is auto to allow tool calls#39969
he-yufeng wants to merge 1 commit into
vllm-project:mainfrom
he-yufeng:fix/auto-tool-choice-response-format

Conversation

@he-yufeng

Copy link
Copy Markdown
Contributor

Summary

When a request includes both tools and response_format (e.g. json_object) with tool_choice: "auto" (the default), constrained JSON decoding from response_format forces the model to produce JSON content and prevents it from generating tool call tokens. The model returns tool_calls: [] and answers directly as JSON.

This was already fixed for tool_choice: "required" in #32006, but the tool_choice: "auto" case was missed.

Fix

In adjust_request(), when get_json_schema_from_tools returns None (the "auto" case) but tools are present, clear response_format so the model can freely choose between tool calls and text output. The tool parser handles extraction regardless of output format.

Fixes #39929

Changes

1 file changed — vllm/tool_parsers/abstract_tool_parser.py

# Before: response_format only cleared for "required"/"forced function"
if json_schema_from_tool is not None:
    request.response_format = None

# After: also cleared for "auto" when tools are present
if json_schema_from_tool is not None:
    request.response_format = None
elif isinstance(request, ChatCompletionRequest):
    # tool_choice: "auto" -- clear response_format so constrained
    # decoding doesn't prevent the model from generating tool calls.
    request.response_format = None

Test plan

  • Code review: the elif only triggers when json_schema_from_tool is None (i.e. tool_choice="auto") and request.tools is non-empty (checked at line 80)
  • With tool_choice="auto" + response_format=json_object + tools → model can now return tool_calls
  • Without tools, adjust_request returns early (line 80-81), so response_format is preserved
  • tool_choice="required" path unchanged (still goes through the if branch)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the adjust_request method in abstract_tool_parser.py to clear the response_format for ChatCompletionRequest instances, ensuring that constrained decoding does not interfere with tool call generation. A review comment identifies a potential regression where this change would incorrectly clear the response format even when tool_choice is set to 'none', and suggests refining the condition to specifically target 'auto' tool selection.

description="Response format for tool calling",
strict=True,
)
elif isinstance(request, ChatCompletionRequest):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation clears response_format for any ChatCompletionRequest where json_schema_from_tool is None. However, json_schema_from_tool is None for both tool_choice="auto" and tool_choice="none" (as defined in get_json_schema_from_tools in vllm/tool_parsers/utils.py).

If a user provides tools but explicitly sets tool_choice="none" while also requesting a specific response_format (e.g., json_object), this change will incorrectly clear their response_format. This results in a regression where the model's output is no longer constrained to JSON even though tool calling is disabled. The condition should explicitly check for tool_choice == "auto".

Suggested change
elif isinstance(request, ChatCompletionRequest):
elif isinstance(request, ChatCompletionRequest) and request.tool_choice == "auto":

description="Response format for tool calling",
strict=True,
)
elif isinstance(request, ChatCompletionRequest):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can be more explicit here

elif isinstance(request, ChatCompletionRequest) and request.tool_choice in ("auto", None):

@sfeng33

sfeng33 commented Apr 16, 2026

Copy link
Copy Markdown
Collaborator

Can you please fix DCO?

description="Response format for tool calling",
strict=True,
)
elif isinstance(request, ChatCompletionRequest):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an E2E test is needed.

@he-yufeng he-yufeng force-pushed the fix/auto-tool-choice-response-format branch from 6340873 to 8cef40c Compare April 22, 2026 12:43
@he-yufeng he-yufeng requested a review from bbrowning as a code owner April 22, 2026 12:43
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Thanks for the reviews! Addressed:

  • @sfeng33 — tightened the condition to tool_choice in ("auto", None) so tool_choice="none" (caller explicitly opted out of tools) keeps response_format intact. DCO fixed via sign-off.
  • @chaunceyjiang — added unit tests covering all four cases in tests/tool_parsers/test_openai_tool_parser.py:
    • tool_choice="auto" + response_format → cleared
    • tool_choice unset + tools → cleared (auto is the default)
    • tool_choice="none" + response_format → preserved
    • no tools + response_format → preserved

Since adjust_request lives in the base ToolParser class, the OpenAIToolParser exercises it directly. Let me know if you'd like the same check added as a protocol-level test in tests/tool_use/ — happy to move it.

@he-yufeng

Copy link
Copy Markdown
Contributor Author

Hi, checking in on this after the latest update. I addressed the DCO issue and the review feedback by preserving response_format for tool_choice="none", then added the focused parser tests for the covered cases. GitHub currently shows DCO, pre-commit, RTD, and the summary check passing. Happy to adjust further if you prefer a protocol-level test instead.

@sfeng33

sfeng33 commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the work! The change LGTM, deferring to @chaunceyjiang for feedback on the test coverage.

alexeldeib added a commit to alexeldeib/vllm that referenced this pull request May 31, 2026
ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by
VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing
StructuredOutputsParams via in-place attribute assignment and returns without
nulling response_format. The in-place set bypasses
StructuredOutputsParams.__post_init__, so the params keep a prior
mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one
lowered from response_format) next to the new structural_tag. On the next
re-validation this trips the one-constraint invariant, so a strict-mode request
that also carries a structured-output constraint or a response_format fails with:

    ValueError: You can only use one kind of structured outputs constraint
    but multiple are specified

This affects any parser that installs a structural tag -- currently DeepSeek-V4
and Qwen3-Coder via get_structural_tag. The env var is off by default, and a
request with no pre-existing constraint is unaffected.

Fix: rebuild structured_outputs with only the structural tag (preserving the
whitespace / additional-properties knobs) and null response_format, mirroring
Step 2 of the same method. This "tool constraint wins, response_format dropped"
resolution already exists in Step 2, the DeepSeek-V3.2 override (vllm-project#41178), and for
required/auto in vllm-project#32006 / vllm-project#39969; the in-place-vs-rebuild trade-off was discussed
on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds).

Repro / regression test (CPU, no model required):

    pytest tests/tool_use/test_strict_tool_calling_adjust_request.py

The added tests enable strict mode, give a parser a structural tag, and send
tools together with a response_format or a structured_outputs.json constraint
(tool_choice auto and required). On the pre-fix code adjust_request leaves two
constraints, and to_sampling_params raises the ValueError above; with this change
structured_outputs holds only the structural tag, response_format is None, and
the user's whitespace knobs are preserved. The conflict tests fail without this
patch and pass with it; the no-pre-existing-constraint case passes either way.

Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that
also sets response_format returns HTTP 400 (the error above) before this change
and a normal tool call after; a required-tool request is unaffected because that
path already rebuilds.

Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
alexeldeib added a commit to alexeldeib/vllm that referenced this pull request May 31, 2026
ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by
VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing
StructuredOutputsParams via in-place attribute assignment and returns without
nulling response_format. The in-place set bypasses
StructuredOutputsParams.__post_init__, so the params keep a prior
mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one
lowered from response_format) next to the new structural_tag. On the next
re-validation this trips the one-constraint invariant, so a strict-mode request
that also carries a structured-output constraint or a response_format fails with:

    ValueError: You can only use one kind of structured outputs constraint
    but multiple are specified

This affects any parser that installs a structural tag -- currently DeepSeek-V4
and Qwen3-Coder via get_structural_tag. The env var is off by default, and a
request with no pre-existing constraint is unaffected.

Fix: rebuild structured_outputs with only the structural tag (preserving the
whitespace / additional-properties knobs) and null response_format, mirroring
Step 2 of the same method. This "tool constraint wins, response_format dropped"
resolution already exists in Step 2 and the DeepSeek-V3.2 override (vllm-project#41178), and
is the intent of the open auto-path fix vllm-project#39969; the in-place-vs-rebuild trade-off
was discussed on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds).

Repro / regression test (CPU, no model required):

    pytest tests/tool_use/test_strict_tool_calling_adjust_request.py

The added tests enable strict mode, give a parser a structural tag, and send
tools together with a response_format or a structured_outputs.json constraint
(tool_choice auto and required). On the pre-fix code adjust_request leaves two
constraints, and to_sampling_params raises the ValueError above; with this change
structured_outputs holds only the structural tag, response_format is None, and
the user's whitespace knobs are preserved. The conflict tests fail without this
patch and pass with it; the no-pre-existing-constraint case passes either way.

Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that
also sets response_format returns HTTP 400 (the error above) before this change
and a normal tool call after; a required-tool request is unaffected because that
path already rebuilds.

Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
alexeldeib added a commit to alexeldeib/vllm that referenced this pull request Jun 10, 2026
ToolParser.adjust_request's strict structural-tag path (added in vllm-project#40894, gated by
VLLM_ENFORCE_STRICT_TOOL_CALLING) installs structural_tag on a pre-existing
StructuredOutputsParams via in-place attribute assignment and returns without
nulling response_format. The in-place set bypasses
StructuredOutputsParams.__post_init__, so the params keep a prior
mutually-exclusive constraint (json/regex/choice/grammar/json_object, or one
lowered from response_format) next to the new structural_tag. On the next
re-validation this trips the one-constraint invariant, so a strict-mode request
that also carries a structured-output constraint or a response_format fails with:

    ValueError: You can only use one kind of structured outputs constraint
    but multiple are specified

This affects any parser that installs a structural tag -- currently DeepSeek-V4
and Qwen3-Coder via get_structural_tag. The env var is off by default, and a
request with no pre-existing constraint is unaffected.

Fix: rebuild structured_outputs with only the structural tag (preserving the
whitespace / additional-properties knobs) and null response_format, mirroring
Step 2 of the same method. This "tool constraint wins, response_format dropped"
resolution already exists in Step 2 and the DeepSeek-V3.2 override (vllm-project#41178), and
is the intent of the open auto-path fix vllm-project#39969; the in-place-vs-rebuild trade-off
was discussed on vllm-project#40894 and vllm-project#43155 (whose Kimi path already rebuilds).

Repro / regression test (CPU, no model required):

    pytest tests/tool_use/test_strict_tool_calling_adjust_request.py

The added tests enable strict mode, give a parser a structural tag, and send
tools together with a response_format or a structured_outputs.json constraint
(tool_choice auto and required). On the pre-fix code adjust_request leaves two
constraints, and to_sampling_params raises the ValueError above; with this change
structured_outputs holds only the structural tag, response_format is None, and
the user's whitespace knobs are preserved. The conflict tests fail without this
patch and pass with it; the no-pre-existing-constraint case passes either way.

Equivalently over HTTP: with strict mode on, a tool_choice="auto" request that
also sets response_format returns HTTP 400 (the error above) before this change
and a normal tool call after; a required-tool request is unaffected because that
path already rebuilds.

Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
@mergify

mergify Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @universeplayer.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 11, 2026
When both tools and response_format are set with tool_choice=auto,
constrained JSON decoding prevents the model from generating tool
call tokens. Already fixed for required in vllm-project#32006 but auto was missed.
tool_choice=none is deliberately left untouched because the caller
explicitly opted out of tool calls.

Fixes vllm-project#39929

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
@he-yufeng he-yufeng force-pushed the fix/auto-tool-choice-response-format branch from 8cef40c to db95113 Compare June 11, 2026 23:07
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Rebased this on the latest main and resolved the test-path conflict by keeping the current upstream layout:

  • kept the ToolParser.adjust_request change against the current abstract_tool_parser.py
  • moved the regression coverage from the removed tests/tool_parsers/test_openai_tool_parser.py path into a small unit test under tests/tool_use/test_tool_parser_adjust_request.py

Local checks run:

  • python -m py_compile vllm\tool_parsers\abstract_tool_parser.py tests\tool_use\test_tool_parser_adjust_request.py
  • ruff check vllm\tool_parsers\abstract_tool_parser.py tests\tool_use\test_tool_parser_adjust_request.py
  • ruff format --check vllm\tool_parsers\abstract_tool_parser.py tests\tool_use\test_tool_parser_adjust_request.py
  • SKIP=update-dockerfile-graph pre-commit run --files vllm\tool_parsers\abstract_tool_parser.py tests\tool_use\test_tool_parser_adjust_request.py
  • git diff --check upstream/main..HEAD

python -m pytest tests\tool_use\test_tool_parser_adjust_request.py -q is blocked in my local Windows base environment before collecting this test by an existing transformers -> kernels import failure: ValueError: Either a revision or a version must be specified. The non-importing checks above pass, and CI should run the test in the project environment.

@mergify mergify Bot removed the needs-rebase label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Bug]: response_format suppresses tool calls when tool_choice: "auto" — constrained decoding prevents tool generation

3 participants