Skip to content

[Tool] adjust_request to reasoning parser, and Gemma4 fixes#39027

Merged
aarnphm merged 4 commits into
vllm-project:mainfrom
bbrowning:gemma4-multi-turn-fixes
Apr 8, 2026
Merged

[Tool] adjust_request to reasoning parser, and Gemma4 fixes#39027
aarnphm merged 4 commits into
vllm-project:mainfrom
bbrowning:gemma4-multi-turn-fixes

Conversation

@bbrowning

@bbrowning bbrowning commented Apr 5, 2026

Copy link
Copy Markdown
Collaborator

Purpose

Fix multiple issues preventing Gemma4 models from working correctly
with multi-turn tool calling and reasoning in vLLM:

  • Add new Gemma4 chat template that properly encodes tool results using the model's native format, handles multi-turn conversations with interleaved tool calls and reasoning, and strips thinking content from prior assistant turns
  • Add adjust_request() to ReasoningParser base class (mirroring ToolParser) so reasoning parsers can modify request parameters before generation, used by Gemma4 to set skip_special_tokens=False
  • Fix reasoning parser to extract non-streaming thinking content and handle the "thought\n" prefix correctly in streaming
  • Fix pre-existing mypy error in ReasoningParserManager.register_module
  • Add unit tests for reasoning parser and chat template rendering
  • Fix empty "user" turns created when handling tool outputs by our Messages API to Chat Completions translation
  • is_reasoning_end clean ups for the Gemma 4 reasoning parser
    • don't assuming reasoning has ended when we scan prompts backwards across user turn boundaries or after tool responses
    • explicitly mark reasoning as ended when we start generating tool calls

The net result of these fixes shows larger Gemma4 models are very competitive at multi-turn tool calling for their size. I won't share any specific numbers here, but all of these fixes were guided by both direct inspection of prompting and multi-turn behavior and some simple quantitative eval with the BFCL multi_turn suite.

You'll need to both enable thinking and select the correct chat template when testing Gemma 4 models with these fixes:

vllm serve google/gemma-4-31B-it \
  --tool-call-parser gemma4 \
  --enable-auto-tool-choice \
  --reasoning-parser gemma4 \
  --default-chat-template-kwargs '{"enable_thinking": true}' \
  --chat-template examples/tool_chat_template_gemma4.jinja

Test Plan

BFCL multi_turn suite to uncover bugs and validate fixes

(expand for BFCL clone, setup, adding models)
git clone https://github.com/ShishirPatil/gorilla

cd gorilla/berkeley-function-call-leaderboard/

uv venv --python 3.12 --seed

source .venv/bin/activate

uv pip install -e .

cat <<EOF >> bfcl_eval/constants/model_config.py
    "google/gemma-4-E2B-it": ModelConfig(
        model_name="google/gemma-4-E2B-it",
        display_name="google/gemma-4-E2B-it (FC) (vLLM)",
        url="https://huggingface.co/google/gemma-4-E2B-it",
        org="Google",
        license="apache-2.0",
        model_handler=OpenAICompletionsHandler,
        input_price=None,
        output_price=None,
        is_fc_model=True,
        underscore_to_dot=True,
    ),
    "google/gemma-4-26B-A4B-it": ModelConfig(
        model_name="google/gemma-4-26B-A4B-it",
        display_name="google/gemma-4-26B-A4B-it (FC) (vLLM)",
        url="https://huggingface.co/google/gemma-4-26B-A4B-it",
        org="Google",
        license="apache-2.0",
        model_handler=OpenAICompletionsHandler,
        input_price=None,
        output_price=None,
        is_fc_model=True,
        underscore_to_dot=True,
    ),
    "google/gemma-4-31B-it": ModelConfig(
        model_name="google/gemma-4-31B-it",
        display_name="google/gemma-4-31B-it (FC) (vLLM)",
        url="https://huggingface.co/google/gemma-4-31B-it",
        org="Google",
        license="apache-2.0",
        model_handler=OpenAICompletionsHandler,
        input_price=None,
        output_price=None,
        is_fc_model=True,
        underscore_to_dot=True,
    ),
}
EOF

Run BFCL multi_turn eval suite:

OPENAI_BASE_URL="http://localhost:8000/v1" \
OPENAI_API_KEY="fake" \
bfcl generate \
  --model google/gemma-4-31B-it \
  --num-threads 4 \
  --allow-overwrite \
  --test-category multi_turn

OPENAI_API_KEY="fake" \
bfcl evaluate

Unit Tests

# Note: this test has a pre-existing dependency on transformers 5.x
# `pip install --upgrade transformers`
pytest tests/reasoning/test_gemma4_reasoning_parser.py

pytest tests/renderers/test_gemma4_chat_template.py

# Run all reasoning parser tests, since we added `adjust_request`
# skip the ones that CI skips because they already fail
# and skip step3p5 because it requires trusting remote code 
pytest tests/reasoning \
  --ignore=tests/reasoning/test_seedoss_reasoning_parser.py \
  --ignore=tests/reasoning/test_glm4_moe_reasoning_parser.py \
  --ignore=tests/reasoning/test_step3p5_reasoning_parser.py

Claude Code pointed at a Gemma 4 model running locally

CLAUDE_CODE_USE_VERTEX=0 \
ANTHROPIC_BASE_URL="http://localhost:8000" \
ANTHROPIC_DEFAULT_OPUS_MODEL="google/gemma-4-31B-it" \
ANTHROPIC_DEFAULT_SONNET_MODEL="google/gemma-4-31B-it" \
ANTHROPIC_DEFAULT_HAIKU_MODEL="google/gemma-4-31B-it" \
ANTHROPIC_AUTH_TOKEN="dummy" \
claude \
  --model sonnet

Test Result

Unit Tests

pytest tests/reasoning/test_gemma4_reasoning_parser.py

29 passed, 2 warnings in 3.24s

pytest tests/renderers/test_gemma4_chat_template.py

14 passed, 2 warnings in 0.98s

tests/reasoning

318 passed, 5 warnings in 40.41s

BFCL Results

I have BFCL results and they are far better after this change than before. I'm not sure it's my place to share those publicly here, but the results for the larger Gemma4 models (MoE and Dense) are very good for models of their size.

Claude Code usability

I was able to execute multiple complex refactoring and new code generation sessions in existing codebases with both Gemma-4-31B and Gemma-4-26B-A4B. After the latest fixes here, I'm not seeing any unparsed tool calls nor any leaked reasoning content into the session.

@mergify

mergify Bot commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

Documentation preview: https://vllm--39027.org.readthedocs.build/en/39027/

@mergify mergify Bot added documentation Improvements or additions to documentation frontend tool-calling labels Apr 5, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jinja chat template for Gemma 4, along with infrastructure to support custom Jinja filters and normalized tool responses. However, several critical issues were identified in the implementation. Specifically, the hardcoding of skip_special_tokens = False in the OpenAI serving layer is a global change that overrides user intent for all models. Additionally, there is debug logging to a hardcoded local file (gemma_turns.log) which is unsuitable for production. The use of global monkey-patching on jinja2.sandbox.ImmutableSandboxedEnvironment is also discouraged as it creates dangerous side effects across the entire application; a more localized injection of filters is preferred. Finally, it is recommended to log exceptions during Jinja filter patching rather than swallowing them silently.

Comment thread vllm/entrypoints/openai/chat_completion/serving.py Outdated
Comment thread vllm/entrypoints/openai/chat_completion/serving.py Outdated
Comment thread vllm/entrypoints/openai/chat_completion/serving.py Outdated
Comment thread vllm/env_override.py Outdated
Comment thread vllm/transformers_utils/chat_template_json_filters.py Outdated
@ywang96

ywang96 commented Apr 6, 2026

Copy link
Copy Markdown
Member

FYI #38858 (comment)

@bbrowning

Copy link
Copy Markdown
Collaborator Author

FYI #38858 (comment)

Thanks, I commented there. We'll need to handle parsing reasoning content properly within vLLM vs asking the user to set skip_special_tokens=False on the request, as the user knob is enable_thinking=True and then it's the server's job to parse reasoning out of that when the server was configured to use the gemma4 reasoning parser.

@bbrowning bbrowning force-pushed the gemma4-multi-turn-fixes branch from 039bf4e to 21518bb Compare April 6, 2026 13:20
@bbrowning

Copy link
Copy Markdown
Collaborator Author

@lucianommartins The draft gemma4 chat template added here at examples/tool_chat_template_gemma4.jinja changes a number of things that we might want to get into the model's default chat template overall, so that all inference servers can benefit. I know some of these are already on your radar, but things like handling the reasoning content in multi-turn scenarios I don't know if you've considered yet.

Also, optimally we would coerce tool outputs that are JSON strings into actual JSON objects. I need to validate how much of a % difference this makes in overall multi-turn eval results, as my best results so far were based on turning JSON strings into JSON objects before the tool output reaches the chat template so that Gemini sees the tool response as an actual JSON object with string token delimiters and such instead of as a single encoded string. The previous draft of this did some hacks to get this happening in the chat template via external python helper functions, but I'd like to drop those hacks if I can verify the model results are substantially identical even if we just give it a JSON string as one string blob for tool call outputs.

@sfeng33 @chaunceyjiang I'd like your eyes on this for the adjustments to wire in an adjust_request for reasoning parsers in case you have another preferred way to do this. So many reasoning parser unit tests fail today because we don't CI them that it's a bit hard to get a good signal here, but I'll do more testing on my end as well.

I'll be pulling this out of draft soon once I clean up a few more loose ends.

@lucianommartins

Copy link
Copy Markdown
Contributor

thanks @bbrowning - I will update my transformers PR with the nice additions from your jinja.

@bbrowning

Copy link
Copy Markdown
Collaborator Author

@lucianommartins opened #39081 to focus just on handling the stripping of special tokens during reasoning. Once that gets in a good place and merges, I'll remove the equivalent-ish code from this PR as here I don't handle it for the offline inference case.

@drrros

drrros commented Apr 6, 2026

Copy link
Copy Markdown

with patch applied it worked better, about 10 tool calls went fine, but at some point it happens again:

                                                                                                                                                                                                                                                                                                                                                                                                                                            
FAILED (failures=7, errors=31)                                                                                                                                                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                                                                                                            

  Read 3 files, listed 1 directory (ctrl+o to expand)

● <|channel>thought
  I will remove the invalid @patch decorators from tests/rest_api_backend/test_cc_write_protection.py and then run the tests to see the effect.<channel|>

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● ---

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● Update(tests/rest_api_backend/test_cc_write_protection.py)      
  ⎿  Error editing file
  ⎿  Interrupted · What should Claude do instead?                                                                                                                                                                                                                                                                                                                                                                                           

It failed to apply file changes after tokens leakage for some reason, had to stop it after some time.

@bbrowning

Copy link
Copy Markdown
Collaborator Author

@drrros That sounds like we still have some edge cases to sort out in streaming reasoning parsing or perhaps our Messages API implementation. Let's debug that more specifically in #39043 since that focuses on making sure this works great for Claude Code specific tool calls.

@bbrowning

Copy link
Copy Markdown
Collaborator Author

Adding a few more commits to my staging area here. With these additional fixes plus #38909 and #39114 Gemma4 multi-turn reasoning and tool calling performance is getting to a really good place.

@bbrowning

Copy link
Copy Markdown
Collaborator Author

Note this builds on top of changes from 39114, and the pre-commit failures are because of that.

@mergify

mergify Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bbrowning.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 8, 2026
@bbrowning bbrowning force-pushed the gemma4-multi-turn-fixes branch from 928a3ad to 8bdf630 Compare April 8, 2026 10:50
@mergify mergify Bot removed the needs-rebase label Apr 8, 2026
@bbrowning

Copy link
Copy Markdown
Collaborator Author

Just waiting on #39114 to land to get the pre-commit fixed and pull this out of draft.

This is a slightly simpler approach to wiring in skip_special_tokens for the Gemma 4 reasoning parser than #39081. I don't have a strong preference, but this has to happen for the reasoning parser to work out of the box. If 39081 merges first I'll rebase this to remove my version of that. Otherwise, if this merges 39081 can either be closed or adjusted to tweak the logic I added here.

We'll also want to update the Gemma 4 Usage Guide in the recipes repo pointing to our new chat template. There is huggingface/transformers#45257 to get this updated by default in the model's official chat template, but it's not clear if/when that will merge. I've confirmed that all the logic is the same between what that transformers PR renders and what we have here.

bbrowning and others added 2 commits April 8, 2026 13:18
Fix multiple issues preventing Gemma4 models from working correctly
with multi-turn tool calling and reasoning in vLLM:

- Add new Gemma4 chat template that properly encodes tool results
  using the model's native format, handles multi-turn conversations
  with interleaved tool calls and reasoning, and strips thinking
  content from prior assistant turns
- Add adjust_request() to ReasoningParser base class (mirroring
  ToolParser) so reasoning parsers can modify request parameters
  before generation, used by Gemma4 to set skip_special_tokens=False
- Fix reasoning parser to extract non-streaming thinking content
  and handle the "thought\n" prefix correctly in streaming
- Fix pre-existing mypy error in ReasoningParserManager.register_module
- Add unit tests for reasoning parser and chat template rendering

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
When translating from Anthropic Messages API to Chat Completions, we
were inserting entirely empty "user" turns for tool call outputs, as
those come in via the user in the Messages API but get turned into a
tool role in the Chat Completions API. These empty user role turs make
their way into some chat templates, and for example led to issues in
long multi-turn scenarios when testing with Gemma 4 models.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
douhashi added a commit to douhashi/runpod-vllm-gemma that referenced this pull request Apr 9, 2026
Current vllm/vllm-openai:gemma4 image does not support this flag.
Thinking disable will be possible after image update with
--default-chat-template-kwargs from vllm-project/vllm#39027.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
khluu pushed a commit that referenced this pull request Apr 10, 2026
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit 8477fe4)
khluu pushed a commit that referenced this pull request Apr 16, 2026
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit 8477fe4)
(cherry picked from commit 0d784f4)
greg1232 pushed a commit to supermassive-intelligence/vllm-fork that referenced this pull request Apr 22, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@guidryheal-create

guidryheal-create commented May 10, 2026

Copy link
Copy Markdown

i fin-tune gemma using unsloth, then saved it with all it need. i m pretty sure everything fine about it but,
i got the issue with camel and other tool as well.

when i query it from api the tool call look kinda correct so :

docker run --runtime nvidia --gpus all -v $(pwd):/template   -v ~/.cache/huggingface:/root/.cache/huggingface   --env "HF_TOKEN=$HF_TOKEN"   -p 8666:8000   --ipc=host   vllm/vllm-openai:latest   --model __mymodel__/gemma-4   --gpu-memory-utilization 0.85   --max-model-len 131072 --enable-auto-tool-choice --quantization fp8 --kv-cache-dtype fp8 --safetensors-load-strategy lazy --reasoning-parser gemma4 --tool-call-parser gemma4 --chat-template /template/chat_template.jinja 

with chat template from source (it work way less when i use VLLM exemple) :

{%- macro format_parameters(properties, required, filter_keys=false) -%}
    {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
    {%- set ns = namespace(found_first=false) -%}
    {%- for key, value in properties | dictsort -%}
        {%- set add_comma = false -%}
        {%- if not filter_keys or key not in standard_keys -%}
            {%- if ns.found_first %},{% endif -%}
            {%- set ns.found_first = true -%}
            {{ key }}:{
            {%- if value['description'] -%}
                description:<|"|>{{ value['description'] }}<|"|>
                {%- set add_comma = true -%}
            {%- endif -%}
            {%- if value['type'] | upper == 'STRING' -%}
                {%- if value['enum'] -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    enum:{{ format_argument(value['enum']) }}
                {%- endif -%}
            {%- elif value['type'] | upper == 'ARRAY' -%}
                {%- if value['items'] is mapping and value['items'] -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    items:{
                    {%- set ns_items = namespace(found_first=false) -%}
                    {%- for item_key, item_value in value['items'] | dictsort -%}
                        {%- if item_value is not none -%}
                            {%- if ns_items.found_first %},{% endif -%}
                            {%- set ns_items.found_first = true -%}
                            {%- if item_key == 'properties' -%}
                                properties:{
                                {%- if item_value is mapping -%}
                                    {{- format_parameters(item_value, value['items']['required'] | default([])) -}}
                                {%- endif -%}
                                }
                            {%- elif item_key == 'required' -%}
                                required:[
                                {%- for req_item in item_value -%}
                                    <|"|>{{- req_item -}}<|"|>
                                    {%- if not loop.last %},{% endif -%}
                                {%- endfor -%}
                                ]
                            {%- elif item_key == 'type' -%}
                                {%- if item_value is string -%}
                                    type:{{ format_argument(item_value | upper) }}
                                {%- else -%}
                                    type:{{ format_argument(item_value | map('upper') | list) }}
                                {%- endif -%}
                            {%- else -%}
                                {{ item_key }}:{{ format_argument(item_value) }}
                            {%- endif -%}
                        {%- endif -%}
                    {%- endfor -%}
                    }
                {%- endif -%}
            {%- endif -%}
            {%- if value['nullable'] %}
                {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                nullable:true
            {%- endif -%}
            {%- if value['type'] | upper == 'OBJECT' -%}
                {%- if value['properties'] is defined and value['properties'] is mapping -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    properties:{
                    {{- format_parameters(value['properties'], value['required'] | default([])) -}}
                    }
                {%- elif value is mapping -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    properties:{
                    {{- format_parameters(value, value['required'] | default([]), filter_keys=true) -}}
                    }
                {%- endif -%}
                {%- if value['required'] -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    required:[
                    {%- for item in value['required'] | default([]) -%}
                        <|"|>{{- item -}}<|"|>
                        {%- if not loop.last %},{% endif -%}
                    {%- endfor -%}
                    ]
                {%- endif -%}
            {%- endif -%}
            {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
            type:<|"|>{{ value['type'] | upper }}<|"|>}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}
{%- macro format_function_declaration(tool_data) -%}
    declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|>
    {%- set params = tool_data['function']['parameters'] -%}
    {%- if params -%}
        ,parameters:{
        {%- if params['properties'] -%}
            properties:{ {{- format_parameters(params['properties'], params['required']) -}} },
        {%- endif -%}
        {%- if params['required'] -%}
            required:[
            {%- for item in params['required'] -%}
                <|"|>{{- item -}}<|"|>
                {{- ',' if not loop.last -}}
            {%- endfor -%}
            ],
        {%- endif -%}
        {%- if params['type'] -%}
            type:<|"|>{{- params['type'] | upper -}}<|"|>}
        {%- endif -%}
    {%- endif -%}
    {%- if 'response' in tool_data['function'] -%}
        {%- set response_declaration = tool_data['function']['response'] -%}
        ,response:{
        {%- if response_declaration['description'] -%}
            description:<|"|>{{- response_declaration['description'] -}}<|"|>,
        {%- endif -%}
        {%- if response_declaration['type'] | upper == 'OBJECT' -%}
            type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>}
        {%- endif -%}
    {%- endif -%}
    }
{%- endmacro -%}
{%- macro format_argument(argument, escape_keys=True) -%}
    {%- if argument is string -%}
        {{- '<|"|>' + argument + '<|"|>' -}}
    {%- elif argument is boolean -%}
        {{- 'true' if argument else 'false' -}}
    {%- elif argument is mapping -%}
        {{- '{' -}}
        {%- set ns = namespace(found_first=false) -%}
        {%- for key, value in argument | dictsort -%}
            {%- if ns.found_first %},{% endif -%}
            {%- set ns.found_first = true -%}
            {%- if escape_keys -%}
                {{- '<|"|>' + key + '<|"|>' -}}
            {%- else -%}
                {{- key -}}
            {%- endif -%}
            :{{- format_argument(value, escape_keys=escape_keys) -}}
        {%- endfor -%}
        {{- '}' -}}
    {%- elif argument is sequence -%}
        {{- '[' -}}
        {%- for item in argument -%}
            {{- format_argument(item, escape_keys=escape_keys) -}}
            {%- if not loop.last %},{% endif -%}
        {%- endfor -%}
        {{- ']' -}}
    {%- else -%}
        {{- argument -}}
    {%- endif -%}
{%- endmacro -%}
{%- macro strip_thinking(text) -%}
    {%- set ns = namespace(result='') -%}
    {%- for part in text.split('<channel|>') -%}
        {%- if '<|channel>' in part -%}
            {%- set ns.result = ns.result + part.split('<|channel>')[0] -%}
        {%- else -%}
            {%- set ns.result = ns.result + part -%}
        {%- endif -%}
    {%- endfor -%}
    {{- ns.result | trim -}}
{%- endmacro -%}

{%- macro format_tool_response_block(tool_name, response) -%}
    {{- '<|tool_response>' -}}
    {%- if response is mapping -%}
        {{- 'response:' + tool_name + '{' -}}
        {%- for key, value in response | dictsort -%}
            {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
            {%- if not loop.last %},{% endif -%}
        {%- endfor -%}
        {{- '}' -}}
    {%- else -%}
        {{- 'response:' + tool_name + '{value:' + format_argument(response, escape_keys=False) + '}' -}}
    {%- endif -%}
    {{- '<tool_response|>' -}}
{%- endmacro -%}

{%- set ns = namespace(prev_message_type=None) -%}
{%- set loop_messages = messages -%}
{{- bos_token -}}
{#- Handle System/Tool Definitions Block -#}
{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
    {{- '<|turn>system\n' -}}
    {#- Inject Thinking token at the very top of the FIRST system turn -#}
    {%- if enable_thinking is defined and enable_thinking -%}
        {{- '<|think|>\n' -}}
        {%- set ns.prev_message_type = 'think' -%}
    {%- endif -%}
    {%- if messages[0]['role'] in ['system', 'developer'] -%}
        {%- if messages[0]['content'] is string -%}
            {{- messages[0]['content'] | trim -}}
        {%- elif messages[0]['content'] is sequence -%}
            {%- for item in messages[0]['content'] -%}
                {{- item['text'] | trim + ' '-}}
            {%- endfor -%}
        {%- endif -%}
        {%- set loop_messages = messages[1:] -%}
    {%- endif -%}
    {%- if tools -%}
        {%- for tool in tools %}
            {{- '<|tool>' -}}
            {{- format_function_declaration(tool) | trim -}}
            {{- '<tool|>' -}}
        {%- endfor %}
        {%- set ns.prev_message_type = 'tool' -%}
    {%- endif -%}
    {{- '<turn|>\n' -}}
{%- endif %}

{#- Pre-scan: find last user message index for reasoning guard -#}
{%- set ns_turn = namespace(last_user_idx=-1) -%}
{%- for i in range(loop_messages | length) -%}
    {%- if loop_messages[i]['role'] == 'user' -%}
        {%- set ns_turn.last_user_idx = i -%}
    {%- endif -%}
{%- endfor -%}

{#- Loop through messages -#}
{%- for message in loop_messages -%}
    {%- if message['role'] != 'tool' -%}
    {%- set ns.prev_message_type = None -%}
    {%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
    {#- Detect continuation: suppress duplicate <|turn>model when previous non-tool message was also assistant -#}
    {%- set prev_nt = namespace(role=None, found=false) -%}
    {%- if loop.index0 > 0 -%}
        {%- for j in range(loop.index0 - 1, -1, -1) -%}
            {%- if not prev_nt.found -%}
                {%- if loop_messages[j]['role'] != 'tool' -%}
                    {%- set prev_nt.role = loop_messages[j]['role'] -%}
                    {%- set prev_nt.found = true -%}
                {%- endif -%}
            {%- endif -%}
        {%- endfor -%}
    {%- endif -%}
    {%- set continue_same_model_turn = (role == 'model' and prev_nt.role == 'assistant') -%}
    {%- if not continue_same_model_turn -%}
        {{- '<|turn>' + role + '\n' }}
    {%- endif -%}

    {#- Render reasoning/reasoning_content as thinking channel -#}
    {%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%}
    {%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%}
        {{- '<|channel>thought\n' + thinking_text + '\n<channel|>' -}}
    {%- endif -%}

            {%- if message['tool_calls'] -%}
                {%- for tool_call in message['tool_calls'] -%}
                    {%- set function = tool_call['function'] -%}
                    {{- '<|tool_call>call:' + function['name'] + '{' -}}
                    {%- if function['arguments'] is mapping -%}
                        {%- set ns_args = namespace(found_first=false) -%}
                        {%- for key, value in function['arguments'] | dictsort -%}
                            {%- if ns_args.found_first %},{% endif -%}
                            {%- set ns_args.found_first = true -%}
                            {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
                        {%- endfor -%}
                    {%- elif function['arguments'] is string -%}
                        {{- function['arguments'] -}}
                    {%- endif -%}
                    {{- '}<tool_call|>' -}}
                {%- endfor -%}
                {%- set ns.prev_message_type = 'tool_call' -%}
            {%- endif -%}

            {%- set ns_tr_out = namespace(flag=false) -%}
            {%- if message.get('tool_responses') -%}
                {#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#}
                {%- for tool_response in message['tool_responses'] -%}
                    {{- format_tool_response_block(tool_response['name'] | default('unknown'), tool_response['response']) -}}
                    {%- set ns_tr_out.flag = true -%}
                    {%- set ns.prev_message_type = 'tool_response' -%}
                {%- endfor -%}
            {%- elif message.get('tool_calls') -%}
                {#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#}
                {%- set ns_tool_scan = namespace(stopped=false) -%}
                {%- for k in range(loop.index0 + 1, loop_messages | length) -%}
                    {%- if ns_tool_scan.stopped -%}
                    {%- elif loop_messages[k]['role'] != 'tool' -%}
                        {%- set ns_tool_scan.stopped = true -%}
                    {%- else -%}
                        {%- set follow = loop_messages[k] -%}
                        {#- Resolve tool_call_id to function name -#}
                        {%- set ns_tname = namespace(name=follow.get('name') | default('unknown')) -%}
                        {%- for tc in message['tool_calls'] -%}
                            {%- if tc.get('id') == follow.get('tool_call_id') -%}
                                {%- set ns_tname.name = tc['function']['name'] -%}
                            {%- endif -%}
                        {%- endfor -%}
                        {#- Handle content as string or content-parts array -#}
                        {%- set tool_body = follow.get('content') -%}
                        {%- if tool_body is string -%}
                            {{- format_tool_response_block(ns_tname.name, tool_body) -}}
                        {%- elif tool_body is sequence and tool_body is not string -%}
                            {%- set ns_txt = namespace(s='') -%}
                            {%- for part in tool_body -%}
                                {%- if part.get('type') == 'text' -%}
                                    {%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%}
                                {%- endif -%}
                            {%- endfor -%}
                            {{- format_tool_response_block(ns_tname.name, ns_txt.s) -}}
                        {%- else -%}
                            {{- format_tool_response_block(ns_tname.name, tool_body) -}}
                        {%- endif -%}
                        {%- set ns_tr_out.flag = true -%}
                        {%- set ns.prev_message_type = 'tool_response' -%}
                    {%- endif -%}
                {%- endfor -%}
            {%- endif -%}

            {%- set captured_content -%}
            {%- if message['content'] is string -%}
                {%- if role == 'model' -%}
                    {{- strip_thinking(message['content']) -}}
                {%- else -%}
                    {{- message['content'] | trim -}}
                {%- endif -%}
            {%- elif message['content'] is sequence -%}
                {%- for item in message['content'] -%}
                    {%- if item['type'] == 'text' -%}
                        {%- if role == 'model' -%}
                            {{- strip_thinking(item['text']) -}}
                        {%- else -%}
                            {{- item['text'] | trim -}}
                        {%- endif -%}
                    {%- elif item['type'] == 'image' -%}
                        {{- '<|image|>' -}}
                        {%- set ns.prev_message_type = 'image' -%}
                    {%- elif item['type'] == 'audio' -%}
                        {{- '<|audio|>' -}}
                        {%- set ns.prev_message_type = 'audio' -%}
                    {%- elif item['type'] == 'video' -%}
                        {{- '<|video|>' -}}
                        {%- set ns.prev_message_type = 'video' -%}
                    {%- endif -%}
                {%- endfor -%}
            {%- endif -%}
            {%- endset -%}

            {{- captured_content -}}
            {%- set has_content = captured_content | trim | length > 0 -%}

        {%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%}
            {{- '<|tool_response>' -}}
        {%- elif not (ns_tr_out.flag and not has_content) -%}
            {{- '<turn|>\n' -}}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}

{%- if add_generation_prompt -%}
    {%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
        {{- '<|turn>model\n' -}}
    {%- endif -%}
{%- endif -%}

test query :

curl -X 'POST' \
  'uri/v1/responses' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "__mymodel__/gemma-4",
  "input": "What is the weather in Paris today?",
  "temperature": 0,

  "tool_choice": "auto",

  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string"
          }
        },
        "required": ["location"]
      }
    }
  ]
}'

response :

{
  "id": "resp_8c7a5c3404323551",
  "created_at": 1778390214,
  "incomplete_details": null,
  "instructions": null,
  "metadata": null,
  "model": "__mymodel__/gemma-4",
  "object": "response",
  "output": [
    {
      "id": "rs_952f7a3fee06d8bd",
      "summary": [],
      "type": "reasoning",
      "content": [
        {
          "text": "1. **Analyze the user's request:** The user is asking for the weather in \"Paris\" today.\n2. **Identify available tools:** The available tool is `get_weather(location: STRING)`.\n3. **Determine the necessary arguments:** The `get_weather` tool requires a `location` argument.\n4. **Extract the argument from the request:** The location specified in the request is \"Paris\".\n5. **Construct the tool call:** Call `get_weather` with `location=\"Paris\"`.\n6. **Format the output:** Generate the tool call in the required JSON format.",
          "type": "reasoning_text"
        }
      ],
      "encrypted_content": null,
      "status": null
    },
    {
      "arguments": "{\"location\": \"Paris\"}",
      "call_id": "chatcmpl-tool-b6585741e3bf0611",
      "name": "get_weather",
      "type": "function_call",
      "id": "fc_a0e2cd868c181a8c",
      "namespace": null,
      "status": "completed"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 0,
  "tool_choice": "auto",
  "tools": [
    {
      "name": "get_weather",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string"
          }
        },
        "required": [
          "location"
        ]
      },
      "strict": null,
      "type": "function",
      "defer_loading": null,
      "description": "Get current weather"
    }
  ],
  "top_p": 1,
  "background": false,
  "max_output_tokens": 131010,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "reasoning": null,
  "service_tier": "auto",
  "status": "completed",
  "text": null,
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 62,
    "input_tokens_details": {
      "cached_tokens": 32,
      "input_tokens_per_turn": [],
      "cached_tokens_per_turn": []
    },
    "output_tokens": 170,
    "output_tokens_details": {
      "reasoning_tokens": 131,
      "tool_output_tokens": 0,
      "output_tokens_per_turn": [],
      "tool_output_tokens_per_turn": []
    },
    "total_tokens": 232
  },
  "user": null,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "kv_transfer_params": null,
  "input_messages": null,
  "output_messages": null
}

@guidryheal-create

guidryheal-create commented May 10, 2026

Copy link
Copy Markdown

it look clearly functional yet camel or other tool calling framework struggle with it. i wonder if it s VLLM or my tool calling framework that even at lastest is parsing tool call in a wrong way for open ai compatible or vllm. (ps i know it s close just asking even so i look at it myself so if no one reply consider fixed)

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit d3ca10a)
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit d3ca10a)
(cherry picked from commit cf9b6f4)
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit da54c3e)
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit da54c3e)
(cherry picked from commit 13e17ca)
alexbi29 pushed a commit to alexbi29/vllm that referenced this pull request May 18, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
alexbi29 added a commit to alexbi29/vllm that referenced this pull request May 18, 2026
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…roject#39027)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants