[Tool] `adjust_request` to reasoning parser, and Gemma4 fixes by bbrowning · Pull Request #39027 · vllm-project/vllm

bbrowning · 2026-04-05T13:14:14Z

Purpose

Fix multiple issues preventing Gemma4 models from working correctly
with multi-turn tool calling and reasoning in vLLM:

Add new Gemma4 chat template that properly encodes tool results using the model's native format, handles multi-turn conversations with interleaved tool calls and reasoning, and strips thinking content from prior assistant turns
Add adjust_request() to ReasoningParser base class (mirroring ToolParser) so reasoning parsers can modify request parameters before generation, used by Gemma4 to set skip_special_tokens=False
Fix reasoning parser to extract non-streaming thinking content and handle the "thought\n" prefix correctly in streaming
Fix pre-existing mypy error in ReasoningParserManager.register_module
Add unit tests for reasoning parser and chat template rendering
Fix empty "user" turns created when handling tool outputs by our Messages API to Chat Completions translation
is_reasoning_end clean ups for the Gemma 4 reasoning parser
- don't assuming reasoning has ended when we scan prompts backwards across user turn boundaries or after tool responses
- explicitly mark reasoning as ended when we start generating tool calls

The net result of these fixes shows larger Gemma4 models are very competitive at multi-turn tool calling for their size. I won't share any specific numbers here, but all of these fixes were guided by both direct inspection of prompting and multi-turn behavior and some simple quantitative eval with the BFCL multi_turn suite.

You'll need to both enable thinking and select the correct chat template when testing Gemma 4 models with these fixes:

vllm serve google/gemma-4-31B-it \
  --tool-call-parser gemma4 \
  --enable-auto-tool-choice \
  --reasoning-parser gemma4 \
  --default-chat-template-kwargs '{"enable_thinking": true}' \
  --chat-template examples/tool_chat_template_gemma4.jinja

Test Plan

BFCL multi_turn suite to uncover bugs and validate fixes

(expand for BFCL clone, setup, adding models)

git clone https://github.com/ShishirPatil/gorilla

cd gorilla/berkeley-function-call-leaderboard/

uv venv --python 3.12 --seed

source .venv/bin/activate

uv pip install -e .

cat <<EOF >> bfcl_eval/constants/model_config.py
    "google/gemma-4-E2B-it": ModelConfig(
        model_name="google/gemma-4-E2B-it",
        display_name="google/gemma-4-E2B-it (FC) (vLLM)",
        url="https://huggingface.co/google/gemma-4-E2B-it",
        org="Google",
        license="apache-2.0",
        model_handler=OpenAICompletionsHandler,
        input_price=None,
        output_price=None,
        is_fc_model=True,
        underscore_to_dot=True,
    ),
    "google/gemma-4-26B-A4B-it": ModelConfig(
        model_name="google/gemma-4-26B-A4B-it",
        display_name="google/gemma-4-26B-A4B-it (FC) (vLLM)",
        url="https://huggingface.co/google/gemma-4-26B-A4B-it",
        org="Google",
        license="apache-2.0",
        model_handler=OpenAICompletionsHandler,
        input_price=None,
        output_price=None,
        is_fc_model=True,
        underscore_to_dot=True,
    ),
    "google/gemma-4-31B-it": ModelConfig(
        model_name="google/gemma-4-31B-it",
        display_name="google/gemma-4-31B-it (FC) (vLLM)",
        url="https://huggingface.co/google/gemma-4-31B-it",
        org="Google",
        license="apache-2.0",
        model_handler=OpenAICompletionsHandler,
        input_price=None,
        output_price=None,
        is_fc_model=True,
        underscore_to_dot=True,
    ),
}
EOF

Run BFCL multi_turn eval suite:

OPENAI_BASE_URL="http://localhost:8000/v1" \
OPENAI_API_KEY="fake" \
bfcl generate \
  --model google/gemma-4-31B-it \
  --num-threads 4 \
  --allow-overwrite \
  --test-category multi_turn

OPENAI_API_KEY="fake" \
bfcl evaluate

Unit Tests

# Note: this test has a pre-existing dependency on transformers 5.x
# `pip install --upgrade transformers`
pytest tests/reasoning/test_gemma4_reasoning_parser.py

pytest tests/renderers/test_gemma4_chat_template.py

# Run all reasoning parser tests, since we added `adjust_request`
# skip the ones that CI skips because they already fail
# and skip step3p5 because it requires trusting remote code 
pytest tests/reasoning \
  --ignore=tests/reasoning/test_seedoss_reasoning_parser.py \
  --ignore=tests/reasoning/test_glm4_moe_reasoning_parser.py \
  --ignore=tests/reasoning/test_step3p5_reasoning_parser.py

Claude Code pointed at a Gemma 4 model running locally

CLAUDE_CODE_USE_VERTEX=0 \
ANTHROPIC_BASE_URL="http://localhost:8000" \
ANTHROPIC_DEFAULT_OPUS_MODEL="google/gemma-4-31B-it" \
ANTHROPIC_DEFAULT_SONNET_MODEL="google/gemma-4-31B-it" \
ANTHROPIC_DEFAULT_HAIKU_MODEL="google/gemma-4-31B-it" \
ANTHROPIC_AUTH_TOKEN="dummy" \
claude \
  --model sonnet

Test Result

Unit Tests

`pytest tests/reasoning/test_gemma4_reasoning_parser.py`

29 passed, 2 warnings in 3.24s

`pytest tests/renderers/test_gemma4_chat_template.py`

14 passed, 2 warnings in 0.98s

`tests/reasoning`

318 passed, 5 warnings in 40.41s

BFCL Results

I have BFCL results and they are far better after this change than before. I'm not sure it's my place to share those publicly here, but the results for the larger Gemma4 models (MoE and Dense) are very good for models of their size.

Claude Code usability

I was able to execute multiple complex refactoring and new code generation sessions in existing codebases with both Gemma-4-31B and Gemma-4-26B-A4B. After the latest fixes here, I'm not seeing any unparsed tool calls nor any leaked reasoning content into the session.

mergify · 2026-04-05T13:14:53Z

Documentation preview: https://vllm--39027.org.readthedocs.build/en/39027/

gemini-code-assist

Code Review

This pull request introduces a new Jinja chat template for Gemma 4, along with infrastructure to support custom Jinja filters and normalized tool responses. However, several critical issues were identified in the implementation. Specifically, the hardcoding of skip_special_tokens = False in the OpenAI serving layer is a global change that overrides user intent for all models. Additionally, there is debug logging to a hardcoded local file (gemma_turns.log) which is unsuitable for production. The use of global monkey-patching on jinja2.sandbox.ImmutableSandboxedEnvironment is also discouraged as it creates dangerous side effects across the entire application; a more localized injection of filters is preferred. Finally, it is recommended to log exceptions during Jinja filter patching rather than swallowing them silently.

ywang96 · 2026-04-06T01:48:26Z

FYI #38858 (comment)

bbrowning · 2026-04-06T11:53:46Z

FYI #38858 (comment)

Thanks, I commented there. We'll need to handle parsing reasoning content properly within vLLM vs asking the user to set skip_special_tokens=False on the request, as the user knob is enable_thinking=True and then it's the server's job to parse reasoning out of that when the server was configured to use the gemma4 reasoning parser.

bbrowning · 2026-04-06T13:43:48Z

@lucianommartins The draft gemma4 chat template added here at examples/tool_chat_template_gemma4.jinja changes a number of things that we might want to get into the model's default chat template overall, so that all inference servers can benefit. I know some of these are already on your radar, but things like handling the reasoning content in multi-turn scenarios I don't know if you've considered yet.

Also, optimally we would coerce tool outputs that are JSON strings into actual JSON objects. I need to validate how much of a % difference this makes in overall multi-turn eval results, as my best results so far were based on turning JSON strings into JSON objects before the tool output reaches the chat template so that Gemini sees the tool response as an actual JSON object with string token delimiters and such instead of as a single encoded string. The previous draft of this did some hacks to get this happening in the chat template via external python helper functions, but I'd like to drop those hacks if I can verify the model results are substantially identical even if we just give it a JSON string as one string blob for tool call outputs.

@sfeng33 @chaunceyjiang I'd like your eyes on this for the adjustments to wire in an adjust_request for reasoning parsers in case you have another preferred way to do this. So many reasoning parser unit tests fail today because we don't CI them that it's a bit hard to get a good signal here, but I'll do more testing on my end as well.

I'll be pulling this out of draft soon once I clean up a few more loose ends.

lucianommartins · 2026-04-06T14:39:26Z

thanks @bbrowning - I will update my transformers PR with the nice additions from your jinja.

bbrowning · 2026-04-06T14:51:42Z

@lucianommartins opened #39081 to focus just on handling the stripping of special tokens during reasoning. Once that gets in a good place and merges, I'll remove the equivalent-ish code from this PR as here I don't handle it for the offline inference case.

drrros · 2026-04-06T20:07:38Z

with patch applied it worked better, about 10 tool calls went fine, but at some point it happens again:

                                                                                                                                                                                                                                                                                                                                                                                                                                            
FAILED (failures=7, errors=31)                                                                                                                                                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                                                                                                                                                            

  Read 3 files, listed 1 directory (ctrl+o to expand)

● <|channel>thought
  I will remove the invalid @patch decorators from tests/rest_api_backend/test_cc_write_protection.py and then run the tests to see the effect.<channel|>

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● ---

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● Update(tests/rest_api_backend/test_cc_write_protection.py)
  ⎿  Error editing file

● Update(tests/rest_api_backend/test_cc_write_protection.py)      
  ⎿  Error editing file
  ⎿  Interrupted · What should Claude do instead?

It failed to apply file changes after tokens leakage for some reason, had to stop it after some time.

bbrowning · 2026-04-06T20:40:19Z

@drrros That sounds like we still have some edge cases to sort out in streaming reasoning parsing or perhaps our Messages API implementation. Let's debug that more specifically in #39043 since that focuses on making sure this works great for Claude Code specific tool calls.

bbrowning · 2026-04-08T02:22:34Z

Adding a few more commits to my staging area here. With these additional fixes plus #38909 and #39114 Gemma4 multi-turn reasoning and tool calling performance is getting to a really good place.

bbrowning · 2026-04-08T02:27:13Z

Note this builds on top of changes from 39114, and the pre-commit failures are because of that.

mergify · 2026-04-08T03:19:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bbrowning.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

bbrowning · 2026-04-08T10:59:58Z

Just waiting on #39114 to land to get the pre-commit fixed and pull this out of draft.

This is a slightly simpler approach to wiring in skip_special_tokens for the Gemma 4 reasoning parser than #39081. I don't have a strong preference, but this has to happen for the reasoning parser to work out of the box. If 39081 merges first I'll rebase this to remove my version of that. Otherwise, if this merges 39081 can either be closed or adjusted to tweak the logic I added here.

We'll also want to update the Gemma 4 Usage Guide in the recipes repo pointing to our new chat template. There is huggingface/transformers#45257 to get this updated by default in the model's official chat template, but it's not clear if/when that will merge. I've confirmed that all the logic is the same between what that transformers PR renders and what we have here.

Fix multiple issues preventing Gemma4 models from working correctly with multi-turn tool calling and reasoning in vLLM: - Add new Gemma4 chat template that properly encodes tool results using the model's native format, handles multi-turn conversations with interleaved tool calls and reasoning, and strips thinking content from prior assistant turns - Add adjust_request() to ReasoningParser base class (mirroring ToolParser) so reasoning parsers can modify request parameters before generation, used by Gemma4 to set skip_special_tokens=False - Fix reasoning parser to extract non-streaming thinking content and handle the "thought\n" prefix correctly in streaming - Fix pre-existing mypy error in ReasoningParserManager.register_module - Add unit tests for reasoning parser and chat template rendering Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

When translating from Anthropic Messages API to Chat Completions, we were inserting entirely empty "user" turns for tool call outputs, as those come in via the user in the Messages API but get turned into a tool role in the Chat Completions API. These empty user role turs make their way into some chat templates, and for example led to issues in long multi-turn scenarios when testing with Gemma 4 models. Signed-off-by: Ben Browning <bbrownin@redhat.com>

Current vllm/vllm-openai:gemma4 image does not support this flag. Thinking disable will be possible after image update with --default-chat-template-kwargs from vllm-project/vllm#39027. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit 8477fe4)

Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit 8477fe4) (cherry picked from commit 0d784f4)

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

guidryheal-create · 2026-05-10T05:11:32Z

i fin-tune gemma using unsloth, then saved it with all it need. i m pretty sure everything fine about it but,
i got the issue with camel and other tool as well.

when i query it from api the tool call look kinda correct so :

docker run --runtime nvidia --gpus all -v $(pwd):/template   -v ~/.cache/huggingface:/root/.cache/huggingface   --env "HF_TOKEN=$HF_TOKEN"   -p 8666:8000   --ipc=host   vllm/vllm-openai:latest   --model __mymodel__/gemma-4   --gpu-memory-utilization 0.85   --max-model-len 131072 --enable-auto-tool-choice --quantization fp8 --kv-cache-dtype fp8 --safetensors-load-strategy lazy --reasoning-parser gemma4 --tool-call-parser gemma4 --chat-template /template/chat_template.jinja

with chat template from source (it work way less when i use VLLM exemple) :

{%- macro format_parameters(properties, required, filter_keys=false) -%}
    {%- set standard_keys = ['description', 'type', 'properties', 'required', 'nullable'] -%}
    {%- set ns = namespace(found_first=false) -%}
    {%- for key, value in properties | dictsort -%}
        {%- set add_comma = false -%}
        {%- if not filter_keys or key not in standard_keys -%}
            {%- if ns.found_first %},{% endif -%}
            {%- set ns.found_first = true -%}
            {{ key }}:{
            {%- if value['description'] -%}
                description:<|"|>{{ value['description'] }}<|"|>
                {%- set add_comma = true -%}
            {%- endif -%}
            {%- if value['type'] | upper == 'STRING' -%}
                {%- if value['enum'] -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    enum:{{ format_argument(value['enum']) }}
                {%- endif -%}
            {%- elif value['type'] | upper == 'ARRAY' -%}
                {%- if value['items'] is mapping and value['items'] -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    items:{
                    {%- set ns_items = namespace(found_first=false) -%}
                    {%- for item_key, item_value in value['items'] | dictsort -%}
                        {%- if item_value is not none -%}
                            {%- if ns_items.found_first %},{% endif -%}
                            {%- set ns_items.found_first = true -%}
                            {%- if item_key == 'properties' -%}
                                properties:{
                                {%- if item_value is mapping -%}
                                    {{- format_parameters(item_value, value['items']['required'] | default([])) -}}
                                {%- endif -%}
                                }
                            {%- elif item_key == 'required' -%}
                                required:[
                                {%- for req_item in item_value -%}
                                    <|"|>{{- req_item -}}<|"|>
                                    {%- if not loop.last %},{% endif -%}
                                {%- endfor -%}
                                ]
                            {%- elif item_key == 'type' -%}
                                {%- if item_value is string -%}
                                    type:{{ format_argument(item_value | upper) }}
                                {%- else -%}
                                    type:{{ format_argument(item_value | map('upper') | list) }}
                                {%- endif -%}
                            {%- else -%}
                                {{ item_key }}:{{ format_argument(item_value) }}
                            {%- endif -%}
                        {%- endif -%}
                    {%- endfor -%}
                    }
                {%- endif -%}
            {%- endif -%}
            {%- if value['nullable'] %}
                {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                nullable:true
            {%- endif -%}
            {%- if value['type'] | upper == 'OBJECT' -%}
                {%- if value['properties'] is defined and value['properties'] is mapping -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    properties:{
                    {{- format_parameters(value['properties'], value['required'] | default([])) -}}
                    }
                {%- elif value is mapping -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    properties:{
                    {{- format_parameters(value, value['required'] | default([]), filter_keys=true) -}}
                    }
                {%- endif -%}
                {%- if value['required'] -%}
                    {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
                    required:[
                    {%- for item in value['required'] | default([]) -%}
                        <|"|>{{- item -}}<|"|>
                        {%- if not loop.last %},{% endif -%}
                    {%- endfor -%}
                    ]
                {%- endif -%}
            {%- endif -%}
            {%- if add_comma %},{%- else -%} {%- set add_comma = true -%} {% endif -%}
            type:<|"|>{{ value['type'] | upper }}<|"|>}
        {%- endif -%}
    {%- endfor -%}
{%- endmacro -%}
{%- macro format_function_declaration(tool_data) -%}
    declaration:{{- tool_data['function']['name'] -}}{description:<|"|>{{- tool_data['function']['description'] -}}<|"|>
    {%- set params = tool_data['function']['parameters'] -%}
    {%- if params -%}
        ,parameters:{
        {%- if params['properties'] -%}
            properties:{ {{- format_parameters(params['properties'], params['required']) -}} },
        {%- endif -%}
        {%- if params['required'] -%}
            required:[
            {%- for item in params['required'] -%}
                <|"|>{{- item -}}<|"|>
                {{- ',' if not loop.last -}}
            {%- endfor -%}
            ],
        {%- endif -%}
        {%- if params['type'] -%}
            type:<|"|>{{- params['type'] | upper -}}<|"|>}
        {%- endif -%}
    {%- endif -%}
    {%- if 'response' in tool_data['function'] -%}
        {%- set response_declaration = tool_data['function']['response'] -%}
        ,response:{
        {%- if response_declaration['description'] -%}
            description:<|"|>{{- response_declaration['description'] -}}<|"|>,
        {%- endif -%}
        {%- if response_declaration['type'] | upper == 'OBJECT' -%}
            type:<|"|>{{- response_declaration['type'] | upper -}}<|"|>}
        {%- endif -%}
    {%- endif -%}
    }
{%- endmacro -%}
{%- macro format_argument(argument, escape_keys=True) -%}
    {%- if argument is string -%}
        {{- '<|"|>' + argument + '<|"|>' -}}
    {%- elif argument is boolean -%}
        {{- 'true' if argument else 'false' -}}
    {%- elif argument is mapping -%}
        {{- '{' -}}
        {%- set ns = namespace(found_first=false) -%}
        {%- for key, value in argument | dictsort -%}
            {%- if ns.found_first %},{% endif -%}
            {%- set ns.found_first = true -%}
            {%- if escape_keys -%}
                {{- '<|"|>' + key + '<|"|>' -}}
            {%- else -%}
                {{- key -}}
            {%- endif -%}
            :{{- format_argument(value, escape_keys=escape_keys) -}}
        {%- endfor -%}
        {{- '}' -}}
    {%- elif argument is sequence -%}
        {{- '[' -}}
        {%- for item in argument -%}
            {{- format_argument(item, escape_keys=escape_keys) -}}
            {%- if not loop.last %},{% endif -%}
        {%- endfor -%}
        {{- ']' -}}
    {%- else -%}
        {{- argument -}}
    {%- endif -%}
{%- endmacro -%}
{%- macro strip_thinking(text) -%}
    {%- set ns = namespace(result='') -%}
    {%- for part in text.split('<channel|>') -%}
        {%- if '<|channel>' in part -%}
            {%- set ns.result = ns.result + part.split('<|channel>')[0] -%}
        {%- else -%}
            {%- set ns.result = ns.result + part -%}
        {%- endif -%}
    {%- endfor -%}
    {{- ns.result | trim -}}
{%- endmacro -%}

{%- macro format_tool_response_block(tool_name, response) -%}
    {{- '<|tool_response>' -}}
    {%- if response is mapping -%}
        {{- 'response:' + tool_name + '{' -}}
        {%- for key, value in response | dictsort -%}
            {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
            {%- if not loop.last %},{% endif -%}
        {%- endfor -%}
        {{- '}' -}}
    {%- else -%}
        {{- 'response:' + tool_name + '{value:' + format_argument(response, escape_keys=False) + '}' -}}
    {%- endif -%}
    {{- '<tool_response|>' -}}
{%- endmacro -%}

{%- set ns = namespace(prev_message_type=None) -%}
{%- set loop_messages = messages -%}
{{- bos_token -}}
{#- Handle System/Tool Definitions Block -#}
{%- if (enable_thinking is defined and enable_thinking) or tools or messages[0]['role'] in ['system', 'developer'] -%}
    {{- '<|turn>system\n' -}}
    {#- Inject Thinking token at the very top of the FIRST system turn -#}
    {%- if enable_thinking is defined and enable_thinking -%}
        {{- '<|think|>\n' -}}
        {%- set ns.prev_message_type = 'think' -%}
    {%- endif -%}
    {%- if messages[0]['role'] in ['system', 'developer'] -%}
        {%- if messages[0]['content'] is string -%}
            {{- messages[0]['content'] | trim -}}
        {%- elif messages[0]['content'] is sequence -%}
            {%- for item in messages[0]['content'] -%}
                {{- item['text'] | trim + ' '-}}
            {%- endfor -%}
        {%- endif -%}
        {%- set loop_messages = messages[1:] -%}
    {%- endif -%}
    {%- if tools -%}
        {%- for tool in tools %}
            {{- '<|tool>' -}}
            {{- format_function_declaration(tool) | trim -}}
            {{- '<tool|>' -}}
        {%- endfor %}
        {%- set ns.prev_message_type = 'tool' -%}
    {%- endif -%}
    {{- '<turn|>\n' -}}
{%- endif %}

{#- Pre-scan: find last user message index for reasoning guard -#}
{%- set ns_turn = namespace(last_user_idx=-1) -%}
{%- for i in range(loop_messages | length) -%}
    {%- if loop_messages[i]['role'] == 'user' -%}
        {%- set ns_turn.last_user_idx = i -%}
    {%- endif -%}
{%- endfor -%}

{#- Loop through messages -#}
{%- for message in loop_messages -%}
    {%- if message['role'] != 'tool' -%}
    {%- set ns.prev_message_type = None -%}
    {%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
    {#- Detect continuation: suppress duplicate <|turn>model when previous non-tool message was also assistant -#}
    {%- set prev_nt = namespace(role=None, found=false) -%}
    {%- if loop.index0 > 0 -%}
        {%- for j in range(loop.index0 - 1, -1, -1) -%}
            {%- if not prev_nt.found -%}
                {%- if loop_messages[j]['role'] != 'tool' -%}
                    {%- set prev_nt.role = loop_messages[j]['role'] -%}
                    {%- set prev_nt.found = true -%}
                {%- endif -%}
            {%- endif -%}
        {%- endfor -%}
    {%- endif -%}
    {%- set continue_same_model_turn = (role == 'model' and prev_nt.role == 'assistant') -%}
    {%- if not continue_same_model_turn -%}
        {{- '<|turn>' + role + '\n' }}
    {%- endif -%}

    {#- Render reasoning/reasoning_content as thinking channel -#}
    {%- set thinking_text = message.get('reasoning') or message.get('reasoning_content') -%}
    {%- if thinking_text and loop.index0 > ns_turn.last_user_idx and message.get('tool_calls') -%}
        {{- '<|channel>thought\n' + thinking_text + '\n<channel|>' -}}
    {%- endif -%}

            {%- if message['tool_calls'] -%}
                {%- for tool_call in message['tool_calls'] -%}
                    {%- set function = tool_call['function'] -%}
                    {{- '<|tool_call>call:' + function['name'] + '{' -}}
                    {%- if function['arguments'] is mapping -%}
                        {%- set ns_args = namespace(found_first=false) -%}
                        {%- for key, value in function['arguments'] | dictsort -%}
                            {%- if ns_args.found_first %},{% endif -%}
                            {%- set ns_args.found_first = true -%}
                            {{- key -}}:{{- format_argument(value, escape_keys=False) -}}
                        {%- endfor -%}
                    {%- elif function['arguments'] is string -%}
                        {{- function['arguments'] -}}
                    {%- endif -%}
                    {{- '}<tool_call|>' -}}
                {%- endfor -%}
                {%- set ns.prev_message_type = 'tool_call' -%}
            {%- endif -%}

            {%- set ns_tr_out = namespace(flag=false) -%}
            {%- if message.get('tool_responses') -%}
                {#- Legacy: tool_responses embedded on the assistant message (Google/Gemma native) -#}
                {%- for tool_response in message['tool_responses'] -%}
                    {{- format_tool_response_block(tool_response['name'] | default('unknown'), tool_response['response']) -}}
                    {%- set ns_tr_out.flag = true -%}
                    {%- set ns.prev_message_type = 'tool_response' -%}
                {%- endfor -%}
            {%- elif message.get('tool_calls') -%}
                {#- OpenAI Chat Completions: forward-scan consecutive role:tool messages -#}
                {%- set ns_tool_scan = namespace(stopped=false) -%}
                {%- for k in range(loop.index0 + 1, loop_messages | length) -%}
                    {%- if ns_tool_scan.stopped -%}
                    {%- elif loop_messages[k]['role'] != 'tool' -%}
                        {%- set ns_tool_scan.stopped = true -%}
                    {%- else -%}
                        {%- set follow = loop_messages[k] -%}
                        {#- Resolve tool_call_id to function name -#}
                        {%- set ns_tname = namespace(name=follow.get('name') | default('unknown')) -%}
                        {%- for tc in message['tool_calls'] -%}
                            {%- if tc.get('id') == follow.get('tool_call_id') -%}
                                {%- set ns_tname.name = tc['function']['name'] -%}
                            {%- endif -%}
                        {%- endfor -%}
                        {#- Handle content as string or content-parts array -#}
                        {%- set tool_body = follow.get('content') -%}
                        {%- if tool_body is string -%}
                            {{- format_tool_response_block(ns_tname.name, tool_body) -}}
                        {%- elif tool_body is sequence and tool_body is not string -%}
                            {%- set ns_txt = namespace(s='') -%}
                            {%- for part in tool_body -%}
                                {%- if part.get('type') == 'text' -%}
                                    {%- set ns_txt.s = ns_txt.s + (part.get('text') | default('')) -%}
                                {%- endif -%}
                            {%- endfor -%}
                            {{- format_tool_response_block(ns_tname.name, ns_txt.s) -}}
                        {%- else -%}
                            {{- format_tool_response_block(ns_tname.name, tool_body) -}}
                        {%- endif -%}
                        {%- set ns_tr_out.flag = true -%}
                        {%- set ns.prev_message_type = 'tool_response' -%}
                    {%- endif -%}
                {%- endfor -%}
            {%- endif -%}

            {%- set captured_content -%}
            {%- if message['content'] is string -%}
                {%- if role == 'model' -%}
                    {{- strip_thinking(message['content']) -}}
                {%- else -%}
                    {{- message['content'] | trim -}}
                {%- endif -%}
            {%- elif message['content'] is sequence -%}
                {%- for item in message['content'] -%}
                    {%- if item['type'] == 'text' -%}
                        {%- if role == 'model' -%}
                            {{- strip_thinking(item['text']) -}}
                        {%- else -%}
                            {{- item['text'] | trim -}}
                        {%- endif -%}
                    {%- elif item['type'] == 'image' -%}
                        {{- '<|image|>' -}}
                        {%- set ns.prev_message_type = 'image' -%}
                    {%- elif item['type'] == 'audio' -%}
                        {{- '<|audio|>' -}}
                        {%- set ns.prev_message_type = 'audio' -%}
                    {%- elif item['type'] == 'video' -%}
                        {{- '<|video|>' -}}
                        {%- set ns.prev_message_type = 'video' -%}
                    {%- endif -%}
                {%- endfor -%}
            {%- endif -%}
            {%- endset -%}

            {{- captured_content -}}
            {%- set has_content = captured_content | trim | length > 0 -%}

        {%- if ns.prev_message_type == 'tool_call' and not ns_tr_out.flag -%}
            {{- '<|tool_response>' -}}
        {%- elif not (ns_tr_out.flag and not has_content) -%}
            {{- '<turn|>\n' -}}
        {%- endif -%}
    {%- endif -%}
{%- endfor -%}

{%- if add_generation_prompt -%}
    {%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
        {{- '<|turn>model\n' -}}
    {%- endif -%}
{%- endif -%}

test query :

curl -X 'POST' \
  'uri/v1/responses' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "__mymodel__/gemma-4",
  "input": "What is the weather in Paris today?",
  "temperature": 0,

  "tool_choice": "auto",

  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string"
          }
        },
        "required": ["location"]
      }
    }
  ]
}'

response :

{
  "id": "resp_8c7a5c3404323551",
  "created_at": 1778390214,
  "incomplete_details": null,
  "instructions": null,
  "metadata": null,
  "model": "__mymodel__/gemma-4",
  "object": "response",
  "output": [
    {
      "id": "rs_952f7a3fee06d8bd",
      "summary": [],
      "type": "reasoning",
      "content": [
        {
          "text": "1. **Analyze the user's request:** The user is asking for the weather in \"Paris\" today.\n2. **Identify available tools:** The available tool is `get_weather(location: STRING)`.\n3. **Determine the necessary arguments:** The `get_weather` tool requires a `location` argument.\n4. **Extract the argument from the request:** The location specified in the request is \"Paris\".\n5. **Construct the tool call:** Call `get_weather` with `location=\"Paris\"`.\n6. **Format the output:** Generate the tool call in the required JSON format.",
          "type": "reasoning_text"
        }
      ],
      "encrypted_content": null,
      "status": null
    },
    {
      "arguments": "{\"location\": \"Paris\"}",
      "call_id": "chatcmpl-tool-b6585741e3bf0611",
      "name": "get_weather",
      "type": "function_call",
      "id": "fc_a0e2cd868c181a8c",
      "namespace": null,
      "status": "completed"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 0,
  "tool_choice": "auto",
  "tools": [
    {
      "name": "get_weather",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string"
          }
        },
        "required": [
          "location"
        ]
      },
      "strict": null,
      "type": "function",
      "defer_loading": null,
      "description": "Get current weather"
    }
  ],
  "top_p": 1,
  "background": false,
  "max_output_tokens": 131010,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "reasoning": null,
  "service_tier": "auto",
  "status": "completed",
  "text": null,
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 62,
    "input_tokens_details": {
      "cached_tokens": 32,
      "input_tokens_per_turn": [],
      "cached_tokens_per_turn": []
    },
    "output_tokens": 170,
    "output_tokens_details": {
      "reasoning_tokens": 131,
      "tool_output_tokens": 0,
      "output_tokens_per_turn": [],
      "tool_output_tokens_per_turn": []
    },
    "total_tokens": 232
  },
  "user": null,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "kv_transfer_params": null,
  "input_messages": null,
  "output_messages": null
}

guidryheal-create · 2026-05-10T05:20:08Z

it look clearly functional yet camel or other tool calling framework struggle with it. i wonder if it s VLLM or my tool calling framework that even at lastest is parsing tool call in a wrong way for open ai compatible or vllm. (ps i know it s close just asking even so i look at it myself so if no one reply consider fixed)

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit d3ca10a)

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit d3ca10a) (cherry picked from commit cf9b6f4)

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit da54c3e)

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit da54c3e) (cherry picked from commit 13e17ca)

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

…vllm-project#39027)" This reverts commit 60a84ac.

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

…roject#39027) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

mergify Bot added documentation Improvements or additions to documentation frontend tool-calling labels Apr 5, 2026

github-project-automation Bot added this to Tool Calling Apr 5, 2026

gemini-code-assist Bot reviewed Apr 5, 2026

View reviewed changes

This was referenced Apr 6, 2026

[Bug]: gemma4_utils._parse_tool_arguments truncates string values containing internal quotes #39069

Open

[Bugfix] Fix gemma4_utils._parse_tool_arguments truncating strings with internal quotes #39070

Open

bbrowning force-pushed the gemma4-multi-turn-fixes branch from 039bf4e to 21518bb Compare April 6, 2026 13:20

This was referenced Apr 6, 2026

[Bug]: Vllm + Gemma 4 + claude code: tool calling problems #39043

Open

[Bug]: Gemma4 on vLLM + PI coding agent: Validation failed for tool "edit": - path: must have required property 'path' #39072

Open

aldehir mentioned this pull request Apr 7, 2026

common : add gemma 4 specialized parser ggml-org/llama.cpp#21418

Merged

tysonmcnulty mentioned this pull request Apr 7, 2026

[Bugfix] Fix Gemma4 streaming tool parser stale state between requests #39214

Closed

mergify Bot added the needs-rebase label Apr 8, 2026

bbrowning force-pushed the gemma4-multi-turn-fixes branch from 928a3ad to 8bdf630 Compare April 8, 2026 10:50

mergify Bot removed the needs-rebase label Apr 8, 2026

bbrowning mentioned this pull request Apr 8, 2026

[Feature] Extend Gemma4 tool parser to support XML-style <tool_call> format #39172

Open

bbrowning and others added 2 commits April 8, 2026 13:18

douhashi mentioned this pull request Apr 9, 2026

vLLM イメージ更新後に Gemma 4 thinking 有効化と新 chat template を適用する douhashi/runpod-vllm-gemma#1

Open

5 tasks

faradawn mentioned this pull request Apr 10, 2026

Improve out-of-the-box recipe for Gemma4 vllm-project/recipes#329

Open

3 tasks

FredericOdermatt mentioned this pull request Apr 11, 2026

[Fix] Sync gemma4 chat template from hf #39570

Merged

the-david-oy mentioned this pull request Apr 29, 2026

[CONTRIBUTION]: Add Gemma 4 tool-call and reasoning parsers ai-dynamo/dynamo#8851

Closed

Kimahriman mentioned this pull request May 8, 2026

[Bugfix] Fix Gemma4 reasoning for batch chat completions #42105

Open

alexbi29 added a commit to alexbi29/vllm that referenced this pull request May 18, 2026

Revert "[Tool] adjust_request to reasoning parser, and Gemma4 fixes (…

a851278

…vllm-project#39027)" This reverts commit 60a84ac.

GrootLiu mentioned this pull request May 20, 2026

[Feature] [Bugfix] Tool Call Parser Support & Gemma4 Bug Fix baidu/vLLM-Kunlun#360

Merged

3 tasks

Uh oh!

Conversation

bbrowning commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

BFCL multi_turn suite to uncover bugs and validate fixes

Unit Tests

Claude Code pointed at a Gemma 4 model running locally

Test Result

Unit Tests

pytest tests/reasoning/test_gemma4_reasoning_parser.py

pytest tests/renderers/test_gemma4_chat_template.py

tests/reasoning

BFCL Results

Claude Code usability

Uh oh!

mergify Bot commented Apr 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ywang96 commented Apr 6, 2026

Uh oh!

bbrowning commented Apr 6, 2026

Uh oh!

bbrowning commented Apr 6, 2026

Uh oh!

lucianommartins commented Apr 6, 2026

Uh oh!

bbrowning commented Apr 6, 2026

Uh oh!

drrros commented Apr 6, 2026

Uh oh!

bbrowning commented Apr 6, 2026

Uh oh!

bbrowning commented Apr 8, 2026

Uh oh!

bbrowning commented Apr 8, 2026

Uh oh!

mergify Bot commented Apr 8, 2026

Uh oh!

bbrowning commented Apr 8, 2026

Uh oh!

guidryheal-create commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guidryheal-create commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

bbrowning commented Apr 5, 2026 •

edited

Loading

`pytest tests/reasoning/test_gemma4_reasoning_parser.py`

`pytest tests/renderers/test_gemma4_chat_template.py`

`tests/reasoning`

guidryheal-create commented May 10, 2026 •

edited

Loading

guidryheal-create commented May 10, 2026 •

edited

Loading