[Bug]: Responses API does not surface reasoning output with `--reasoning-parser gemma4` (works with deepseek_r1)

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
OS                           : Ubuntu 22.04.5 LTS (x86_64)
PyTorch version              : 2.11.0+cu130
CUDA used to build PyTorch   : 13.0
Python version               : 3.12.13
Is CUDA available            : True
CUDA runtime version         : 13.0.88
GPU 0-3: NVIDIA RTX PRO 6000 Blackwell Server Edition
vLLM Version                 : 0.21.0
transformers                 : 5.8.1
```

</details>

### 🐛 Describe the bug

With `--reasoning-parser gemma4` enabled on vLLM v0.21.0, the Chat Completions API correctly surfaces model reasoning in a `reasoning` field alongside tool calls. However, the Responses API (`/v1/responses`) does not surface reasoning output in any form — `reasoning_tokens` is always 0 and no `ResponseReasoningItem` output item appears — even when `reasoning: {"effort": "high"}` is passed in the request.

This is specific to the `gemma4` reasoning parser — tested with Qwen3 + `deepseek_r1` which works correctly on the same image.

**Server start command:**

```bash
docker run -d \
  --name vllm-gemma4 \
  --gpus all \
  --ipc=host \
  -p 8000:8000 \
  -e VLLM_ENABLE_RESPONSES_API_STORE=1 \
  vllm/vllm-openai:v0.21.0 \
  --model google/gemma-4-26B-A4B-it \
  --tensor-parallel-size 4 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.9 \
  --max-num-seqs 4 \
  --max-num-batched-tokens 14336 \
  --tool-call-parser functiongemma \
  --enable-auto-tool-choice \
  --reasoning-parser gemma4
```

**Reproduction script:**

```python
import requests, json

BASE = "http://localhost:8000/v1"
MODEL = "google/gemma-4-26B-A4B-it"

tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    }
]

# Test 1: Chat Completions — reasoning WORKS
print("=== Chat Completions (works) ===")
r1 = requests.post(f"{BASE}/chat/completions", json={
    "model": MODEL,
    "messages": [{"role": "user", "content": "What is the weather in NYC?"}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}],
    "tool_choice": "required",
    "max_tokens": 1024,
    "chat_template_kwargs": {"enable_thinking": True},
})
d1 = r1.json()
msg = d1["choices"][0]["message"]
print(f"  reasoning: {msg.get('reasoning', '')[:200]}")
print(f"  tool_calls: {msg.get('tool_calls')}")
print(f"  finish_reason: {d1['choices'][0]['finish_reason']}")

# Test 2: Responses API — reasoning NOT surfaced
print("\n=== Responses API (broken) ===")
r2 = requests.post(f"{BASE}/responses", json={
    "model": MODEL,
    "input": [{"role": "user", "content": "What is the weather in NYC?"}],
    "tools": tools,
    "tool_choice": "required",
    "reasoning": {"effort": "high"},
})
d2 = r2.json()
usage = d2.get("usage", {})
output_details = usage.get("output_tokens_details", {})
print(f"  reasoning_tokens: {output_details.get('reasoning_tokens', 0)}")
print(f"  output types: {[item.get('type') for item in d2.get('output', [])]}")
print(f"  top-level reasoning field: {d2.get('reasoning')}")

# Test 3: Responses API text-only — still no reasoning
print("\n=== Responses API text-only (also broken) ===")
r3 = requests.post(f"{BASE}/responses", json={
    "model": MODEL,
    "input": [{"role": "user", "content": "What is 2+2? Think step by step."}],
    "reasoning": {"effort": "high"},
    "max_output_tokens": 500,
})
d3 = r3.json()
usage3 = d3.get("usage", {})
print(f"  reasoning_tokens: {usage3.get('output_tokens_details', {}).get('reasoning_tokens', 0)}")
print(f"  output types: {[item.get('type') for item in d3.get('output', [])]}")
```

**Expected behavior:**

The Responses API should surface reasoning output when `--reasoning-parser gemma4` is active and `reasoning: {"effort": "high"}` is passed. Expected:
- A `"reasoning"` output item (`ResponseReasoningItem`) containing the model's thinking
- `reasoning_tokens > 0` in usage details

The Chat Completions API demonstrates this works at the model/parser level — the Responses API just doesn't wire it through to output items.

**Actual behavior:**

```
=== Chat Completions (works) ===
  reasoning: The user is asking about the weather in NYC. I should look at the available tools... The `get_weather` tool seems appropriate for this task.
  tool_calls: [{'id': 'chatcmpl-tool-...', 'type': 'function', 'function': {'name': 'get_weather', 'arguments': '{"city": "NYC"}'}}]
  finish_reason: tool_calls

=== Responses API (broken) ===
  reasoning_tokens: 0
  output types: ['function_call']
  top-level reasoning field: None

=== Responses API text-only (also broken) ===
  reasoning_tokens: 0
  output types: ['message']
```

### Reproduction Evidence

Confirmed on `vllm/vllm-openai:latest` (v0.21.0) with `google/gemma-4-26B-A4B-it`, TP=4, `--reasoning-parser gemma4`, `--tool-call-parser functiongemma`:

```
=== Chat Completions with reasoning ===
  reasoning present: True
  reasoning preview: The user wants to know the product of 15 × 37...
  finish_reason: stop

=== Chat Completions with tools + reasoning ===
  reasoning present: True
  reasoning preview: The user is asking about the weather in NYC. I should look for a tool...
  tool_calls: [{"function": {"name": "get_weather", "arguments": "{\"city\": \"NYC\"}"}}]
  finish_reason: tool_calls

=== Responses API with reasoning={'effort': 'high'} ===
  reasoning_tokens: 0
  output types: ['message']
  has ResponseReasoningItem: False

=== Responses API with tools + reasoning ===
  reasoning_tokens: 0
  output types: ['function_call']
  has ResponseReasoningItem: False
```

**Note:** This bug is parser-specific. Tested with Qwen3-1.7B + `--reasoning-parser deepseek_r1` on the same image — the Responses API correctly returns `ResponseReasoningItem` with `reasoning_tokens: 1023`. The issue is specific to the `gemma4` reasoning parser integration with the Responses API.

### Analysis

The root cause is in the `gemma4` reasoning parser's interaction with the Responses API serving path:

1. **Chat Completions path works**: The `gemma4` reasoning parser correctly extracts `<think>...</think>` blocks and surfaces them in the `reasoning` field of the chat completion response.

2. **Responses API non-harmony path (`_make_response_output_items`)**: Delegates to `parser.extract_response_outputs()` at `serving.py:1044`. The gemma4 parser's implementation of this method appears to not construct `ResponseReasoningItem` objects from the parsed reasoning content — it may be stripping the reasoning and only returning the final content/tool calls.

3. **Contrast with working parsers**: The `deepseek_r1` parser correctly produces `ResponseReasoningItem` in its `extract_response_outputs()` implementation, which is why Qwen3 works fine.

4. **`reasoning_tokens` counting also fails**: The fallback at `serving.py:866` that tries to count reasoning tokens from accumulated token IDs doesn't trigger for the gemma4 parser context.

### Related Issues / PRs

- PR #41393 — "Adding reasoning for responses API V1" (open, adds `chat_template_kwargs` + `thinking_token_budget` passthrough — partial fix for a different layer)
- Issue #42962 — "Documentation falsely claims chat_template_kwargs support for the Responses API"
- Issue #33915 — "Support `include_reasoning` request parameter for non-harmony models"

This issue is distinct: it's about the `gemma4` reasoning parser not producing `ResponseReasoningItem` in its `extract_response_outputs()` path, while other parsers (like `deepseek_r1`) work correctly.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Responses API does not surface reasoning output with `--reasoning-parser gemma4` (works with deepseek_r1) #43395

Your current environment

🐛 Describe the bug

Reproduction Evidence

Analysis

Related Issues / PRs

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Responses API does not surface reasoning output with --reasoning-parser gemma4 (works with deepseek_r1) #43395

Description

Your current environment

🐛 Describe the bug

Reproduction Evidence

Analysis

Related Issues / PRs

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: Responses API does not surface reasoning output with `--reasoning-parser gemma4` (works with deepseek_r1) #43395