Eval bug: Qwen3.5 and 3.6  tool call emitted in reasoning_content instead of delta.tool_calls  (GitHub CoPilot client)

### Name and Version

Current head in repo 

d8794eecd (HEAD -> master, origin/master, origin/HEAD) examples: refactor diffusion generation (#22590)

./build/bin/llama-server --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 32109 MiB):
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes, VRAM: 32109 MiB
version: 9022 (d8794eecd)
built with GNU 13.3.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

GPU : GeForce RTX 5090
CPU : AMD EPYC 9J14 

### Models

Qwen3.5-35B-A3B-Q6_K.gguf  (https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF)

### Problem description & steps to reproduce

When Qwen models are used with GitHub CoPilot, sometimes copilot fails to process response. 

<img width="735" height="436" alt="Image" src="https://github.com/user-attachments/assets/3910c017-c794-45fa-a6f2-aae1e5175da7" />

Recent PRs fixed quite a few issues, this one still occures. Few sample requests are attached from network catpure. 

Common pattern seems to be : 

model generates a Qwen-style `<tool_call>...</tool_call>` block, but these are streamed inside `delta.reasoning_content` instead of converting it to `delta.tool_calls`.

- stream ends with `data: [DONE]`
- `delta.content` is empty
- `delta.tool_calls` is absent
- `delta.reasoning_content` contains a complete `<tool_call>...</tool_call>`
- final `finish_reason` is `"stop"`

Expected response : 

- structured `delta.tool_calls`
- final `finish_reason` should be `"tool_calls"`

[row_1142_read_file_requirements.request.sanitized.json](https://github.com/user-attachments/files/27366987/row_1142_read_file_requirements.request.sanitized.json)
[row_1230_run_terminal_backend.request.sanitized.json](https://github.com/user-attachments/files/27366985/row_1230_run_terminal_backend.request.sanitized.json)
[row_1256_read_file_logger_after_compaction.request.sanitized.json](https://github.com/user-attachments/files/27366986/row_1256_read_file_logger_after_compaction.request.sanitized.json)

### First Bad Commit

Does not seem to be a regression 

### Relevant log output

No error in logs , sample output that causes CoPilot to fail : 

data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"<tool_call>"}}],...}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"<function=read_file>"}}],...}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"<parameter=filePath>"}}],...}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"/tmp/example-workspace/example-project/backend/requirements.txt"}}],...}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"reasoning_content":"</tool_call>"}}],...}
data: {"choices":[{"finish_reason":"stop","index":0,"delta":{}}],...}
data: [DONE]


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Qwen3.5 and 3.6 tool call emitted in reasoning_content instead of delta.tool_calls (GitHub CoPilot client) #22684

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: Qwen3.5 and 3.6 tool call emitted in reasoning_content instead of delta.tool_calls (GitHub CoPilot client) #22684

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions