Skip to content

Eval bug: Qwen3.5 9B often prints tool calls in XML and stops when thinking is enabled - tool calls inside thinking block #20837

@kik4444

Description

@kik4444

Name and Version

ggml_cuda_init: found 1 ROCm devices (Total VRAM: 16368 MiB):
Device 0: AMD Radeon RX 6900 XT, gfx1030 (0x1030), VMM: no, Wave Size: 32, VRAM: 16368 MiB
version: 8461 (cea560f)
built with GNU 15.2.0 for Linux x86_64

Operating systems

Linux

GGML backends

HIP

Hardware

CPU: AMD Ryzen 9 5950X
GPU: AMD Radeon RX 6900 XT
OS: NixOS 26.05 (Yarara) x86_64

I think the GGML backend is HIP because that's what the nixpkgs package llama-cpp-rocm enables - https://github.com/NixOS/nixpkgs/blob/a1e8ce6b50ffa87ad0d39881c47eb214982330dc/pkgs/by-name/ll/llama-cpp/package.nix

Models

unsloth Qwen3.5-9B-UD-Q4_K_XL
https://huggingface.co/unsloth/Qwen3.5-9B-GGUF

Problem description & steps to reproduce

I'm running llama.cpp server with the command:

llama-server --port 8001 --models-preset ~/llama.cpp/models.ini --host 0.0.0.0

My models.ini contains:

version = 1

kv-unified = true
cache-type-k = q8_0
cache-type-v = q8_0

[Qwen3.5-9B-UD-Q4-Coding]
model = /home/user/llama.cpp/unsloth/qwen3.5-9b-ud-q4_k_xl/unsloth_Qwen3.5-9B-GGUF_Qwen3.5-9B-UD-Q4_K_XL.gguf
mmproj = /home/user/llama.cpp/unsloth/qwen3.5-9b-ud-q4_k_xl/unsloth_Qwen3.5-9B-GGUF_mmproj-BF16.gguf
alias = Qwen3.5-9B-UD-Q4-Coding

ctx-size = 131072
fit = on
flash-attn = true
chat-template-kwargs = {"enable_thinking":true}

temp = 0.6
top-p = 0.95
top-k = 20
min-p = 0.00
presence-penalty = 0.0
repeat-penalty = 1.0

Then I use the Zed editor to connect to my model from another computer. Here's the relevant bits of my chat with it using Zed's feature to export your agent chat to markdown:

Chat markdown
## User

What tools do you have?

## Assistant

<thinking>
The user is asking about what tools I have available. I should provide a clear, helpful summary of the tools I can use, formatted nicely in markdown. I don't need to use any tools for this - I can just explain them from the information I have.

</thinking>

I have access to several tools that can help you work with code and projects. Here's what I can do:

## File Operations

- **`read_file`** - Read the content of files in your project. If files are large, I'll get an outline with line numbers so we can dive into specific sections.
- **`list_directory`** - List the contents of a directory to explore the project structure.
- **`find_path`** - Fast file path matching using glob patterns (e.g., `**/*.rs` to find all Rust files). Great for locating files by name.
- **`grep`** - Search file contents using regular expressions. Prefer this over path search when looking for symbols or specific code patterns.

## Project Diagnostics

- **`diagnostics`** - Get errors and warnings for the project or a specific file. Useful for checking the health of your code after making changes.

## Web & Network

- **`fetch`** - Fetch URLs and return the content as Markdown. Handy for loading documentation or external resources.
- **`open`** - Open files or URLs with your default application (browser, text editor, etc.).

## System Tools

- **`now`** - Get the current datetime in RFC 3339 format (UTC or local time).

## Advanced

- **`spawn_agent`** - Delegate tasks to agents when you need parallel work or complex tasks that benefit from a dedicated session.

---

All of these tools are designed to work with your project in the root directory `/path/to/project`. Just let me know what you're working on and I can help you navigate and manipulate the code!

## User

Read through [@main.nu](file:///path/to/project/main.nu) and give me a summary of what it does

## Assistant

<thinking>

I need to read the actual content of main.nu file to provide a summary - the file outline only shows 7 symbols which is not enough detail.

</thinking>

I can see the file outline shows this is a Nushell script with some notes about <omitted>. However, I only see a file outline with symbol names. Let me read the full content to give you a comprehensive summary.



**Tool Call: Read file `main.nu`**
Status: Completed

```main.nu
SUCCESS: File outline retrieved. This file is too large to read all at once, so the outline below shows the file's structure with line numbers.

IMPORTANT: Do NOT retry this call without line numbers - you will get the same outline.
Instead, use the line numbers below to read specific sections by calling this tool again with start_line and end_line parameters.

# File outline for /path/to/project/main.nu

<omitted>

Showing symbols 1-7 (total symbols: 7)


NEXT STEPS: To read a specific symbol's implementation, call read_file with the same path plus start_line and end_line from the outline above.
For example, to read a function shown as [L100-150], use start_line: 100 and end_line: 150.
```


## Assistant

<thinking>

The file is too large and I only see a structure outline. I need to read specific sections. Let me start from the beginning to understand the overall structure.

<tool_call>
<function=read_file>
<parameter=path>
main.nu
</parameter>
<parameter=start_line>
1
</parameter>
<parameter=end_line>
100
</parameter>
</function>
</tool_call>
</thinking>

After the agent prints the tool call in XML it simply stops running as if it's finished.

Sometimes it succeeds making one tool call, but then it starts printing XML and I give up on it. I have no idea if it may or may not make another successful call if you try enough times after it starts printing XML.

However, if I disable thinking with chat-template-kwargs = {"enable_thinking":false}, then it makes all its tool calls without problems (at least as far as I've tested), e.g. it can read the whole 1000-line file in several consecutive tool calls.

I've also tried running llama-server with the flag --no-cache-prompt as mentioned in #20614 but that had no effect.

I think the same or similar issue is also reported here
ollama/ollama#14745
ollama/ollama#14493

First Bad Commit

I've had this problem since llama.cpp 8255. I haven't tried earlier versions.

Relevant log output

Here's the log from llama-server from running the model until the XML tool call. It shows that it's served several requests because the model actually tried to read the file multiple times but it got the path wrong. When it finally got it right, that's when it printed the XML.

log.txt

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingchat parserIssues related to the chat parser and chat templates

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions