Name and Version
llama-server --version
version: 9211 (5bf468a2f)
built with GNU 13.3.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA GeForce RTX 3090 (4x)
Models
unsloth/Qwen3.6-27B-GGUF
Problem description & steps to reproduce
When using reasoning_format: "none" in streaming mode, the <think> opening tag is stripped from the content deltas in builds ≥b9211. The thinking content flows through content deltas without the opening tag, making it impossible for clients to detect where reasoning content begins.
Expected behavior (build b9191 and earlier):
First content delta with reasoning_format: "none":
{"content": "</think>
Here's a thinking process:
Actual behavior (build b9211 and later):
First content delta:
{"content": "Here's a thinking process:
The </think> tag is stripped but the reasoning content is NOT sent as separate reasoning_content deltas - it's all in content deltas without the tag.
Non-streaming mode works correctly - the full response includes the </think> tag as expected.
Steps to reproduce
- Start llama-server with a Qwen3.6 model
- Send streaming request with
reasoning_format: "none":
curl -s -X POST http://localhost:8082/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/Qwen3.6-27B-GGUF",
"messages": [{"role": "user", "content": "who are you"}],
"stream": true,
"reasoning_format": "none",
"max_tokens": 30
}'
- Observe that the first content delta does NOT include the
</think> tag
First Bad Commit
Build b9227 (autoparser PR) introduced this behavior. Build b9191 works correctly.
Relevant log output
N/A - this is observable in the SSE delta output
Name and Version
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA GeForce RTX 3090 (4x)
Models
unsloth/Qwen3.6-27B-GGUF
Problem description & steps to reproduce
When using
reasoning_format: "none"in streaming mode, the<think>opening tag is stripped from the content deltas in builds ≥b9211. The thinking content flows throughcontentdeltas without the opening tag, making it impossible for clients to detect where reasoning content begins.Expected behavior (build b9191 and earlier):
First content delta with
reasoning_format: "none":{"content": "</think> Here's a thinking process:Actual behavior (build b9211 and later):
First content delta:
{"content": "Here's a thinking process:The
</think>tag is stripped but the reasoning content is NOT sent as separatereasoning_contentdeltas - it's all incontentdeltas without the tag.Non-streaming mode works correctly - the full response includes the
</think>tag as expected.Steps to reproduce
reasoning_format: "none":</think>tagFirst Bad Commit
Build b9227 (autoparser PR) introduced this behavior. Build b9191 works correctly.
Relevant log output
N/A - this is observable in the SSE delta output