Eval bug: Answer in think tags. Qwen 3.6 27B

### Name and Version

drros@epyc-ws:~/llama.cpp$ ./build/bin/llama-cli --version
ggml_cuda_init: found 3 CUDA devices (Total VRAM: 71963 MiB):
  Device 0: NVIDIA RTX PRO 4000 Blackwell, compute capability 12.0, VMM: yes, VRAM: 23987 MiB
  Device 1: NVIDIA RTX PRO 4000 Blackwell, compute capability 12.0, VMM: yes, VRAM: 23987 MiB
  Device 2: NVIDIA RTX PRO 4000 Blackwell, compute capability 12.0, VMM: yes, VRAM: 23987 MiB
version: 8940 (78433f606)
built with GNU 13.3.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

AMD Epyc 9274f \ 384Gb 4800 mt\s ddr5 \ 3*RTX PRO 4000 Blackwell

### Models

Qwen 3.6 27B - [Unsloth's Q8-XL](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/blob/main/Qwen3.6-27B-UD-Q8_K_XL.gguf) quant with BF16 mmproj.

### Problem description & steps to reproduce

Sometimes model spits out answer inside <think> or doubles tags, not sure, but it looks like this in web interface:

<img width="812" height="611" alt="Image" src="https://github.com/user-attachments/assets/3cde2326-8fae-44eb-8add-85cc236638f5" />

- In logs meantime there is no parsing warnings or any other non standard messages.
Ofc not sure is it a model itself or parsing \ template issue bug. But this is seems like cause of strange stops in agentic workloads - sometimes it just stops after some time and if asked to "continue" it runs fine again.
Tend to think this is model issue, as this happened couple times also in VLLM (although I've not tried recent nightlies for couple days).
Also not sure how to reproduce this in controlled environment, as this happened to me just 2-3 times in web interface, with chats not having anything in common. This screenshot for example is just younger daughters math's training, but it also happened in some python dev chats.
Model load params and logs in logs section.

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
drros@epyc-ws:~/llama.cpp$ LLAMA_SET_ROWS=1 ./build/bin/llama-server --models-preset ../my-models.ini  --models-max 1 -np 6 --port 30000 --host 192.168.0.60 --webui-mcp-proxy --no-mmproj-offload -kvu --mlock --spec-type ngram-mod --spec-ngram-size-n 48  --draft-min 4 --draft-max 64
g

srv          load: spawning server instance with name=qwen3.6-27b-ud-q8-thinking-coding-vision on port 54915
srv          load: spawning server instance with args:
srv          load:   /home/drros/llama.cpp/build/bin/llama-server
srv          load:   --chat-template-kwargs
srv          load:   {"preserve_thinking":true}
srv          load:   --draft-max
srv          load:   64
srv          load:   --draft-n-min
srv          load:   4
srv          load:   --host
srv          load:   127.0.0.1
srv          load:   --image-min-tokens
srv          load:   2048
srv          load:   --min-p
srv          load:   0.0
srv          load:   --mlock
srv          load:   --no-mmap
srv          load:   --no-mmproj-offload
srv          load:   --port
srv          load:   54915
srv          load:   --presence-penalty
srv          load:   0.0
srv          load:   --repeat-penalty
srv          load:   1.0
srv          load:   --spec-ngram-size-n
srv          load:   48
srv          load:   --spec-type
srv          load:   ngram-mod
srv          load:   --temperature
srv          load:   0.6
srv          load:   --top-k
srv          load:   20
srv          load:   --top-p
srv          load:   0.95
srv          load:   --webui-mcp-proxy
srv          load:   --alias
srv          load:   qwen3.6-27b-ud-q8-thinking-coding-vision
srv          load:   --ctx-size
srv          load:   262144
srv          load:   --cache-ram
srv          load:   65536
srv          load:   --swa-checkpoints
srv          load:   128
srv          load:   --kv-unified
srv          load:   --model
srv          load:   /mnt/ds1nfs/codellamaweights/qwen3.6-27b-q8-xl/Qwen3.6-27B-UD-Q8_K_XL.gguf
srv          load:   --mmproj
srv          load:   /mnt/ds1nfs/codellamaweights/qwen3.6-27b-q8-xl/mmproj-BF16.gguf
srv          load:   --parallel
srv          load:   6
srv          load:   --reasoning
srv          load:   on
srv          load:   --split-mode
srv          load:   tensor
srv          load:   --ubatch-size
srv          load:   2048

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Answer in think tags. Qwen 3.6 27B #22398

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: Answer in think tags. Qwen 3.6 27B #22398

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions