Eval bug: Anthropic Messages API drops thinking content blocks during conversion

### Name and Version

```
version: 8189 (4d828bd1a)
built with GNU 13.3.0 for Linux x86_64
```

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

NVIDIA GeForce RTX 3090

### Models

Qwen3.5-35B-A3B (UD-Q4_K_M quantization via Unsloth)
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-UD-Q4_K_M-GGUF

### Problem description & steps to reproduce

The Anthropic Messages API (`/v1/messages`) silently drops `thinking` content blocks when converting to the internal OpenAI chat format. In `tools/server/server-common.cpp`, the function `convert_anthropic_to_oai()` handles content block types `text`, `image`, `tool_use`, and `tool_result`, but has no handler for `thinking` blocks. They are silently ignored, and prior assistant messages are converted without the `reasoning_content` field.

This was discovered while using Claude Code with Qwen3.5 via llama-server's Anthropic-compatible endpoint. The impact depends on how each model's chat template handles thinking in conversation history (see comment below for details on Qwen3.5's specific behavior).

**Steps to reproduce:**

1. Serve a thinking-capable model with `llama-server` and tool calling enabled
2. Use a client that calls `/v1/messages` with `thinking.type: "enabled"` and sends `thinking` blocks back in conversation history (e.g., Claude Code, or any Anthropic API client following the [extended thinking spec](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#multi-turn-conversations))
3. Observe that `thinking` blocks are absent from the converted OpenAI messages (visible via the `converted request` debug log)

**Fix:** Two changes in `tools/server/server-common.cpp`, function `convert_anthropic_to_oai()`:

1. Add a handler for `thinking` blocks to accumulate reasoning content
2. Set `reasoning_content` on the converted message so the chat template can use it

### First Bad Commit

This has always been the case — `convert_anthropic_to_oai()` has never handled `thinking` blocks.

### Relevant log output

<details>
<summary>Logs</summary>

```console
slot process_toke: id  0 | task 10718 | n_decoded = 17, n_remaining = 31983, next token:    25 ':'
Grammar still awaiting trigger after token 248046 (`<|im_end|>`)
slot process_toke: id  0 | task 10718 | stopped by EOS
slot process_toke: id  0 | task 10718 | n_decoded = 18, n_remaining = 31982, next token: 248046 ''
prompt eval time =    9701.77 ms / 23996 tokens
       eval time =     157.32 ms /    18 tokens
Parsed message: {"role":"assistant","content":"","reasoning_content":"Let me also check what branch you're on and what recent work has been done:"}
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":18}}
```
</details>

### Related

- #19513 — Similar symptom (premature EOS instead of tool call) but different root cause and model. That issue affected Qwen3-Coder-Next (non-thinking) on the Responses API due to contiguous assistant message fragmentation, fixed by #19773.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Anthropic Messages API drops thinking content blocks during conversion #20090

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Anthropic Messages API drops thinking content blocks during conversion #20090

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions