Skip to content

Eval bug: Anthropic Messages API drops thinking content blocks during conversion #20090

@T0mSIlver

Description

@T0mSIlver

Name and Version

version: 8189 (4d828bd1a)
built with GNU 13.3.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

NVIDIA GeForce RTX 3090

Models

Qwen3.5-35B-A3B (UD-Q4_K_M quantization via Unsloth)
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-UD-Q4_K_M-GGUF

Problem description & steps to reproduce

The Anthropic Messages API (/v1/messages) silently drops thinking content blocks when converting to the internal OpenAI chat format. In tools/server/server-common.cpp, the function convert_anthropic_to_oai() handles content block types text, image, tool_use, and tool_result, but has no handler for thinking blocks. They are silently ignored, and prior assistant messages are converted without the reasoning_content field.

This was discovered while using Claude Code with Qwen3.5 via llama-server's Anthropic-compatible endpoint. The impact depends on how each model's chat template handles thinking in conversation history (see comment below for details on Qwen3.5's specific behavior).

Steps to reproduce:

  1. Serve a thinking-capable model with llama-server and tool calling enabled
  2. Use a client that calls /v1/messages with thinking.type: "enabled" and sends thinking blocks back in conversation history (e.g., Claude Code, or any Anthropic API client following the extended thinking spec)
  3. Observe that thinking blocks are absent from the converted OpenAI messages (visible via the converted request debug log)

Fix: Two changes in tools/server/server-common.cpp, function convert_anthropic_to_oai():

  1. Add a handler for thinking blocks to accumulate reasoning content
  2. Set reasoning_content on the converted message so the chat template can use it

First Bad Commit

This has always been the case — convert_anthropic_to_oai() has never handled thinking blocks.

Relevant log output

Logs
slot process_toke: id  0 | task 10718 | n_decoded = 17, n_remaining = 31983, next token:    25 ':'
Grammar still awaiting trigger after token 248046 (`<|im_end|>`)
slot process_toke: id  0 | task 10718 | stopped by EOS
slot process_toke: id  0 | task 10718 | n_decoded = 18, n_remaining = 31982, next token: 248046 ''
prompt eval time =    9701.77 ms / 23996 tokens
       eval time =     157.32 ms /    18 tokens
Parsed message: {"role":"assistant","content":"","reasoning_content":"Let me also check what branch you're on and what recent work has been done:"}
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":18}}

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions