What happened?
qwen-code emits the legacy reasoning_content field on outgoing assistant messages, never the new reasoning field. vLLM ≥ 0.20 (per RFC #27755) reads only message.get("reasoning") and discards reasoning_content before the chat template ever runs. The Qwen3.6 chat template, in turn, looks up message.reasoning_content — but vLLM has already stripped it. Net effect: every prior <think> block renders empty.
Symptom: In multi-turn agent loops, the prior <think>...</think> tags are emitted but the content between them is empty. The Qwen3.6 chat template renders every preserved prior turn as <|im_start|>assistant\n<think>\n\n</think>\n\n<content> instead of <think>\n<prior-reasoning>\n</think>\n\n<content>. The tag wrapper is correct — only the chain-of-thought inside is dropped. This defeats both the template's default "interleaved thinking" preservation and the explicit preserve_thinking: true opt-in.
Setup: qwen-code 0.15.11 (npm) pointed at a local vLLM 0.20.x server hosting Qwen/Qwen3.6-35B-A3B
What did you expect to happen?
This is contrary to Qwen3.6's own model-card recommendation:
"Qwen3.6 has been additionally trained to preserve and leverage thinking traces from historical messages. … This capability is particularly beneficial for agent scenarios, where maintaining full reasoning context can enhance decision consistency and, in many cases, reduce overall token consumption by minimizing redundant reasoning. Additionally, it can improve KV cache utilization."
— https://huggingface.co/Qwen/Qwen3.6-35B-A3B
Client information
Client Information
Run qwen to enter the interactive CLI, then run the /about command.
$ qwen /about
# paste output here
Login information
No response
Anything else we need to know?
No response
What happened?
qwen-code emits the legacy
reasoning_contentfield on outgoing assistant messages, never the newreasoningfield. vLLM ≥ 0.20 (per RFC #27755) reads onlymessage.get("reasoning")and discardsreasoning_contentbefore the chat template ever runs. The Qwen3.6 chat template, in turn, looks upmessage.reasoning_content— but vLLM has already stripped it. Net effect: every prior<think>block renders empty.Symptom: In multi-turn agent loops, the prior
<think>...</think>tags are emitted but the content between them is empty. The Qwen3.6 chat template renders every preserved prior turn as<|im_start|>assistant\n<think>\n\n</think>\n\n<content>instead of<think>\n<prior-reasoning>\n</think>\n\n<content>. The tag wrapper is correct — only the chain-of-thought inside is dropped. This defeats both the template's default "interleaved thinking" preservation and the explicitpreserve_thinking: trueopt-in.Setup: qwen-code 0.15.11 (npm) pointed at a local vLLM 0.20.x server hosting
Qwen/Qwen3.6-35B-A3BWhat did you expect to happen?
This is contrary to Qwen3.6's own model-card recommendation:
Client information
Client Information
Run
qwento enter the interactive CLI, then run the/aboutcommand.Login information
No response
Anything else we need to know?
No response