Skip to content

Regression: Reasoning/thinking output provided as regular output #9985

@mgoltzsche

Description

@mgoltzsche

LocalAI version:

LocalAI v4.3.1
container: localai/localai:v4.3.1-gpu-vulkan

Environment, CPU architecture, OS, and Version:

Ubuntu 24.04 host with an AMD Ryzen 7 5800X CPU and an AMD Radeon RX 6600 GPU

Linux max-machine 6.8.0-117-generic #117-Ubuntu SMP PREEMPT_DYNAMIC Tue May  5 19:26:24 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

The reasoning output of the LLM is provided along with the regular output within the content field instead of providing both outputs within separate fields.
This is a regression since it worked in LocalAI v4.0.0 but stopped working at some point prior to LocalAI v4.3.1.

To Reproduce

  1. Start LocalAI, e.g. docker run -ti --rm --network=host --privileged -v "$(pwd)/data/models:/models" -v "$(pwd)/data/backends:/backends" localai/localai:v4.3.1-gpu-vulkan --address 127.0.0.1:8080
  2. Download the qwen3-4b model.
  3. Request chat completion: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "qwen3-4b", "messages": [{"role": "user", "content": "Hello"}]}'
  4. Observe that the response JSON contains the actual AI response along with the reasoning output within the content field as opposed to the reasoning output being provided separately within the reasoning field:
{"created":1779732183,"object":"chat.completion","id":"23fc7722-53af-48d7-be9e-6174f84e12b9","model":"qwen3-4b","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\u003cthink\u003e\nOkay, the user sent \"Hello\". I need to respond appropriately. Since it's a greeting, I should reply with a friendly and welcoming message. Maybe start with a greeting, then ask how they're doing. Keep it simple and open-ended so they feel comfortable to share more. Also, make sure to mention that I'm here to help with any questions or needs they have. Avoid any technical jargon. Just a warm and inviting response.\n\u003c/think\u003e\n\nHello! How can I assist you today? 😊"}}],"usage":{"prompt_tokens":10,"completion_tokens":106,"total_tokens":116}}

Expected behavior

The reasoning output should be provided within the reasoning field separate from the actual output that should be provided within the content field as it was within LocalAI v4.0.0.
To give an example, here is a LocalAI v4.0.0 response:

{"created":1779731898,"object":"chat.completion","id":"653a17cd-c765-418d-b8df-3979ea5422dd","model":"qwen3-4b","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"\n\nHello! How can I assist you today? 😊","reasoning":"Okay, the user just said \"Hello\". I need to respond appropriately. Since they're greeting me, I should respond in a friendly and welcoming manner. Let me make sure to acknowledge their greeting and offer assistance.\n\nI should keep it simple and polite. Maybe start with a greeting back, then ask how I can help them. That way, it's open-ended and invites them to share more about what they need.\n\nI should avoid any technical jargon or complex language. The response should be easy to understand and conversational. Let me check for any possible misunderstandings. If they meant something specific by \"Hello\", but since it's just a greeting, I think a standard response is best.\n\nAlso, I should make sure to keep the tone positive and helpful. No need to add anything else unless they ask for more. Just a straightforward reply."}}],"usage":{"prompt_tokens":10,"completion_tokens":187,"total_tokens":197}}

Logs

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions