Skip to content

Bug (minor): gpt-oss output mixes up think mode and final response #884

@aittalam

Description

@aittalam

Contact Details

No response

What happened?

When running llamafile 0.10.0 alpha gpt-oss models, the think mode output and final output are mixed together, and the tags separating different parts of the answer are ignored:

build: 1770734049 (f47edb8c1) with cosmocc for cosmopolitan
software: llamafile 0.10.0-dev
model:    gpt-oss-20b-MXFP4.gguf
compute:  Apple Metal GPU

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
>>> Hello!
analysisWe need to respond politely.assistantfinalHello! How can I help you today?
>>>

This does not happen with other models with think mode, e.g. Qwen3:

build: 1771373822 (f47edb8c1) with cosmocc for cosmopolitan
software: llamafile 0.10.0-dev
model:    Qwen3-8B-Q6_K.gguf
compute:  Apple Metal GPU

format:   Hermes 2 Pro

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
>>> Hello!
<think>
Okay, the user said "Hello!" so I need to respond appropriately. Since they're starting a conversation, I should greet them back and offer assistance. Let me make sure to keep it friendly and open-ended. Maybe something like, "Hello! How can I assist you today?" That sounds good. It's polite and invites them to ask any questions they might have. I should also keep the tone positive and helpful. Alright, that's all set.
</think>

Hello! How can I assist you today? 😊

Version

llamafile 0.10.0 build d32a080

What operating system are you seeing the problem on?

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions