-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Bug (minor): gpt-oss output mixes up think mode and final response #884
Copy link
Copy link
Closed
Labels
Description
Contact Details
No response
What happened?
When running llamafile 0.10.0 alpha gpt-oss models, the think mode output and final output are mixed together, and the tags separating different parts of the answer are ignored:
build: 1770734049 (f47edb8c1) with cosmocc for cosmopolitan
software: llamafile 0.10.0-dev
model: gpt-oss-20b-MXFP4.gguf
compute: Apple Metal GPU
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
>>> Hello!
analysisWe need to respond politely.assistantfinalHello! How can I help you today?
>>>
This does not happen with other models with think mode, e.g. Qwen3:
build: 1771373822 (f47edb8c1) with cosmocc for cosmopolitan
software: llamafile 0.10.0-dev
model: Qwen3-8B-Q6_K.gguf
compute: Apple Metal GPU
format: Hermes 2 Pro
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
>>> Hello!
<think>
Okay, the user said "Hello!" so I need to respond appropriately. Since they're starting a conversation, I should greet them back and offer assistance. Let me make sure to keep it friendly and open-ended. Maybe something like, "Hello! How can I assist you today?" That sounds good. It's polite and invites them to ask any questions they might have. I should also keep the tone positive and helpful. Alright, that's all set.
</think>
Hello! How can I assist you today? 😊
Version
llamafile 0.10.0 build d32a080
What operating system are you seeing the problem on?
No response
Relevant log output
Reactions are currently unavailable