Description
Combining a system message with response_format: {"type": "json_object"} in the chat completions API causes:
TemplateException(message: Optional('System message must be at the beginning.'))
Without response_format, the same system + user message works fine.
Reproduction
# Works:
curl -s http://localhost:8002/v1/chat/completions -H 'Content-Type: application/json' \
-d '{"model":"qwen3.5-9b-mlx","messages":[{"role":"system","content":"Extract facts."},{"role":"user","content":"Raoh supervises Juza."}],"max_tokens":50}'
# Fails:
curl -s http://localhost:8002/v1/chat/completions -H 'Content-Type: application/json' \
-d '{"model":"qwen3.5-9b-mlx","messages":[{"role":"system","content":"Extract facts."},{"role":"user","content":"Raoh supervises Juza."}],"max_tokens":50,"response_format":{"type":"json_object"}}'
Impact
Blocks Mem0 and Graphiti (and likely other OpenAI-compatible clients) from working with SwiftLM out of the box, since both send system messages + response_format: json_object.
Environment
- macOS, M2 Ultra
- Qwen3.5-9B-MLX-4bit, Qwen3.5-35B-A3B-8bit
- SwiftLM latest release build