What changed
- Output tokens/sec now excludes thinking time. Reasoning models like DeepSeek-R1, Qwen3-thinking, GLM, and gpt-oss stream their thoughts on a separate channel; previously this time was wrongly charged against TTFT and the tokens were counted as output. Anubis now decodes reasoning content and surfaces it as
<think>…</think>in the response. - Prefill (input) tokens/sec is now a first-class metric. Visible on the TTFT card, in session history, in CSV export, and on the leaderboard. Computed from
prompt_tokens / prompt_eval_duration. - Reasoning split. When a thinking model is used, Anubis records reasoning tokens and reasoning duration separately. The session detail view shows reasoning tok/s alongside output tok/s.
- Better error messages for backend failures — Ollama HTTP errors no longer surface as "timed out after 0 seconds."
- Leaderboard explorer gains Prefill tok/s and Reasoning tok/s columns.
Database
Local SQLite migration v6 adds reasoning_tokens and reasoning_duration columns; runs automatically on first launch.
Notes
- Existing leaderboard entries from reasoning-model runs (pre-v3.1) likely have inflated TTFT and conflated reasoning with output throughput. New submissions are correct.
- Apple Intelligence backend is unchanged (no separate reasoning channel).
🤖 Generated with Claude Code