Skip to content

Anubis OSS v3.1 — Reasoning-aware metrics & prefill speed

Latest

Choose a tag to compare

@uncSoft uncSoft released this 03 May 22:21

Fixes #17 and #18.

What changed

  • Output tokens/sec now excludes thinking time. Reasoning models like DeepSeek-R1, Qwen3-thinking, GLM, and gpt-oss stream their thoughts on a separate channel; previously this time was wrongly charged against TTFT and the tokens were counted as output. Anubis now decodes reasoning content and surfaces it as <think>…</think> in the response.
  • Prefill (input) tokens/sec is now a first-class metric. Visible on the TTFT card, in session history, in CSV export, and on the leaderboard. Computed from prompt_tokens / prompt_eval_duration.
  • Reasoning split. When a thinking model is used, Anubis records reasoning tokens and reasoning duration separately. The session detail view shows reasoning tok/s alongside output tok/s.
  • Better error messages for backend failures — Ollama HTTP errors no longer surface as "timed out after 0 seconds."
  • Leaderboard explorer gains Prefill tok/s and Reasoning tok/s columns.

Database

Local SQLite migration v6 adds reasoning_tokens and reasoning_duration columns; runs automatically on first launch.

Notes

  • Existing leaderboard entries from reasoning-model runs (pre-v3.1) likely have inflated TTFT and conflated reasoning with output throughput. New submissions are correct.
  • Apple Intelligence backend is unchanged (no separate reasoning channel).

🤖 Generated with Claude Code