Skip to content

feat: /model auto — heuristic V4-Flash vs V4-Pro routing per turn #315

@Hmbown

Description

@Hmbown

Problem

DeepSeek-V4 ships in two tiers:

  • V4-Pro — flagship; better at hard reasoning, costs more per token.
  • V4-Flash — fast, cheap; great for simple tool calls and follow-ups.

Today users pick one at session start (/model deepseek-v4-pro or /model deepseek-v4-flash) and stay there. That's wasteful: easy turns (a one-line file read, a git status) burn Pro tokens unnecessarily; hard turns (multi-file refactor) on Flash get worse-quality output.

Proposed solution

Add auto as a first-class model option alongside the explicit IDs. Same UX surface as today — users pick from the model menu — with a third entry:

/model deepseek-v4-pro      → always Pro
/model deepseek-v4-flash    → always Flash
/model auto                 → router picks per turn        ← new

The model picker (/model with no argument) lists auto first.

Routing heuristic (when auto is selected)

The dispatcher classifies each turn before sending and picks a tier:

Signal Lean toward Flash Lean toward Pro
Recent message length short long
Tool result count in context many simple results few complex results
User prompt complexity (keyword density: "design", "architect", "refactor", "audit", "analyze") absent present
Last turn's reasoning effort off / low high / max
User's /effort setting low / off high / max

Threshold-tuned at first; later, a small classifier trained on session telemetry.

User can override mid-session with /model deepseek-v4-pro (sticks until reset to /model auto).

Footer

The footer model chip shows auto·flash or auto·pro so the user always knows which tier the current turn ran on.

Telemetry

/cost extends with a per-tier breakdown when auto is in use:

this session: 12 turns Pro ($0.45), 28 turns Flash ($0.03), saved ~$0.20 vs Pro-only

Why this matters

Cost is the #1 reason users churn off paid CLIs. Smart routing turns "I burned $5 today" into "I burned $0.80 today" without hurting quality on the turns that matter. That's a real differentiator vs Claude Code (no price-tier-down model) and vs Cursor (closed routing).

Folding it into /model (vs a separate /route command) keeps the UI surface flat — there's exactly one knob for "what model".

Acceptance criteria

  • /model auto registers as a valid model option in the picker.
  • Footer chip shows auto·flash or auto·pro when auto is active.
  • /cost per-tier breakdown when auto is in use.
  • Heuristic threshold table documented; users can override via ~/.deepseek/config.toml.
  • At least one A/B run on a sample task showing Pro-only vs auto cost difference.
  • Existing /model deepseek-v4-pro / /model deepseek-v4-flash continue to work (no regression).

Related

  • crates/tui/src/commands/core.rs:model — slash command.
  • crates/tui/src/llm_client.rs — request dispatch.
  • crates/tui/src/config.rs:normalize_model_name — already accepts model strings; auto becomes a recognized value.
  • DeepSeek pricing: https://api-docs.deepseek.com/quick_start/pricing
  • /effort already exists for thinking budget; this is the model-tier analogue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions