Problem
DeepSeek-V4 ships in two tiers:
- V4-Pro — flagship; better at hard reasoning, costs more per token.
- V4-Flash — fast, cheap; great for simple tool calls and follow-ups.
Today users pick one at session start (/model deepseek-v4-pro or /model deepseek-v4-flash) and stay there. That's wasteful: easy turns (a one-line file read, a git status) burn Pro tokens unnecessarily; hard turns (multi-file refactor) on Flash get worse-quality output.
Proposed solution
Add auto as a first-class model option alongside the explicit IDs. Same UX surface as today — users pick from the model menu — with a third entry:
/model deepseek-v4-pro → always Pro
/model deepseek-v4-flash → always Flash
/model auto → router picks per turn ← new
The model picker (/model with no argument) lists auto first.
Routing heuristic (when auto is selected)
The dispatcher classifies each turn before sending and picks a tier:
| Signal |
Lean toward Flash |
Lean toward Pro |
| Recent message length |
short |
long |
| Tool result count in context |
many simple results |
few complex results |
| User prompt complexity (keyword density: "design", "architect", "refactor", "audit", "analyze") |
absent |
present |
| Last turn's reasoning effort |
off / low |
high / max |
User's /effort setting |
low / off |
high / max |
Threshold-tuned at first; later, a small classifier trained on session telemetry.
User can override mid-session with /model deepseek-v4-pro (sticks until reset to /model auto).
Footer
The footer model chip shows auto·flash or auto·pro so the user always knows which tier the current turn ran on.
Telemetry
/cost extends with a per-tier breakdown when auto is in use:
this session: 12 turns Pro ($0.45), 28 turns Flash ($0.03), saved ~$0.20 vs Pro-only
Why this matters
Cost is the #1 reason users churn off paid CLIs. Smart routing turns "I burned $5 today" into "I burned $0.80 today" without hurting quality on the turns that matter. That's a real differentiator vs Claude Code (no price-tier-down model) and vs Cursor (closed routing).
Folding it into /model (vs a separate /route command) keeps the UI surface flat — there's exactly one knob for "what model".
Acceptance criteria
Related
crates/tui/src/commands/core.rs:model — slash command.
crates/tui/src/llm_client.rs — request dispatch.
crates/tui/src/config.rs:normalize_model_name — already accepts model strings; auto becomes a recognized value.
- DeepSeek pricing: https://api-docs.deepseek.com/quick_start/pricing
/effort already exists for thinking budget; this is the model-tier analogue.
Problem
DeepSeek-V4 ships in two tiers:
Today users pick one at session start (
/model deepseek-v4-proor/model deepseek-v4-flash) and stay there. That's wasteful: easy turns (a one-line file read, agit status) burn Pro tokens unnecessarily; hard turns (multi-file refactor) on Flash get worse-quality output.Proposed solution
Add
autoas a first-class model option alongside the explicit IDs. Same UX surface as today — users pick from the model menu — with a third entry:The model picker (
/modelwith no argument) listsautofirst.Routing heuristic (when
autois selected)The dispatcher classifies each turn before sending and picks a tier:
/effortsettingThreshold-tuned at first; later, a small classifier trained on session telemetry.
User can override mid-session with
/model deepseek-v4-pro(sticks until reset to/model auto).Footer
The footer model chip shows
auto·flashorauto·proso the user always knows which tier the current turn ran on.Telemetry
/costextends with a per-tier breakdown whenautois in use:Why this matters
Cost is the #1 reason users churn off paid CLIs. Smart routing turns "I burned $5 today" into "I burned $0.80 today" without hurting quality on the turns that matter. That's a real differentiator vs Claude Code (no price-tier-down model) and vs Cursor (closed routing).
Folding it into
/model(vs a separate/routecommand) keeps the UI surface flat — there's exactly one knob for "what model".Acceptance criteria
/model autoregisters as a valid model option in the picker.auto·flashorauto·prowhen auto is active./costper-tier breakdown when auto is in use.~/.deepseek/config.toml./model deepseek-v4-pro//model deepseek-v4-flashcontinue to work (no regression).Related
crates/tui/src/commands/core.rs:model— slash command.crates/tui/src/llm_client.rs— request dispatch.crates/tui/src/config.rs:normalize_model_name— already accepts model strings;autobecomes a recognized value./effortalready exists for thinking budget; this is the model-tier analogue.