Skip to content

refactor(config): drop preset abstraction, expose model + effort directly#1657

Merged
esengine merged 1 commit into
mainfrom
worktree-agent-a35113ae
May 24, 2026
Merged

refactor(config): drop preset abstraction, expose model + effort directly#1657
esengine merged 1 commit into
mainfrom
worktree-agent-a35113ae

Conversation

@esengine

Copy link
Copy Markdown
Owner

Summary

Presets bundled (model, reasoning_effort, thinking) under names like flash / pro / auto. The bundling forced "max" as the cap and silently broke OpenAI-compatible endpoints that only accept the standard literal set — a user reported their self-hosted vLLM rejecting requests with:

1 validation error: reasoning_effort — Input should be 'none', 'low', 'medium' or 'high'

This PR rips out the preset layer and exposes model + effort directly across all three surfaces.

  • ReasoningEffort widens to low | medium | high | max
  • Default cap is now "high" — accepted by every OpenAI-compatible endpoint (vLLM, Azure, DeepSeek). "max" stays available; users opt in knowing it's a DeepSeek extension
  • Dashboard + Desktop: settings panel shows a model picker + four effort buttons; both apply live through the existing settings bridge
  • CLI: /model and /effort slash commands; ModelPicker shows EFFORT and MODELS sections with full enumeration
  • Config: preset field removed, model persisted directly

55 files changed, +890 / -1153.

Test plan

  • npm run typecheck (root + dashboard) clean
  • npm run lint clean (2 pre-existing warnings unrelated to this PR)
  • npm test — 3579 pass, 0 fail
  • npm run build:dashboard produces the expected dist/ (verified via dashboard-smoke tests)
  • Manual: /model and /effort in CLI apply live and persist across restart
  • Manual: dashboard settings page model + effort changes take effect immediately
  • Manual: desktop settings page model + effort changes take effect immediately
  • Manual: pointing CLI at a vLLM endpoint with default config succeeds (no more "max" rejection)

Closes the user-reported vLLM 400.

…ctly

Presets bundled (model, reasoning_effort, thinking) under names like
flash/pro/auto. The bundling forced "max" as the cap and silently broke
OpenAI-compatible endpoints that only accept the standard literal set
(vLLM rejects "max" with a 400 — saw it in the wild against a
self-hosted DeepSeek-V4-Pro).

Direct controls instead:

- ReasoningEffort widens to low | medium | high | max
- Default cap is now "high" — the value every OpenAI-compatible
  endpoint accepts; users opt into "max" knowing it's DeepSeek-only
- Dashboard + desktop settings: model picker + effort buttons,
  both apply live through the existing settings bridge
- CLI: /model and /effort slash commands; ModelPicker shows EFFORT
  and MODELS sections with full enumeration
- Config: preset field removed, model field persisted directly
@esengine esengine merged commit 88fc19d into main May 24, 2026
4 checks passed
@esengine esengine deleted the worktree-agent-a35113ae branch May 24, 2026 05:37
esengine pushed a commit that referenced this pull request May 24, 2026
…moved, persisted usage stats, plan dispatch gate

Headline themes:
- Desktop: bundle the CLI-hosted React dashboard, retire Tauri+Preact duplicate (#1418)
- Config: drop preset abstraction; flash/pro are direct model selections (#1657, #1630)
- Stats: persist cumulative usage to session meta + auto-restore on startup (#1667, #1680, #1643, #1628)
- Plans: editMode="plan" enforced at the ToolRegistry dispatch gate (#1681); step advance fix (#1629)
- Context: fold once at turn start, drop pre-flight + byte-ceiling (#1642, #1646); collapsible compacted card (#1649)
- Subagents: per-skill flash/pro override + Settings UI (#1632)
- Desktop polish: sidebar drag-resize (#1688), responsive collapse (#1585), copy/edit overlay + msg-history nav (#1645), Esc closes modal not turn (#1685), QQ tab isolation (#1672), DiffCard for edits (#1662), theme-aware highlighting (#1655), system events toggle (#1654/#1650), macOS TCC inheritance (#1614), dashboard.enabled (#1612)
- Dashboard polish: persistent session URL (#1586, #1589, #1599), theme-aware highlighting (#1664), IME confirm-enter guard (#1689), code-fence lang fix (#1677), vendor chunk split (#1587), markdown table h-scroll (#1562)
- TUI: Alt+S input stash/recall; static history isolated from input rerenders (#1635); legacy mouse drop (#1637, #1648); multi-edit gated in review (#1647)
- Diff: SplitDiff column border holds under CJK (#1686)
- MCP: workspace roots passed to servers (#1625); codeCommand honors mcpServers (#1603)
- Config plumbing: (baseUrl, apiKey) resolved as a tuple (#1658); stale model id self-heal (#1663)

See CHANGELOG for the full list.
esengine pushed a commit that referenced this pull request May 25, 2026
Three stale-doc fixes:

- ARCHITECTURE.md §4.3 — replace removed /pro single-turn arming with the
  current /model flash|pro + settings.json model selection. Note the
  removal in 0.50.0 (#1657, #1630).
- ARCHITECTURE.md §4.4 — replace the never-existed FAILURE_ESCALATION_THRESHOLD
  counter with the actual <<<NEEDS_PRO>>> model self-report mechanism. No
  failure-counter; purely LLM-initiated, no-op on pro tier.
- benchmarks/real-world-cache/README.md — fix 10× pricing error in
  v4-flash cache-hit ($0.028 → $0.0028) and entirely wrong v4-pro pricing
  ($0.139/1.667/3.333 → $0.003625/0.435/0.87). Recalculated cost tables;
  headline 99.82% hit ratio unchanged, savings now correctly show ~97.7%
  (flash) / ~98.9% (pro).

Thanks @FriendsHL for catching this — the benchmark pricing in particular
is the public cache-first defense link, the old numbers would have been
embarrassing.
esengine added a commit that referenced this pull request May 25, 2026
#1657 dropped the preset abstraction and exposed reasoning effort
directly, keeping `max` as a DeepSeek-only extension that users opt
into knowing standard OpenAI / vLLM / Azure reject it with 400. The
TUI still advertised `max` everywhere — `/effort` argsHint, slash-arg
picker, ModelPicker effort rows, /effort handler — even when the
active endpoint was a third-party host that can't accept it. Users
on those endpoints saw `max` in every suggestion and reported it as
a preset-era leftover (#1794).

Endpoint-aware filter: when `loop.client.baseUrl` is not api.deepseek.com,
drop `max` from the choices the TUI surfaces:
  - `/effort` argsHint and argCompleter (autocomplete + arg picker)
  - ModelPicker effort rows
  - /effort handler's accept list + status / usage message
  - new `effortUsageNoMax` i18n key (EN / zh-CN / de) so the error
    on bad input doesn't itself name `max` as an option

`max` stays available on DeepSeek endpoints — that's the design from
#1657, just no longer visible where it would 400.

Fixes #1794.

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant