refactor(config): drop preset abstraction, expose model + effort directly#1657
Merged
Conversation
…ctly Presets bundled (model, reasoning_effort, thinking) under names like flash/pro/auto. The bundling forced "max" as the cap and silently broke OpenAI-compatible endpoints that only accept the standard literal set (vLLM rejects "max" with a 400 — saw it in the wild against a self-hosted DeepSeek-V4-Pro). Direct controls instead: - ReasoningEffort widens to low | medium | high | max - Default cap is now "high" — the value every OpenAI-compatible endpoint accepts; users opt into "max" knowing it's DeepSeek-only - Dashboard + desktop settings: model picker + effort buttons, both apply live through the existing settings bridge - CLI: /model and /effort slash commands; ModelPicker shows EFFORT and MODELS sections with full enumeration - Config: preset field removed, model field persisted directly
This was referenced May 24, 2026
esengine
pushed a commit
that referenced
this pull request
May 24, 2026
…moved, persisted usage stats, plan dispatch gate Headline themes: - Desktop: bundle the CLI-hosted React dashboard, retire Tauri+Preact duplicate (#1418) - Config: drop preset abstraction; flash/pro are direct model selections (#1657, #1630) - Stats: persist cumulative usage to session meta + auto-restore on startup (#1667, #1680, #1643, #1628) - Plans: editMode="plan" enforced at the ToolRegistry dispatch gate (#1681); step advance fix (#1629) - Context: fold once at turn start, drop pre-flight + byte-ceiling (#1642, #1646); collapsible compacted card (#1649) - Subagents: per-skill flash/pro override + Settings UI (#1632) - Desktop polish: sidebar drag-resize (#1688), responsive collapse (#1585), copy/edit overlay + msg-history nav (#1645), Esc closes modal not turn (#1685), QQ tab isolation (#1672), DiffCard for edits (#1662), theme-aware highlighting (#1655), system events toggle (#1654/#1650), macOS TCC inheritance (#1614), dashboard.enabled (#1612) - Dashboard polish: persistent session URL (#1586, #1589, #1599), theme-aware highlighting (#1664), IME confirm-enter guard (#1689), code-fence lang fix (#1677), vendor chunk split (#1587), markdown table h-scroll (#1562) - TUI: Alt+S input stash/recall; static history isolated from input rerenders (#1635); legacy mouse drop (#1637, #1648); multi-edit gated in review (#1647) - Diff: SplitDiff column border holds under CJK (#1686) - MCP: workspace roots passed to servers (#1625); codeCommand honors mcpServers (#1603) - Config plumbing: (baseUrl, apiKey) resolved as a tuple (#1658); stale model id self-heal (#1663) See CHANGELOG for the full list.
esengine
pushed a commit
that referenced
this pull request
May 25, 2026
Three stale-doc fixes: - ARCHITECTURE.md §4.3 — replace removed /pro single-turn arming with the current /model flash|pro + settings.json model selection. Note the removal in 0.50.0 (#1657, #1630). - ARCHITECTURE.md §4.4 — replace the never-existed FAILURE_ESCALATION_THRESHOLD counter with the actual <<<NEEDS_PRO>>> model self-report mechanism. No failure-counter; purely LLM-initiated, no-op on pro tier. - benchmarks/real-world-cache/README.md — fix 10× pricing error in v4-flash cache-hit ($0.028 → $0.0028) and entirely wrong v4-pro pricing ($0.139/1.667/3.333 → $0.003625/0.435/0.87). Recalculated cost tables; headline 99.82% hit ratio unchanged, savings now correctly show ~97.7% (flash) / ~98.9% (pro). Thanks @FriendsHL for catching this — the benchmark pricing in particular is the public cache-first defense link, the old numbers would have been embarrassing.
This was referenced May 25, 2026
Closed
esengine
added a commit
that referenced
this pull request
May 25, 2026
#1657 dropped the preset abstraction and exposed reasoning effort directly, keeping `max` as a DeepSeek-only extension that users opt into knowing standard OpenAI / vLLM / Azure reject it with 400. The TUI still advertised `max` everywhere — `/effort` argsHint, slash-arg picker, ModelPicker effort rows, /effort handler — even when the active endpoint was a third-party host that can't accept it. Users on those endpoints saw `max` in every suggestion and reported it as a preset-era leftover (#1794). Endpoint-aware filter: when `loop.client.baseUrl` is not api.deepseek.com, drop `max` from the choices the TUI surfaces: - `/effort` argsHint and argCompleter (autocomplete + arg picker) - ModelPicker effort rows - /effort handler's accept list + status / usage message - new `effortUsageNoMax` i18n key (EN / zh-CN / de) so the error on bad input doesn't itself name `max` as an option `max` stays available on DeepSeek endpoints — that's the design from #1657, just no longer visible where it would 400. Fixes #1794. Co-authored-by: reasonix <reasonix@deepseek.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Presets bundled (model, reasoning_effort, thinking) under names like
flash/pro/auto. The bundling forced"max"as the cap and silently broke OpenAI-compatible endpoints that only accept the standard literal set — a user reported their self-hosted vLLM rejecting requests with:This PR rips out the preset layer and exposes model + effort directly across all three surfaces.
ReasoningEffortwidens tolow | medium | high | max"high"— accepted by every OpenAI-compatible endpoint (vLLM, Azure, DeepSeek)."max"stays available; users opt in knowing it's a DeepSeek extension/modeland/effortslash commands;ModelPickershows EFFORT and MODELS sections with full enumerationpresetfield removed,modelpersisted directly55 files changed, +890 / -1153.
Test plan
npm run typecheck(root + dashboard) cleannpm run lintclean (2 pre-existing warnings unrelated to this PR)npm test— 3579 pass, 0 failnpm run build:dashboardproduces the expecteddist/(verified via dashboard-smoke tests)/modeland/effortin CLI apply live and persist across restart"max"rejection)Closes the user-reported vLLM 400.