Skip to content

feat(loop): /max-tokens slash command to cap output tokens per turn (#2196)#2207

Merged
esengine merged 1 commit into
esengine:mainfrom
nianyi778:feat/max-output-tokens
May 29, 2026
Merged

feat(loop): /max-tokens slash command to cap output tokens per turn (#2196)#2207
esengine merged 1 commit into
esengine:mainfrom
nianyi778:feat/max-output-tokens

Conversation

@nianyi778

Copy link
Copy Markdown

Problem

Closes #2196. There was no way to limit how many tokens the model generates per turn. Users with long reasoning tasks could burn 80K+ output tokens in a single turn with no recourse.

Change

Adds a /max-tokens <N|off> slash command (in the setup group, next to /effort and /budget) and a persisted maxOutputTokens config field:

/max-tokens 4096     # cap to 4096 output tokens next turn
/max-tokens          # show current value
/max-tokens off      # clear the cap (server default applies)

The value is sent as max_tokens in the API request and persists across restarts via ~/.reasonix/config.json.

Files changed:

  • src/config.ts — field + loadMaxOutputTokens / saveMaxOutputTokens
  • src/loop.tsmaxOutputTokens on options + mutable field + configure() support
  • src/loop/streaming.ts — forwarded through StreamModelOptions
  • src/cli/ui/slash/commands.ts — new /max-tokens command definition
  • src/cli/ui/slash/handlers/model.ts — handler implementation
  • src/cli/ui/App.tsx — seeded from config on loop construction
  • i18n: EN / zh-CN / de strings

@nianyi778

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d1e89e9bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/loop.ts
toolSpecs,
signal,
reasoningEffort: this.reasoningEffort,
maxTokens: this.maxOutputTokens,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply /max-tokens to forced summaries

When /max-tokens is set and a turn hits the iteration cap, context guard, or all-suppressed storm path, the loop jumps to forceSummaryAfterIterLimit(), whose ctx.client.chat() call in src/loop/force-summary.ts still omits maxTokens. That summary generation is part of the same user turn, so the exact runaway-tool-loop scenarios this command is meant to control can still make an uncapped model call; please thread this.maxOutputTokens through summaryContext/ForceSummaryContext and pass it to the summary chat request too.

Useful? React with 👍 / 👎.

…sengine#2196)

Adds /max-tokens <N|off> slash command and persisted maxOutputTokens config
field so users can limit output tokens per turn, preventing runaway reasoning.

- config: maxOutputTokens field, load/saveMaxOutputTokens accessors
- loop: stored on loop instance, passed as max_tokens to chat and stream calls
- loop/streaming.ts: maxTokens forwarded through StreamModelOptions
- slash: new /max-tokens setup-group command with handler + i18n (EN/zh-CN/de)
- App.tsx: seeded from config on loop construction
@nianyi778 nianyi778 force-pushed the feat/max-output-tokens branch from 7d1e89e to 3ecd008 Compare May 29, 2026 02:54
@nianyi778

Copy link
Copy Markdown
Author

Fixed in 3ecd0084. Added maxOutputTokens to ForceSummaryContext and threaded it to the client.chat() call inside forceSummaryAfterIterLimit, so forced-summary turns (stuck/context-guard/iter-limit) honour the cap too.

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ecd00840c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/ui/App.tsx
hooks: hookList,
hookCwd: currentRootDir,
reasoningEffort: initialReasoningEffort ?? loadReasoningEffort(),
maxOutputTokens: loadMaxOutputTokens(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate persisted maxOutputTokens to all loop constructors

This only seeds the persisted cap for the TUI loop. I checked the other production CacheFirstLoop constructors that already load persisted per-turn settings (src/cli/commands/desktop.ts buildRuntimeFor loads loadReasoningEffort()/loadMaxIterPerTurn(), and src/cli/commands/run.ts loads loadMaxIterPerTurn()), and neither passes loadMaxOutputTokens(), so a user who saved /max-tokens 4096 can restart into desktop or reasonix run and silently get uncapped model calls despite the config field being persisted. Please thread loadMaxOutputTokens() through those runtime constructors as well so the saved cap is consistently honored.

Useful? React with 👍 / 👎.

@esengine esengine left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed — clean and correctly plumbed. maxOutputTokens flows config → loop.configurethis.maxOutputTokensmaxTokens: this.maxOutputTokens on the client.chat call (both call sites), with undefined = no cap (server default). The /max-tokens handler covers bare (show), a number (set), and off (clear), persists to config, and i18n is in EN/de/zh-CN. Closes a real gap (#2196 — runaway reasoning burning 80K+ tokens). CI green. Merging.

@esengine esengine merged commit 9619913 into esengine:main May 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: maximum output tokens

2 participants