feat: add Codex fast mode toggle (/fast command)#6875
Merged
Conversation
Add /fast slash command to toggle OpenAI Codex service_tier between
normal and priority ('fast') inference. Only exposed for models
registered in _FAST_MODE_BACKEND_CONFIG (currently gpt-5.4).
- Registry-based backend config for extensibility
- Dynamic command visibility (hidden from help/autocomplete for
non-supported models) via command_filter on SlashCommandCompleter
- service_tier flows through request_overrides from route resolution
- Omit max_output_tokens for Codex backend (rejects it)
- Persists to config.yaml under agent.service_tier
Salvage cleanup: removed simple_term_menu/input() menu (banned),
bare /fast now shows status like /reasoning. Removed redundant
override resolution in _build_api_kwargs — single source of truth
via request_overrides from route.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
This was referenced Apr 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Salvage of PR #6817 by @g-guthrie — cherry-picked onto current main with cleanups.
Adds a
/fastslash command to toggle OpenAI Codexservice_tierbetween normal and priority ("fast") inference. Only exposed for models registered in_FAST_MODE_BACKEND_CONFIG(currentlygpt-5.4onopenai-codex).What it does
/fast— shows current tier status/fast fast— enables priority service tier/fast normal— disables priority tier/fast status— shows current statusconfig.yamlunderagent.service_tiermax_output_tokensfor Codex backend (rejects that parameter)Salvage cleanups from original PR
simple_term_menumenu (banned — rendering bugs in tmux/iTerm2)input()fallback (hangs in prompt_toolkit event loop)/fastnow shows status (like/reasoning) instead of opening a menu_build_api_kwargs— overrides flow solely throughrequest_overridesfrom route resolution (single source of truth)Files changed
cli.py—/fasthandler, service_tier config parsing, command visibility filterrun_agent.py—service_tier+request_overrideson AIAgent,is_codex_backendguard formax_output_tokenshermes_cli/commands.py— CommandDef +command_filteron SlashCommandCompleterhermes_cli/config.py—service_tierin DEFAULT_CONFIGhermes_cli/models.py—_FAST_MODE_BACKEND_CONFIGregistry, resolve functionsLive-verified
openai-codexOAuth +gpt-5.4/fast→ shows normal,/fast fast→ enables, verified response works/fast normal→ disables, verified response works/faston claude-sonnet-4 → correctly blocked with messageSalvage of #6817 — will close the original PR after merge with credit.