fix(delegate): _load_config reads from disk every call, not cached CLI_CONFIG#15540
Open
it-helloprint wants to merge 1 commit into
Open
Conversation
…I_CONFIG The previous `_load_config()` read `cli.CLI_CONFIG` first — a module-level dict populated ONCE at `cli.py` import. In long-running gateway processes (Discord / Telegram / Slack) this meant any edits to `~/.hermes/config.yaml`'s `delegation.model` / `delegation.provider` keys were invisible to running subagents. Users would edit the config, restart nothing (the docstring implied config was re-read on each dispatch), and `delegate_task` calls would silently inherit the parent session's model instead of the configured override. Observed symptom (production Discord gateway, 2026-04-25): - config.yaml `delegation.model: anthropic/claude-sonnet-4.6` - Session running on `anthropic/claude-opus-4.7` - Every `delegate_task` dispatch returned envelopes with `model: anthropic/claude-opus-4.7` despite the config pin - Verified via sessions.db: all child sessions showed parent's model - Verified root cause: `_resolve_delegation_credentials` received `configured_model = None` / `configured_provider = None` (the pre-April config defaults the frozen `CLI_CONFIG` still held), hit the "no provider override" branch, returned null creds, child fell through to `effective_model = model or parent_agent.model` Cost impact: ~$700 of unintended Opus burn on Sonnet-suitable mechanical coding work (4 PRs of CSV streaming / data migration / Filament refactor) before the routing miss was diagnosed. Fix: flip the fallback order so `load_config()` (which reads the live file) is tried FIRST, and `CLI_CONFIG` is only the backup when file read fails. The YAML file is <10 KB and `_load_config()` runs on the cold path (once per `delegate_task` call, not per API hit), so the ~5 ms disk read is worth the correctness guarantee. CLI path still works identically: `load_config()` reads the same file that `load_cli_config` already reads. No behavioural change for CLI users, just a guarantee for gateway users that config edits take effect without a process restart. Includes updated docstring explaining the fallback order and the historical failure mode so future readers see why disk-first matters.
Collaborator
This was referenced Apr 28, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_load_config()intools/delegate_tool.pyreadscli.CLI_CONFIGbefore falling back tohermes_cli.config.load_config(). BecauseCLI_CONFIGis populated exactly once atcli.pymodule import and never refreshed, long-running gateway processes (Discord / Telegram / Slack) can't see edits made to~/.hermes/config.yaml'sdelegation.model/delegation.providerafter startup. The result is a silent fall-back: users edit the config, restart nothing, and subagent dispatches silently inherit the parent session's model instead of the configured override.This flips the fallback order so
load_config()(which reads the file on every call) is tried first, andCLI_CONFIGis only the backup.Observed Symptom (production Discord gateway, 2026-04-25)
anthropic/claude-opus-4.7~/.hermes/config.yaml:delegation.model: anthropic/claude-sonnet-4.6,delegation.provider: openrouterdelegate_taskdispatches returned envelopes with"model": "anthropic/claude-opus-4.7"despite the pinsessionstable confirmed: all child sessions recorded the parent's model, not the configured oneVerified root cause:
_load_config()returnedCLI_CONFIG["delegation"], a dict frozen at April 22 import (whendelegation.model/delegation.providerwere still empty strings)._resolve_delegation_credentials()readconfigured_model = None,configured_provider = None.creds["model"] = None→effective_model = model or parent_agent.model(line ~321 in_build_child_agent) → child ran Opus.Cost Impact
In the incident that triggered this investigation, ~$700 of unintended Opus burn on Sonnet-suitable mechanical coding work (4 PRs: CSV streaming importer, data migration command, Filament Infolist grouping refactor, doc updates) before the routing miss was diagnosed. A user who's pinned a cheaper model is probably doing so because the task doesn't need Opus; they should not have to kill their gateway just to pick up a config edit.
Fix
hermes_cli.config.load_config()reads~/.hermes/config.yamlon every call. Try that first. Only if the disk read fails / returns empty do we fall back to the frozenCLI_CONFIG.Cost of the extra read: a ~10 KB YAML file parsed once per
delegate_taskcall (not per API hit, not per token). ~5 ms on a modern machine, on the delegation cold path. Rounding error relative to any LLM API call.Diff Semantics
CLI users: No change.
load_config()reads the same file asload_cli_config, so whateverCLI_CONFIGhad,load_config()returns an equivalentdelegationblock.Gateway users: Config edits take effect on the next
delegate_taskcall instead of requiring a gateway restart.Testing
Reproduced in isolation before the fix:
After the fix:
Full delegate_task flow exercised in
/tmp/trace_delegation_full.py(not included in commit) confirms the child AIAgent'sself.modelstayssonnet-4.6through construction and past the first API call.Docstring update
Added a section explaining the fallback order and the historical failure mode so future readers see why disk-first matters.