fix(delegate): read fresh config.yaml so delegation.model edits take effect (#11999)#12053
fix(delegate): read fresh config.yaml so delegation.model edits take effect (#11999)#12053briandevans wants to merge 2 commits into
Conversation
…effect (NousResearch#11999) ``tools.delegate_tool._load_config`` short-circuited through ``cli.CLI_CONFIG``, a dict that's populated **once** at module-import time and is never re-read. In long-lived CLI sessions this meant ``delegation.model`` / ``delegation.provider`` edits to ``~/.hermes/config.yaml`` after startup were silently ignored — subagents kept running on the parent's model regardless of the user's override, defeating the entire point of cost-tiered delegation (Sonnet subagents under an Opus parent, say). The reporter's exact scenario (NousResearch#11999): set delegation: model: claude-sonnet-4-20250514 provider: anthropic in config.yaml, run the parent on ``claude-opus-4-6``, call ``delegate_task`` — the child reports ``claude-opus-4-6`` and the result metadata confirms ``"model": "claude-opus-4-6"``. Root cause ---------- The short-circuit at the top of ``_load_config``: try: from cli import CLI_CONFIG cfg = CLI_CONFIG.get("delegation", {}) if cfg: return cfg misfires in two ways: 1. ``CLI_CONFIG`` is a module-level dict built by ``cli.load_cli_config()`` at import time. Subsequent edits to ``config.yaml`` (including adding a new ``delegation:`` section) are never picked up by that process. ``/reset`` and config-reload commands don't re-run ``load_cli_config``, which matches the reporter's observation that the bug "persists across /reset commands and config reloads". 2. The ``if cfg:`` check always fires because cli.py's delegation defaults already populate a truthy dict (``{"max_iterations": 45, "default_toolsets": [...], "model": "", "provider": "", "base_url": "", "api_key": ""}``) even when the user hasn't set any override. That truthy-but-empty dict bypasses the persistent ``hermes_cli.config.load_config()`` path entirely, so ``configured_model = str(cfg.get("model") or "").strip() or None`` resolves to ``None`` and ``_build_child_agent`` falls back to ``parent_agent.model``. Fix --- Drop the stale ``CLI_CONFIG`` short-circuit and always read fresh via ``hermes_cli.config.load_config()``. That's the path gateway and cron code already use, so unifying everyone on it also removes a subtle CLI-only divergence. Performance cost is a single YAML parse + deep- copy of ``DEFAULT_CONFIG``, <1 ms in practice — and ``delegate_task`` is invoked at most once per user turn. Also harden the function so a malformed ``delegation: not-a-dict`` value coerces to ``{}`` instead of leaking a non-dict through to ``_resolve_delegation_credentials`` (which calls ``.get()`` on it). Narrow scope — explicitly not changed ------------------------------------- * ``_resolve_delegation_credentials`` — downstream of ``_load_config``. Verified correct given a fresh config dict; no changes needed. * ``_build_child_agent`` / ``AIAgent.__init__`` — same. * ``cli.CLI_CONFIG`` — leave as-is. Nothing else in the codebase writes to ``CLI_CONFIG["delegation"]`` at runtime (grep-verified), so dropping its use in ``_load_config`` loses no runtime-only data. Regression coverage ------------------- ``tests/tools/test_delegate.py`` gets a new ``TestLoadConfigFreshReads`` class with 5 tests: * ``test_ignores_stale_cli_config_cache`` — plants a stale CLI_CONFIG with empty-default delegation in ``sys.modules["cli"]`` and asserts ``_load_config()`` still reads the fresh user override from ``config.yaml``. On unpatched ``origin/main`` this fails with the exact reporter symptom: ``AssertionError: '' != 'claude-sonnet-4-20250514'``. * ``test_picks_up_delegation_override_written_after_startup`` — end-user scenario: edit config.yaml, call _load_config, get the edit. * ``test_non_dict_delegation_section_is_coerced_to_empty`` — hardens ``.get()`` callers downstream; also fails on ``origin/main`` (the old code returned the non-dict string verbatim). * ``test_returns_dict_when_config_missing_delegation_section`` — pins graceful handling when no delegation section is configured. * ``test_end_to_end_resolve_uses_fresh_config`` — integration through ``_resolve_delegation_credentials`` confirming the config flows all the way to ``effective_model = creds["model"] or parent.model``. Validation ---------- ``source venv/bin/activate && python -m pytest tests/tools/test_delegate.py tests/tools/test_delegate_toolset_scope.py -q`` -> 77 passed. CI-aligned broad suite: ``python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=line -n auto`` -> 12824 passed, 39 skipped, 15 pre-existing baseline failures (all reproduce on clean ``origin/main``; none in ``tools/delegate_tool.py`` or ``tests/tools/test_delegate.py``). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes delegation config staleness so delegate_task picks up delegation.model / delegation.provider edits from ~/.hermes/config.yaml during long-lived CLI sessions.
Changes:
- Remove the
cli.CLI_CONFIGshort-circuit and always read delegation config viahermes_cli.config.load_config(). - Harden
_load_config()against malformed non-dictdelegation:values by coercing to{}. - Add regression tests covering fresh reads, stale CLI cache behavior, and malformed delegation sections.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tools/delegate_tool.py | Always loads delegation config fresh from disk and type-checks the delegation section. |
| tests/tools/test_delegate.py | Adds regression tests for #11999 covering config freshness and malformed config handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| the built-in defaults (empty-string ``model``/``provider`` and a | ||
| ``max_iterations`` float): those produced a truthy-but-empty dict |
There was a problem hiding this comment.
The docstring claims the defaults include a "max_iterations" float, but DEFAULT_CONFIG defines delegation.max_iterations as an int (e.g., hermes_cli/config.py:676). This is misleading documentation—please update the wording/type to match the actual config schema.
| the built-in defaults (empty-string ``model``/``provider`` and a | |
| ``max_iterations`` float): those produced a truthy-but-empty dict | |
| the built-in defaults (empty-string ``model``/``provider`` and an | |
| integer ``max_iterations``): those produced a truthy-but-empty dict |
| prev_home = os.environ.get("HERMES_HOME") | ||
| prev_key = os.environ.get("ANTHROPIC_API_KEY") | ||
| os.environ["HERMES_HOME"] = hhome | ||
| os.environ.setdefault("ANTHROPIC_API_KEY", "sk-test") | ||
| try: | ||
| # User adds override | ||
| with open(os.path.join(hhome, "config.yaml"), "w") as fh: | ||
| import yaml as _yaml | ||
| _yaml.safe_dump({ | ||
| "model": "claude-opus-4-6", | ||
| "delegation": { | ||
| "model": "claude-sonnet-4-20250514", | ||
| "provider": "anthropic", | ||
| }, | ||
| }, fh) | ||
| cfg = _load_config() | ||
| creds = _resolve_delegation_credentials(cfg, parent) | ||
| self.assertEqual(creds["model"], "claude-sonnet-4-20250514") | ||
| # and ``_build_child_agent`` will later do: | ||
| # effective_model = creds["model"] or parent.model | ||
| # which must now land on the delegation model, not parent's. | ||
| effective = creds["model"] or parent.model | ||
| self.assertEqual(effective, "claude-sonnet-4-20250514") | ||
| finally: | ||
| if prev_home is None: | ||
| os.environ.pop("HERMES_HOME", None) | ||
| else: | ||
| os.environ["HERMES_HOME"] = prev_home | ||
| if prev_key is None: | ||
| os.environ.pop("ANTHROPIC_API_KEY", None) | ||
| else: | ||
| os.environ["ANTHROPIC_API_KEY"] = prev_key | ||
|
|
||
|
|
There was a problem hiding this comment.
test_end_to_end_resolve_uses_fresh_config currently depends on real provider resolution (_resolve_delegation_credentials -> hermes_cli.runtime_provider.resolve_runtime_provider) and uses os.environ.setdefault("ANTHROPIC_API_KEY", ...), which won’t override an existing-but-empty value. This makes the test environment-dependent/flaky. Consider patching hermes_cli.runtime_provider.resolve_runtime_provider (or agent.anthropic_adapter.resolve_anthropic_token) to return a fixed credential bundle, and/or force-set ANTHROPIC_API_KEY within a patch.dict so the test is hermetic.
| prev_home = os.environ.get("HERMES_HOME") | |
| prev_key = os.environ.get("ANTHROPIC_API_KEY") | |
| os.environ["HERMES_HOME"] = hhome | |
| os.environ.setdefault("ANTHROPIC_API_KEY", "sk-test") | |
| try: | |
| # User adds override | |
| with open(os.path.join(hhome, "config.yaml"), "w") as fh: | |
| import yaml as _yaml | |
| _yaml.safe_dump({ | |
| "model": "claude-opus-4-6", | |
| "delegation": { | |
| "model": "claude-sonnet-4-20250514", | |
| "provider": "anthropic", | |
| }, | |
| }, fh) | |
| cfg = _load_config() | |
| creds = _resolve_delegation_credentials(cfg, parent) | |
| self.assertEqual(creds["model"], "claude-sonnet-4-20250514") | |
| # and ``_build_child_agent`` will later do: | |
| # effective_model = creds["model"] or parent.model | |
| # which must now land on the delegation model, not parent's. | |
| effective = creds["model"] or parent.model | |
| self.assertEqual(effective, "claude-sonnet-4-20250514") | |
| finally: | |
| if prev_home is None: | |
| os.environ.pop("HERMES_HOME", None) | |
| else: | |
| os.environ["HERMES_HOME"] = prev_home | |
| if prev_key is None: | |
| os.environ.pop("ANTHROPIC_API_KEY", None) | |
| else: | |
| os.environ["ANTHROPIC_API_KEY"] = prev_key | |
| with patch.dict( | |
| os.environ, | |
| {"HERMES_HOME": hhome, "ANTHROPIC_API_KEY": "sk-test"}, | |
| clear=False, | |
| ): | |
| # Ensure credential lookup is deterministic and does not | |
| # depend on any real provider/token resolution in the test | |
| # environment. | |
| with patch( | |
| "agent.anthropic_adapter.resolve_anthropic_token", | |
| return_value="sk-test", | |
| ): | |
| # User adds override | |
| with open(os.path.join(hhome, "config.yaml"), "w") as fh: | |
| import yaml as _yaml | |
| _yaml.safe_dump({ | |
| "model": "claude-opus-4-6", | |
| "delegation": { | |
| "model": "claude-sonnet-4-20250514", | |
| "provider": "anthropic", | |
| }, | |
| }, fh) | |
| cfg = _load_config() | |
| creds = _resolve_delegation_credentials(cfg, parent) | |
| self.assertEqual(creds["model"], "claude-sonnet-4-20250514") | |
| # and ``_build_child_agent`` will later do: | |
| # effective_model = creds["model"] or parent.model | |
| # which must now land on the delegation model, not parent's. | |
| effective = creds["model"] or parent.model | |
| self.assertEqual(effective, "claude-sonnet-4-20250514") |
…d-to-end test Addresses both Copilot inline review points on PR NousResearch#12053: 1. Docstring typo fix. The docstring said ``max_iterations`` was a ``float`` in the built-in defaults. It's an ``int`` — ``DEFAULT_CONFIG["delegation"]["max_iterations"] = 50``. Reworded the sentence and dropped the misleading type qualifier. 2. ``test_end_to_end_resolve_uses_fresh_config`` no longer depends on real provider resolution. Replaced the ad-hoc ``os.environ.setdefault("ANTHROPIC_API_KEY", ...)`` (which doesn't override an existing empty value and still calls into the real ``resolve_runtime_provider`` pipeline) with: * ``patch.dict(os.environ, ...)`` to force-set ``ANTHROPIC_API_KEY`` and ``HERMES_HOME`` atomically inside the block, and * ``patch(hermes_cli.runtime_provider.resolve_runtime_provider)`` returning a fixed credential bundle, so the test is hermetic and its outcome only depends on the code path under test — ``_load_config`` → ``_resolve_delegation_credentials`` → ``effective_model`` selection. Validation: ``source venv/bin/activate && python -m pytest tests/tools/test_delegate.py::TestLoadConfigFreshReads -q`` -> 5 passed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Addressed both Copilot review points in 1. Docstring type fix — reworded the sentence so the description of the built-in-defaults dict no longer labels 2. End-to-end test hermeticity — replaced the environment-dependent
Validation: The unrelated That test exercises |
…xplicit provider When a subagent is configured with a custom base_url but no explicit provider, it should use 'custom' as the provider rather than inheriting the parent's provider. This fixes credential resolution for subagents using custom endpoints like NVIDIA NIM. Also fixes _load_config() to always read fresh from config.yaml instead of using the stale CLI_CONFIG cache. This ensures delegation config changes are picked up without restarting the CLI. Fixes subagent credential resolution when using custom base_url without explicit provider (complements PR NousResearch#12053).
|
Closing to keep the queue clean — happy to reopen if this is still useful. |
What does this PR do?
Fixes #11999.
delegate_tasksilently ignoreddelegation.model/delegation.provideroverrides in~/.hermes/config.yaml— subagents kept running on the parent's model regardless of what the user configured, defeating the whole point of cost-tiered delegation (Sonnet subagents under an Opus parent, say). The reporter's exact scenario:Parent on
claude-opus-4-6. Expected: child runs on Sonnet. Actual: child runs on Opus; result metadata confirms"model": "claude-opus-4-6". Reporter also observed "persists across /reset commands and config reloads" — an important symptom that points at the actual root cause.Root cause
tools/delegate_tool.py::_load_config()short-circuited throughcli.CLI_CONFIG, a dict populated once at module-import time:Two independent bugs in this short-circuit:
Cache staleness.
CLI_CONFIGis built bycli.load_cli_config()at module import and is never re-read. An edit to~/.hermes/config.yamlafter the CLI starts — including adding a newdelegation:section — is invisible todelegate_task./resetand config-reload commands don't re-runload_cli_config, which is exactly the reporter's "persists across /reset" symptom.Truthy-but-empty defaults always win. cli.py populates
CLI_CONFIG["delegation"]with built-in defaults ({"max_iterations": 45, "default_toolsets": [...], "model": "", "provider": "", "base_url": "", "api_key": ""}) even when the user's config.yaml has no delegation section at all. That's a truthy dict (non-empty), soif cfg: return cfgfires and the persistenthermes_cli.config.load_config()path is never reached. Downstream,configured_model = str(cfg.get("model") or "").strip() or Noneresolves toNone,_resolve_delegation_credentialssetsprovider: None(inherit-from-parent), and_build_child_agentdoeseffective_model = model or parent_agent.model→ the parent model wins.Reproduced in a minimal sandbox:
Fix
Drop the stale
CLI_CONFIGshort-circuit. Always read fresh viahermes_cli.config.load_config():Also hardens against a malformed
delegation: not-a-dictvalue — the old code returned the non-dict verbatim, which then crashed_resolve_delegation_credentialson.get("model"). Coerce to{}so callers safely fall back to inherit-from-parent defaults.Precedence / compat table
delegation.modelafter CLI startupdelegate_taskdelegation:sectionload_config()returns DEFAULT_CONFIG delegation → inherit-from-parent (same result)delegation.model: ""explicitlydelegation: not-a-dict(malformed).get()→ AttributeError{}→ inherit-from-parentCLI_CONFIG["delegation"]Narrow scope — explicitly not changed
_resolve_delegation_credentials— downstream of_load_config, already correct given a fresh config dict. No change needed._build_child_agent/AIAgent.__init__— same; the bug is purely at the config-read layer.cli.CLI_CONFIG— left as-is. Grep confirms no runtime writer toCLI_CONFIG["delegation"]anywhere, so dropping its use in_load_configloses no runtime-only data.load_config()is a single YAML parse pluscopy.deepcopy(DEFAULT_CONFIG), measured <1 ms.delegate_taskis called at most once per user turn; the cost is negligible.Pre-empt reviewer Q&A
Q: Why not just re-populate
CLI_CONFIGon file-change or /reset?That would solve this bug but introduces a much larger invariant to maintain (every mutation point of every config section would have to either poll or be hooked). Reading fresh per-call is simpler, has a localized fix, and matches how gateway/cron already operate — so we're unifying instead of diverging further.
Q: Could this regress any path that relied on
CLI_CONFIG's import-time snapshot?Grepped every reader of
CLI_CONFIG["delegation"]— onlytools/delegate_tool.py::_load_config(). Nothing else reads this section, nothing writes to it at runtime.Q:
load_config()callsensure_hermes_home(). Side-effect concerns like #11983?For delegate_task specifically:
ensure_hermes_home()creates~/.hermessubdirs and seedsSOUL.mdif they don't exist. This is idempotent after first call.delegate_taskis only invoked from an AIAgent tool-dispatch loop where the parent agent has already initialized~/.hermes(it couldn't have gotten to tool-dispatch otherwise). So the side-effect is a confirmed no-op in this call context. Additionally, the pre-fix fallback path (which fires in gateway/cron) already usedload_config(), so this side effect was already part of the function's behaviour on the only paths that mattered.Q: Why coerce non-dict delegation to
{}instead of raising?_load_configis best-effort at all other exit points (catches any exception →{}). Keeping that contract means callers don't have to learn a new failure mode — they still fall through to inherit-from-parent. A warning-log could be nice follow-up but the behavioural contract is maintained.Regression coverage
tests/tools/test_delegate.pygetsTestLoadConfigFreshReads(5 tests):test_ignores_stale_cli_config_cache— plants a staleclimodule with empty-defaultCLI_CONFIGinsys.modules, writes an override toconfig.yaml, asserts_load_config()returns the override. Fails onorigin/mainwith the exact reporter symptom:AssertionError: '' != 'claude-sonnet-4-20250514'.test_picks_up_delegation_override_written_after_startup— end-user scenario: editconfig.yaml, call_load_config, get the edit.test_non_dict_delegation_section_is_coerced_to_empty— hardens.get()callers downstream; also fails onorigin/main(old code returned the non-dict string verbatim).test_returns_dict_when_config_missing_delegation_section— pins graceful handling whendelegation:isn't configured at all.test_end_to_end_resolve_uses_fresh_config— integration through_resolve_delegation_credentialsall the way toeffective_model = creds["model"] or parent.model.How to test
Focused suite:
→ 77 passed (72 pre-existing + 5 new).
Verify the regression is caught on origin/main:
→ 2 fail (
test_ignores_stale_cli_config_cache,test_non_dict_delegation_section_is_coerced_to_empty). Restore withgit stash pop.Manual repro (the sandbox script above), before & after.
CI-aligned broad suite:
→ 12824 passed, 39 skipped, 15 pre-existing baseline failures (dingtalk ×N, matrix, wsl ×2, approve_deny ×2, file_staleness ×2, local_interrupt_cleanup, plus a couple of known flakes in tools/). All reproduce on clean
origin/main; none intools/delegate_tool.pyortests/tools/test_delegate.py.Tested on: macOS 15 (Darwin 25.5.0), Python 3.11.15.
Related Issue
Fixes #11999
Type of Change
Changes Made
tools/delegate_tool.py: rewrite_load_configto always read fresh fromhermes_cli.config.load_config(); add defensive dict-coercion against malformed config values.tests/tools/test_delegate.py: addTestLoadConfigFreshReads— 5 regression tests (2 fail cleanly onorigin/main).Adjacent surfaces checked
_resolve_delegation_credentials— downstream; verified correct given fresh config. Unchanged._build_child_agent— receivescreds["model"]; verified it correctly wins overparent_agent.modelwhen present. Unchanged.AIAgent.__init__model resolution (run_agent.py:708,:772) — preserves user-supplied model. Unchanged.preserve_dots=self._anthropic_preserve_dots()call sites — unrelated. Unchanged.cli.CLI_CONFIGwriters — grep-confirmed none fordelegation. No data loss from dropping the short-circuit.Checklist
Code
fix(scope):)origin/mainwithout the fix)Documentation & Housekeeping
delegation.model/delegation.providerkeys behave as their docstrings already promisedNotes for reviewers
python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto(matches.github/workflows/tests.yml).hermes_cli/**changes — the Nix workflow should not trigger.