Skip to content

fix: surface 'Always Approve' silent-failure when ruamel.yaml is missing (#27660)#27679

Open
xxxigm wants to merge 3 commits into
NousResearch:mainfrom
xxxigm:fix/27660-save-config-silent-failure
Open

fix: surface 'Always Approve' silent-failure when ruamel.yaml is missing (#27660)#27679
xxxigm wants to merge 3 commits into
NousResearch:mainfrom
xxxigm:fix/27660-save-config-silent-failure

Conversation

@xxxigm

@xxxigm xxxigm commented May 18, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes the "Always Approve" silent-failure for destructive slash commands (/new, /reset, /undo, /clear) and the parallel /reload-mcp flow when ruamel.yaml is missing from the active venv.

Before this PR:

  • cli.save_config_value() called utils.atomic_roundtrip_yaml_update() which imports ruamel.yaml at function scope. When the import failed, save_config_value caught the bare ImportError, logged a generic "Failed to save config" line at ERROR level, and returned False.
  • gateway/run.py's _on_confirm handler ignored the return value, unconditionally logged "User opted out of destructive slash confirm", and appended "Future /clear, /new, /reset, and /undo will run without confirmation" to the user-visible reply.
  • Net effect: a user clicking "Always Approve" got the success-shaped reply, the gateway logged "opted out", but ~/.hermes/config.yaml was never written. The confirmation prompt reappeared on the next /new, and the user could click "Always Approve" indefinitely with zero effect.

After this PR the failure is detected, surfaced, and actionable.

Related Issue

Fixes #27660.

The same code shape exists in the /reload-mcp confirmation handler (_handle_mcp_reload), and is fixed in the same PR to keep the two parallel paths in sync.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Security fix
  • Documentation update
  • Tests (adding or improving test coverage)
  • Refactor (no behavior change)
  • New skill (bundled or hub)

Changes Made

utils.py — typed exception at the import site:

  • Added MissingYamlRoundtripDependency(ImportError). Its message contains the actionable pip-install hint (pip install ruamel.yaml==0.18.17) and preserves the original ImportError on .original.
  • atomic_roundtrip_yaml_update now wraps the function-local from ruamel.yaml import YAML in a try/except ImportError and raises MissingYamlRoundtripDependency instead of the bare module-not-found.

cli.py — structured success/failure for callers that need it:

  • New save_config_value_detailed(key, value) -> (bool, str | None) that surfaces a user-presentable error string on failure (with the install hint when the cause is missing ruamel.yaml; with TypeName: msg for other exceptions).
  • Existing save_config_value is preserved as a bool-only alias that delegates to save_config_value_detailed. Every existing caller stays binary-compatible.

gateway/run.py — both destructive-slash and /reload-mcp _on_confirm handlers:

  • Switched to save_config_value_detailed.
  • logger.info("User opted out...") now fires only after persist succeeded; failures get a logger.warning with the underlying reason.
  • The success/failure branches now produce different user replies:
    • Success: existing "Future /clear, /new, /reset, and /undo will run without confirmation. Re-enable via approvals.destructive_slash_confirm: true" note preserved.
    • Failure: replaced with "⚠️ Could not save the opt-out preference, so the confirmation prompt will keep appearing. Reason: <hint>. Fix: add approvals.destructive_slash_confirm: false to ~/.hermes/config.yaml manually."
  • Unexpected exceptions raised from save_config_value_detailed itself (rather than just a False return) are also caught and surfaced through the same warning block.

Tests — 11 new regression tests, 1 existing test updated:

  • tests/cli/test_cli_save_config_value.py (+8): TestSaveConfigValueDetailed pins the (bool, str|None) contract for success and the two failure modes; TestMissingYamlRoundtripDependency pins the typed-exception surface (subclass of ImportError, message contents, .original preservation, end-to-end via sys.modules['ruamel'] blocker).
  • tests/gateway/test_destructive_slash_confirm.py (+3): the direct bug: 'Always Approve' for destructive slash commands silently fails when ruamel.yaml is missing from venv #27660 regression test (failure must surface "Could not save" + the reason, and the misleading "without confirmation" note must be absent); the happy-path regression guard (success keeps the existing note, no warning); a coverage test for the raise-instead-of-return False path.
  • The pre-existing test_resolve_always_persists_opt_out_and_runs_execute was updated to patch save_config_value_detailed (the new symbol the gateway calls).

How to Test

  1. Check out the branch and set up the venv:

    python3 -m venv .venv && source .venv/bin/activate && pip install -e ".[all,dev]"
    
  2. Reproduce the original bug: 'Always Approve' for destructive slash commands silently fails when ruamel.yaml is missing from venv #27660 scenario locally:

    pip uninstall -y ruamel.yaml ruamel.yaml.clib
    python -c "from cli import save_config_value_detailed; \
               print(save_config_value_detailed('approvals.destructive_slash_confirm', False))"
    

    Expected: (False, 'ruamel.yaml is required to update user config files atomically (this is a declared core dependency in pyproject.toml). Re-install with: pip install ruamel.yaml==0.18.17. Underlying ImportError: ...')

    Re-install ruamel.yaml and confirm success:

    pip install ruamel.yaml==0.18.17
    python -c "from cli import save_config_value_detailed; \
               print(save_config_value_detailed('approvals.destructive_slash_confirm', False))"
    

    Expected: (True, None) and ~/.hermes/config.yaml updated.

  3. Run the new + existing regression tests:

    scripts/run_tests.sh tests/cli/test_cli_save_config_value.py tests/gateway/test_destructive_slash_confirm.py
    

    Expected: 25 passed (14 pre-existing + 11 new).

  4. Run the wider approvals/gates suite to confirm no cross-file regression:

    scripts/run_tests.sh tests/cli/test_cli_save_config_value.py \
                         tests/gateway/test_destructive_slash_confirm.py \
                         tests/hermes_cli/test_destructive_slash_confirm_gate.py \
                         tests/hermes_cli/test_mcp_reload_confirm_gate.py
    

    Expected: 35 passed.

  5. (Optional) end-to-end against a Telegram bot: with ruamel.yaml removed, send /new, click "Always Approve", confirm the reply now contains the "⚠️ Could not save the opt-out preference" block instead of the misleading success note.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (3 commits: fix(utils,cli), fix(gateway), test(cli,gateway))
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run scripts/run_tests.sh tests/cli/test_cli_save_config_value.py tests/gateway/test_destructive_slash_confirm.py and all tests pass
  • I've added tests for my changes (11 new regression tests, 1 existing test migrated)
  • I've tested on my platform: macOS 15.2 (Darwin 24.6.0), Python 3.12

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — docstrings on MissingYamlRoundtripDependency, save_config_value_detailed, and the updated save_config_value describe the new contract.
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A (no config-key changes; only persistence-path error handling)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — the new code path is pure Python; tests use tmp_path + monkeypatch and are hermetic.
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

$ scripts/run_tests.sh tests/cli/test_cli_save_config_value.py tests/gateway/test_destructive_slash_confirm.py
4 workers [25 items]
.........................                                                [100%]
============================== 25 passed in 1.65s ==============================

$ scripts/run_tests.sh tests/cli/test_cli_save_config_value.py \
                       tests/gateway/test_destructive_slash_confirm.py \
                       tests/hermes_cli/test_destructive_slash_confirm_gate.py \
                       tests/hermes_cli/test_mcp_reload_confirm_gate.py
4 workers [35 items]
...................................                                      [100%]
============================== 35 passed in 1.79s ==============================

Failure path verified manually (with ruamel.yaml uninstalled):

>>> save_config_value_detailed("approvals.destructive_slash_confirm", False)
(False,
 "ruamel.yaml is required to update user config files atomically "
 "(this is a declared core dependency in pyproject.toml). "
 "Re-install with: pip install ruamel.yaml==0.18.17. "
 "Underlying ImportError: No module named 'ruamel'")

xxxigm added 3 commits May 18, 2026 07:28
When ``ruamel.yaml`` is absent from the active venv (rare but real for
``pip install -e . --no-deps`` and partial installs), every call to
``atomic_roundtrip_yaml_update`` -- the only writer for user-edited
``config.yaml`` -- raised a bare ``ModuleNotFoundError: No module named
'ruamel'`` that propagated up to ``save_config_value`` in cli.py,
which caught ``Exception`` and returned ``False`` to its caller.  The
caller (typically the gateway's "Always Approve" handler for
destructive slash commands) had no way to tell a dependency failure
apart from a disk-full or permission error, so the failure was logged
as a generic "Failed to save config" line that users never see.

This change introduces two pieces of contract:

1. ``utils.MissingYamlRoundtripDependency`` -- a typed ``ImportError``
   subclass raised by ``atomic_roundtrip_yaml_update`` when ruamel.yaml
   is unimportable.  The exception message includes the actionable
   pip-install hint (``pip install ruamel.yaml==0.18.17``) so users hit
   by the issue can self-heal without grepping source.

2. ``cli.save_config_value_detailed(key, value) -> (bool, str | None)``
   -- a new variant of ``save_config_value`` that surfaces the
   underlying failure reason as a user-presentable string.  The
   existing ``save_config_value`` is preserved as a thin bool-only
   alias so every call site already in the codebase keeps working
   without modification.

Part of the fix for NousResearch#27660.  Behaviour change for callers: opt-in via
the new ``save_config_value_detailed`` API.
…e slash + reload-mcp

Both ``_maybe_confirm_destructive_slash`` and ``_handle_mcp_reload``
in gateway/run.py contained the same silent-failure pattern (NousResearch#27660):

  * call ``save_config_value("approvals.X", False)``
  * log ``"User opted out..."`` on the next line
  * unconditionally append a reply note claiming "future runs will
    skip the prompt"
  * catch any exception and log it at WARNING level

Because ``save_config_value`` returns ``False`` rather than raising
when the write fails (e.g. ruamel.yaml missing from the venv), the
log + reply note fired for every click of "Always Approve" regardless
of whether the opt-out had actually been persisted.  Users could click
the button indefinitely with no effect and no visible error.

This change wires both handlers into the new
``save_config_value_detailed`` API:

  * The "opted out" INFO log now fires only when persist succeeded.
  * A WARNING is logged with the underlying reason on failure.
  * The reply tells the user honestly what happened: success keeps the
    existing "future runs no confirmation" note; failure replaces it
    with a "⚠️ Could not save the opt-out preference" block that
    includes the actual error reason and a manual-edit fallback path.
  * Unexpected exceptions from ``save_config_value_detailed`` itself
    (rather than just a ``False`` return) are also surfaced now -- the
    same warning-block path catches them.

Fixes NousResearch#27660.
…ousResearch#27660)

Add 11 new regression tests across two files to prevent the NousResearch#27660
silent-failure pattern from coming back.

tests/cli/test_cli_save_config_value.py (+8):

  * TestSaveConfigValueDetailed -- pin the (bool, str|None) contract
    for the new save_config_value_detailed helper: success returns
    (True, None); ruamel.yaml missing returns (False, msg) where msg
    contains both "ruamel.yaml" and the "pip install ruamel.yaml"
    hint; arbitrary OSError still produces a typed message; the
    bool-only save_config_value alias keeps returning a bare bool so
    every existing caller stays compatible.

  * TestMissingYamlRoundtripDependency -- pin the typed exception
    surface: subclass of ImportError; message contains the install
    hint; .original preserves the underlying ImportError; raised
    end-to-end by atomic_roundtrip_yaml_update when ruamel.yaml is
    unimportable (via sys.modules['ruamel'] = blocker monkeypatch).

tests/gateway/test_destructive_slash_confirm.py (+3):

  * test_resolve_always_surfaces_persist_failure_to_user -- the
    direct NousResearch#27660 regression test: when save fails, the user reply
    contains "Could not save the opt-out preference" plus the
    underlying reason (e.g. "ruamel.yaml"), AND the misleading
    "future runs will skip without confirmation" note is absent.

  * test_resolve_always_no_warning_when_persist_succeeds -- happy
    path: existing success note is preserved, no "Could not save"
    warning appears.

  * test_resolve_always_handles_unexpected_exception -- if
    save_config_value_detailed *raises* rather than returning False,
    the handler still produces a coherent user reply (no stack
    trace escapes).

The pre-existing test_resolve_always_persists_opt_out_and_runs_execute
was updated to patch save_config_value_detailed (the new symbol the
gateway calls) as well as save_config_value (kept for any indirect
callers).
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery area/config Config system, migrations, profiles labels May 18, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the focused bug fix — I verified the underlying premise still holds on current main.

Problems

  • The /reload-mcp gateway code moved since this PR was opened. Current main defines it in gateway/slash_commands.py:3194, and that path still ignores save_config_value(...) at gateway/slash_commands.py:3235 before always appending the opt-out follow-up at gateway/slash_commands.py:3245. This PR edits gateway/run.py, so that half needs porting.
  • A sibling TUI /reload-mcp path has the same silent-persistence shape: tui_gateway/server.py:7688-7690 calls save_config_value and ignores its boolean result, while ui-tui/src/app/slash/commands/ops.ts:108-112 prints “future /reload-mcp will run without confirmation” whenever params.always is true.

Suggested changes

  • Port the gateway /reload-mcp fix into gateway/slash_commands.py on current main and add a regression test for the save_config_value(...) -> False branch.
  • Either handle the TUI /reload-mcp failure path too, or explicitly leave it as a follow-up so maintainers can decide scope.
  • If you want the CLI path to match the issue comment’s guidance, use the new detailed helper at cli.py:8646 and cli.py:8712 so terminal users see the actionable missing-ruamel.yaml reason.

Automated hermes-sweeper review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: 'Always Approve' for destructive slash commands silently fails when ruamel.yaml is missing from venv

3 participants