Skip to content

fix: preserve skill command cache on scan failure in scan_skill_commands()#18720

Open
henrytran1803 wants to merge 1 commit into
NousResearch:mainfrom
henrytran1803:fix/scan-skill-commands-preserve-cache
Open

fix: preserve skill command cache on scan failure in scan_skill_commands()#18720
henrytran1803 wants to merge 1 commit into
NousResearch:mainfrom
henrytran1803:fix/scan-skill-commands-preserve-cache

Conversation

@henrytran1803

Copy link
Copy Markdown

Summary

Fixes #18659

scan_skill_commands() unconditionally cleared the module-level _skill_commands dict to {} before entering the try block. When scanning failed (e.g. unreadable directory, broken import chain), the exception was silently swallowed by except Exception: pass, leaving the global empty — all 90+ skill slash commands were lost with zero user-facing error.

Changes

agent/skill_commands.pyscan_skill_commands():

  • Build the new command dict in a local variable (new_commands) instead of writing directly to the global
  • Assign to _skill_commands only on success (via else clause on the try/except)
  • On failure, log a warning (logger.warning("skill scan failed; preserving previous command cache", exc_info=True)) instead of silently discarding the error

tests/agent/test_skill_commands.pyTestScanPreservesCacheOnFailure:

  • test_outer_exception_preserves_existing_commands — verifies previously-cached commands survive a failed scan
  • test_outer_exception_logs_warning — verifies the warning is logged (no more silent failure)
  • test_empty_scan_replaces_cache — verifies successful empty scans still correctly replace the cache (no false preservation)
  • test_reload_skills_preserves_commands_on_scan_failure — verifies reload_skills() diff does not report skills as "removed" when the scan fails

Testing

All 45 existing + 4 new tests pass:

tests/agent/test_skill_commands.py  (39 tests — 35 existing + 4 new)
tests/agent/test_skill_commands_reload.py  (6 tests)

Zero regressions.

Design rationale

The fix follows the pattern suggested in the issue: build locally, assign on success. This is the same atomicity pattern used by reload_skills() (which snapshots before state and diffs against it). The else clause ensures the global is only touched when the scan completes without raising — if any outer exception occurs, the previous cache is preserved intact.

…nds()

scan_skill_commands() unconditionally cleared the module-level
_skill_commands dict before entering the try block. When scanning failed
(e.g. unreadable directory, import error), the exception was silently
swallowed and all 90+ skill slash commands were lost with zero user-facing
error.

Build the new dict in a local variable and assign to the global only on
success. On failure, log a warning and keep the previous cache intact.

Closes NousResearch#18659
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder tool/skills Skills system (list, view, manage) P2 Medium — degraded but workaround exists labels May 2, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #18668 — same fix (atomic local-build-then-swap) for the same root cause in scan_skill_commands() clearing _skill_commands before the try block. Both fix #18659.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: scan_skill_commands unconditionally clears _skill_commands before try block, silently loses all skills on scan failure

2 participants