Skip to content

[Bug]: scan_skill_commands unconditionally clears _skill_commands before try block, silently loses all skills on scan failure #18659

@SimbaKingjoe

Description

@SimbaKingjoe

Bug Description

agent/skill_commands.py:222 unconditionally sets _skill_commands = {} before entering the try block that populates it. If any exception occurs between the clear and the first successful skill addition, all 90+ skill slash commands are silently lost with zero user-facing error.

# agent/skill_commands.py:221-227
global _skill_commands
_skill_commands = {}           # ← cleared BEFORE try — this is the problem
try:
    from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, ...
    from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
    disabled = _get_disabled_skill_names()
    ...
except Exception:
    pass                        # ← exception swallowed, _skill_commands stays empty
return _skill_commands

The _skill_commands = {} assignment at line 222 executes unconditionally — then the entire population logic lives inside a try/except Exception: pass that silently discards any failure. When the exception path is taken, the global stays empty.

Steps to Reproduce

  1. Make the skills directory unreadable: chmod 000 ~/.hermes/skills
  2. Start a new hermes session: hermes
  3. Observe that all skill slash commands are unavailable (e.g. /apple-notes, /claude-code, etc.)
  4. (Optionally restore permissions: chmod 755 ~/.hermes/skills, then run /reload-skills — skills will reappear because reload_skills() snapshots before={} and the rescan now succeeds, showing all 90 skills as "added")

Alternatively, insert a temporary raise ImportError between lines 222 and 223 to simulate a broken import chain, then trigger /reload-skills.

Expected Behavior

If scanning fails, the previously cached _skill_commands should be preserved. The user should still be able to use all previously loaded skill slash commands. An error should be surfaced (to the log, and ideally to the user) so the failure doesn't go unnoticed.

Actual Behavior

  • _skill_commands is unconditionally cleared to {} before the scan attempt.
  • If scanning fails, the exception is silently swallowed (except Exception: pass).
  • The user sees No new skills detected. 📚 0 skill(s) available or an alarming "90 skills removed" diff, with no indication that an error occurred.
  • All skill slash commands become unavailable.

Affected Component

agent/skill_commands.pyscan_skill_commands() function (line 221-277), and by extension reload_skills() (line 287-349) and get_skill_commands() (line 280-284).

Also affected: cli.py:1868 (module-level _skill_commands = scan_skill_commands()), cli.py:6490 (/reload-skills handler), gateway/run.py:4890 (gateway /reload-skills handler), tui_gateway/server.py:4008 (TUI skill command listing).

Debug Report

N/A — no crash or stack trace. The bug manifests as silent data loss due to the except Exception: pass on line 275-276. Log inspection at ~/.hermes/logs/agent.log would show no error.

Proposed Fix

Build the new dict in a local variable, assign to the global only on success:

def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
    global _skill_commands
    new_commands: Dict[str, Dict[str, Any]] = {}
    try:
        ...
        # build new_commands instead of _skill_commands throughout
    except Exception:
        pass  # keep old _skill_commands intact; log the error
    else:
        _skill_commands = new_commands
    return _skill_commands

Operating System

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertool/skillsSkills system (list, view, manage)type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions