Skip to content

scan_skill_commands: unconditionally clears _skill_commands before try block #18656

@SimbaKingjoe

Description

@SimbaKingjoe

Description

agent/skill_commands.py:222 unconditionally sets _skill_commands = {} before entering the try block that populates it. If any exception occurs between the clear and the first successful skill addition, all skill slash commands are silently lost.

# agent/skill_commands.py:221-227
global _skill_commands
_skill_commands = {}           # ← cleared BEFORE try
try:
    from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, ...
    from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
    disabled = _get_disabled_skill_names()
    ...
except Exception:
    pass                        # ← exception swallowed, stays empty
return _skill_commands

Trigger conditions

In current production code, tools.skills_tool is already cached in sys.modules before scan_skill_commands() runs (imported during tool discovery at startup), so the specific from tools.skills_tool import ... line cannot fail. get_disabled_skill_names() and iter_skill_index_files() are also defensively written.

However, the bug can trigger when SKILLS_DIR exists but os.walk fails (permission error, broken filesystem). More importantly, it is a latent risk for any future code added between line 222 and the try block, or any refactor that changes the import order or scanning logic.

Impact

Scenario 1: startup permanently broken

  1. scan_skill_commands() called at startup -> exception -> _skill_commands = {}
  2. User runs /reload-skills -> reload_skills() snapshots before={}, scans fail again -> after={}
  3. User sees: No new skills detected. 0 skill(s) available
  4. All 90+ skill slash commands are gone with zero indication

Scenario 2: hot-reload after startup

  1. Startup succeeds -> 90 skills loaded
  2. Something causes scan_skill_commands() to fail on re-invocation
  3. /reload-skills -> before=90 skills snapshotted, after=0 -> "Removed 90 skills"
  4. User sees 90 skills removed, 0 skill(s) available -- looks intentional but cause is opaque

Fix

Build the new dict in a local variable, assign to the global only on success:

def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
    global _skill_commands
    new_commands: Dict[str, Dict[str, Any]] = {}
    try:
        ...
        # build new_commands instead of _skill_commands
    except Exception:
        pass  # keep old _skill_commands intact
    else:
        _skill_commands = new_commands
    return _skill_commands

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderduplicateThis issue or pull request already existstool/skillsSkills system (list, view, manage)type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions