Skip to content

[Bug]: _build_skill_message fallback rglob can inject thousands of files into context from nested dependency dirs #18675

@SimbaKingjoe

Description

@SimbaKingjoe

Bug Description

agent/skill_commands.py:179-186 contains an unfiltered rglob("*") fallback that recursively scans supporting file directories. When skill_view() returns linked_files: null (which happens when none of the four subdirectories — references/, templates/, scripts/, assets/ — contain files matching skill_view's extension-based patterns), the fallback kicks in and scans all four subdirectories with rglob("*") — no directory exclusions, no depth limit, no file count cap.

If a skill has e.g. scripts/node_modules/ or scripts/.venv/ with thousands of files, they all get listed in the supporting files block and injected into the context.

Root cause: scan rule mismatch

skill_view uses non-recursive glob(ext) for scripts/ and references/:

# skills_tool.py:1216 — NON-recursive, extension-filtered
for ext in ["*.py", "*.sh", "*.bash", "*.js", "*.ts", "*.rb"]:
    script_files.extend([str(f.relative_to(skill_dir)) for f in scripts_dir.glob(ext)])

The fallback in _build_skill_message uses recursive rglob("*") with no filter:

# skill_commands.py:183 — RECURSIVE, unfiltered
for f in sorted(subdir_path.rglob("*")):
    if f.is_file() and not f.is_symlink():
        rel = str(f.relative_to(skill_dir))
        supporting.append(rel)

When script_files is empty (e.g. scripts/ only contains node_modules/ with zero top-level .py/.sh/.js files), skill_view returns linked_files: null, and the unfiltered rglob fallback takes over.

Steps to Reproduce

# 1. Create a skill with a nested dependency directory
mkdir -p ~/.hermes/skills/test-rglob/scripts/node_modules/fake-dep/lib
cat > ~/.hermes/skills/test-rglob/SKILL.md << 'EOF'
---
name: test-rglob
description: Test skill
---
# Test
EOF

# 2. Populate node_modules with many files (no top-level .py/.sh/.js in scripts/)
for i in $(seq 1 5000); do
  echo "// file $i" > ~/.hermes/skills/test-rglob/scripts/node_modules/fake-dep/lib/module_$i.js
done

# 3. Invoke the skill — context gets 5000+ file paths injected
# The supporting files block in the skill message will list every file

Expected Behavior

  1. rglob should skip well-known large directories: .git, node_modules, .venv, venv, __pycache__, .tox, .pytest_cache, .mypy_cache, .ruff_cache
  2. There should be a hard cap on supporting file count (e.g. 200)
  3. Empty files should be skipped
  4. The scanning rules between skill_view and the fallback should be consistent — if skill_view intentionally uses non-recursive glob, the fallback shouldn't silently upgrade to recursive rglob

Actual Behavior

The fallback unconditionally traverses every subdirectory with rglob("*"), potentially dumping tens of thousands of file paths into the skill message. This inflates context usage (token cost + cache invalidation), and in extreme cases can cause the agent call to fail due to context overflow.

Affected Component

  • Primary: agent/skill_commands.py:179-186_build_skill_message() fallback supporting-files scan
  • Related: tools/skills_tool.py:1210skill_view() also uses rglob("*") for assets/, same vulnerability but less likely to trigger since assets/ rarely contains dependency dirs
  • Downstream: build_skill_invocation_message() and build_preloaded_skills_prompt() — both call _build_skill_message

Debug Report

N/A — no crash or stack trace. The bug manifests as silently inflated context with thousands of unnecessary file paths. The agent call may succeed but incurs significant token overhead and cache invalidation.

Proposed Fix

--- a/agent/skill_commands.py
+++ b/agent/skill_commands.py
@@ -176,11 +176,30 @@ def _build_skill_message(
         if isinstance(entries, list):
             supporting.extend(entries)

+    _SKIP_DIRS = frozenset({
+        ".git", ".hg", ".svn",
+        "node_modules", ".venv", "venv", ".tox",
+        "__pycache__", ".pytest_cache", ".mypy_cache", ".ruff_cache",
+    })
+    _MAX_SUPPORTING_FILES = 200
+
     if not supporting and skill_dir:
         for subdir in ("references", "templates", "scripts", "assets"):
             subdir_path = skill_dir / subdir
             if subdir_path.exists():
                 for f in sorted(subdir_path.rglob("*")):
-                    if f.is_file() and not f.is_symlink():
+                    if any(part in _SKIP_DIRS for part in f.parts):
+                        continue
+                    if f.is_file() and not f.is_symlink() and f.stat().st_size > 0:
                         rel = str(f.relative_to(skill_dir))
                         supporting.append(rel)
+                        if len(supporting) >= _MAX_SUPPORTING_FILES:
+                            supporting.append(
+                                f"... (truncated, {_MAX_SUPPORTING_FILES} file limit)"
+                            )
+                            break
+                if len(supporting) >= _MAX_SUPPORTING_FILES:
+                    break

Operating System

macOS (platform-independent bug — affects all OS)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertool/skillsSkills system (list, view, manage)type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions