Skip to content

fix(cron): sanitize invisible unicode in vetted skill content instead of hard-blocking#37245

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-261b4548
Jun 2, 2026
Merged

fix(cron): sanitize invisible unicode in vetted skill content instead of hard-blocking#37245
teknium1 merged 1 commit into
mainfrom
hermes/hermes-261b4548

Conversation

@teknium1

@teknium1 teknium1 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Cron jobs that load a skill whose body contains a stray invisible-unicode character (zero-width space U+200B, BOM, bidi control) were permanently dead — the skills-attached prompt scan hard-blocked on any such char, even though skill bodies are already install-time vetted by skills_guard.py and these chars routinely appear in copy-pasted unicode docs / code examples.

The skills path now strips invisibles (logging the codepoints) and runs the cleaned prompt. The raw user-prompt path keeps the hard block — that is the actual #3968 injection surface, where a tiny directive prompt with a ZWSP is a smoking gun, not prose.

Changes

  • tools/cronjob_tools.py: add _strip_invisible_unicode() (preserves emoji ZWJ); _scan_cron_skill_assembled() now sanitizes invisibles instead of blocking and returns (cleaned_prompt, error).
  • cron/scheduler.py: _scan_assembled_cron_prompt() uses the cleaned prompt for the skills path; raw-prompt path unchanged (still hard-blocks).
  • Tests updated for the new contract + sanitize-not-block behavior.

Why not the report's A/B/C re-architecture

The scan never touched the system prompt / memories / SOUL files — it only ever scanned cron prompt + loaded skill content. No layer rework needed; the fix is scoped to the one path that false-positives.

Validation

Scenario Before After
Skill body w/ stray U+200B cron permanently blocked char stripped, job runs
Raw user prompt w/ U+200B blocked still blocked (injection surface)
Skill w/ injection hidden behind U+200B blocked still blocked (sanitize doesn't bypass)
Emoji ZWJ in skill allowed allowed (preserved)

E2E: planted-skill _build_job_prompt() run for all three cases — sanitized build, raw block, injection block all confirmed.
Targeted suites: tests/tools/test_cronjob_tools.py + tests/cron/test_cron_prompt_injection_skill.py → 70/70 pass.

Infographic

cron-sanitizes-invisible-unicode

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-261b4548 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9637 on HEAD, 9637 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4991 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

… of hard-blocking

A stray zero-width space (U+200B), BOM, or bidi control in loaded skill
markdown permanently killed any cron that loaded it. The skills-attached
assembled-prompt scan hard-blocked on any invisible-unicode char, even
though skill bodies are already install-time vetted by skills_guard.py and
the chars commonly appear in copy-pasted unicode docs / code examples.

The skills path now strips invisibles (logging the codepoints) and runs the
cleaned prompt. The raw user-prompt path (_scan_cron_prompt) keeps the hard
block — that is the actual #3968 injection surface, where a small directive
prompt with a ZWSP is a smoking gun, not prose. Stripping does not let a real
injection slip through: the directive still matches after sanitization.

_scan_cron_skill_assembled now returns (cleaned_prompt, error).
@teknium1 teknium1 force-pushed the hermes/hermes-261b4548 branch from dc8d79b to 53be136 Compare June 2, 2026 06:39
@teknium1 teknium1 merged commit 2c0d648 into main Jun 2, 2026
23 checks passed
@teknium1 teknium1 deleted the hermes/hermes-261b4548 branch June 2, 2026 07:29
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
… of hard-blocking (NousResearch#37245)

A stray zero-width space (U+200B), BOM, or bidi control in loaded skill
markdown permanently killed any cron that loaded it. The skills-attached
assembled-prompt scan hard-blocked on any invisible-unicode char, even
though skill bodies are already install-time vetted by skills_guard.py and
the chars commonly appear in copy-pasted unicode docs / code examples.

The skills path now strips invisibles (logging the codepoints) and runs the
cleaned prompt. The raw user-prompt path (_scan_cron_prompt) keeps the hard
block — that is the actual NousResearch#3968 injection surface, where a small directive
prompt with a ZWSP is a smoking gun, not prose. Stripping does not let a real
injection slip through: the directive still matches after sanitization.

_scan_cron_skill_assembled now returns (cleaned_prompt, error).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant