fix(cron): split scanner so skill prose stops false-positiving exfil patterns by teknium1 · Pull Request #32339 · NousResearch/hermes-agent

teknium1 · 2026-05-26T01:13:01Z

Summary

Splits the cron prompt scanner into two tiers so legitimate skill markdown stops false-positiving on command-shape prose. Fixes the silent killing of every cron job that loads a skill containing security postmortems or runbooks.

The bug

Every PR-scout cron job (kilocode, codex, opencode, gemini-cli, cline, roocode, aider, openclaw, nanoclaw, ironclaw, claude-code — 11 jobs total) has been silently erroring out with:

Blocked: prompt matches threat pattern 'read_secrets'. Cron prompts must not contain injection or exfiltration payloads.

All 11 attach the bundled hermes-agent-dev skill. That skill contains a security postmortem section describing a previous vulnerability in prose:

namespace isolation blocked /proc/environ secret recovery, but the same child process could just cat ~/.hermes/.env. Locking the window while the door is open.

The runtime scanner added in #3968 doesn't distinguish "prose describing an attack" from "an actual exfiltration command." It just regex-matches cat\s+[^\n]*\.env against the assembled prompt and blocks.

This isn't theoretical — every PR-scout cron job has been broken for weeks. last_status: error on all 11.

Root cause

_scan_cron_prompt was designed for the small user-authored cron prompt at create/update time. Strict matching is correct there: a legit cron prompt has no reason to say cat .env or rm -rf /. #3968 then reused the same patterns to scan the assembled prompt at runtime — which includes loaded skill markdown. Skill bodies are large markdown documents that legitimately discuss attack patterns in security docs.

The strict pattern set has no way to tell prose from a command via regex.

Fix

Two pattern tiers, routed by context:

_scan_cron_prompt (strict, unchanged) — runs at create/update on the small user-authored prompt, and as runtime defense-in-depth when no skills are attached. All existing strict patterns preserved.
_scan_cron_skill_assembled (new, looser) — runs at runtime on the assembled prompt when skills are attached. Catches only patterns whose phrasing is unambiguous in any context:
- prompt_injection (ignore previous/all/above/prior instructions)
- disregard_rules (disregard your/all/any instructions/rules/guidelines)
- sys_prompt_override (system prompt override)
- deception_hide (do not tell the user)
- invisible-unicode markers
Command-shape patterns (read_secrets, ssh_backdoor, sudoers_mod, destructive_root_rm) are dropped because they false-positive on legitimate prose.

Skill bodies are already scanned at install time by skills_guard.py. The runtime cron scan is purely a tripwire for an obvious prompt-injection directive surviving a malicious skill install — not a substitute for install-time vetting.

Threat model

Still blocked:

Skill body containing literal ignore all previous instructions — prompt_injection
Skill body containing disregard your guidelines — disregard_rules
Skill body containing do not tell the user — deception_hide
Skill body containing invisible unicode markers — invisible_unicode_*
User cron prompt containing any strict pattern (incl. command-shape ones) at create or runtime

Now allowed (was the bug):

Skill prose describing attack commands ("the attacker could just cat ~/.hermes/.env") in postmortems, runbooks, security docs

Tests

tests/tools/test_cronjob_tools.py ........................................................ [56 tests, all passing]
tests/cron/test_cron_prompt_injection_skill.py ........................... [11 tests, all passing]
tests/tools/test_cron_prompt_injection.py ........ [8 tests, all passing]

Wider surface verified: 544/544 across tests/cron/, tests/tools/test_threat_patterns.py, tests/tools/test_memory_tool.py, tests/tools/test_cronjob_tools.py.

Changes:

Strict scanner tests (TestScanCronPrompt, 24 tests) preserved as-is — bare command patterns still block in user prompts.
New TestScanCronSkillAssembled class (6 tests) covers the looser scanner: clean passes, injection still blocks, invisible unicode still blocks, emoji ZWJ still allowed, descriptive attack-command prose now allowed, GitHub auth-header still allowed.
test_skill_with_env_exfil_payload_raises (which planted cat ~/.hermes/.env in a skill body and expected a block) replaced with test_skill_with_env_exfil_command_in_prose_is_allowed — same shape but documents the new correct behavior using the actual postmortem prose that caused the production bug.

Live verification

Confirmed end-to-end against the real broken job:

Job: kilocode-pr-scout, prompt-len=4747, skills=['hermes-agent-dev']
SUCCESS — assembled prompt length: 107041
Contains skill content marker: True

Was previously raising CronPromptInjectionBlocked: read_secrets and silently writing a "BLOCKED" output file every Monday.

Why not just whitelist the github skill again?

PR #3968 was followed by 783d117 ("avoid github skill false positives in scanner") — a narrow allowlist for curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com. This PR is the second wave of the same bug, this time hitting a different pattern. Rather than playing whack-a-mole with one allowlist per false-positive, this fix addresses the underlying conflation of "strict patterns for user prompts" with "patterns for skill markdown."

…sitiving The runtime cron prompt scanner (added in #3968 to plug the "malicious skill carrying an injection payload" gap) reuses the same critical-severity patterns as the create-time user-prompt scan against the *assembled* prompt — which includes loaded skill markdown. That works fine for narrow patterns like "ignore previous instructions" which never legitimately appear in prose. It catastrophically false- positives on command-shape patterns like `cat ~/.hermes/.env`, `authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely appear in security postmortems and runbooks as **descriptive prose** about attacks, not as actual commands. Concrete failure: the bundled `hermes-agent-dev` skill contains a security postmortem section saying "the attacker could just `cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill was silently blocked with `Blocked: prompt matches threat pattern 'read_secrets'`. All 11 scout jobs failed for weeks. Fix: split the scanner into two tiers and route by context: - `_scan_cron_prompt` (strict, unchanged behavior) runs against the small user-authored cron prompt at create/update and as a runtime defense-in-depth when no skills are attached. A legit user prompt has no business saying `cat .env`, so the strict patterns still apply there. - `_scan_cron_skill_assembled` (new, looser) runs against the assembled prompt when skills are attached. It only catches unambiguous prompt-injection directives ("ignore previous instructions", "disregard your rules", "system prompt override", "do not tell the user") plus invisible-unicode markers. Command- shape patterns are dropped because they false-positive on prose. This is defense-in-depth, not the only line of defense. Skill bodies are already scanned at install time by `skills_guard.py`; the runtime cron scan exists purely as a tripwire for an obvious injection directive surviving a malicious install. Catching prose mentions of commands was never the goal of #3968 — the test that planted a skill containing `cat ~/.hermes/.env` was the wrong shape of test for the threat model. Tests: - `_scan_cron_prompt` strict behavior preserved (56 existing tests unchanged: bare `cat .env`, `rm -rf /`, etc. still block). - New `TestScanCronSkillAssembled` class verifies the looser scanner: injection / disregard / system-override / do-not-tell-the-user / invisible-unicode still block; descriptive prose about attack commands is allowed; GitHub auth-header allowlist still works. - `test_skill_with_env_exfil_payload_raises` (planted `cat .env` in skill body) replaced with `test_skill_with_env_exfil_command _in_prose_is_allowed` documenting the new correct behavior with the real-world postmortem-style example that triggered the bug. - All 11 originally-failing PR-scout jobs validated end-to-end via `_build_job_prompt` — assembled prompts now build successfully with the `hermes-agent-dev` skill attached. Total: 75/75 tests in cron + cronjob_tools + threat scanner pass; 544/544 across the wider cron / memory / threat-pattern surface.

github-actions · 2026-05-26T01:13:40Z

🔎 Lint report: `fix/cron-scanner-skill-prose-fp` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9355 on HEAD, 9355 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4953 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…sitiving (NousResearch#32339) The runtime cron prompt scanner (added in NousResearch#3968 to plug the "malicious skill carrying an injection payload" gap) reuses the same critical-severity patterns as the create-time user-prompt scan against the *assembled* prompt — which includes loaded skill markdown. That works fine for narrow patterns like "ignore previous instructions" which never legitimately appear in prose. It catastrophically false- positives on command-shape patterns like `cat ~/.hermes/.env`, `authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely appear in security postmortems and runbooks as **descriptive prose** about attacks, not as actual commands. Concrete failure: the bundled `hermes-agent-dev` skill contains a security postmortem section saying "the attacker could just `cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill was silently blocked with `Blocked: prompt matches threat pattern 'read_secrets'`. All 11 scout jobs failed for weeks. Fix: split the scanner into two tiers and route by context: - `_scan_cron_prompt` (strict, unchanged behavior) runs against the small user-authored cron prompt at create/update and as a runtime defense-in-depth when no skills are attached. A legit user prompt has no business saying `cat .env`, so the strict patterns still apply there. - `_scan_cron_skill_assembled` (new, looser) runs against the assembled prompt when skills are attached. It only catches unambiguous prompt-injection directives ("ignore previous instructions", "disregard your rules", "system prompt override", "do not tell the user") plus invisible-unicode markers. Command- shape patterns are dropped because they false-positive on prose. This is defense-in-depth, not the only line of defense. Skill bodies are already scanned at install time by `skills_guard.py`; the runtime cron scan exists purely as a tripwire for an obvious injection directive surviving a malicious install. Catching prose mentions of commands was never the goal of NousResearch#3968 — the test that planted a skill containing `cat ~/.hermes/.env` was the wrong shape of test for the threat model. Tests: - `_scan_cron_prompt` strict behavior preserved (56 existing tests unchanged: bare `cat .env`, `rm -rf /`, etc. still block). - New `TestScanCronSkillAssembled` class verifies the looser scanner: injection / disregard / system-override / do-not-tell-the-user / invisible-unicode still block; descriptive prose about attack commands is allowed; GitHub auth-header allowlist still works. - `test_skill_with_env_exfil_payload_raises` (planted `cat .env` in skill body) replaced with `test_skill_with_env_exfil_command _in_prose_is_allowed` documenting the new correct behavior with the real-world postmortem-style example that triggered the bug. - All 11 originally-failing PR-scout jobs validated end-to-end via `_build_job_prompt` — assembled prompts now build successfully with the `hermes-agent-dev` skill attached. Total: 75/75 tests in cron + cronjob_tools + threat scanner pass; 544/544 across the wider cron / memory / threat-pattern surface.

…sitiving (NousResearch#32339) The runtime cron prompt scanner (added in NousResearch#3968 to plug the "malicious skill carrying an injection payload" gap) reuses the same critical-severity patterns as the create-time user-prompt scan against the *assembled* prompt — which includes loaded skill markdown. That works fine for narrow patterns like "ignore previous instructions" which never legitimately appear in prose. It catastrophically false- positives on command-shape patterns like `cat ~/.hermes/.env`, `authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely appear in security postmortems and runbooks as **descriptive prose** about attacks, not as actual commands. Concrete failure: the bundled `hermes-agent-dev` skill contains a security postmortem section saying "the attacker could just `cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill was silently blocked with `Blocked: prompt matches threat pattern 'read_secrets'`. All 11 scout jobs failed for weeks. Fix: split the scanner into two tiers and route by context: - `_scan_cron_prompt` (strict, unchanged behavior) runs against the small user-authored cron prompt at create/update and as a runtime defense-in-depth when no skills are attached. A legit user prompt has no business saying `cat .env`, so the strict patterns still apply there. - `_scan_cron_skill_assembled` (new, looser) runs against the assembled prompt when skills are attached. It only catches unambiguous prompt-injection directives ("ignore previous instructions", "disregard your rules", "system prompt override", "do not tell the user") plus invisible-unicode markers. Command- shape patterns are dropped because they false-positive on prose. This is defense-in-depth, not the only line of defense. Skill bodies are already scanned at install time by `skills_guard.py`; the runtime cron scan exists purely as a tripwire for an obvious injection directive surviving a malicious install. Catching prose mentions of commands was never the goal of NousResearch#3968 — the test that planted a skill containing `cat ~/.hermes/.env` was the wrong shape of test for the threat model. Tests: - `_scan_cron_prompt` strict behavior preserved (56 existing tests unchanged: bare `cat .env`, `rm -rf /`, etc. still block). - New `TestScanCronSkillAssembled` class verifies the looser scanner: injection / disregard / system-override / do-not-tell-the-user / invisible-unicode still block; descriptive prose about attack commands is allowed; GitHub auth-header allowlist still works. - `test_skill_with_env_exfil_payload_raises` (planted `cat .env` in skill body) replaced with `test_skill_with_env_exfil_command _in_prose_is_allowed` documenting the new correct behavior with the real-world postmortem-style example that triggered the bug. - All 11 originally-failing PR-scout jobs validated end-to-end via `_build_job_prompt` — assembled prompts now build successfully with the `hermes-agent-dev` skill attached. Total: 75/75 tests in cron + cronjob_tools + threat scanner pass; 544/544 across the wider cron / memory / threat-pattern surface. #AI commit#

…sitiving (NousResearch#32339) The runtime cron prompt scanner (added in NousResearch#3968 to plug the "malicious skill carrying an injection payload" gap) reuses the same critical-severity patterns as the create-time user-prompt scan against the *assembled* prompt — which includes loaded skill markdown. That works fine for narrow patterns like "ignore previous instructions" which never legitimately appear in prose. It catastrophically false- positives on command-shape patterns like `cat ~/.hermes/.env`, `authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely appear in security postmortems and runbooks as **descriptive prose** about attacks, not as actual commands. Concrete failure: the bundled `hermes-agent-dev` skill contains a security postmortem section saying "the attacker could just `cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill was silently blocked with `Blocked: prompt matches threat pattern 'read_secrets'`. All 11 scout jobs failed for weeks. Fix: split the scanner into two tiers and route by context: - `_scan_cron_prompt` (strict, unchanged behavior) runs against the small user-authored cron prompt at create/update and as a runtime defense-in-depth when no skills are attached. A legit user prompt has no business saying `cat .env`, so the strict patterns still apply there. - `_scan_cron_skill_assembled` (new, looser) runs against the assembled prompt when skills are attached. It only catches unambiguous prompt-injection directives ("ignore previous instructions", "disregard your rules", "system prompt override", "do not tell the user") plus invisible-unicode markers. Command- shape patterns are dropped because they false-positive on prose. This is defense-in-depth, not the only line of defense. Skill bodies are already scanned at install time by `skills_guard.py`; the runtime cron scan exists purely as a tripwire for an obvious injection directive surviving a malicious install. Catching prose mentions of commands was never the goal of NousResearch#3968 — the test that planted a skill containing `cat ~/.hermes/.env` was the wrong shape of test for the threat model. Tests: - `_scan_cron_prompt` strict behavior preserved (56 existing tests unchanged: bare `cat .env`, `rm -rf /`, etc. still block). - New `TestScanCronSkillAssembled` class verifies the looser scanner: injection / disregard / system-override / do-not-tell-the-user / invisible-unicode still block; descriptive prose about attack commands is allowed; GitHub auth-header allowlist still works. - `test_skill_with_env_exfil_payload_raises` (planted `cat .env` in skill body) replaced with `test_skill_with_env_exfil_command _in_prose_is_allowed` documenting the new correct behavior with the real-world postmortem-style example that triggered the bug. - All 11 originally-failing PR-scout jobs validated end-to-end via `_build_job_prompt` — assembled prompts now build successfully with the `hermes-agent-dev` skill attached. Total: 75/75 tests in cron + cronjob_tools + threat scanner pass; 544/544 across the wider cron / memory / threat-pattern surface.

teknium1 merged commit ccd8993 into main May 26, 2026
26 checks passed

teknium1 deleted the fix/cron-scanner-skill-prose-fp branch May 26, 2026 01:20

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/cron Cron scheduler and job management tool/skills Skills system (list, view, manage) labels May 26, 2026

Haderach-Ram mentioned this pull request May 26, 2026

Ecosystem Digest — 2026-05-26 Haderach-Ram/openclaw-radar#19

Open

alt-glitch mentioned this pull request Jun 10, 2026

fix(cron): don't strict-scan script-injected output in no-skills jobs #43223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cron): split scanner so skill prose stops false-positiving exfil patterns#32339

fix(cron): split scanner so skill prose stops false-positiving exfil patterns#32339
teknium1 merged 1 commit into
mainfrom
fix/cron-scanner-skill-prose-fp

teknium1 commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 26, 2026

Summary

The bug

Root cause

Fix

Threat model

Tests

Live verification

Why not just whitelist the github skill again?

Uh oh!

github-actions Bot commented May 26, 2026

🔎 Lint report: fix/cron-scanner-skill-prose-fp vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔎 Lint report: `fix/cron-scanner-skill-prose-fp` vs `origin/main`