Skip to content

fix(cron): split scanner so skill prose stops false-positiving exfil patterns#32339

Merged
teknium1 merged 1 commit into
mainfrom
fix/cron-scanner-skill-prose-fp
May 26, 2026
Merged

fix(cron): split scanner so skill prose stops false-positiving exfil patterns#32339
teknium1 merged 1 commit into
mainfrom
fix/cron-scanner-skill-prose-fp

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Splits the cron prompt scanner into two tiers so legitimate skill markdown stops false-positiving on command-shape prose. Fixes the silent killing of every cron job that loads a skill containing security postmortems or runbooks.

The bug

Every PR-scout cron job (kilocode, codex, opencode, gemini-cli, cline, roocode, aider, openclaw, nanoclaw, ironclaw, claude-code — 11 jobs total) has been silently erroring out with:

Blocked: prompt matches threat pattern 'read_secrets'. Cron prompts must not contain injection or exfiltration payloads.

All 11 attach the bundled hermes-agent-dev skill. That skill contains a security postmortem section describing a previous vulnerability in prose:

namespace isolation blocked /proc/environ secret recovery, but the same child process could just cat ~/.hermes/.env. Locking the window while the door is open.

The runtime scanner added in #3968 doesn't distinguish "prose describing an attack" from "an actual exfiltration command." It just regex-matches cat\s+[^\n]*\.env against the assembled prompt and blocks.

This isn't theoretical — every PR-scout cron job has been broken for weeks. last_status: error on all 11.

Root cause

_scan_cron_prompt was designed for the small user-authored cron prompt at create/update time. Strict matching is correct there: a legit cron prompt has no reason to say cat .env or rm -rf /. #3968 then reused the same patterns to scan the assembled prompt at runtime — which includes loaded skill markdown. Skill bodies are large markdown documents that legitimately discuss attack patterns in security docs.

The strict pattern set has no way to tell prose from a command via regex.

Fix

Two pattern tiers, routed by context:

  • _scan_cron_prompt (strict, unchanged) — runs at create/update on the small user-authored prompt, and as runtime defense-in-depth when no skills are attached. All existing strict patterns preserved.

  • _scan_cron_skill_assembled (new, looser) — runs at runtime on the assembled prompt when skills are attached. Catches only patterns whose phrasing is unambiguous in any context:

    • prompt_injection (ignore previous/all/above/prior instructions)
    • disregard_rules (disregard your/all/any instructions/rules/guidelines)
    • sys_prompt_override (system prompt override)
    • deception_hide (do not tell the user)
    • invisible-unicode markers

    Command-shape patterns (read_secrets, ssh_backdoor, sudoers_mod, destructive_root_rm) are dropped because they false-positive on legitimate prose.

Skill bodies are already scanned at install time by skills_guard.py. The runtime cron scan is purely a tripwire for an obvious prompt-injection directive surviving a malicious skill install — not a substitute for install-time vetting.

Threat model

Still blocked:

  • Skill body containing literal ignore all previous instructionsprompt_injection
  • Skill body containing disregard your guidelinesdisregard_rules
  • Skill body containing do not tell the userdeception_hide
  • Skill body containing invisible unicode markers — invisible_unicode_*
  • User cron prompt containing any strict pattern (incl. command-shape ones) at create or runtime

Now allowed (was the bug):

  • Skill prose describing attack commands ("the attacker could just cat ~/.hermes/.env") in postmortems, runbooks, security docs

Tests

tests/tools/test_cronjob_tools.py ........................................................ [56 tests, all passing]
tests/cron/test_cron_prompt_injection_skill.py ........................... [11 tests, all passing]
tests/tools/test_cron_prompt_injection.py ........ [8 tests, all passing]

Wider surface verified: 544/544 across tests/cron/, tests/tools/test_threat_patterns.py, tests/tools/test_memory_tool.py, tests/tools/test_cronjob_tools.py.

Changes:

  • Strict scanner tests (TestScanCronPrompt, 24 tests) preserved as-is — bare command patterns still block in user prompts.
  • New TestScanCronSkillAssembled class (6 tests) covers the looser scanner: clean passes, injection still blocks, invisible unicode still blocks, emoji ZWJ still allowed, descriptive attack-command prose now allowed, GitHub auth-header still allowed.
  • test_skill_with_env_exfil_payload_raises (which planted cat ~/.hermes/.env in a skill body and expected a block) replaced with test_skill_with_env_exfil_command_in_prose_is_allowed — same shape but documents the new correct behavior using the actual postmortem prose that caused the production bug.

Live verification

Confirmed end-to-end against the real broken job:

Job: kilocode-pr-scout, prompt-len=4747, skills=['hermes-agent-dev']
SUCCESS — assembled prompt length: 107041
Contains skill content marker: True

Was previously raising CronPromptInjectionBlocked: read_secrets and silently writing a "BLOCKED" output file every Monday.

Why not just whitelist the github skill again?

PR #3968 was followed by 783d117 ("avoid github skill false positives in scanner") — a narrow allowlist for curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com. This PR is the second wave of the same bug, this time hitting a different pattern. Rather than playing whack-a-mole with one allowlist per false-positive, this fix addresses the underlying conflation of "strict patterns for user prompts" with "patterns for skill markdown."

…sitiving

The runtime cron prompt scanner (added in #3968 to plug the
"malicious skill carrying an injection payload" gap) reuses the same
critical-severity patterns as the create-time user-prompt scan against
the *assembled* prompt — which includes loaded skill markdown.

That works fine for narrow patterns like "ignore previous instructions"
which never legitimately appear in prose. It catastrophically false-
positives on command-shape patterns like `cat ~/.hermes/.env`,
`authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely
appear in security postmortems and runbooks as **descriptive prose**
about attacks, not as actual commands.

Concrete failure: the bundled `hermes-agent-dev` skill contains a
security postmortem section saying "the attacker could just
`cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill
was silently blocked with `Blocked: prompt matches threat pattern
'read_secrets'`. All 11 scout jobs failed for weeks.

Fix: split the scanner into two tiers and route by context:

  - `_scan_cron_prompt` (strict, unchanged behavior) runs against
    the small user-authored cron prompt at create/update and as a
    runtime defense-in-depth when no skills are attached. A legit
    user prompt has no business saying `cat .env`, so the strict
    patterns still apply there.

  - `_scan_cron_skill_assembled` (new, looser) runs against the
    assembled prompt when skills are attached. It only catches
    unambiguous prompt-injection directives ("ignore previous
    instructions", "disregard your rules", "system prompt override",
    "do not tell the user") plus invisible-unicode markers. Command-
    shape patterns are dropped because they false-positive on prose.

This is defense-in-depth, not the only line of defense. Skill bodies
are already scanned at install time by `skills_guard.py`; the runtime
cron scan exists purely as a tripwire for an obvious injection
directive surviving a malicious install. Catching prose mentions of
commands was never the goal of #3968 — the test that planted a skill
containing `cat ~/.hermes/.env` was the wrong shape of test for the
threat model.

Tests:
- `_scan_cron_prompt` strict behavior preserved (56 existing tests
  unchanged: bare `cat .env`, `rm -rf /`, etc. still block).
- New `TestScanCronSkillAssembled` class verifies the looser scanner:
  injection / disregard / system-override / do-not-tell-the-user /
  invisible-unicode still block; descriptive prose about attack
  commands is allowed; GitHub auth-header allowlist still works.
- `test_skill_with_env_exfil_payload_raises` (planted `cat .env`
  in skill body) replaced with `test_skill_with_env_exfil_command
  _in_prose_is_allowed` documenting the new correct behavior with
  the real-world postmortem-style example that triggered the bug.
- All 11 originally-failing PR-scout jobs validated end-to-end via
  `_build_job_prompt` — assembled prompts now build successfully
  with the `hermes-agent-dev` skill attached.

Total: 75/75 tests in cron + cronjob_tools + threat scanner pass;
544/544 across the wider cron / memory / threat-pattern surface.
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: fix/cron-scanner-skill-prose-fp vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9355 on HEAD, 9355 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4953 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@teknium1 teknium1 merged commit ccd8993 into main May 26, 2026
26 checks passed
@teknium1 teknium1 deleted the fix/cron-scanner-skill-prose-fp branch May 26, 2026 01:20
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/cron Cron scheduler and job management tool/skills Skills system (list, view, manage) labels May 26, 2026
bridge25 pushed a commit to bridge25/hermes-agent that referenced this pull request May 27, 2026
…sitiving (NousResearch#32339)

The runtime cron prompt scanner (added in NousResearch#3968 to plug the
"malicious skill carrying an injection payload" gap) reuses the same
critical-severity patterns as the create-time user-prompt scan against
the *assembled* prompt — which includes loaded skill markdown.

That works fine for narrow patterns like "ignore previous instructions"
which never legitimately appear in prose. It catastrophically false-
positives on command-shape patterns like `cat ~/.hermes/.env`,
`authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely
appear in security postmortems and runbooks as **descriptive prose**
about attacks, not as actual commands.

Concrete failure: the bundled `hermes-agent-dev` skill contains a
security postmortem section saying "the attacker could just
`cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill
was silently blocked with `Blocked: prompt matches threat pattern
'read_secrets'`. All 11 scout jobs failed for weeks.

Fix: split the scanner into two tiers and route by context:

  - `_scan_cron_prompt` (strict, unchanged behavior) runs against
    the small user-authored cron prompt at create/update and as a
    runtime defense-in-depth when no skills are attached. A legit
    user prompt has no business saying `cat .env`, so the strict
    patterns still apply there.

  - `_scan_cron_skill_assembled` (new, looser) runs against the
    assembled prompt when skills are attached. It only catches
    unambiguous prompt-injection directives ("ignore previous
    instructions", "disregard your rules", "system prompt override",
    "do not tell the user") plus invisible-unicode markers. Command-
    shape patterns are dropped because they false-positive on prose.

This is defense-in-depth, not the only line of defense. Skill bodies
are already scanned at install time by `skills_guard.py`; the runtime
cron scan exists purely as a tripwire for an obvious injection
directive surviving a malicious install. Catching prose mentions of
commands was never the goal of NousResearch#3968 — the test that planted a skill
containing `cat ~/.hermes/.env` was the wrong shape of test for the
threat model.

Tests:
- `_scan_cron_prompt` strict behavior preserved (56 existing tests
  unchanged: bare `cat .env`, `rm -rf /`, etc. still block).
- New `TestScanCronSkillAssembled` class verifies the looser scanner:
  injection / disregard / system-override / do-not-tell-the-user /
  invisible-unicode still block; descriptive prose about attack
  commands is allowed; GitHub auth-header allowlist still works.
- `test_skill_with_env_exfil_payload_raises` (planted `cat .env`
  in skill body) replaced with `test_skill_with_env_exfil_command
  _in_prose_is_allowed` documenting the new correct behavior with
  the real-world postmortem-style example that triggered the bug.
- All 11 originally-failing PR-scout jobs validated end-to-end via
  `_build_job_prompt` — assembled prompts now build successfully
  with the `hermes-agent-dev` skill attached.

Total: 75/75 tests in cron + cronjob_tools + threat scanner pass;
544/544 across the wider cron / memory / threat-pattern surface.
mathias3 pushed a commit to mathias3/hermes-agent that referenced this pull request May 28, 2026
…sitiving (NousResearch#32339)

The runtime cron prompt scanner (added in NousResearch#3968 to plug the
"malicious skill carrying an injection payload" gap) reuses the same
critical-severity patterns as the create-time user-prompt scan against
the *assembled* prompt — which includes loaded skill markdown.

That works fine for narrow patterns like "ignore previous instructions"
which never legitimately appear in prose. It catastrophically false-
positives on command-shape patterns like `cat ~/.hermes/.env`,
`authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely
appear in security postmortems and runbooks as **descriptive prose**
about attacks, not as actual commands.

Concrete failure: the bundled `hermes-agent-dev` skill contains a
security postmortem section saying "the attacker could just
`cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill
was silently blocked with `Blocked: prompt matches threat pattern
'read_secrets'`. All 11 scout jobs failed for weeks.

Fix: split the scanner into two tiers and route by context:

  - `_scan_cron_prompt` (strict, unchanged behavior) runs against
    the small user-authored cron prompt at create/update and as a
    runtime defense-in-depth when no skills are attached. A legit
    user prompt has no business saying `cat .env`, so the strict
    patterns still apply there.

  - `_scan_cron_skill_assembled` (new, looser) runs against the
    assembled prompt when skills are attached. It only catches
    unambiguous prompt-injection directives ("ignore previous
    instructions", "disregard your rules", "system prompt override",
    "do not tell the user") plus invisible-unicode markers. Command-
    shape patterns are dropped because they false-positive on prose.

This is defense-in-depth, not the only line of defense. Skill bodies
are already scanned at install time by `skills_guard.py`; the runtime
cron scan exists purely as a tripwire for an obvious injection
directive surviving a malicious install. Catching prose mentions of
commands was never the goal of NousResearch#3968 — the test that planted a skill
containing `cat ~/.hermes/.env` was the wrong shape of test for the
threat model.

Tests:
- `_scan_cron_prompt` strict behavior preserved (56 existing tests
  unchanged: bare `cat .env`, `rm -rf /`, etc. still block).
- New `TestScanCronSkillAssembled` class verifies the looser scanner:
  injection / disregard / system-override / do-not-tell-the-user /
  invisible-unicode still block; descriptive prose about attack
  commands is allowed; GitHub auth-header allowlist still works.
- `test_skill_with_env_exfil_payload_raises` (planted `cat .env`
  in skill body) replaced with `test_skill_with_env_exfil_command
  _in_prose_is_allowed` documenting the new correct behavior with
  the real-world postmortem-style example that triggered the bug.
- All 11 originally-failing PR-scout jobs validated end-to-end via
  `_build_job_prompt` — assembled prompts now build successfully
  with the `hermes-agent-dev` skill attached.

Total: 75/75 tests in cron + cronjob_tools + threat scanner pass;
544/544 across the wider cron / memory / threat-pattern surface.
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
…sitiving (NousResearch#32339)

The runtime cron prompt scanner (added in NousResearch#3968 to plug the
"malicious skill carrying an injection payload" gap) reuses the same
critical-severity patterns as the create-time user-prompt scan against
the *assembled* prompt — which includes loaded skill markdown.

That works fine for narrow patterns like "ignore previous instructions"
which never legitimately appear in prose. It catastrophically false-
positives on command-shape patterns like `cat ~/.hermes/.env`,
`authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely
appear in security postmortems and runbooks as **descriptive prose**
about attacks, not as actual commands.

Concrete failure: the bundled `hermes-agent-dev` skill contains a
security postmortem section saying "the attacker could just
`cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill
was silently blocked with `Blocked: prompt matches threat pattern
'read_secrets'`. All 11 scout jobs failed for weeks.

Fix: split the scanner into two tiers and route by context:

  - `_scan_cron_prompt` (strict, unchanged behavior) runs against
    the small user-authored cron prompt at create/update and as a
    runtime defense-in-depth when no skills are attached. A legit
    user prompt has no business saying `cat .env`, so the strict
    patterns still apply there.

  - `_scan_cron_skill_assembled` (new, looser) runs against the
    assembled prompt when skills are attached. It only catches
    unambiguous prompt-injection directives ("ignore previous
    instructions", "disregard your rules", "system prompt override",
    "do not tell the user") plus invisible-unicode markers. Command-
    shape patterns are dropped because they false-positive on prose.

This is defense-in-depth, not the only line of defense. Skill bodies
are already scanned at install time by `skills_guard.py`; the runtime
cron scan exists purely as a tripwire for an obvious injection
directive surviving a malicious install. Catching prose mentions of
commands was never the goal of NousResearch#3968 — the test that planted a skill
containing `cat ~/.hermes/.env` was the wrong shape of test for the
threat model.

Tests:
- `_scan_cron_prompt` strict behavior preserved (56 existing tests
  unchanged: bare `cat .env`, `rm -rf /`, etc. still block).
- New `TestScanCronSkillAssembled` class verifies the looser scanner:
  injection / disregard / system-override / do-not-tell-the-user /
  invisible-unicode still block; descriptive prose about attack
  commands is allowed; GitHub auth-header allowlist still works.
- `test_skill_with_env_exfil_payload_raises` (planted `cat .env`
  in skill body) replaced with `test_skill_with_env_exfil_command
  _in_prose_is_allowed` documenting the new correct behavior with
  the real-world postmortem-style example that triggered the bug.
- All 11 originally-failing PR-scout jobs validated end-to-end via
  `_build_job_prompt` — assembled prompts now build successfully
  with the `hermes-agent-dev` skill attached.

Total: 75/75 tests in cron + cronjob_tools + threat scanner pass;
544/544 across the wider cron / memory / threat-pattern surface.
#AI commit#
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
…sitiving (NousResearch#32339)

The runtime cron prompt scanner (added in NousResearch#3968 to plug the
"malicious skill carrying an injection payload" gap) reuses the same
critical-severity patterns as the create-time user-prompt scan against
the *assembled* prompt — which includes loaded skill markdown.

That works fine for narrow patterns like "ignore previous instructions"
which never legitimately appear in prose. It catastrophically false-
positives on command-shape patterns like `cat ~/.hermes/.env`,
`authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely
appear in security postmortems and runbooks as **descriptive prose**
about attacks, not as actual commands.

Concrete failure: the bundled `hermes-agent-dev` skill contains a
security postmortem section saying "the attacker could just
`cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill
was silently blocked with `Blocked: prompt matches threat pattern
'read_secrets'`. All 11 scout jobs failed for weeks.

Fix: split the scanner into two tiers and route by context:

  - `_scan_cron_prompt` (strict, unchanged behavior) runs against
    the small user-authored cron prompt at create/update and as a
    runtime defense-in-depth when no skills are attached. A legit
    user prompt has no business saying `cat .env`, so the strict
    patterns still apply there.

  - `_scan_cron_skill_assembled` (new, looser) runs against the
    assembled prompt when skills are attached. It only catches
    unambiguous prompt-injection directives ("ignore previous
    instructions", "disregard your rules", "system prompt override",
    "do not tell the user") plus invisible-unicode markers. Command-
    shape patterns are dropped because they false-positive on prose.

This is defense-in-depth, not the only line of defense. Skill bodies
are already scanned at install time by `skills_guard.py`; the runtime
cron scan exists purely as a tripwire for an obvious injection
directive surviving a malicious install. Catching prose mentions of
commands was never the goal of NousResearch#3968 — the test that planted a skill
containing `cat ~/.hermes/.env` was the wrong shape of test for the
threat model.

Tests:
- `_scan_cron_prompt` strict behavior preserved (56 existing tests
  unchanged: bare `cat .env`, `rm -rf /`, etc. still block).
- New `TestScanCronSkillAssembled` class verifies the looser scanner:
  injection / disregard / system-override / do-not-tell-the-user /
  invisible-unicode still block; descriptive prose about attack
  commands is allowed; GitHub auth-header allowlist still works.
- `test_skill_with_env_exfil_payload_raises` (planted `cat .env`
  in skill body) replaced with `test_skill_with_env_exfil_command
  _in_prose_is_allowed` documenting the new correct behavior with
  the real-world postmortem-style example that triggered the bug.
- All 11 originally-failing PR-scout jobs validated end-to-end via
  `_build_job_prompt` — assembled prompts now build successfully
  with the `hermes-agent-dev` skill attached.

Total: 75/75 tests in cron + cronjob_tools + threat scanner pass;
544/544 across the wider cron / memory / threat-pattern surface.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…sitiving (NousResearch#32339)

The runtime cron prompt scanner (added in NousResearch#3968 to plug the
"malicious skill carrying an injection payload" gap) reuses the same
critical-severity patterns as the create-time user-prompt scan against
the *assembled* prompt — which includes loaded skill markdown.

That works fine for narrow patterns like "ignore previous instructions"
which never legitimately appear in prose. It catastrophically false-
positives on command-shape patterns like `cat ~/.hermes/.env`,
`authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely
appear in security postmortems and runbooks as **descriptive prose**
about attacks, not as actual commands.

Concrete failure: the bundled `hermes-agent-dev` skill contains a
security postmortem section saying "the attacker could just
`cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill
was silently blocked with `Blocked: prompt matches threat pattern
'read_secrets'`. All 11 scout jobs failed for weeks.

Fix: split the scanner into two tiers and route by context:

  - `_scan_cron_prompt` (strict, unchanged behavior) runs against
    the small user-authored cron prompt at create/update and as a
    runtime defense-in-depth when no skills are attached. A legit
    user prompt has no business saying `cat .env`, so the strict
    patterns still apply there.

  - `_scan_cron_skill_assembled` (new, looser) runs against the
    assembled prompt when skills are attached. It only catches
    unambiguous prompt-injection directives ("ignore previous
    instructions", "disregard your rules", "system prompt override",
    "do not tell the user") plus invisible-unicode markers. Command-
    shape patterns are dropped because they false-positive on prose.

This is defense-in-depth, not the only line of defense. Skill bodies
are already scanned at install time by `skills_guard.py`; the runtime
cron scan exists purely as a tripwire for an obvious injection
directive surviving a malicious install. Catching prose mentions of
commands was never the goal of NousResearch#3968 — the test that planted a skill
containing `cat ~/.hermes/.env` was the wrong shape of test for the
threat model.

Tests:
- `_scan_cron_prompt` strict behavior preserved (56 existing tests
  unchanged: bare `cat .env`, `rm -rf /`, etc. still block).
- New `TestScanCronSkillAssembled` class verifies the looser scanner:
  injection / disregard / system-override / do-not-tell-the-user /
  invisible-unicode still block; descriptive prose about attack
  commands is allowed; GitHub auth-header allowlist still works.
- `test_skill_with_env_exfil_payload_raises` (planted `cat .env`
  in skill body) replaced with `test_skill_with_env_exfil_command
  _in_prose_is_allowed` documenting the new correct behavior with
  the real-world postmortem-style example that triggered the bug.
- All 11 originally-failing PR-scout jobs validated end-to-end via
  `_build_job_prompt` — assembled prompts now build successfully
  with the `hermes-agent-dev` skill attached.

Total: 75/75 tests in cron + cronjob_tools + threat scanner pass;
544/544 across the wider cron / memory / threat-pattern surface.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management P1 High — major feature broken, no workaround tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants