fix(cron): don't strict-scan script-injected output in no-skills jobs#43223
Conversation
The runtime assembled-prompt scan (#3968 lineage) selected its pattern tier on has_skills alone. A script-driven, no-skills job injects its script's stdout into the prompt, and that blob was scanned with the STRICT user-prompt pattern set — so any command-shape string in the data feed (e.g. a triage bot ingesting a bug report that quotes `rm -rf /`) hard-blocked the job on every tick. Script output and context_from output are runtime DATA produced by operator-authored code — the same trust class as install-vetted skill markdown, not a user-authored directive prompt. Select the scan tier by what the assembled prompt CONTAINS: when it includes skill content OR injected data, use the looser _scan_cron_skill_assembled set (keeps unambiguous injection directives, drops command-shape patterns, sanitizes invisible unicode instead of blocking). Defense-in-depth is preserved: - The raw user prompt is still strict-scanned at create/update (api_server paths untouched) AND re-scanned strict at runtime even when the looser tier was selected for the data blob. - Plain no-script/no-skills jobs keep the strict scan on the whole assembled prompt. - Injection directives arriving via script stdout still block. Rejected alternative: removing destructive_root_rm from the strict set or a per-job skip_injection_scan flag — both weaken the guard globally.
🔎 Lint report:
|
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved
Looks Good
- Clean fix: adds HERMES_SCRIPT_OUTPUT_STRICT_SCAN to control strict scanning of script-injected output in no-skills cron jobs.
- Defaults to False to avoid false positives in script output, which is the safe default.
- 2 files changed, well-scoped.
- No security concerns.
Reviewed by Hermes Agent
TriageCategory: Repro: confirmed live, not theoretical — a production triage cron ( Genealogy — third false-positive round of the same design flawThis scanner has a recurring bug class: a new content source gets fed to a scan tier calibrated for a different trust class. Each round, a different source; each fix, the same shape.
Root cause of round 3 specifically: #5082 (cron pre-run script injection, merged 2026-04-04) and #32339 (the tier split, merged 2026-05-26) never met. The tier split only considered Precedent for reviewers
Related open work (complementary, no conflicts)
|
…NousResearch#43223) The runtime assembled-prompt scan (NousResearch#3968 lineage) selected its pattern tier on has_skills alone. A script-driven, no-skills job injects its script's stdout into the prompt, and that blob was scanned with the STRICT user-prompt pattern set — so any command-shape string in the data feed (e.g. a triage bot ingesting a bug report that quotes `rm -rf /`) hard-blocked the job on every tick. Script output and context_from output are runtime DATA produced by operator-authored code — the same trust class as install-vetted skill markdown, not a user-authored directive prompt. Select the scan tier by what the assembled prompt CONTAINS: when it includes skill content OR injected data, use the looser _scan_cron_skill_assembled set (keeps unambiguous injection directives, drops command-shape patterns, sanitizes invisible unicode instead of blocking). Defense-in-depth is preserved: - The raw user prompt is still strict-scanned at create/update (api_server paths untouched) AND re-scanned strict at runtime even when the looser tier was selected for the data blob. - Plain no-script/no-skills jobs keep the strict scan on the whole assembled prompt. - Injection directives arriving via script stdout still block. Rejected alternative: removing destructive_root_rm from the strict set or a per-job skip_injection_scan flag — both weaken the guard globally.
TL;DR
Cron jobs that use a
script(orcontext_from) inject collected data into the prompt. That data was being scanned with the strict pattern set meant for the user-typed prompt — so a feed that merely quotes a dangerous command (e.g. a triage bot ingesting a bug report containingrm -rf /) hard-blocked the job on every tick, forever. This PR scans injected data with the looser tier that skill content already uses, while keeping the user prompt strict.Behavior change
The runtime scanner (
_scan_assembled_cron_prompt, added in #21350/#3968) checks the assembled prompt before each tick and picks one of two pattern tiers:_scan_cron_prompt): injection directives + command shapes (rm -rf /,cat .env, exfil-curl, …)_scan_cron_skill_assembled, from fix(cron): split scanner so skill prose stops false-positiving exfil patterns #32339): injection directives only; command shapes dropped; invisible unicode stripped instead of blockedWhat gets which tier, per content type:
skills: [...]on the job)context_fromoutputThe last row is new defense-in-depth: when the loose tier is selected because of injected data, the raw user prompt is additionally re-scanned strict at runtime — so the user-authored surface keeps the full guarantee (it was already strict-scanned at create/update; that path is untouched).
Why the bug existed
Tier selection was keyed on
has_skillsalone. #5082 (script injection, Apr) and #32339 (the tier split, May) never met: a script-driven no-skills job falls into thehas_skills=Falsebranch, so its script stdout inherited user-prompt strictness. The two features composed into a job class the scanner design never considered.has_skillswas the wrong axis. The right axis is whether the assembled text contains runtime-injected content (data/skill markdown — where command-shape strings are expected and benign) vs. is purely the user's directive prompt (whererm -rf /is a smoking gun). Script output is produced by operator-authored code — the same trust class as install-vetted skill markdown.Live incident
A production triage cron (script feed of GitHub issues, no skills, 5-min tick) ran clean for hours, then got BLOCKED (
destructive_root_rm) on every tick the moment an open security issue quoting the root-delete command entered its queue. Its corpus has 112 such open items — bug reports pasting dangerous commands are normal data for this job class. Verified end-to-end post-fix: same job, same offending rows → prompt builds, agent runs.What is NOT weakened
api_server.py, cronjob tool) — untouched.ignore previous instructions, …) arriving via script stdout — still blocked by the loose tier._CRON_THREAT_PATTERNS— unmodified.Rejected alternatives: removing
destructive_root_rmfrom the strict set, or a per-jobskip_injection_scanflag — both weaken the guard globally, and the flag would itself be settable by an injected agent (cron jobs are API/tool-creatable). Skipping scans for trusted content classes was already rejected in #23470; tier-based scanning is the accepted precedent (#32339, #22605).Changes
cron/scheduler.py:_build_job_prompttrackshas_injected_data(script stdout — both success and error paths — andcontext_fromoutput) and passes it plus the rawuser_promptto_scan_assembled_cron_prompt, which now selects the tier onhas_skills OR has_injected_dataand strict-rescans the user prompt on the data-only path.tests/cron/test_cron_prompt_injection_skill.py: 7 new cases (payload literals built at runtime so the test file never contains them) — command shapes in script stdout / failed-script output /context_frompass; injection directive in stdout still blocks; dirty user prompt on a script job still blocks; plain-prompt path stays strict; invisible unicode in feed data sanitized.