Skip to content

fix: cron prompt injection scanner bypass for multi-word variants#63

Merged
teknium1 merged 1 commit into
NousResearch:mainfrom
0xbyt4:fix/cron-prompt-injection-bypass
Feb 27, 2026
Merged

fix: cron prompt injection scanner bypass for multi-word variants#63
teknium1 merged 1 commit into
NousResearch:mainfrom
0xbyt4:fix/cron-prompt-injection-bypass

Conversation

@0xbyt4

@0xbyt4 0xbyt4 commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fix prompt injection bypass in _scan_cron_prompt regex
  • Add 8 regression tests for multi-word injection variants

Bug

The regex ignore\s+(previous|all|above|prior)\s+instructions only matches when there is exactly one keyword between "ignore" and "instructions". Multi-word variants bypass the scanner:

Input Expected Actual (before fix)
ignore previous instructions Blocked Blocked
Ignore ALL prior instructions Blocked Not blocked
ignore all previous instructions Blocked Not blocked
ignore the above instructions Blocked Not blocked

Root cause: The alternation (previous|all|above|prior) consumes "ALL", then \s+instructions tries to match "prior" and fails.

Fix

Allow optional extra words before and after the keyword alternation:

- (r'ignore\s+(previous|all|above|prior)\s+instructions', "prompt_injection"),
+ (r'ignore\s+(?:\w+\s+)*(?:previous|all|above|prior)\s+(?:\w+\s+)*instructions', "prompt_injection"),

Test plan

  • 8 regression tests pass (multi-word variants, case insensitive, false positive checks)
  • All existing cron tests still pass
  • Full suite: 299 passed, 0 failed

The regex `ignore\s+(previous|all|above|prior)\s+instructions` only
allowed ONE word between "ignore" and "instructions". Multi-word
variants like "Ignore ALL prior instructions" bypassed the scanner
because "ALL" matched the alternation but then `\s+instructions`
failed to match "prior".

Fix: use `(?:\w+\s+)*` groups to allow optional extra words before
and after the keyword alternation.
@teknium1 teknium1 merged commit 1522718 into NousResearch:main Feb 27, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…ion-bypass

fix: cron prompt injection scanner bypass for multi-word variants
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…ion-bypass

fix: cron prompt injection scanner bypass for multi-word variants
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ion-bypass

fix: cron prompt injection scanner bypass for multi-word variants
jarvis-stark-ops pushed a commit to 1Team-Engineering/hermes-agent that referenced this pull request Jun 10, 2026
Adds two new completion gates that fire alongside the Part 1/2 gates.

Closes hermes-jarvis#63 (PR-existence verification)
Closes hermes-jarvis#32 (doc-drift check)
Context: hermes-jarvis#61 (bootstrap-paradox case study)

## NousResearch#63 — verify_pr_urls_exist

When a verdict result or summary contains a GitHub PR URL pattern,
the dispatcher runs `gh pr view <url> --json number` to verify each
URL resolves. Phantom URLs (404 with "Not Found" / "Could not
resolve" / "no pull request" in stderr) reject the completion.

Indeterminate cases (gh missing, network error, unauthenticated) fall
open — workers can still complete in offline / broken-gh envs without
being trapped.

Catches the 2026-06-09 Tchalla case (hermes-jarvis#61): release-gate
reviewer blocked with "cannot run gh pr diff 42" on PR NousResearch#42 that
didn't exist. With this gate, his completion would have surfaced the
phantom URL specifically, prompting him to surface the real cause.

Opt-out: `metadata.x_phantom_pr_ok` with ≥20-char string reason.

## NousResearch#32 — verify_doc_drift

For tasks whose tenant slug encodes a version
(`marvel-swarm-vN-N-test`), the gate scans `README.md` / `README` in
the workspace for older `vX.Y` mentions outside a history-style
heading (`## History`, `## Older versions`, `## Previous versions`,
`## Archive`). Stale mentions reject the completion.

CHANGELOG.md is intentionally skipped — older versions are expected
there by definition.

Catches the 2026-06-09 agent-dashboard PR #1 case (hermes-jarvis#61):
README still said "v6.2 Marvel swarm test target" while the chain
was v6.6. No reviewer flagged it.

Opt-out: `metadata.x_doc_drift_ok` with ≥20-char string reason.

## Tests

19 new tests added to `test_kanban_completion_gates.py`:
- TestPRExistence (8) — no PR URL skipped, real passes, phantom
  rejects, mixed real+phantom flags only phantom, indeterminate falls
  open, summary scanned, dedup, opt-out
- TestDocDrift (10) — non-versioned tenant skips, no README skips,
  stale README rejects, current README passes, History section
  excused, CHANGELOG file excused, higher version not stale, scratch
  skipped, opt-out

83 passed in test_kanban_completion_gates.py (up from 64). Zero
regressions on adjacent paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jarvis-stark-ops added a commit to 1Team-Engineering/hermes-agent that referenced this pull request Jun 10, 2026
…earch#32)

- `verify_pr_urls_exist` (closes hermes-jarvis#63) — scans verdict
  text + summary for GitHub PR URLs and runs `gh pr view` per URL.
  Phantom URLs (404) reject; indeterminate (gh missing / network)
  falls open. Strict 404 classification excludes DNS/network token
  patterns so "could not resolve host" stays indeterminate.
- `verify_doc_drift` (closes hermes-jarvis#32) — for tasks whose
  tenant slug encodes a version (marvel-swarm-vN-N-test), scans
  README.md/README for older vX.Y mentions outside a history section.
  Depth-aware section-tracking so `## History\n### v6.2 details`
  correctly excuses the subsection.

Opt-outs: x_phantom_pr_ok / x_doc_drift_ok (≥20-char string reasons).

Context: hermes-jarvis#61.

90 tests pass after two self-review passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants