Skip to content

fix: allow ZWJ inside emoji grapheme clusters in context/memory/skills scanners#12673

Open
witt3rd wants to merge 1 commit into
NousResearch:mainfrom
witt3rd:fix/emoji-zwj-scan-false-positive
Open

fix: allow ZWJ inside emoji grapheme clusters in context/memory/skills scanners#12673
witt3rd wants to merge 1 commit into
NousResearch:mainfrom
witt3rd:fix/emoji-zwj-scan-false-positive

Conversation

@witt3rd

@witt3rd witt3rd commented Apr 19, 2026

Copy link
Copy Markdown

Problem

The invisible-unicode blocklist in _scan_context_content (and siblings in memory_tool and skills_guard) treats ZWJ (U+200D) as categorically malicious. But ZWJ is a required component of emoji grapheme clusters — any gendered emoji (🧙‍♂️, 👩‍⚕️, 🏃‍♀️), family emoji (👨‍👩‍👧), rainbow flag (🏳️‍🌈), or other multi-pictograph emoji is char + ZWJ + char [+ VS16].

Any context file (SOUL.md, AGENTS.md, .hermes.md), memory entry, or skill file containing such emoji is silently replaced with:

[BLOCKED: SOUL.md contained potential prompt injection (invisible unicode U+200D). Content not loaded.]

Relationship to #28589

Upstream PR #28589"fix(cron): allow emoji ZWJ sequences in prompts" (a salvage of the original #28164 by @outsourc-e) — fixed exactly this false positive, but only for the cron prompt scanner (_scan_cron_prompt), via a cronjob_tools-local helper _strip_legitimate_emoji_zwj.

That set the precedent — but three other scanners share the identical invisible-char blocklist and still reject legitimate emoji. This PR completes the job for those three, and factors the context check into a single shared helper rather than adding a fourth copy of the logic. tools/cronjob_tools.py is intentionally left untouched — #28589 already covers it.

Fix

New shared helper utils.find_unsafe_invisibles(content, blocklist):

  • ZWJ between two pictographic codepoints (skipping FE0E/FE0F) — allowed.
  • ZWJ elsewhere (e.g. hello‍world) — flagged.
  • All other blocklisted invisibles (ZWSP, ZWNJ, BOM, bidi overrides, word joiners) — unconditionally flagged.

Pictographic ranges checked: 1F000–1FFFF, 2600–27BF, 2300–23FF, 2B00–2BFF, plus ©, ®, , , , , , . Narrow by design — prefers false-negative (flagging a legit emoji ZWJ) over false-positive (letting a text-hiding ZWJ past).

Callers updated

  • agent/prompt_builder.py::_scan_context_content
  • tools/memory_tool.py::_scan_memory_content
  • tools/skills_guard.py::scan_file

Tests

6 new tests in tests/agent/test_prompt_builder.py::TestScanContextContent:

  • ZWJ inside 🧙‍♂️ — allowed
  • ZWJ inside multi-ZWJ 👨‍👩‍👧 — allowed
  • hello‍world — still blocked
  • Mixed legit emoji + injection ZWJ — blocked (one unsafe is enough)
  • ZWSP adjacent to emoji — still blocked (only ZWJ is context-whitelisted)
  • Existing test_invisible_unicode_blocked — still passes

221/221 tests pass across the affected modules (test_prompt_builder, test_memory_tool, test_skills_guard).

Repro

Before:

from agent.prompt_builder import _scan_context_content
content = "wizard 🧙‍♂️"  # 🧙‍♂️
_scan_context_content(content, "SOUL.md")
# → "[BLOCKED: SOUL.md contained potential prompt injection (invisible unicode U+200D). Content not loaded.]"

After: returns content unchanged.

@witt3rd witt3rd force-pushed the fix/emoji-zwj-scan-false-positive branch from 270e8d4 to dbb92aa Compare April 21, 2026 14:51
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/memory Memory tool and memory providers tool/skills Skills system (list, view, manage) comp/cron Cron scheduler and job management labels Apr 23, 2026
@witt3rd witt3rd force-pushed the fix/emoji-zwj-scan-false-positive branch 4 times, most recently from d904348 to bfa7c1b Compare April 29, 2026 16:35
@witt3rd witt3rd force-pushed the fix/emoji-zwj-scan-false-positive branch 4 times, most recently from 42b0021 to 72f532e Compare May 12, 2026 11:10
@forge-witt3rd

Copy link
Copy Markdown

Friendly nudge — this PR ships from a fork, so workflows are gated on maintainer approval and statusCheckRollup is currently empty (no CI has run yet, neither pass nor fail). Tests pass locally; output is shown in the PR description.

If a maintainer could click Approve and run workflows when convenient, the green ticks will land and reviewers will have CI signal to lean on. Happy to rebase or address review comments any time.

Thanks! ⚒️

@witt3rd witt3rd force-pushed the fix/emoji-zwj-scan-false-positive branch from 72f532e to 00d1659 Compare May 18, 2026 15:58
@witt3rd witt3rd force-pushed the fix/emoji-zwj-scan-false-positive branch from 00d1659 to 3841dc0 Compare May 21, 2026 03:52
@witt3rd witt3rd changed the title fix: allow ZWJ inside emoji grapheme clusters in context/memory/cron/skills scanners fix: allow ZWJ inside emoji grapheme clusters in context/memory/skills scanners May 21, 2026
@witt3rd witt3rd force-pushed the fix/emoji-zwj-scan-false-positive branch from 3841dc0 to 517e859 Compare May 29, 2026 07:09
…s scanners

SOUL.md, memory entries, and skill files containing emoji ZWJ sequences
(e.g. 🧙‍♂️ = 🧙 + ZWJ + ♂ + VS16) were being silently blocked as
prompt-injection attempts. ZWJ (U+200D) is in the invisible-char
blocklist for good reason — it can hide text inside benign-looking
strings — but it is also required inside emoji sequences and has no way
to hide anything harmful there.

Upstream PR NousResearch#28589 ("fix(cron): allow emoji ZWJ sequences in prompts",
a salvage of NousResearch#28164) established the precedent for this fix, but only
applied it to the cron prompt scanner via a cronjob_tools-local helper
(_strip_legitimate_emoji_zwj). The identical false positive still
affects the other three scanners that share the same invisible-char
blocklist. This PR completes the job for those three, factoring the
context check into a single shared helper instead of adding a fourth
copy of the logic.

Added shared utils.find_unsafe_invisibles() that context-checks ZWJ:
allowed between two pictographic codepoints (skipping variation
selectors), flagged everywhere else. All other invisibles in the
blocklist remain unconditionally flagged.

Callers updated:
- agent/prompt_builder.py (_scan_context_content — blocks SOUL.md et al.)
- tools/memory_tool.py (_scan_memory_content — blocks memory add/update)
- tools/skills_guard.py (scan_file — blocks skill install)

tools/cronjob_tools.py is intentionally left untouched — PR NousResearch#28589
already fixes _scan_cron_prompt.

Adds 6 tests covering:
- ZWJ inside 🧙‍♂️ (gendered emoji) — allowed
- Multi-ZWJ family emoji 👨‍👩‍👧 — allowed
- ZWJ between letters (classic injection shape) — still blocked
- Mixed legit emoji + injection ZWJ — blocked (at least one unsafe ZWJ)
- ZWSP adjacent to emoji — still blocked (only ZWJ is context-whitelisted)

221/221 tests pass across the affected test modules.

Motivation: a user SOUL.md containing 🧙‍♂️ was being silently blocked from
loading, with a [BLOCKED: ... invisible unicode U+200D] marker leaking
into the system prompt in place of the actual identity content. The
scan was eating its own foot on a legitimate, widely-used emoji sequence.
@witt3rd witt3rd force-pushed the fix/emoji-zwj-scan-false-positive branch from 517e859 to f647451 Compare May 29, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cron Cron scheduler and job management P2 Medium — degraded but workaround exists tool/memory Memory tool and memory providers tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants