fix: allow ZWJ (U+200D) in legitimate emoji sequences#35808
Conversation
Closes NousResearch#18581 The invisible-unicode scanner in threat_patterns.py flagged ALL U+200D (Zero Width Joiner) occurrences as potential prompt injection, which silently blocked SOUL.md and other context files containing standard emoji sequences like 🤸♀️, 👨👩👧, or ❤️🔥. Root cause: INVISIBLE_CHARS includes U+200D, and scan_for_threats() performed a simple set intersection without considering that ZWJ is the standard Unicode mechanism (TR#51) for constructing gendered, family, and skin-tone emoji sequences. Fix: mirror the proven approach from cronjob_tools.py — define emoji codepoint ranges, skip VS16 variation selectors, and check whether each ZWJ sits between emoji codepoints. Strip legitimate emoji ZWJ before the invisible-char scan so that only suspicious ZWJ (between ASCII/CJK/Latin etc.) is flagged. Changes: - tools/threat_patterns.py: add _EMOJI_CP_RANGES, _is_emoji_cp(), _zwj_in_emoji_sequence(), and _strip_legitimate_emoji_zwj(); update scan_for_threats() to strip emoji ZWJ before scanning - tests/tools/test_threat_patterns.py: add 10 new test cases covering emoji ZWJ allowlist, CJK/Latin injection blocklist, mixed content, skin-tone sequences, and edge positions
mohamedorigami-jpg
left a comment
There was a problem hiding this comment.
Nice approach with the strip-vs-validate strategy. The test coverage is thorough -- especially the mixed emoji + malicious ZWJ test and the multi-ZWJ family emoji case.
One thing I noticed: the regex currently checks that ZWJ is preceded by an emoji codepoint (via the emoji set) but doesn't validate that the full ZWJ sequence is a known emoji. That's the right call for a detector -- you want under-blocking over false positives, and unknown emoji sequences are harmless anyway.
|
@mohamedorigami-jpg Thanks for the review! You're absolutely right — validating the full ZWJ sequence against a known-emoji database would be overkill for a threat detector. The emoji-codepoint-neighbour check gives us the right balance: standard emoji sequences pass, while anything else (ASCII/CJK/Latin-adjacent ZWJ) still gets caught. Glad the test coverage looks solid. |
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary — PR #35808
Verdict: Approved ✅
Author: w1ndcn | Type: bugfix | Files: tools/threat_patterns.py, tests/tools/test_threat_patterns.py
Key Points
- Fixes a real false-positive: U+200D (ZWJ) in emoji sequences like 🤸♀️, 👨👩👧 were flagged as invisible-unicode injection.
- Clean design:
_EMOJI_CP_RANGEScovers all Unicode 16.0 ZWJ emoji sequences;_zwj_in_emoji_sequence()walks past VS16 before checking neighbours;_strip_legitimate_emoji_zwj()strips before the existing intersection logic. - Security preserved: ZWJ between ASCII, CJK, or Latin chars is still flagged. Mixed content (emoji + malicious ZWJ) catches the malicious one.
- 10 comprehensive test cases covering all edge cases.
No issues found.
Reviewed by Hermes Agent
Closes #18581
Problem
The invisible-unicode scanner in
threat_patterns.pyflags all U+200D (Zero Width Joiner) occurrences as potential prompt injection. This silently blocks SOUL.md and other context files that contain standard emoji sequences like 🤸♀️, 👨👩👧, or ❤️🔥, replacing them with[BLOCKED: SOUL.md contained potential prompt injection (invisible unicode U+200D)].ZWJ is the standard Unicode mechanism (TR#51) for constructing gendered, family, and skin-tone emoji — not an injection vector when used in emoji sequences.
Root Cause
INVISIBLE_CHARSincludes U+200D, andscan_for_threats()performs a simplechar_set & INVISIBLE_CHARSintersection without considering whether the ZWJ sits inside a legitimate emoji grapheme cluster.Fix
Mirror the proven approach already used in
cronjob_tools.py— define emoji codepoint ranges, skip VS16 variation selectors, and check whether each ZWJ sits between emoji codepoints. Strip legitimate emoji ZWJ before the invisible-char scan so that only suspicious ZWJ (between ASCII / CJK / Latin etc.) is flagged.Key design decisions
_EMOJI_CP_RANGES): covers SMP pictographic ranges, Miscellaneous Symbols, Dingbats, regional indicators, skin tone modifiers, and gender symbols — sufficient for all ZWJ sequences defined inemoji-sequences.txtas of Unicode 16.0._zwj_in_emoji_sequence()walks past U+FE0F variation selectors before checking neighbours, matching real emoji encoding (e.g.🏳️🌈is🏳 + FE0F + 200D + 🌈)._strip_legitimate_emoji_zwj()removes legitimate ZWJ from a copy of the content, so the existingINVISIBLE_CHARSintersection logic works unchanged.Security preserved
hel\u200dlo(ASCII injection)忽略\u200d指令(CJK injection)café\u200dïgnore(Latin injection)Changes
tools/threat_patterns.py: add_EMOJI_CP_RANGES,_is_emoji_cp(),_zwj_in_emoji_sequence(),_strip_legitimate_emoji_zwj(); updatescan_for_threats()to strip emoji ZWJ before scanningtests/tools/test_threat_patterns.py: add 10 new test cases (47 total, all passing)