fix: allow ZWJ (U+200D) in legitimate emoji sequences by w1ndcn · Pull Request #35808 · NousResearch/hermes-agent

w1ndcn · 2026-05-31T10:05:57Z

Problem

The invisible-unicode scanner in threat_patterns.py flags all U+200D (Zero Width Joiner) occurrences as potential prompt injection. This silently blocks SOUL.md and other context files that contain standard emoji sequences like 🤸‍♀️, 👨‍👩‍👧, or ❤️‍🔥, replacing them with [BLOCKED: SOUL.md contained potential prompt injection (invisible unicode U+200D)].

ZWJ is the standard Unicode mechanism (TR#51) for constructing gendered, family, and skin-tone emoji — not an injection vector when used in emoji sequences.

Root Cause

INVISIBLE_CHARS includes U+200D, and scan_for_threats() performs a simple char_set & INVISIBLE_CHARS intersection without considering whether the ZWJ sits inside a legitimate emoji grapheme cluster.

Fix

Mirror the proven approach already used in cronjob_tools.py — define emoji codepoint ranges, skip VS16 variation selectors, and check whether each ZWJ sits between emoji codepoints. Strip legitimate emoji ZWJ before the invisible-char scan so that only suspicious ZWJ (between ASCII / CJK / Latin etc.) is flagged.

Key design decisions

Emoji range table (_EMOJI_CP_RANGES): covers SMP pictographic ranges, Miscellaneous Symbols, Dingbats, regional indicators, skin tone modifiers, and gender symbols — sufficient for all ZWJ sequences defined in emoji-sequences.txt as of Unicode 16.0.
VS16 skip: _zwj_in_emoji_sequence() walks past U+FE0F variation selectors before checking neighbours, matching real emoji encoding (e.g. 🏳️‍🌈 is 🏳 + FE0F + 200D + 🌈).
Strip-then-scan: _strip_legitimate_emoji_zwj() removes legitimate ZWJ from a copy of the content, so the existing INVISIBLE_CHARS intersection logic works unchanged.

Security preserved

Input	Result
🤸‍♀️ emoji	✅ allowed
👨‍👩‍👧 family emoji	✅ allowed
❤️‍🔥 heart-on-fire	✅ allowed
👩🏽‍💻 skin-tone emoji	✅ allowed
`hel\u200dlo` (ASCII injection)	🚫 flagged
`忽略\u200d指令` (CJK injection)	🚫 flagged
`café\u200dïgnore` (Latin injection)	🚫 flagged
ZWJ at text edges	🚫 flagged
U+200B, U+2066 etc.	🚫 still flagged

Changes

tools/threat_patterns.py: add _EMOJI_CP_RANGES, _is_emoji_cp(), _zwj_in_emoji_sequence(), _strip_legitimate_emoji_zwj(); update scan_for_threats() to strip emoji ZWJ before scanning
tests/tools/test_threat_patterns.py: add 10 new test cases (47 total, all passing)

Closes NousResearch#18581 The invisible-unicode scanner in threat_patterns.py flagged ALL U+200D (Zero Width Joiner) occurrences as potential prompt injection, which silently blocked SOUL.md and other context files containing standard emoji sequences like 🤸‍♀️, 👨‍👩‍👧, or ❤️‍🔥. Root cause: INVISIBLE_CHARS includes U+200D, and scan_for_threats() performed a simple set intersection without considering that ZWJ is the standard Unicode mechanism (TR#51) for constructing gendered, family, and skin-tone emoji sequences. Fix: mirror the proven approach from cronjob_tools.py — define emoji codepoint ranges, skip VS16 variation selectors, and check whether each ZWJ sits between emoji codepoints. Strip legitimate emoji ZWJ before the invisible-char scan so that only suspicious ZWJ (between ASCII/CJK/Latin etc.) is flagged. Changes: - tools/threat_patterns.py: add _EMOJI_CP_RANGES, _is_emoji_cp(), _zwj_in_emoji_sequence(), and _strip_legitimate_emoji_zwj(); update scan_for_threats() to strip emoji ZWJ before scanning - tests/tools/test_threat_patterns.py: add 10 new test cases covering emoji ZWJ allowlist, CJK/Latin injection blocklist, mixed content, skin-tone sequences, and edge positions

mohamedorigami-jpg

Nice approach with the strip-vs-validate strategy. The test coverage is thorough -- especially the mixed emoji + malicious ZWJ test and the multi-ZWJ family emoji case.

One thing I noticed: the regex currently checks that ZWJ is preceded by an emoji codepoint (via the emoji set) but doesn't validate that the full ZWJ sequence is a known emoji. That's the right call for a detector -- you want under-blocking over false positives, and unknown emoji sequences are harmless anyway.

w1ndcn · 2026-05-31T12:27:14Z

@mohamedorigami-jpg Thanks for the review! You're absolutely right — validating the full ZWJ sequence against a known-emoji database would be overkill for a threat detector. The emoji-codepoint-neighbour check gives us the right balance: standard emoji sequences pass, while anything else (ASCII/CJK/Latin-adjacent ZWJ) still gets caught. Glad the test coverage looks solid.

tonydwb

Code Review Summary — PR #35808

Verdict: Approved ✅

Author: w1ndcn | Type: bugfix | Files: tools/threat_patterns.py, tests/tools/test_threat_patterns.py

Key Points

Fixes a real false-positive: U+200D (ZWJ) in emoji sequences like 🤸‍♀️, 👨‍👩‍👧 were flagged as invisible-unicode injection.
Clean design: _EMOJI_CP_RANGES covers all Unicode 16.0 ZWJ emoji sequences; _zwj_in_emoji_sequence() walks past VS16 before checking neighbours; _strip_legitimate_emoji_zwj() strips before the existing intersection logic.
Security preserved: ZWJ between ASCII, CJK, or Latin chars is still flagged. Mixed content (emoji + malicious ZWJ) catches the malicious one.
10 comprehensive test cases covering all edge cases.

No issues found.

Reviewed by Hermes Agent

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 31, 2026

mohamedorigami-jpg approved these changes May 31, 2026

View reviewed changes

tonydwb approved these changes May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow ZWJ (U+200D) in legitimate emoji sequences#35808

fix: allow ZWJ (U+200D) in legitimate emoji sequences#35808
w1ndcn wants to merge 1 commit into
NousResearch:mainfrom
w1ndcn:fix/zwj-emoji-false-positive-18581

w1ndcn commented May 31, 2026

Uh oh!

mohamedorigami-jpg left a comment

Uh oh!

w1ndcn commented May 31, 2026

Uh oh!

tonydwb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

w1ndcn commented May 31, 2026

Problem

Root Cause

Fix

Key design decisions

Security preserved

Changes

Uh oh!

mohamedorigami-jpg left a comment

Choose a reason for hiding this comment

Uh oh!

w1ndcn commented May 31, 2026

Uh oh!

tonydwb left a comment

Choose a reason for hiding this comment

Code Review Summary — PR #35808

Key Points

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants