[WIP] Fix double escaping of text in titles#19545
Conversation
…revent literal > in issue titles Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes double-escaping of HTML entities (notably >, <, &) so sanitized titles/content don’t retain literal entities and remain idempotent.
Changes:
- Extend
decodeHtmlEntitiesto decode additional common named entities (gt/lt/amp) including some double-encoded forms. - Add sanitize-content tests for named entity decoding and idempotency.
- Add sanitize-title tests for title-specific named entity decoding and idempotency.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| actions/setup/js/sanitize_content_core.cjs | Expands HTML entity decoding behavior used early in the sanitization pipeline. |
| actions/setup/js/sanitize_content.test.cjs | Adds coverage for decoding >, <, & and idempotency in content sanitization. |
| actions/setup/js/sanitize_title.test.cjs | Adds coverage for decoding >, <, & and idempotency in title sanitization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| it("should decode > in title to prevent literal > appearing in issues", () => { | ||
| // If an agent outputs a title with > (e.g. because the prompt context contained it), | ||
| // the sanitizer must decode it back to > so the issue title is not > in markdown. | ||
| expect(sanitizeTitle("value > threshold")).toBe("value > threshold"); | ||
| }); | ||
|
|
||
| it("should decode double-encoded &gt; in title to >", () => { | ||
| expect(sanitizeTitle("value &gt; threshold")).toBe("value > threshold"); | ||
| }); | ||
|
|
||
| it("should decode < in title and neutralize any resulting HTML tags", () => { | ||
| // <tag> → <tag> → convertXmlTags → (tag) | ||
| expect(sanitizeTitle("<script> injection")).toBe("(script) injection"); | ||
| }); | ||
|
|
||
| it("should decode & in title to &", () => { | ||
| expect(sanitizeTitle("cats & dogs")).toBe("cats & dogs"); | ||
| }); | ||
|
|
There was a problem hiding this comment.
Consider adding title-focused coverage for fully double-encoded entities like &amp;gt; / &amp;lt;...&amp;gt; and &amp;commat;user. These inputs are plausible after multiple encoding passes and can reveal ordering issues in the underlying HTML-entity decoding.
| // Decode named entity for @ symbol (including double-encoded variants) | ||
| // @ and &commat; → @ | ||
| result = result.replace(/&(?:amp;)?commat;/gi, "@"); | ||
|
|
||
| // Decode common named HTML entities (including double-encoded variants) | ||
| // These prevent HTML-encoded characters from persisting as literal entities | ||
| // in sanitized output (e.g. a title containing > instead of >). | ||
| // > and &gt; → > | ||
| result = result.replace(/&(?:amp;)?gt;/gi, ">"); | ||
| // < and &lt; → < (convertXmlTags will then neutralise any resulting tags) | ||
| result = result.replace(/&(?:amp;)?lt;/gi, "<"); | ||
| // & and &amp; → & (decoded after gt/lt so &gt; is already handled above) | ||
| result = result.replace(/&(?:amp;)?amp;/gi, "&"); |
There was a problem hiding this comment.
decodeHtmlEntities decodes &amp; to &, but the named-entity replacements for commat/gt/lt run before that. This means inputs like &amp;commat;user or &amp;lt;script&amp;gt; will be transformed into @user / <script> and then left as entities, which can re-enable @mention bypass or allow HTML entities to survive the sanitizer. Reorder decoding so & is normalized first (or run entity decoding in a bounded loop / allow multiple amp; layers) and then apply the specific named-entity decodes afterward so newly-unwrapped entities are also handled.
| // in sanitized output (e.g. a title containing > instead of >). | ||
| // > and &gt; → > | ||
| result = result.replace(/&(?:amp;)?gt;/gi, ">"); | ||
| // < and &lt; → < (convertXmlTags will then neutralise any resulting tags) |
There was a problem hiding this comment.
Typo/inconsistency in comment: the codebase uses American spelling "neutralize" throughout (e.g., function name neutralizeBotTriggers and other comments), but this comment says "neutralise". Consider updating to "neutralize" for consistency.
| // < and &lt; → < (convertXmlTags will then neutralise any resulting tags) | |
| // < and &lt; → < (convertXmlTags will then neutralize any resulting tags) |
| it("should decode > entity to > to prevent literal > in output", () => { | ||
| const result = sanitizeContent("value > threshold"); | ||
| expect(result).toBe("value > threshold"); | ||
| }); | ||
|
|
||
| it("should decode double-encoded &gt; entity to >", () => { | ||
| const result = sanitizeContent("value &gt; threshold"); | ||
| expect(result).toBe("value > threshold"); | ||
| }); | ||
|
|
||
| it("should decode < entity to < and then neutralize resulting tags", () => { | ||
| const result = sanitizeContent("<script> injection"); | ||
| // < → < and > → >, then convertXmlTags turns <script> into (script) | ||
| expect(result).toBe("(script) injection"); | ||
| }); | ||
|
|
||
| it("should decode & entity to &", () => { | ||
| const result = sanitizeContent("cats & dogs"); | ||
| expect(result).toBe("cats & dogs"); | ||
| }); | ||
|
|
||
| it("should decode double-encoded &amp; entity to &", () => { | ||
| const result = sanitizeContent("cats &amp; dogs"); | ||
| expect(result).toBe("cats & dogs"); | ||
| }); |
There was a problem hiding this comment.
The new entity-decoding tests cover &gt; and &amp; but not cases where the entire entity is double-encoded (e.g. &amp;gt;, &amp;lt;...&amp;gt;, &amp;commat;user). Adding assertions for these would both prevent regressions and catch the partial-decoding/ordering issue in decodeHtmlEntities.
decodeHtmlEntitiesinsanitize_content_core.cjsto also decode>,<,&(with double-encoded variants)sanitize_content.test.cjsfor named HTML entity decoding and idempotencysanitize_title.test.cjsfor title-specific HTML entity decoding and idempotency💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.