Skip to content

[WIP] Fix double escaping of text in titles#19545

Merged
pelikhan merged 3 commits intomainfrom
copilot/fix-double-escaping-in-titles
Mar 4, 2026
Merged

[WIP] Fix double escaping of text in titles#19545
pelikhan merged 3 commits intomainfrom
copilot/fix-double-escaping-in-titles

Conversation

Copy link
Contributor

Copilot AI commented Mar 4, 2026

  • Investigate double escaping issue in create-issue titles
  • Fix decodeHtmlEntities in sanitize_content_core.cjs to also decode >, <, & (with double-encoded variants)
  • Add tests to sanitize_content.test.cjs for named HTML entity decoding and idempotency
  • Add tests to sanitize_title.test.cjs for title-specific HTML entity decoding and idempotency
  • Validate: 302 tests pass (sanitize_content + sanitize_title), fmt/lint clean, Go tests pass
  • Code review completed and comments addressed
  • CodeQL check: no analyzable languages changed (JS only)

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…revent literal > in issue titles

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review March 4, 2026 09:07
Copilot AI review requested due to automatic review settings March 4, 2026 09:07
@pelikhan pelikhan merged commit c9f8a54 into main Mar 4, 2026
48 checks passed
@pelikhan pelikhan deleted the copilot/fix-double-escaping-in-titles branch March 4, 2026 09:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes double-escaping of HTML entities (notably &gt;, &lt;, &amp;) so sanitized titles/content don’t retain literal entities and remain idempotent.

Changes:

  • Extend decodeHtmlEntities to decode additional common named entities (gt/lt/amp) including some double-encoded forms.
  • Add sanitize-content tests for named entity decoding and idempotency.
  • Add sanitize-title tests for title-specific named entity decoding and idempotency.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
actions/setup/js/sanitize_content_core.cjs Expands HTML entity decoding behavior used early in the sanitization pipeline.
actions/setup/js/sanitize_content.test.cjs Adds coverage for decoding &gt;, &lt;, &amp; and idempotency in content sanitization.
actions/setup/js/sanitize_title.test.cjs Adds coverage for decoding &gt;, &lt;, &amp; and idempotency in title sanitization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +244 to +262
it("should decode &gt; in title to prevent literal &gt; appearing in issues", () => {
// If an agent outputs a title with &gt; (e.g. because the prompt context contained it),
// the sanitizer must decode it back to > so the issue title is not &gt; in markdown.
expect(sanitizeTitle("value &gt; threshold")).toBe("value > threshold");
});

it("should decode double-encoded &amp;gt; in title to >", () => {
expect(sanitizeTitle("value &amp;gt; threshold")).toBe("value > threshold");
});

it("should decode &lt; in title and neutralize any resulting HTML tags", () => {
// &lt;tag&gt; → <tag> → convertXmlTags → (tag)
expect(sanitizeTitle("&lt;script&gt; injection")).toBe("(script) injection");
});

it("should decode &amp; in title to &", () => {
expect(sanitizeTitle("cats &amp; dogs")).toBe("cats & dogs");
});

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding title-focused coverage for fully double-encoded entities like &amp;amp;gt; / &amp;amp;lt;...&amp;amp;gt; and &amp;amp;commat;user. These inputs are plausible after multiple encoding passes and can reveal ordering issues in the underlying HTML-entity decoding.

Copilot uses AI. Check for mistakes.
Comment on lines 661 to +673
// Decode named entity for @ symbol (including double-encoded variants)
// &commat; and &amp;commat; → @
result = result.replace(/&(?:amp;)?commat;/gi, "@");

// Decode common named HTML entities (including double-encoded variants)
// These prevent HTML-encoded characters from persisting as literal entities
// in sanitized output (e.g. a title containing &gt; instead of >).
// &gt; and &amp;gt; → >
result = result.replace(/&(?:amp;)?gt;/gi, ">");
// &lt; and &amp;lt; → < (convertXmlTags will then neutralise any resulting tags)
result = result.replace(/&(?:amp;)?lt;/gi, "<");
// &amp; and &amp;amp; → & (decoded after gt/lt so &amp;gt; is already handled above)
result = result.replace(/&(?:amp;)?amp;/gi, "&");
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decodeHtmlEntities decodes &amp;amp; to &, but the named-entity replacements for commat/gt/lt run before that. This means inputs like &amp;amp;commat;user or &amp;amp;lt;script&amp;amp;gt; will be transformed into &commat;user / &lt;script&gt; and then left as entities, which can re-enable @mention bypass or allow HTML entities to survive the sanitizer. Reorder decoding so &amp; is normalized first (or run entity decoding in a bounded loop / allow multiple amp; layers) and then apply the specific named-entity decodes afterward so newly-unwrapped entities are also handled.

Copilot uses AI. Check for mistakes.
// in sanitized output (e.g. a title containing &gt; instead of >).
// &gt; and &amp;gt; → >
result = result.replace(/&(?:amp;)?gt;/gi, ">");
// &lt; and &amp;lt; → < (convertXmlTags will then neutralise any resulting tags)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/inconsistency in comment: the codebase uses American spelling "neutralize" throughout (e.g., function name neutralizeBotTriggers and other comments), but this comment says "neutralise". Consider updating to "neutralize" for consistency.

Suggested change
// &lt; and &amp;lt; → < (convertXmlTags will then neutralise any resulting tags)
// &lt; and &amp;lt; → < (convertXmlTags will then neutralize any resulting tags)

Copilot uses AI. Check for mistakes.
Comment on lines +1667 to +1691
it("should decode &gt; entity to > to prevent literal &gt; in output", () => {
const result = sanitizeContent("value &gt; threshold");
expect(result).toBe("value > threshold");
});

it("should decode double-encoded &amp;gt; entity to >", () => {
const result = sanitizeContent("value &amp;gt; threshold");
expect(result).toBe("value > threshold");
});

it("should decode &lt; entity to < and then neutralize resulting tags", () => {
const result = sanitizeContent("&lt;script&gt; injection");
// &lt; → < and &gt; → >, then convertXmlTags turns <script> into (script)
expect(result).toBe("(script) injection");
});

it("should decode &amp; entity to &", () => {
const result = sanitizeContent("cats &amp; dogs");
expect(result).toBe("cats & dogs");
});

it("should decode double-encoded &amp;amp; entity to &", () => {
const result = sanitizeContent("cats &amp;amp; dogs");
expect(result).toBe("cats & dogs");
});
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new entity-decoding tests cover &amp;gt; and &amp;amp; but not cases where the entire entity is double-encoded (e.g. &amp;amp;gt;, &amp;amp;lt;...&amp;amp;gt;, &amp;amp;commat;user). Adding assertions for these would both prevent regressions and catch the partial-decoding/ordering issue in decodeHtmlEntities.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants