Skip to content

fix: strip on* event handlers and style attributes from allowlisted HTML tags in convertXmlTags()#22988

Merged
pelikhan merged 5 commits intomainfrom
copilot/fix-html-attribute-injection
Mar 25, 2026
Merged

fix: strip on* event handlers and style attributes from allowlisted HTML tags in convertXmlTags()#22988
pelikhan merged 5 commits intomainfrom
copilot/fix-html-attribute-injection

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 25, 2026

convertXmlTags() passed allowlisted tags through verbatim — attributes and all. Both <details ontoggle="alert(document.cookie)"> and <span style="position:fixed;..."> survived the full sanitizeContentCore pipeline unchanged, contradicting the documented sanitization goal.

Changes

  • sanitize_content_core.cjsconvertXmlTags(): Added stripDangerousAttributes() helper that removes on* event handlers and style attributes from allowlisted tags before returning them. Disallowed tags continue to be converted to parentheses format unchanged.
// Before: allowed tags returned verbatim including all attributes
if (allowedTags.includes(tagName)) {
  return match; // <details ontoggle="bad()"> passed through as-is
}

// After: dangerous attributes stripped before returning
if (allowedTags.includes(tagName)) {
  const sanitizedContent = stripDangerousAttributes(tagContent);
  return `<${sanitizedContent}>`;
}
  • Attribute stripping covers: all quoting forms ("val", 'val', val, bare), case-insensitive (ONCLICK, OnStyle), multiple dangerous attrs in a single tag. Safe attributes (title, class, open, lang, id, etc.) are preserved.

  • Regex design: uses \s+ (not \s*) before the attribute name to avoid false matches inside tag names (e.g. matching "ong" inside strong). Unquoted value pattern is [^\s>"'\]*— excludes>`, whitespace, and all quote characters.

  • Tests: 15 new cases in a "dangerous attribute stripping" describe block covering the two confirmed attack vectors (ontoggle, CSS overlay via style), all quoting forms, case-insensitivity, multi-attribute stripping, safe-attribute preservation, whitespace around =, and the tag-name/attribute boundary.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/github/gh-aw/contents/.github%2Fworkflows%2Faudit-workflows.md
    • Triggering command: /opt/hostedtoolcache/node/24.14.0/x64/bin/node /opt/hostedtoolcache/node/24.14.0/x64/bin/node --experimental-import-meta-resolve --require /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/suppress-warnings.cjs --conditions node --conditions development /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/dist/workers/forks.js (http block)
  • invalid.example.invalid
    • Triggering command: /usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git e/git init�� git git tions/setup/js/node_modules/.bin/git user.email test@example.comcheckout /git git bran�� -M main k/gh-aw/gh-aw/actions/setup/js/node_modules/.bin/git /tmp/bare-incremgit gin/feature-branadd cal/bin/git git (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Copilot AI and others added 3 commits March 25, 2026 21:56
…in convertXmlTags()

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/beb0e828-1969-46c4-b4c5-bd52b7e12530

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
… unquoted value pattern, add simple unquoted test

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/beb0e828-1969-46c4-b4c5-bd52b7e12530

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…-case tests

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/beb0e828-1969-46c4-b4c5-bd52b7e12530

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix HTML attribute injection in convertXmlTags() fix: strip on* event handlers and style attributes from allowlisted HTML tags in convertXmlTags() Mar 25, 2026
Copilot AI requested a review from pelikhan March 25, 2026 22:02
@pelikhan pelikhan marked this pull request as ready for review March 25, 2026 22:04
Copilot AI review requested due to automatic review settings March 25, 2026 22:04
@pelikhan pelikhan enabled auto-merge (squash) March 25, 2026 22:04
@github-actions github-actions bot mentioned this pull request Mar 25, 2026
@pelikhan pelikhan merged commit ef4c1d6 into main Mar 25, 2026
69 of 115 checks passed
@pelikhan pelikhan deleted the copilot/fix-html-attribute-injection branch March 25, 2026 22:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the convertXmlTags() sanitization step so allowlisted HTML tags no longer pass through dangerous attributes (notably on* event handlers and style) unchanged.

Changes:

  • Add stripDangerousAttributes() in convertXmlTags() to remove on* and style attributes from allowlisted tags before preserving them.
  • Add a dedicated test block covering dangerous-attribute stripping behavior across quoting/casing/whitespace variants.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
actions/setup/js/sanitize_content_core.cjs Strips on* event handlers and style from allowlisted HTML tags during XML/HTML tag conversion.
actions/setup/js/sanitize_content.test.cjs Adds test cases validating the new dangerous-attribute stripping behavior for allowlisted tags.
Comments suppressed due to low confidence (1)

actions/setup/js/sanitize_content_core.cjs:591

  • stripDangerousAttributes() can match and remove text inside quoted attribute values because it looks for \s+(?:on\w+|style) anywhere in tagContent and the =value part is optional. Example: <span title="foo onclick bar"> would have the substring " onclick" removed even though it's part of title, corrupting safe attributes. Consider tokenizing attributes (scan chars while tracking quote state) and removing attributes by name, or at minimum require an = and ensure the match occurs at an attribute boundary outside quotes.
  function stripDangerousAttributes(tagContent) {
    // Match: one-or-more whitespace + (on* | style) + optional =value
    // Value forms: "...", '...', or unquoted (no whitespace / > / quote chars), or bare (no =)
    // The unquoted form excludes >, whitespace, and all quote characters (', ", `) so it
    // cannot consume the closing > of the tag or straddle other attribute values.
    return tagContent.replace(/\s+(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, "");
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +577 to +590
* Note: `\s+` (requiring at least one whitespace before the attribute name) is
* intentional — HTML attributes are always separated from the tag name and from
* each other by at least one whitespace character. Using `\s*` would risk false
* matches inside tag names (e.g. matching "ong" inside "strong").
*
* @param {string} tagContent - Tag content without surrounding angle brackets
* @returns {string} Tag content with dangerous attributes removed
*/
function stripDangerousAttributes(tagContent) {
// Match: one-or-more whitespace + (on* | style) + optional =value
// Value forms: "...", '...', or unquoted (no whitespace / > / quote chars), or bare (no =)
// The unquoted form excludes >, whitespace, and all quote characters (', ", `) so it
// cannot consume the closing > of the tag or straddle other attribute values.
return tagContent.replace(/\s+(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, "");
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on \s+ before on*/style assumes attributes are always whitespace-separated, but browsers commonly treat <tag a="1"onclick="..."> (no whitespace between attributes) as having an onclick attribute. This means dangerous attributes may still survive on allowlisted tags. Update the stripping logic to handle attribute boundaries even when whitespace is missing (ideally via a small HTML-attribute tokenizer), and add a regression test for the no-whitespace case.

This issue also appears on line 585 of the same file.

Suggested change
* Note: `\s+` (requiring at least one whitespace before the attribute name) is
* intentional HTML attributes are always separated from the tag name and from
* each other by at least one whitespace character. Using `\s*` would risk false
* matches inside tag names (e.g. matching "ong" inside "strong").
*
* @param {string} tagContent - Tag content without surrounding angle brackets
* @returns {string} Tag content with dangerous attributes removed
*/
function stripDangerousAttributes(tagContent) {
// Match: one-or-more whitespace + (on* | style) + optional =value
// Value forms: "...", '...', or unquoted (no whitespace / > / quote chars), or bare (no =)
// The unquoted form excludes >, whitespace, and all quote characters (', ", `) so it
// cannot consume the closing > of the tag or straddle other attribute values.
return tagContent.replace(/\s+(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, "");
* This function matches dangerous attributes at HTML attribute boundaries:
* the start of the tag content or after whitespace, quotes, "/", or ">".
* This correctly handles cases like a="1"onclick="..." where there is no
* whitespace between attributes, while avoiding matches inside tag names
* such as "strong".
*
* @param {string} tagContent - Tag content without surrounding angle brackets
* @returns {string} Tag content with dangerous attributes removed
*/
function stripDangerousAttributes(tagContent) {
// Match: attribute boundary + (on* | style) + optional =value
// Boundary: start-of-string or one of whitespace, quotes, "/", or ">"
// Value forms: "...", '...', or unquoted (no whitespace / > / quote chars), or bare (no =)
return tagContent.replace(
/(^|[\s"'`\/>])(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi,
"$1"
);

Copilot uses AI. Check for mistakes.
Comment on lines +530 to +533
it("should strip multiple dangerous attributes from a single tag", () => {
const result = sanitizeContent('<span onclick="bad()" style="position:fixed" title="ok">text</span>');
expect(result).toBe('<span title="ok">text</span>');
});
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests cover whitespace-separated attributes and tag-name/attribute concatenation, but they don't cover two important edge cases for this stripping approach: (1) no whitespace between attributes after a quoted value (e.g. <span title="ok"onclick="bad()">) and (2) safe attribute values containing substrings like " onclick"/" style" (to ensure the stripper doesn’t mutate values). Adding regression tests for these will help prevent bypasses and unintended content changes.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML attribute injection not sanitized in convertXmlTags(): event handlers and CSS survive safe-outputs pipeline

3 participants