Conversation
…in convertXmlTags() Agent-Logs-Url: https://github.com/github/gh-aw/sessions/beb0e828-1969-46c4-b4c5-bd52b7e12530 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
… unquoted value pattern, add simple unquoted test Agent-Logs-Url: https://github.com/github/gh-aw/sessions/beb0e828-1969-46c4-b4c5-bd52b7e12530 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…-case tests Agent-Logs-Url: https://github.com/github/gh-aw/sessions/beb0e828-1969-46c4-b4c5-bd52b7e12530 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates the convertXmlTags() sanitization step so allowlisted HTML tags no longer pass through dangerous attributes (notably on* event handlers and style) unchanged.
Changes:
- Add
stripDangerousAttributes()inconvertXmlTags()to removeon*andstyleattributes from allowlisted tags before preserving them. - Add a dedicated test block covering dangerous-attribute stripping behavior across quoting/casing/whitespace variants.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| actions/setup/js/sanitize_content_core.cjs | Strips on* event handlers and style from allowlisted HTML tags during XML/HTML tag conversion. |
| actions/setup/js/sanitize_content.test.cjs | Adds test cases validating the new dangerous-attribute stripping behavior for allowlisted tags. |
Comments suppressed due to low confidence (1)
actions/setup/js/sanitize_content_core.cjs:591
stripDangerousAttributes()can match and remove text inside quoted attribute values because it looks for\s+(?:on\w+|style)anywhere intagContentand the=valuepart is optional. Example:<span title="foo onclick bar">would have the substring" onclick"removed even though it's part oftitle, corrupting safe attributes. Consider tokenizing attributes (scan chars while tracking quote state) and removing attributes by name, or at minimum require an=and ensure the match occurs at an attribute boundary outside quotes.
function stripDangerousAttributes(tagContent) {
// Match: one-or-more whitespace + (on* | style) + optional =value
// Value forms: "...", '...', or unquoted (no whitespace / > / quote chars), or bare (no =)
// The unquoted form excludes >, whitespace, and all quote characters (', ", `) so it
// cannot consume the closing > of the tag or straddle other attribute values.
return tagContent.replace(/\s+(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, "");
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * Note: `\s+` (requiring at least one whitespace before the attribute name) is | ||
| * intentional — HTML attributes are always separated from the tag name and from | ||
| * each other by at least one whitespace character. Using `\s*` would risk false | ||
| * matches inside tag names (e.g. matching "ong" inside "strong"). | ||
| * | ||
| * @param {string} tagContent - Tag content without surrounding angle brackets | ||
| * @returns {string} Tag content with dangerous attributes removed | ||
| */ | ||
| function stripDangerousAttributes(tagContent) { | ||
| // Match: one-or-more whitespace + (on* | style) + optional =value | ||
| // Value forms: "...", '...', or unquoted (no whitespace / > / quote chars), or bare (no =) | ||
| // The unquoted form excludes >, whitespace, and all quote characters (', ", `) so it | ||
| // cannot consume the closing > of the tag or straddle other attribute values. | ||
| return tagContent.replace(/\s+(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, ""); |
There was a problem hiding this comment.
Relying on \s+ before on*/style assumes attributes are always whitespace-separated, but browsers commonly treat <tag a="1"onclick="..."> (no whitespace between attributes) as having an onclick attribute. This means dangerous attributes may still survive on allowlisted tags. Update the stripping logic to handle attribute boundaries even when whitespace is missing (ideally via a small HTML-attribute tokenizer), and add a regression test for the no-whitespace case.
This issue also appears on line 585 of the same file.
| * Note: `\s+` (requiring at least one whitespace before the attribute name) is | |
| * intentional — HTML attributes are always separated from the tag name and from | |
| * each other by at least one whitespace character. Using `\s*` would risk false | |
| * matches inside tag names (e.g. matching "ong" inside "strong"). | |
| * | |
| * @param {string} tagContent - Tag content without surrounding angle brackets | |
| * @returns {string} Tag content with dangerous attributes removed | |
| */ | |
| function stripDangerousAttributes(tagContent) { | |
| // Match: one-or-more whitespace + (on* | style) + optional =value | |
| // Value forms: "...", '...', or unquoted (no whitespace / > / quote chars), or bare (no =) | |
| // The unquoted form excludes >, whitespace, and all quote characters (', ", `) so it | |
| // cannot consume the closing > of the tag or straddle other attribute values. | |
| return tagContent.replace(/\s+(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, ""); | |
| * This function matches dangerous attributes at HTML attribute boundaries: | |
| * the start of the tag content or after whitespace, quotes, "/", or ">". | |
| * This correctly handles cases like a="1"onclick="..." where there is no | |
| * whitespace between attributes, while avoiding matches inside tag names | |
| * such as "strong". | |
| * | |
| * @param {string} tagContent - Tag content without surrounding angle brackets | |
| * @returns {string} Tag content with dangerous attributes removed | |
| */ | |
| function stripDangerousAttributes(tagContent) { | |
| // Match: attribute boundary + (on* | style) + optional =value | |
| // Boundary: start-of-string or one of whitespace, quotes, "/", or ">" | |
| // Value forms: "...", '...', or unquoted (no whitespace / > / quote chars), or bare (no =) | |
| return tagContent.replace( | |
| /(^|[\s"'`\/>])(?:on\w+|style)(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"'`]*))?/gi, | |
| "$1" | |
| ); |
| it("should strip multiple dangerous attributes from a single tag", () => { | ||
| const result = sanitizeContent('<span onclick="bad()" style="position:fixed" title="ok">text</span>'); | ||
| expect(result).toBe('<span title="ok">text</span>'); | ||
| }); |
There was a problem hiding this comment.
Tests cover whitespace-separated attributes and tag-name/attribute concatenation, but they don't cover two important edge cases for this stripping approach: (1) no whitespace between attributes after a quoted value (e.g. <span title="ok"onclick="bad()">) and (2) safe attribute values containing substrings like " onclick"/" style" (to ensure the stripper doesn’t mutate values). Adding regression tests for these will help prevent bypasses and unintended content changes.
convertXmlTags()passed allowlisted tags through verbatim — attributes and all. Both<details ontoggle="alert(document.cookie)">and<span style="position:fixed;...">survived the fullsanitizeContentCorepipeline unchanged, contradicting the documented sanitization goal.Changes
sanitize_content_core.cjs—convertXmlTags(): AddedstripDangerousAttributes()helper that removeson*event handlers andstyleattributes from allowlisted tags before returning them. Disallowed tags continue to be converted to parentheses format unchanged.Attribute stripping covers: all quoting forms (
"val",'val',val, bare), case-insensitive (ONCLICK,OnStyle), multiple dangerous attrs in a single tag. Safe attributes (title,class,open,lang,id, etc.) are preserved.Regex design: uses
\s+(not\s*) before the attribute name to avoid false matches inside tag names (e.g. matching "ong" insidestrong). Unquoted value pattern is[^\s>"'\]*— excludes>`, whitespace, and all quote characters.Tests: 15 new cases in a
"dangerous attribute stripping"describe block covering the two confirmed attack vectors (ontoggle, CSS overlay viastyle), all quoting forms, case-insensitivity, multi-attribute stripping, safe-attribute preservation, whitespace around=, and the tag-name/attribute boundary.Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
https://api.github.com/repos/github/gh-aw/contents/.github%2Fworkflows%2Faudit-workflows.md/opt/hostedtoolcache/node/24.14.0/x64/bin/node /opt/hostedtoolcache/node/24.14.0/x64/bin/node --experimental-import-meta-resolve --require /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/suppress-warnings.cjs --conditions node --conditions development /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/dist/workers/forks.js(http block)invalid.example.invalid/usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git e/git init�� git git tions/setup/js/node_modules/.bin/git user.email test@example.comcheckout /git git bran�� -M main k/gh-aw/gh-aw/actions/setup/js/node_modules/.bin/git /tmp/bare-incremgit gin/feature-branadd cal/bin/git git(dns block)If you need me to access, download, or install something from one of these locations, you can either:
💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.