fix: [0.8.x] preserve trailing whitespace in ProcessingInstruction data#962
Merged
karfau merged 1 commit intoxmldom:release-0.8.xfrom Mar 7, 2026
Merged
Conversation
Per XML spec §2.6, PI data extends to immediately before '?>'. Trailing whitespace inside the PI boundary is content, not separator. Remove \s* before $ in parseInstruction regex so it is no longer stripped. Addresses the trailing-whitespace sub-issue from xmldom#42, backporting xmldom#498 behaviour to the 0.8.x line.
karfau
approved these changes
Mar 7, 2026
Member
karfau
left a comment
There was a problem hiding this comment.
Awesome, thx a lot.
Sadly I can not Providence an ETA for releasing it to npm, as I have very limited time for this project right now.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Target branch:
release-0.8.xWhat
Remove
\s*from theparseInstructionregex inlib/sax.jsso that trailing whitespace inside a processing instruction is preserved instead of silently stripped.Why
Per XML spec §2.6, PI data is everything between the mandatory separator whitespace after the target and the closing
?>. Trailing whitespace inside the PI boundary is content — there is no rule to strip it. Conforming parsers (sax-js, libexpat) preserve it.This was already fixed on
master/0.9.xas a side-effect of the large DOCTYPE rewrite in PR #498 (22k lines). This PR is the minimal, non-breaking backport for the maintained0.8.xline.How
parseInstructionbuilds a substring that already excludes?>, so$anchors immediately before it. The\s*before$was greedily consuming any trailing whitespace from PI data before passing it todomBuilder.processingInstruction. Removing it — while keeping*?on the data group to minimise diff — is the complete fix.Five existing snapshots in
test/xmltest/__snapshots__/not-wf.test.js.snapwere updated: they captured the old buggy behaviour (trailing space stripped from XML-declaration-like PIs in the not-well-formed corpus). The updated snapshots reflect the now-correct output.Scope
nodeName: undefined— already fixed in current source (dom.js:1209), not touchedchildNodes— architectural, out of scopeChanges
lib/sax.js\s*fromparseInstructionregextest/dom/processing-instruction.test.jstest/xmltest/__snapshots__/not-wf.test.js.snapChecklist
npm test— 673 tests pass (671 baseline + 2 new)npm run lint— cleannpm run format— cleanindex.d.tsorreadme.mdupdate neededAddresses the trailing-whitespace sub-issue from #42, backporting #498 behaviour to 0.8.x.