Skip to content

fix: [0.8.x] preserve trailing whitespace in ProcessingInstruction data#962

Merged
karfau merged 1 commit intoxmldom:release-0.8.xfrom
stevenobiajulu:fix/42-pi-trailing-whitespace
Mar 7, 2026
Merged

fix: [0.8.x] preserve trailing whitespace in ProcessingInstruction data#962
karfau merged 1 commit intoxmldom:release-0.8.xfrom
stevenobiajulu:fix/42-pi-trailing-whitespace

Conversation

@stevenobiajulu
Copy link
Copy Markdown
Contributor

Target branch: release-0.8.x

What

Remove \s* from the parseInstruction regex in lib/sax.js so that trailing whitespace inside a processing instruction is preserved instead of silently stripped.

Why

Per XML spec §2.6, PI data is everything between the mandatory separator whitespace after the target and the closing ?>. Trailing whitespace inside the PI boundary is content — there is no rule to strip it. Conforming parsers (sax-js, libexpat) preserve it.

This was already fixed on master/0.9.x as a side-effect of the large DOCTYPE rewrite in PR #498 (22k lines). This PR is the minimal, non-breaking backport for the maintained 0.8.x line.

How

parseInstruction builds a substring that already excludes ?>, so $ anchors immediately before it. The \s* before $ was greedily consuming any trailing whitespace from PI data before passing it to domBuilder.processingInstruction. Removing it — while keeping *? on the data group to minimise diff — is the complete fix.

// before
source.substring(start, end).match(/^<\?(\S*)\s*([\s\S]*?)\s*$/)
// after
source.substring(start, end).match(/^<\?(\S*)\s*([\s\S]*?)$/)

Five existing snapshots in test/xmltest/__snapshots__/not-wf.test.js.snap were updated: they captured the old buggy behaviour (trailing space stripped from XML-declaration-like PIs in the not-well-formed corpus). The updated snapshots reflect the now-correct output.

Scope

  • nodeName: undefined — already fixed in current source (dom.js:1209), not touched
  • XML declaration in childNodes — architectural, out of scope

Changes

File Change
lib/sax.js Remove \s* from parseInstruction regex
test/dom/processing-instruction.test.js New file: 2 tests for trailing space and trailing newline
test/xmltest/__snapshots__/not-wf.test.js.snap 5 snapshot updates — stale snapshots reflecting old (buggy) behaviour

Checklist

  • npm test — 673 tests pass (671 baseline + 2 new)
  • npm run lint — clean
  • npm run format — clean
  • No API surface change — no index.d.ts or readme.md update needed

Addresses the trailing-whitespace sub-issue from #42, backporting #498 behaviour to 0.8.x.

Per XML spec §2.6, PI data extends to immediately before '?>'.
Trailing whitespace inside the PI boundary is content, not separator.
Remove \s* before $ in parseInstruction regex so it is no longer stripped.

Addresses the trailing-whitespace sub-issue from xmldom#42, backporting xmldom#498
behaviour to the 0.8.x line.
Copy link
Copy Markdown
Member

@karfau karfau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thx a lot.
Sadly I can not Providence an ETA for releasing it to npm, as I have very limited time for this project right now.

@karfau karfau added bug Something isn't working spec:XML https://www.w3.org/TR/xml11/ labels Mar 7, 2026
@karfau karfau merged commit ac40424 into xmldom:release-0.8.x Mar 7, 2026
21 checks passed
@karfau karfau added this to the 0.8.12 milestone Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working spec:XML https://www.w3.org/TR/xml11/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants