Summary
lib/helpers/fromDataURI.js uses a DATA_URL_PATTERN regex that doesn't match RFC 2397, so several valid data URIs throw Invalid URL. We should fix this inline rather than pulling in a runtime dependency.
Failing cases
All three of these are RFC 2397 compliant and currently rejected:
data:;base64,MTIz — no media type
data:application/octet-stream,123 — no parameters before the comma
data:text/plain;charset=US-ASCII,123 — non-base64 parameter
The current regex requires [^;]+; for the first segment, so any URI without a ; before the body fails to match.
Why inline rather than a dependency
PR #7295 proposed swapping in data-uri-to-buffer. The tradeoffs there: v6+ is ESM-only and bumps engines.node to >=14, so we'd need to pin v3, which is unmaintained. For a fix of this size, a runtime dep isn't worth the supply-chain surface.
Proposed regex
RFC 2397 grammar: "data:" [ mediatype ] [ ";base64" ] "," data where mediatype := [ type "/" subtype ] *( ";" parameter ).
A compliant inline pattern in lib/helpers/fromDataURI.js:
const DATA_URL_PATTERN = /^([^,;]+\/[^,;]+)?((?:;[^,;=]+=[^,;]+)*)(;base64)?,([\s\S]*)$/;
Then in fromDataURI:
- group 1: media type (may be empty)
- group 2: parameter list (e.g.
;charset=US-ASCII)
- group 3:
;base64 marker (truthy when present)
- group 4: body
For the Blob path, the type passed to new Blob should be mediaType + parameters so charset survives.
Test coverage to add
Port the test cases from #7295 into tests/unit/helpers/fromDataURI.test.js (vitest, not the legacy mocha file). The matrix worth keeping:
- Base64 and non-base64 body, both with and without media type
- Charset parameter present, both base64 and non-base64
- URL-encoded body values get decoded
- Blob output preserves the full content type including parameters
datax:,hi throws Unsupported protocol
data:hi throws Invalid URL
Related
Summary
lib/helpers/fromDataURI.jsuses aDATA_URL_PATTERNregex that doesn't match RFC 2397, so several valid data URIs throwInvalid URL. We should fix this inline rather than pulling in a runtime dependency.Failing cases
All three of these are RFC 2397 compliant and currently rejected:
data:;base64,MTIz— no media typedata:application/octet-stream,123— no parameters before the commadata:text/plain;charset=US-ASCII,123— non-base64 parameterThe current regex requires
[^;]+;for the first segment, so any URI without a;before the body fails to match.Why inline rather than a dependency
PR #7295 proposed swapping in
data-uri-to-buffer. The tradeoffs there: v6+ is ESM-only and bumpsengines.nodeto >=14, so we'd need to pin v3, which is unmaintained. For a fix of this size, a runtime dep isn't worth the supply-chain surface.Proposed regex
RFC 2397 grammar:
"data:" [ mediatype ] [ ";base64" ] "," datawheremediatype := [ type "/" subtype ] *( ";" parameter ).A compliant inline pattern in
lib/helpers/fromDataURI.js:Then in
fromDataURI:;charset=US-ASCII);base64marker (truthy when present)For the Blob path, the type passed to
new Blobshould bemediaType + parametersso charset survives.Test coverage to add
Port the test cases from #7295 into
tests/unit/helpers/fromDataURI.test.js(vitest, not the legacy mocha file). The matrix worth keeping:datax:,hithrowsUnsupported protocoldata:hithrowsInvalid URLRelated