Skip to content

Fix fromDataURI regex to match RFC 2397 (no runtime dep) #10808

@jasonsaayman

Description

@jasonsaayman

Summary

lib/helpers/fromDataURI.js uses a DATA_URL_PATTERN regex that doesn't match RFC 2397, so several valid data URIs throw Invalid URL. We should fix this inline rather than pulling in a runtime dependency.

Failing cases

All three of these are RFC 2397 compliant and currently rejected:

  • data:;base64,MTIz — no media type
  • data:application/octet-stream,123 — no parameters before the comma
  • data:text/plain;charset=US-ASCII,123 — non-base64 parameter

The current regex requires [^;]+; for the first segment, so any URI without a ; before the body fails to match.

Why inline rather than a dependency

PR #7295 proposed swapping in data-uri-to-buffer. The tradeoffs there: v6+ is ESM-only and bumps engines.node to >=14, so we'd need to pin v3, which is unmaintained. For a fix of this size, a runtime dep isn't worth the supply-chain surface.

Proposed regex

RFC 2397 grammar: "data:" [ mediatype ] [ ";base64" ] "," data where mediatype := [ type "/" subtype ] *( ";" parameter ).

A compliant inline pattern in lib/helpers/fromDataURI.js:

const DATA_URL_PATTERN = /^([^,;]+\/[^,;]+)?((?:;[^,;=]+=[^,;]+)*)(;base64)?,([\s\S]*)$/;

Then in fromDataURI:

  • group 1: media type (may be empty)
  • group 2: parameter list (e.g. ;charset=US-ASCII)
  • group 3: ;base64 marker (truthy when present)
  • group 4: body

For the Blob path, the type passed to new Blob should be mediaType + parameters so charset survives.

Test coverage to add

Port the test cases from #7295 into tests/unit/helpers/fromDataURI.test.js (vitest, not the legacy mocha file). The matrix worth keeping:

  • Base64 and non-base64 body, both with and without media type
  • Charset parameter present, both base64 and non-base64
  • URL-encoded body values get decoded
  • Blob output preserves the full content type including parameters
  • datax:,hi throws Unsupported protocol
  • data:hi throws Invalid URL

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions