Skip to content

Replace U+0000 with U+FFFD on input per CommonMark §2.3#263

Merged
frostming merged 1 commit into
frostming:masterfrom
gistrec:fix/replace-nul-with-fffd
May 28, 2026
Merged

Replace U+0000 with U+FFFD on input per CommonMark §2.3#263
frostming merged 1 commit into
frostming:masterfrom
gistrec:fix/replace-nul-with-fffd

Conversation

@gistrec

@gistrec gistrec commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

CommonMark §2.3 requires literal U+0000 characters to be replaced with U+FFFD.
Marko already does this for entity references like � and �, but literal NUL characters in the input are currently preserved in the rendered HTML.

Reproduction

import marko
print(repr(marko.convert("a\x00b")))
# current:  '<p>a\x00b</p>\n'
# expected: '<p>a�b</p>\n'

print(repr(marko.convert("`a\x00b`")))
# current: '<p><code>a\x00b</code></p>\n'

print(repr(marko.convert("&#0;")))
# already correct: '<p>�</p>\n'

The same issue also happens inside fenced code blocks.

Fix

Apply the \x00 to substitution in _preprocess_text() after line-ending normalization.
This makes all parser and renderer paths see the same sanitized input, and keeps the behavior consistent with the existing entity-reference handling.

Tests

Added regression tests for:

  • NUL in plain text
  • NUL inside a code span
  • NUL inside a fenced code block

All existing tests still pass.

CommonMark §2.3 ("Insecure characters") requires that literal U+0000
code points in source be replaced with U+FFFD. Marko's entity-reference
path already honours this for &#0; / &#x0;, but literal NULs in the
input were passed straight through to text, code spans, and code blocks.

Add the substitution to _preprocess_text so that all parser/renderer
paths see a sanitised buffer. Add regression tests covering NUL in
plain text, code spans, and fenced code blocks.
@frostming frostming merged commit 15b95ab into frostming:master May 28, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants