Replace U+0000 with U+FFFD on input per CommonMark §2.3 by gistrec · Pull Request #263 · frostming/marko

gistrec · 2026-05-09T17:15:03Z

Summary

CommonMark §2.3 requires literal U+0000 characters to be replaced with U+FFFD.
Marko already does this for entity references like  and , but literal NUL characters in the input are currently preserved in the rendered HTML.

Reproduction

import marko
print(repr(marko.convert("a\x00b")))
# current:  '<p>a\x00b</p>\n'
# expected: '<p>a�b</p>\n'

print(repr(marko.convert("`a\x00b`")))
# current: '<p><code>a\x00b</code></p>\n'

print(repr(marko.convert("&#0;")))
# already correct: '<p>�</p>\n'

The same issue also happens inside fenced code blocks.

Fix

Apply the \x00 to � substitution in _preprocess_text() after line-ending normalization.
This makes all parser and renderer paths see the same sanitized input, and keeps the behavior consistent with the existing entity-reference handling.

Tests

Added regression tests for:

NUL in plain text
NUL inside a code span
NUL inside a fenced code block

All existing tests still pass.

CommonMark §2.3 ("Insecure characters") requires that literal U+0000 code points in source be replaced with U+FFFD. Marko's entity-reference path already honours this for  / , but literal NULs in the input were passed straight through to text, code spans, and code blocks. Add the substitution to _preprocess_text so that all parser/renderer paths see a sanitised buffer. Add regression tests covering NUL in plain text, code spans, and fenced code blocks.

frostming approved these changes May 28, 2026

View reviewed changes

frostming merged commit 15b95ab into frostming:master May 28, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace U+0000 with U+FFFD on input per CommonMark §2.3#263

Replace U+0000 with U+FFFD on input per CommonMark §2.3#263
frostming merged 1 commit into
frostming:masterfrom
gistrec:fix/replace-nul-with-fffd

gistrec commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gistrec commented May 9, 2026

Summary

Reproduction

Fix

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants