-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Summary
When a Markdown file contains an HTML comment with a malformed closer (-- > instead of -->), mkdocs build can consume unbounded memory and get killed by the OOM killer. The parser appears to treat the remainder of the file (and potentially subsequent files) as one giant open comment, leading to catastrophic backtracking / token growth.
Environment
- MkDocs: 1.6.1
- Python: 3.11.2
- OS: Ubuntu 22.04 (repro’d locally and in GitHub Actions
ubuntu-latest) - Theme: occurs even with built-in
mkdocstheme (no plugins, no extensions) - Plugins: none (explicitly
plugins: []) - Markdown extensions: none (explicitly
markdown_extensions: [])
Also reproduced with Material + typical extensions, but root cause reproduces without any theme/plugins.
Minimal Reproducer
Create these two files:
mkdocs.yml
site_name: Repro
docs_dir: docs
theme:
name: mkdocs
plugins: []
markdown_extensions: []
nav:
- Home: index.mddocs/index.md
# Hello
<!-- This malformed comment never closes correctly -- >
Some content that will never be "seen" by the parser properly.Run:
mkdocs build -vExpected
- Build finishes, or at worst emits a validation error about a malformed HTML comment and continues parsing safely as normal text.
Actual
- Memory usage grows rapidly and the process gets killed by the OOM killer locally (
killed), and shows as “The operation was canceled” in GitHub Actions after the runner terminates the job. No Python traceback is printed.
Observed Behavior Notes
- The issue reproduces with the bare parser configuration (no plugins/extensions), which points to Markdown/HTML handling in the core stack (Python-Markdown’s HTML block/comment parsing).
- With larger doc trees, a single malformed closer in one file can tank the entire build.
- Fixing the closer from
-- >→-->immediately resolves the OOM.
Why this likely happens (hypothesis)
- The HTML comment tokenizer/regex expects
-->and fails to find a terminator when it encounters-- >. The parser then treats the remainder as part of the same comment block, effectively creating an enormous single token/buffer. That can trigger catastrophic backtracking or simply allocate until OOM.
Impact
- Reliability: a single typo in a doc can take down CI and local builds.
- Security/DoS: a crafted Markdown line containing
<!-- … -- >could be used to cause a denial-of-service on documentation pipelines.
Workarounds
-
Search & replace malformed closers in docs:
grep -RIn "\-\- >" docs # find offenders # replace `-- >` with `-->`
-
As a preventive CI check:
grep -RIn "\-\- >" docs && { echo "Malformed HTML comment found"; exit 1; }
Proposed Fix (ideas)
-
In the HTML comment parser:
- Fail closed safely: If a proper
-->terminator isn’t found on a bounded lookahead, treat the sequence as plain text rather than an open comment block. - Tolerant close: Optionally accept
--\s*>as a terminator (normalize/trimming spaces) to be robust to the specific typo. - Guardrails: Impose a maximum comment block length and abort-to-text mode once exceeded, preventing unbounded accumulation/backtracking.
- Fail closed safely: If a proper
-
Add a regression test with the minimal repro above to ensure parsing does not OOM and produces deterministic output (ideally with a warning).
Versions Tested
- MkDocs 1.6.1