Skip to content

Fix parsing larger HTML blocks in MDX files#1924

Merged
mre merged 1 commit into
masterfrom
fix-1923
Nov 18, 2025
Merged

Fix parsing larger HTML blocks in MDX files#1924
mre merged 1 commit into
masterfrom
fix-1923

Conversation

@mre

@mre mre commented Nov 18, 2025

Copy link
Copy Markdown
Member

Previously, each HTML line inside a Markdown file would be parsed independently. This worked reasonably well, but in the case where individual lines are not valid HTML, we would skip link checking.

The fix is to accumulate an HTML block until it gets closed and only then parsing it.

Fixes #1923

@mre

mre commented Nov 18, 2025

Copy link
Copy Markdown
Member Author

@katrinafyi, @thomas-zahner FYI

@katrinafyi

Copy link
Copy Markdown
Member

Very quick! I'm wondering whether this affects parsing markdown inside HTML, which should work according to the spec? I think this should already be handled by pulldown-cmark, but maybe add a test case with a markdown link inside a html block?

Previously, each HTML line inside a Markdown file would be parsed
independently. This worked reasonably well, but in the case where
individual lines are not valid HTML, we would skip link checking.

The fix is to accumulate an HTML block until it gets closed and
only _then_ parsing it.

Fixes #1923
@mre

mre commented Nov 18, 2025

Copy link
Copy Markdown
Member Author

Your concern was valid to check, but our implementation handles it correctly. 👍
I've added the test.

Some notes for future reference:

  • pulldown_cmark treats block-level HTML tags (<div>, </div>) as separate HTML blocks
  • Markdown content between HTML blocks is parsed as normal Markdown paragraphs
  • We only accumulate HTML chunks within a single HTML block (between Start(HtmlBlock) and End(HtmlBlock))
  • Markdown events like Start(Link) are processed normally and separately

@mre

mre commented Nov 18, 2025

Copy link
Copy Markdown
Member Author

Huh, looks like the tests failed, but for an unrelated issue. We should move all tests to "example URLs" at some point and not depend on external servers. 😖

@mre

mre commented Nov 18, 2025

Copy link
Copy Markdown
Member Author

Okay, the test fixed itself. My favorite kind of fix. Merging.

@mre mre merged commit 0ba0195 into master Nov 18, 2025
11 of 12 checks passed
@mre mre deleted the fix-1923 branch November 18, 2025 16:19
@mre mre mentioned this pull request Nov 17, 2025
@thomas-zahner

Copy link
Copy Markdown
Member

Awesome I see no problems. That was really quick 🚀

@mre mre mentioned this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Certain links in mdx files are skipped

3 participants