fix(markdown): stop WYSIWYG hard breaks rendering as literal "<br />"#13076
Conversation
The help/FAQ pages (e.g. /help/faq/editing) showed literal `<br />` text because Open Library has two markdown engines that disagree on hard breaks: * Display: OLMarkdown (vendored Python-Markdown) turns every newline into an injected `<br/>` and has no escape syntax for hard breaks. * Editing: the Tiptap WYSIWYG component (tiptap-markdown) serializes a hard break as CommonMark `\` + newline. When editor-saved content is displayed, OLMarkdown appends its `<br/>` right after the editor's trailing `\`; the backslash escapes the `<`, so readers see the literal text `<br />`. Every multi-line paragraph edited via the editor accumulated this (and stray backslashes — see the page history). Fixes both sides: * Editor: a new OLHardBreak node serializes hard breaks as a bare newline (OLMarkdown's dialect) instead of CommonMark `\`. Re-saving a page now also cleans up legacy backslash/`<br>` cruft. * Renderer: LineBreaksPreprocessor strips a lone trailing `\` before injecting `<br/>`, so already-corrupted pages display correctly without a re-save, and no longer glues a `<br/>` onto the line above a link-reference definition. Adds JS round-trip tests and Python render tests; both fail without the fix. Ref: internetarchive#13074
1fba18b to
692c77a
Compare
VerificationReproduced end-to-end on a local dev instance, plus before/after through the real renderer and red/green tests. ReproductionA Before / after — real
|
Closes #13074
Problem
The help/FAQ pages (e.g.
/help/faq/editing) render literal<br />as visible text. The root cause is that Open Library has two markdown engines that disagree on hard breaks:OLMarkdown(vendored Python-Markdown)<br/>; no escape syntaxtiptap-markdown/ markdown-it)\+ newlineWhen editor-saved content is displayed, OLMarkdown appends its
<br/>right after the editor's trailing\. The backslash escapes the<, so the tag renders as the literal text<br />. Verified against the real renderer:This affects every multi-line paragraph edited via the WYSIWYG editor, not just the orphaned footnote block on that FAQ page (the page history's "stray forward slashes removal" edit is the same backslashes).
Steps to reproduce
Via the WYSIWYG editor (the real-world path — any page that mounts
<ol-markdown-editor>: a/type/pagewiki/help page, or a work description):<br />.breaks: true, markdown-it turns every existing single-newline line break into a hard-break node, so a plain open→save corrupts the whole page at once.Renderer only (no editor needed) — save page source with a line ending in a single backslash:
→ view it and the literal
<br />appears.Fix (editor + renderer)
OLHardBreaknode (openlibrary/components/lit/hard-break.js, wired intoeditor-core.js) serializes hard breaks as a bare newline (OLMarkdown's dialect) instead of CommonMark\. Re-saving a page through the editor now also cleans up legacy backslash/<br>cruft.LineBreaksPreprocessor(olmarkdown.py) normalizes away a lone trailing CommonMark hard-break\, so already-corrupted pages display correctly without needing a re-save (no leaked<br />, no stray\). It also no longer glues a<br/>onto the line directly above a link-reference definition.@tiptap/extension-hard-breakas an explicit dependency (was a transitive/phantom dep viastarter-kit).Tests
tests/unit/js/OLMarkdownEditor.test.js): hard breaks serialize without backslashes/<br>.openlibrary/tests/core/test_olmarkdown.py): the hard-break and reference-block cases.Notes
markdown-it-pyserver-side, to match the editor) would remove the dialect mismatch entirely — worth a separate issue.Testing
npm run test:js— 88 passed (OLMarkdownEditor suite)pytest openlibrary/tests/core/test_olmarkdown.py— 5 passedpre-commit(ruff / mypy / eslint / POT) clean<br />→ clean line breaks)