Skip to content

fix(markdown): stop WYSIWYG hard breaks rendering as literal "<br />"#13076

Merged
mekarpeles merged 1 commit into
internetarchive:masterfrom
lokesh:13074/fix/wysiwyg-hardbreak-br-leak
Jul 2, 2026
Merged

fix(markdown): stop WYSIWYG hard breaks rendering as literal "<br />"#13076
mekarpeles merged 1 commit into
internetarchive:masterfrom
lokesh:13074/fix/wysiwyg-hardbreak-br-leak

Conversation

@lokesh

@lokesh lokesh commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Closes #13074

Problem

The help/FAQ pages (e.g. /help/faq/editing) render literal <br /> as visible text. The root cause is that Open Library has two markdown engines that disagree on hard breaks:

Engine Hard break
Display OLMarkdown (vendored Python-Markdown) every newline → injected <br/>; no escape syntax
Editing Tiptap WYSIWYG (tiptap-markdown / markdown-it) serializes a hard break as CommonMark \ + newline

When editor-saved content is displayed, OLMarkdown appends its <br/> right after the editor's trailing \. The backslash escapes the <, so the tag renders as the literal text <br />. Verified against the real renderer:

editor saves:  'first line\\\nsecond line'
OLMarkdown  →  'first line&lt;br /&gt;\n   second line'   # literal <br /> shown to the reader

This affects every multi-line paragraph edited via the WYSIWYG editor, not just the orphaned footnote block on that FAQ page (the page history's "stray forward slashes removal" edit is the same backslashes).

Steps to reproduce

Via the WYSIWYG editor (the real-world path — any page that mounts <ol-markdown-editor>: a /type/page wiki/help page, or a work description):

  1. Open the editor and type a line, press Shift+Enter (a hard break — plain Enter makes a new paragraph, which is fine), type a second line.
  2. Save, then view the rendered page → the line break shows as the literal text <br />.
    • Opening an existing multi-line page in the editor and simply re-saving also reproduces it: with breaks: true, markdown-it turns every existing single-newline line break into a hard-break node, so a plain open→save corrupts the whole page at once.

Renderer only (no editor needed) — save page source with a line ending in a single backslash:

first line\
second line

→ view it and the literal <br /> appears.

Fix (editor + renderer)

  • Editor — new OLHardBreak node (openlibrary/components/lit/hard-break.js, wired into editor-core.js) serializes hard breaks as a bare newline (OLMarkdown's dialect) instead of CommonMark \. Re-saving a page through the editor now also cleans up legacy backslash/<br> cruft.
  • RendererLineBreaksPreprocessor (olmarkdown.py) normalizes away a lone trailing CommonMark hard-break \, so already-corrupted pages display correctly without needing a re-save (no leaked <br />, no stray \). It also no longer glues a <br/> onto the line directly above a link-reference definition.
  • Added @tiptap/extension-hard-break as an explicit dependency (was a transitive/phantom dep via starter-kit).

Tests

  • JS round-trip tests (tests/unit/js/OLMarkdownEditor.test.js): hard breaks serialize without backslashes/<br>.
  • Python render tests (openlibrary/tests/core/test_olmarkdown.py): the hard-break and reference-block cases.
  • Both new test sets were confirmed to fail without the fix (see the verification comment for before/after rendered output and red/green results).

Notes

  • The orphaned footnote block on the FAQ page is still worth deleting from the wiki source; this change makes the renderer degrade gracefully instead of emitting literal markup.
  • Longer term, converging on a single markdown engine (e.g. markdown-it-py server-side, to match the editor) would remove the dialect mismatch entirely — worth a separate issue.

Testing

  • npm run test:js — 88 passed (OLMarkdownEditor suite)
  • pytest openlibrary/tests/core/test_olmarkdown.py — 5 passed
  • pre-commit (ruff / mypy / eslint / POT) clean
  • Manual before/after on a local dev instance (literal <br /> → clean line breaks)

The help/FAQ pages (e.g. /help/faq/editing) showed literal `<br />` text
because Open Library has two markdown engines that disagree on hard breaks:

* Display: OLMarkdown (vendored Python-Markdown) turns every newline into an
  injected `<br/>` and has no escape syntax for hard breaks.
* Editing: the Tiptap WYSIWYG component (tiptap-markdown) serializes a hard
  break as CommonMark `\` + newline.

When editor-saved content is displayed, OLMarkdown appends its `<br/>` right
after the editor's trailing `\`; the backslash escapes the `<`, so readers see
the literal text `<br />`. Every multi-line paragraph edited via the editor
accumulated this (and stray backslashes — see the page history).

Fixes both sides:

* Editor: a new OLHardBreak node serializes hard breaks as a bare newline
  (OLMarkdown's dialect) instead of CommonMark `\`. Re-saving a page now also
  cleans up legacy backslash/`<br>` cruft.
* Renderer: LineBreaksPreprocessor strips a lone trailing `\` before injecting
  `<br/>`, so already-corrupted pages display correctly without a re-save, and
  no longer glues a `<br/>` onto the line above a link-reference definition.

Adds JS round-trip tests and Python render tests; both fail without the fix.

Ref: internetarchive#13074
@lokesh lokesh force-pushed the 13074/fix/wysiwyg-hardbreak-br-leak branch from 1fba18b to 692c77a Compare June 30, 2026 22:30
@lokesh

lokesh commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

Verification

Reproduced end-to-end on a local dev instance, plus before/after through the real renderer and red/green tests.

Reproduction

A /type/page whose body contains WYSIWYG-style hard breaks (\ + newline, exactly what tiptap-markdown serializes):

First line of a paragraph\
Second line after a hard break\
Third line after another hard break.

Another paragraph that simply
wraps across two source lines.

Before / after — real OLMarkdown output on the running site

Before (master): the trailing \ escapes the injected <br/>, so readers see literal text:

<p>First line of a paragraph&lt;br /&gt;
   Second line after a hard break&lt;br /&gt;
   Third line after another hard break.
</p>

After (this PR): clean line breaks, no literal markup:

<p>First line of a paragraph<br/>
   Second line after a hard break<br/>
   Third line after another hard break.
</p>

(Rendered before/after screenshots from the running page are attached below.)

Tests fail on master, pass on this PR

Python — the new test_olmarkdown.py cases against the unmodified renderer:

master:  AssertionError: assert '&lt;br' not in '<p>first li...d line</p>'   →  2 failed, 3 passed
this PR:                                                                          5 passed

JS — removing OLHardBreak from the editor config (i.e. master's behavior):

without fix:  2 failed, 86 passed   (hard breaks serialize as CommonMark "\")
this PR:      88 passed

No regression

Escaped backslashes (\\), paragraph breaks (\n\n), indented code, and link-reference definitions render byte-identically before and after — only the spurious hard-break \ / leaked <br /> changes.

@lokesh lokesh requested a review from Sadashii June 30, 2026 22:51
@mekarpeles mekarpeles self-assigned this Jul 2, 2026
@mekarpeles mekarpeles merged commit 29b531f into internetarchive:master Jul 2, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Help page /help/faq/editing renders orphaned markdown reference-link definitions as visible junk (literal <br />, escaped chars)

2 participants