Math: Decode HTML entities in latex attribute on load#77789
Conversation
|
Size Change: +34 B (0%) Total Size: 8.18 MB 📦 View Changed
ℹ️ View Unchanged
|
|
Flaky tests detected in 33d1e6b. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/25101069571
|
There was a problem hiding this comment.
Pull request overview
This PR fixes entity-encoded LaTeX round-tripping in the Math block for users without unfiltered_html by decoding HTML entities on editor load before rendering to MathML, and normalizing the in-editor latex attribute to the decoded form. It also adds a Jest suite to cover the new behavior and the “self-heal” re-rendering of stored mathML.
Changes:
- Decode HTML entities in the Math block’s
latexattribute before passing it to@wordpress/latex-to-mathmlon mount. - Normalize the block’s
latexattribute to the decoded form on load (without making the change persistent/undoable). - Add Jest tests validating decoding/normalization and MathML recomputation behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/block-library/src/math/edit.js | Decodes entity-encoded LaTeX before initial MathML generation and optionally normalizes latex on load. |
| packages/block-library/src/math/test/edit.js | Adds unit tests for decoding/normalization and recomputation/self-heal behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| test( 'normalizes the latex attribute when entities are present', async () => { | ||
| const setAttributes = jest.fn(); | ||
| render( | ||
| <MathEdit | ||
| attributes={ { | ||
| latex: 'a & b', | ||
| mathML: '', | ||
| } } | ||
| setAttributes={ setAttributes } | ||
| isSelected={ false } | ||
| /> | ||
| ); | ||
|
|
||
| await waitFor( () => { | ||
| expect( setAttributes ).toHaveBeenCalledWith( | ||
| expect.objectContaining( { latex: 'a & b' } ) | ||
| ); | ||
| } ); | ||
| } ); |
There was a problem hiding this comment.
Given wp_kses can double-encode previously encoded entities (e.g. & → &)
I don't think it does do this.
npx wp-env run cli --env-cwd=/var/www/html wp eval 'echo wp_kses_post( wp_kses_post("&") );'
yields &
| if ( initialLatex.current ) { | ||
| __unstableMarkNextChangeAsNotPersistent(); | ||
| setAttributes( { | ||
| mathML: module.default( initialLatex.current, { | ||
| // `wp_kses` runs on block attributes for users without | ||
| // `unfiltered_html`, encoding `&` to `&` and similar. | ||
| // LaTeX uses `&` (e.g. as a column separator in `pmatrix`), | ||
| // so decode entities before rendering. |
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
For users without `unfiltered_html`, `wp_kses` runs over block attribute values on save and encodes `&` to `&`. LaTeX uses `&` as a column separator (e.g. in `pmatrix`), so on reload the editor's `useEffect` was handing temml a corrupted source string and overwriting the correct `mathML` in state. The next save (autosave, preview, or manual) persisted the broken markup, breaking both the editor preview and the front-end render. Decode entities before passing the latex to temml, and write the decoded value back to the attribute when it differs. Fixes #77787
The mount-time effect captures `initialLatex` at render and then reads it inside the dynamic-import `.then()` callback. The user can edit the textarea before the import resolves, in which case the normalization would overwrite their edit with the stale captured value. Replace the `initialLatex` ref with a `latestLatex` ref that is reassigned on every render, so the post-resolution callback works with whatever value is current. Drop the `latex` write entirely when the decoded value matches, which also covers the case where the captured value no longer applies. Tighten the test suite alongside: assert `__unstableMarkNextChangeAsNotPersistent` is called when the attribute is normalized, exercise the rerender-before-resolve race, and assert full `setAttributes` payloads.
The `react-hooks/immutability` (and `react-hooks/refs`) lint rules reject ref mutation during render. Rename to `latestLatexRef` and move the assignment into a dependency-less `useEffect` that runs after every commit. The dynamic-import callback still reads the freshest value because the import promise resolves on a later microtask than the effect that syncs the ref.
What?
Closes #77787
Decodes HTML entities in the Math block's
latexattribute before passing it to the LaTeX-to-MathML renderer, and normalizes the stored attribute back to its decoded form on load.Why?
For users without the
unfiltered_htmlcapability (e.g. Contributors), WordPress runswp_ksesover each parsed block attribute on save (filter_block_kses_valueinwp-includes/blocks.php). That call normalizes entities, turning&into&. LaTeX uses&as a column separator in environments likepmatrix, so a perfectly valid input like:…is reloaded as
\begin{pmatrix} a & b \\ c & d \end{pmatrix}. The block's mount-timeuseEffectwas feeding that corrupted source straight to temml, which split each&as a separator and leftamp;as literal text in the cells. The recomputed (broken)mathMLwas then written into editor state and persisted on the next save (autosave, "Preview in new tab", or manual save), so the bug propagated to the front-end as well as the editor preview.How?
In
packages/block-library/src/math/edit.js, theuseEffectthat re-derivesmathMLon mount now:decodeEntities()(from@wordpress/html-entities) on the latex before passing it to temml.latexattribute when it differs, so the in-editor representation matches what the user originally typed.The
__unstableMarkNextChangeAsNotPersistent()call is preserved, so this normalization doesn't dirty the post or land in the undo stack.Because
@wordpress/latex-to-mathmlis loaded via a dynamicimport(), there is a window between mount and resolution during which the user can already be typing in the textarea. The effect reads the latex through alatestLatexref that is refreshed on every render, so the post-resolution callback works with the current value rather than the one captured at mount — a user edit made during that window is not clobbered by the normalization.The fix is one-sided: the on-disk format is unchanged (kses re-encodes on the next save), so existing posts continue to round-trip cleanly. Posts that were saved with a corrupted
mathMLbody during the bug self-heal on the next editor load — the effect overwrites the stalemathMLwith a freshly rendered one.A small Jest suite under
packages/block-library/src/math/test/edit.jscovers:latexdecoded before being passed to the renderer,latexattribute normalized when entities are present (and__unstableMarkNextChangeAsNotPersistentcalled),mathMLrecomputed and replacing any previously stored value (self-heal),Testing Instructions
unfiltered_html, e.g. a Contributor.\begin{pmatrix} a & b \\ c & d \end{pmatrix}.(a b / c d)— noamp;text in any cell.unfiltered_html) and confirm there is no behavior change.Testing Instructions for Keyboard
This change has no UI surface; existing block keyboard interaction is unchanged.
Screenshots or screencast
amp;literals after reload as Contributora,b,c,dUse of AI Tools
This PR was authored with the help of Claude Code (Anthropic). The model investigated the bug, identified
filter_block_kses_valueas the source of the entity encoding, drafted the fix and the unit tests, and wrote the commit messages and this PR description. All output was reviewed and verified end-to-end (manual editor + front-end test as a Contributor, full unit + fixture test runs) before commit.