fix(markdown): decode HTML entities in code blocks and inline code by esengine · Pull Request #658 · esengine/DeepSeek-Reasonix

esengine · 2026-05-11T07:35:43Z

Summary

Reporter showed a JSON snippet rendered as { "apiKey": "..." } inside a code fence — the model emitted literal HTML entities instead of ". marked passes the entities through verbatim (its tokens carry raw text, not HTML-escaped output), and our renderer hands that to the terminal unchanged. Terminals don't render entities, so " / & / < leak as visible artifacts.

This is a known LLM artifact: models sometimes HTML-escape inside code blocks — especially on JSON / HTML / XML output — because their training data had plenty of HTML-encoded code in web posts and docs. Claude Code and Cursor both decode entities at the rendering boundary; doing the same here.

Scope

New src/cli/ui/html-entities.ts with decodeHtmlEntities() — handles the five common named entities (quot / apos / amp / lt / gt / nbsp) plus numeric forms (" / "). Unknown named entities pass through so prose that quotes entity names by name doesn't get corrupted. Fast-path early-return when no & is present so paragraph text pays nothing.
Decode applied at four sites: CodeBlock text and codespan text in both markdown.tsx and markdown-lines.ts.
Prose paragraph text is left alone — limiting scope to code keeps the edge case of "user genuinely wrote about &" working in prose.

Closes #657

Test plan

npm run verify — 2601 passed (added 9), 2 skipped
tests/html-entities.test.ts covers: no-& fast path, five named entities,   → NBSP, decimal + hex numeric (incl. emoji codepoint), unknown-name pass-through, case-insensitivity, malformed & fragments left alone, and the literal real-world JSON pattern from the issue
Manual: ask the model to emit a JSON code block and verify quotes render as " not "

The reporter showed a JSON snippet rendered as `{ "apiKey": "..." }` inside a code fence — the model emitted literal HTML entities instead of `"`. marked passes the entities through verbatim (its tokens carry the raw text, not HTML- escaped output), and our renderer rightly hands that to the terminal unchanged. Terminals don't render entities, so they leak as visible `"` / `&` / `<` etc. This is a known LLM artifact: models sometimes HTML-escape inside code blocks, especially on JSON / HTML / XML output, because their training saw a lot of HTML-encoded code in web posts and docs. Both Claude Code and Cursor decode entities at the rendering boundary; doing the same here. Scope: only code blocks and inline code spans (the contexts where models leak entities the most). Prose paragraphs are left alone — if someone genuinely writes "use the `&` entity to escape ampersand" in non-code text, the entity name stays visible. Numeric forms (`"` / `"`) and the five common named forms (quot / apos / amp / lt / gt / nbsp) decode; unknown names pass through so we don't corrupt prose that quotes entity names. Closes #657

…sengine#658) The reporter showed a JSON snippet rendered as `{ "apiKey": "..." }` inside a code fence — the model emitted literal HTML entities instead of `"`. marked passes the entities through verbatim (its tokens carry the raw text, not HTML- escaped output), and our renderer rightly hands that to the terminal unchanged. Terminals don't render entities, so they leak as visible `"` / `&` / `<` etc. This is a known LLM artifact: models sometimes HTML-escape inside code blocks, especially on JSON / HTML / XML output, because their training saw a lot of HTML-encoded code in web posts and docs. Both Claude Code and Cursor decode entities at the rendering boundary; doing the same here. Scope: only code blocks and inline code spans (the contexts where models leak entities the most). Prose paragraphs are left alone — if someone genuinely writes "use the `&` entity to escape ampersand" in non-code text, the entity name stays visible. Numeric forms (`&esengine#34;` / `"`) and the five common named forms (quot / apos / amp / lt / gt / nbsp) decode; unknown names pass through so we don't corrupt prose that quotes entity names. Closes esengine#657

esengine merged commit cf0e920 into main May 11, 2026
3 checks passed

esengine deleted the fix/issue-657-decode-html-entities branch May 11, 2026 07:38

esengine mentioned this pull request May 11, 2026

chore(release): 0.39.0 #659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(markdown): decode HTML entities in code blocks and inline code#658

fix(markdown): decode HTML entities in code blocks and inline code#658
esengine merged 1 commit into
mainfrom
fix/issue-657-decode-html-entities

esengine commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented May 11, 2026

Summary

Scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant