Skip to content

fix(desktop): integrate KaTeX normalisation into math pre-pass and tighten classifier / 整合 KaTeX 标准化到数学预处理并收紧分类器#3376

Merged
esengine merged 9 commits into
main-v2from
merge-pr-3287-markdown-fix
Jun 7, 2026
Merged

fix(desktop): integrate KaTeX normalisation into math pre-pass and tighten classifier / 整合 KaTeX 标准化到数学预处理并收紧分类器#3376
esengine merged 9 commits into
main-v2from
merge-pr-3287-markdown-fix

Conversation

@SivanCola

Copy link
Copy Markdown
Collaborator

Summary

Builds on the remark-math refactor (3257775) by integrating latexNormalizeForKatex into the normalizeMath pre-pass so that KaTeX-specific normalisations (text-mode escapes, |\vert) apply to all recognised math sources automatically.

Changes

  • Extract normalizeMath to mathNormalize.ts and call latexNormalizeForKatex on display math (Step 3), text-mode pairs (Step 4), and inline math (Step 5). Previously latexNormalizeForKatex was exported but never called from the production render path.

  • Add TEXT_MODE_PAIR step (Step 4) that matches $\cmd{...}extra$ as a whole — e.g. $\text{cost is $5} + x^2$ — so that a stray $ inside \text{} is escaped to \textdollar{} rather than splitting the inline-math boundary. The [^$]*? tail after the closing brace also captures trailing content so $\text{a} + x^2$ is handled correctly.

  • Tighten single-letter classifier in mathClassify.ts from /^[A-Za-z]$/ to /^[a-z]$/ to avoid false positives on $I$, $A$, $V$ (Roman numerals, acronyms).

  • Remove dead \| protection from latexNormalize.ts — the source[i-1] === "\\" check could never trigger for the intended \| case (already consumed by readCommand in the \ branch), but incorrectly prevented |\vert conversion after \\ (LaTeX line break).

Test coverage

  • 79 golden tests pass (up from 75).
  • New: \\| line-break + pipe conversion, TEXT_MODE_PAIR with trailing content, uppercase single-letter classifier rejection.

Verification

cd desktop/frontend && npx tsx src/__tests__/math-golden.test.ts
# 79 passed, 0 failed

SivanCola added 6 commits June 5, 2026 19:22
…parser

Replace the hand-written remarkLatexDelimiters AST walker with the mature
remark-math + rehype-katex pipeline, keeping only thin pre-processing:

- normalizeMath: convert \(…\)/\[…\] → $…$/$$…$$ (LLM-native delimiters)
- isLikelyInlineMath classifier: skip non-math $…$ pairs (currency, env vars)
- latexNormalizeForKatex: KaTeX-specific escaping (| → \vert, \text{} chars)
- $$…$$ processed before $…$ to avoid cross-matching

Removes ~300 lines of custom parser code (remarkLatexDelimiters, splitText,
CodeOrMath, CodeBlockOrMath, BlockMath, InlineMath) in favor of remark-math.
… tighten classifier

- Extract normalizeMath from Markdown.tsx into mathNormalize.ts and
  run latexNormalizeForKatex on all recognised math sources so KaTeX
  text-mode escapes (\textdollar{}, \#, etc.) and |→\vert conversion
  apply automatically.
- Add TEXT_MODE_PAIR step that matches $\cmd{...}extra$ as a whole
  (e.g. $\text{cost is $5} + x^2$) so inner $ is escaped to
  \textdollar{} instead of splitting the inline-math boundary.
- Tighten isLikelyInlineMath single-letter check from /[A-Za-z]/ to
  /[a-z]/ to avoid false positives on $I$, $A$, $V$ (Roman
  numerals / acronyms).
- Update golden tests: add \| line-break + pipe coverage, TEXT_MODE_PAIR
  trailing-content cases, and uppercase single-letter classifier checks.
@SivanCola SivanCola requested a review from esengine as a code owner June 6, 2026 16:55
@github-actions github-actions Bot added desktop Wails desktop app (desktop/**) v2 Go rewrite (1.x) — main-v2 branch, active development labels Jun 6, 2026
@SivanCola

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82e3db30c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread desktop/frontend/pnpm-lock.yaml
@esengine

esengine commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Thanks for this — the pipeline logic is sound and the \| dead-code removal is a real fix, but a careful review turned up a user-visible regression that blocks merge:

Must-fix

  1. Fullwidth (U+FF04) leaks into rendered prose. Classifier-rejected $…$ pairs are wrapped in fullwidth (mathNormalize.ts) so remark-math skips them, but nothing converts them back afterward. So costs $5$ today renders as costs $5$ today, and $I$ think$I$ think — literally different wide glyphs shown to the user. Any balanced single-$ pair the classifier rejects becomes mojibake-looking dollar signs. Please revert these to normal $ after remark-math runs (or otherwise ensure the user never sees U+FF04). A multi-$ prose test like it costs $5 and $10 total would pin this.

  2. ~66 lines of dead CSS. .inline-math, .inline-math__source, .block-math, .block-math__copy, .block-math-fallback in styles.css have no .tsx references in this PR (orphaned from a larger feature). Please drop them here.

Heads-up: the green desktop CI job stubs the embedded frontend and does not run the tsx golden suite, so the passing checks don't validate this PR's logic — worth keeping in mind.

Nice-to-have: the DM/IM/LB sentinels in mathNormalize.ts use NUL bytes, which makes git treat the file as binary (undiffable in review). A non-NUL marker would keep it diffable.

Happy to re-review once (1) and (2) are addressed.

@SivanCola

Copy link
Copy Markdown
Collaborator Author

Addressed the must-fix items in cd128a5.

Changes:

  • Non-math dollar pairs now use Markdown dollar entities ($) instead of fullwidth U+FF04, so remark-math skips them while rendered prose still shows normal $.
  • Added golden coverage for $I$ think and it costs $5 and $10 total.
  • Removed the orphaned .inline-math*, .block-math*, and .block-math-fallback CSS.
  • Replaced the NUL-byte sentinels in mathNormalize.ts with ASCII markers so the file is no longer binary.

Verified:

  • corepack pnpm@10.15.0 test:all (81 passed)
  • ReactMarkdown render smoke: currency/prose dollars render as normal $ with no KaTeX; real $x^2$ still renders through KaTeX.

@SivanCola

Copy link
Copy Markdown
Collaborator Author

Added follow-up fix in 255094a to keep the math pre-pass out of Markdown code regions.

Changes:

  • Protect fenced code blocks and inline code spans before normalizing math, then restore them as literal text.
  • Added golden coverage for $PATH$ in inline code, multiple code spans, and fenced shell blocks.

Verified:

  • corepack pnpm@10.15.0 test:all (85 passed)
  • git diff --check

Note: plain pnpm typecheck / pnpm build in this local worktree still require generated Wails bindings, so the focused math suite is the relevant verification for this fix.

@esengine esengine merged commit 105ec7e into main-v2 Jun 7, 2026
16 of 17 checks passed
@esengine esengine deleted the merge-pr-3287-markdown-fix branch June 7, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

desktop Wails desktop app (desktop/**) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants