fix(desktop): integrate KaTeX normalisation into math pre-pass and tighten classifier / 整合 KaTeX 标准化到数学预处理并收紧分类器#3376
Conversation
…parser
Replace the hand-written remarkLatexDelimiters AST walker with the mature
remark-math + rehype-katex pipeline, keeping only thin pre-processing:
- normalizeMath: convert \(…\)/\[…\] → $…$/$$…$$ (LLM-native delimiters)
- isLikelyInlineMath classifier: skip non-math $…$ pairs (currency, env vars)
- latexNormalizeForKatex: KaTeX-specific escaping (| → \vert, \text{} chars)
- $$…$$ processed before $…$ to avoid cross-matching
Removes ~300 lines of custom parser code (remarkLatexDelimiters, splitText,
CodeOrMath, CodeBlockOrMath, BlockMath, InlineMath) in favor of remark-math.
… tighten classifier
- Extract normalizeMath from Markdown.tsx into mathNormalize.ts and
run latexNormalizeForKatex on all recognised math sources so KaTeX
text-mode escapes (\textdollar{}, \#, etc.) and |→\vert conversion
apply automatically.
- Add TEXT_MODE_PAIR step that matches $\cmd{...}extra$ as a whole
(e.g. $\text{cost is $5} + x^2$) so inner $ is escaped to
\textdollar{} instead of splitting the inline-math boundary.
- Tighten isLikelyInlineMath single-letter check from /[A-Za-z]/ to
/[a-z]/ to avoid false positives on $I$, $A$, $V$ (Roman
numerals / acronyms).
- Update golden tests: add \| line-break + pipe coverage, TEXT_MODE_PAIR
trailing-content cases, and uppercase single-letter classifier checks.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82e3db30c9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Thanks for this — the pipeline logic is sound and the Must-fix
Heads-up: the green Nice-to-have: the Happy to re-review once (1) and (2) are addressed. |
|
Addressed the must-fix items in cd128a5. Changes:
Verified:
|
|
Added follow-up fix in 255094a to keep the math pre-pass out of Markdown code regions. Changes:
Verified:
Note: plain |
Summary
Builds on the remark-math refactor (3257775) by integrating
latexNormalizeForKatexinto thenormalizeMathpre-pass so that KaTeX-specific normalisations (text-mode escapes,|→\vert) apply to all recognised math sources automatically.Changes
Extract
normalizeMathtomathNormalize.tsand calllatexNormalizeForKatexon display math (Step 3), text-mode pairs (Step 4), and inline math (Step 5). PreviouslylatexNormalizeForKatexwas exported but never called from the production render path.Add
TEXT_MODE_PAIRstep (Step 4) that matches$\cmd{...}extra$as a whole — e.g.$\text{cost is $5} + x^2$— so that a stray$inside\text{}is escaped to\textdollar{}rather than splitting the inline-math boundary. The[^$]*?tail after the closing brace also captures trailing content so$\text{a} + x^2$is handled correctly.Tighten single-letter classifier in
mathClassify.tsfrom/^[A-Za-z]$/to/^[a-z]$/to avoid false positives on$I$,$A$,$V$(Roman numerals, acronyms).Remove dead
\|protection fromlatexNormalize.ts— thesource[i-1] === "\\"check could never trigger for the intended\|case (already consumed byreadCommandin the\branch), but incorrectly prevented|→\vertconversion after\\(LaTeX line break).Test coverage
\\|line-break + pipe conversion, TEXT_MODE_PAIR with trailing content, uppercase single-letter classifier rejection.Verification