fix(cli): render LaTeX-style output as Unicode in the TUI#25802
fix(cli): render LaTeX-style output as Unicode in the TUI#25802scidomino merged 3 commits intogoogle-gemini:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a renderer-side post-processor to improve the readability of model responses in the TUI. By converting LaTeX-style math and formatting tokens into standard Unicode, it prevents broken-looking raw backslash sequences from appearing in the terminal. The implementation is designed to be deterministic and safe, ensuring that technical content like file paths and code snippets remains preserved while math-heavy output is rendered cleanly. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a utility to convert LaTeX-style syntax into terminal-friendly Unicode, addressing issues where raw LaTeX commands appeared broken in the CLI output. The implementation includes support for Greek letters, mathematical symbols, sub/superscripts, and basic formatting, with integration into the markdown parser that protects code spans and URLs. Feedback was provided regarding the convertLineBreaks function, which could inadvertently mangle Windows network paths or escaped backslashes in regular prose; it is recommended to restrict this specific conversion to math-mode environments.
| out = convertTextFormatting(out); | ||
| out = convertFractionsAndRoots(out); | ||
| out = convertEscapedSpecials(out); | ||
| out = convertLineBreaks(out); |
There was a problem hiding this comment.
The convertLineBreaks function aggressively converts \\ to a newline character. While this is standard in LaTeX math environments, it is highly problematic in arbitrary prose for a developer-focused CLI. Specifically, it will mangle Windows network paths (e.g., \\server\share) and escaped backslashes in code-like text (e.g., C:\\Path) by turning them into newlines.
Since applyProseConversions is intended for text outside of explicit math delimiters, it is safer to remove this conversion here. Legitimate LaTeX line breaks should be wrapped in math delimiters (handled by applyMathModeConversions) or use standard Markdown newlines.
| out = convertLineBreaks(out); | |
| out = convertNamedCommands(out); |
|
Please sign the CLA and then I can review. Also take a look at the failing tests. |
jacob314
left a comment
There was a problem hiding this comment.
Summary
Great work! The PR implements a robust, well-documented, and conservative LaTeX to Unicode converter for the CLI frontend. The approach of intercepting and replacing these tokens before they hit the markdown tokenizer is smart, and the regexes correctly avoid over-matching things like Windows file paths and currency.
Findings
- Correctness: Excellent. The approach to masking inline code spans and URLs, combined with the conservative heuristic for when to strip
$, prevents regressions on non-math text. The test coverage is comprehensive. - Maintainability: The logic is cleanly separated into its own file (
latexToUnicode.ts) with distinct, well-documented transformation steps. Freezing the mapping dictionaries is a nice touch. - Efficiency: The short-circuit
if (input.indexOf('\\') === -1 && input.indexOf('$') === -1)correctly keeps the hot path fast for plain text.
Nitpicks
packages/cli/src/ui/utils/markdownParsingUtils.ts
In convertLatexPreservingSpans, the MASK_PATTERN replaces matched placeholders with their original strings. In the incredibly unlikely event that a user inputs the exact private use area sentinel \uE0000\uE000 and preserved is empty, preserved[0] will be undefined, and replace will output the literal string "undefined".
You can make this completely bulletproof with a fallback:
return converted.replace(
MASK_PATTERN,
(match, i: string) => preserved[Number(i)] ?? match,
);Conclusion
Approved. The implementation is solid, the logic is sound, and all tests pass beautifully. 🚀
- Remove convertLineBreaks from applyProseConversions so `\\` outside math delimiters is no longer rewritten to a newline. This protects Windows UNC paths (`\\server\share`) and escaped backslashes in code-like prose. LaTeX line breaks inside `$...$` / `$$...$$` are still handled by applyMathModeConversions. - Add `?? match` fallback to the MASK_PATTERN restore step so a stray PUA sentinel in user input can never cause "undefined" to leak into rendered output. - Add tests for Windows UNC paths and math-mode-only line breaks. Addresses review comments from @gemini-code-assist and @jacob314 on PR google-gemini#25802.
|
Thanks for the reviews @jacob314 and @gemini-code-assist! Pushed 2649238 addressing both pieces of feedback: 1. Good catch on the UNC-path hazard. 2. Applied verbatim. If the PUA sentinel ever appears in user input with a stale index, the All 76 tests pass locally ( The failing |
|
Thanks for the update, @dimssu! I appreciate you addressing the feedback regarding |
Model responses for math and CS content frequently include LaTeX tokens
like `$\{P_0, \dots, P_n\}$` or `$\to$`. Terminals cannot render LaTeX,
so these appeared as raw backslash sequences that looked broken.
Add a conservative post-processor (`convertLatexToUnicode`) and wire it
into `parseMarkdownToANSI` — the single choke point for all inline text
in the TUI (paragraphs, headers, list items, table cells). Inline code
spans and bare URLs are masked with a private-use-area sentinel during
the conversion pass so their contents remain verbatim.
What gets converted:
- `$...$` / `$$...$$` math delimiters: stripped only when the content
contains a LaTeX marker (`\\`, `_`, `^`) or is a single variable letter.
Currency like `$5.99` and shell-style `$USER $HOME` stay intact.
- Named commands -> Unicode: `\to`, `\rightarrow`, `\dots`, `\times`,
`\leq`, `\geq`, `\neq`, `\approx`, `\infty`, `\in`, `\forall`, `\exists`,
`\sum`, `\prod`, `\int`, `\partial`, `\nabla`, full Greek alphabet,
common operator names (`\log`, `\sin`, ...), arrows, set theory, logic.
- Escaped specials: `\{`, `\}`, `\_`, `\%`, `\&`, `\#`, `\$`, `\|`, `\ `.
- Text wrappers: `\textbf{x}` -> `**x**`, `\textit{x}` -> `*x*`,
`\text{x}`/`\mathrm{x}` -> `x`.
- `\frac{a}{b}` -> `(a)/(b)`, `\sqrt{x}` -> `√(x)`.
- Sub/superscripts (inside math only) where every char maps to Unicode:
`x_0` -> `x₀`, `E = mc^2` -> `E = mc²`, `x_{12}` -> `x₁₂`.
Unknown `\foo` sequences pass through untouched, so Windows paths, regex
escapes, and unrelated backslash content are preserved.
Fixes google-gemini#25656
- Remove convertLineBreaks from applyProseConversions so `\\` outside math delimiters is no longer rewritten to a newline. This protects Windows UNC paths (`\\server\share`) and escaped backslashes in code-like prose. LaTeX line breaks inside `$...$` / `$$...$$` are still handled by applyMathModeConversions. - Add `?? match` fallback to the MASK_PATTERN restore step so a stray PUA sentinel in user input can never cause "undefined" to leak into rendered output. - Add tests for Windows UNC paths and math-mode-only line breaks. Addresses review comments from @gemini-code-assist and @jacob314 on PR google-gemini#25802.
2649238 to
b922600
Compare
|
Quick courtesy ping in case this slipped off the radar. All checks are green, CLA's signed, and the review feedback's been addressed. Happy to address anything else if more eyes land on it. |
…ini#25802) Co-authored-by: cynthialong0-0 <82900738+cynthialong0-0@users.noreply.github.com>
…ini#25802) Co-authored-by: cynthialong0-0 <82900738+cynthialong0-0@users.noreply.github.com>
Summary
Model responses for math, CS and algorithms content frequently include LaTeX tokens like
$\{P_0, \dots, P_n\}$or$\to$. Terminals cannot natively render LaTeX, so these appeared as raw backslash sequences that made the output look broken (see screenshot in the linked issue).This PR adds a conservative renderer-side post-processor that converts the common LaTeX idioms to terminal-friendly Unicode/plain text. Unknown
\foosequences pass through untouched so Windows paths, regex escapes, and unrelated backslash content are preserved.Details
Where the fix lives. All inline text in the TUI flows through
parseMarkdownToANSIinpackages/cli/src/ui/utils/markdownParsingUtils.ts(called fromInlineMarkdownRendererandTableRenderer). Code fences go throughcolorizeCodeseparately. Applying the conversion at the top ofparseMarkdownToANSIcovers paragraphs, headers, list items, and table cells. Inline code spans (`...`) and bare URLs are masked with a Private-Use-Area sentinel during the conversion pass so their contents remain verbatim.What gets converted (new
latexToUnicode.ts):$...$/$$...$$math delimiters: stripped only when the content contains a LaTeX marker (\,_,^) or is a single variable letter. Currency like$5.99, two-dollar-amount prose likeFrom $5 to $10, and shell-style$USER $HOMEstay intact.\to,\rightarrow,\Rightarrow,\leftarrow,\mapsto,\dots/\ldots/\cdots,\times,\cdot,\pm,\leq/\le,\geq/\ge,\neq/\ne,\approx,\equiv,\sim,\cong,\propto,\infty,\in,\notin,\subset,\supset,\subseteq,\supseteq,\cup,\cap,\setminus,\emptyset,\forall,\exists,\neg,\land/\wedge,\lor/\vee,\oplus,\otimes,\implies,\iff,\sum,\prod,\int,\iint,\oint,\partial,\nabla,\aleph,\ell,\hbar, bracket commands, operator names (\log,\ln,\exp,\sin,\cos,\tan, trig/hyperbolic family,\max,\min,\sup,\inf,\lim,\gcd,\det,\dim,\arg,\mod).\alpha…\Omega) plus\var*variants.\{,\},\_,\%,\&,\#,\$,\|,\.\textbf{x}→**x**,\textit{x}/\emph{x}→*x*,\text{x}/\mathrm{x}/\mathbb{x}→x(fed back through the existing inline parser so bold/italic actually render in the TUI).\frac{a}{b}→(a)/(b),\sqrt{x}→√(x),\sqrt[3]{x}→3√(x).\\line break →\n.x_0→x₀,E = mc^2→E = mc²,x_{12}→x₁₂, with common letter sub/sup (x_n→xₙ,x^i→xⁱ). Outside math,_and^are left alone sofile_name/foo^barsurvive. If any character in an operand has no Unicode mapping, the whole operand is left as-is to avoid half-converted output.Before / after (the exact examples from #25656):
becomes
Why not a prompt-level fix? A renderer fix is deterministic, testable, and works against any model output that leaks LaTeX — not just current Gemini versions. Per the
help wantedlabel and @Adib234's comment on the issue asking for "a pull request that improves the rendering or escaping logic for these characters in the core package."Design notes, edge cases, and trade-offs were discussed up-front in my comment on the issue before coding.
Related Issues
Fixes #25656
How to Validate
npm install && npm run buildnpm run preflight— all tests, lint, typecheck, format pass locally.cd packages/cli && npx vitest run src/ui/utils/latexToUnicode.test.ts src/ui/utils/markdownParsingUtils.test.ts src/ui/utils/MarkdownDisplay.test.tsx src/ui/utils/TableRenderer.test.tsxnpm start, then prompt Gemini with a request that triggers LaTeX output (e.g. "Explain the halting problem with math notation"). Verify:$\to$or\alphastay verbatim (backticked).C:\Users\foo" is unchanged.Pre-Merge Checklist
latexToUnicode.test.tswith 51 assertions plus a new LaTeX sub-describe block inmarkdownParsingUtils.test.ts)