Skip to content

fix(cli): render LaTeX-style output as Unicode in the TUI#25802

Merged
scidomino merged 3 commits intogoogle-gemini:mainfrom
dimssu:fix/issue-25656-latex-markdown-rendering
May 4, 2026
Merged

fix(cli): render LaTeX-style output as Unicode in the TUI#25802
scidomino merged 3 commits intogoogle-gemini:mainfrom
dimssu:fix/issue-25656-latex-markdown-rendering

Conversation

@dimssu
Copy link
Copy Markdown
Contributor

@dimssu dimssu commented Apr 22, 2026

Summary

Model responses for math, CS and algorithms content frequently include LaTeX tokens like $\{P_0, \dots, P_n\}$ or $\to$. Terminals cannot natively render LaTeX, so these appeared as raw backslash sequences that made the output look broken (see screenshot in the linked issue).

This PR adds a conservative renderer-side post-processor that converts the common LaTeX idioms to terminal-friendly Unicode/plain text. Unknown \foo sequences pass through untouched so Windows paths, regex escapes, and unrelated backslash content are preserved.

Details

Where the fix lives. All inline text in the TUI flows through parseMarkdownToANSI in packages/cli/src/ui/utils/markdownParsingUtils.ts (called from InlineMarkdownRenderer and TableRenderer). Code fences go through colorizeCode separately. Applying the conversion at the top of parseMarkdownToANSI covers paragraphs, headers, list items, and table cells. Inline code spans (`...`) and bare URLs are masked with a Private-Use-Area sentinel during the conversion pass so their contents remain verbatim.

What gets converted (new latexToUnicode.ts):

  • $...$ / $$...$$ math delimiters: stripped only when the content contains a LaTeX marker (\, _, ^) or is a single variable letter. Currency like $5.99, two-dollar-amount prose like From $5 to $10, and shell-style $USER $HOME stay intact.
  • Named commands → Unicode: \to, \rightarrow, \Rightarrow, \leftarrow, \mapsto, \dots/\ldots/\cdots, \times, \cdot, \pm, \leq/\le, \geq/\ge, \neq/\ne, \approx, \equiv, \sim, \cong, \propto, \infty, \in, \notin, \subset, \supset, \subseteq, \supseteq, \cup, \cap, \setminus, \emptyset, \forall, \exists, \neg, \land/\wedge, \lor/\vee, \oplus, \otimes, \implies, \iff, \sum, \prod, \int, \iint, \oint, \partial, \nabla, \aleph, \ell, \hbar, bracket commands, operator names (\log, \ln, \exp, \sin, \cos, \tan, trig/hyperbolic family, \max, \min, \sup, \inf, \lim, \gcd, \det, \dim, \arg, \mod).
  • Full Greek alphabet (\alpha\Omega) plus \var* variants.
  • Escaped specials: \{, \}, \_, \%, \&, \#, \$, \|, \ .
  • Text-formatting wrappers: \textbf{x}**x**, \textit{x}/\emph{x}*x*, \text{x}/\mathrm{x}/\mathbb{x}x (fed back through the existing inline parser so bold/italic actually render in the TUI).
  • \frac{a}{b}(a)/(b), \sqrt{x}√(x), \sqrt[3]{x}3√(x).
  • \\ line break → \n.
  • Subscripts/superscripts only inside math delimitersx_0x₀, E = mc^2E = mc², x_{12}x₁₂, with common letter sub/sup (x_nxₙ, x^ixⁱ). Outside math, _ and ^ are left alone so file_name / foo^bar survive. If any character in an operand has no Unicode mapping, the whole operand is left as-is to avoid half-converted output.

Before / after (the exact examples from #25656):

A set of processes $\{P_0, P_1, \dots, P_n\}$ exists...
If the graph contains no cycles $\to$ No Deadlock.

becomes

A set of processes {P₀, P₁, …, Pₙ} exists...
If the graph contains no cycles → No Deadlock.

Why not a prompt-level fix? A renderer fix is deterministic, testable, and works against any model output that leaks LaTeX — not just current Gemini versions. Per the help wanted label and @Adib234's comment on the issue asking for "a pull request that improves the rendering or escaping logic for these characters in the core package."

Design notes, edge cases, and trade-offs were discussed up-front in my comment on the issue before coding.

Related Issues

Fixes #25656

How to Validate

  1. npm install && npm run build
  2. npm run preflight — all tests, lint, typecheck, format pass locally.
  3. Targeted unit tests: cd packages/cli && npx vitest run src/ui/utils/latexToUnicode.test.ts src/ui/utils/markdownParsingUtils.test.ts src/ui/utils/MarkdownDisplay.test.tsx src/ui/utils/TableRenderer.test.tsx
  4. Manual TUI check — npm start, then prompt Gemini with a request that triggers LaTeX output (e.g. "Explain the halting problem with math notation"). Verify:
    • LaTeX arrows/sets/Greek letters render as Unicode (→, ∈, α, …).
    • Inline code spans containing $\to$ or \alpha stay verbatim (backticked).
    • Fenced code blocks are untouched.
    • A message like "It costs $5.99 total" is unchanged.
    • A message like "The path is C:\Users\foo" is unchanged.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed) — no user-facing config change
  • Added/updated tests (new latexToUnicode.test.ts with 51 assertions plus a new LaTeX sub-describe block in markdownParsingUtils.test.ts)
  • Noted breaking changes (if any) — none
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@dimssu dimssu requested a review from a team as a code owner April 22, 2026 13:27
@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 22, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a renderer-side post-processor to improve the readability of model responses in the TUI. By converting LaTeX-style math and formatting tokens into standard Unicode, it prevents broken-looking raw backslash sequences from appearing in the terminal. The implementation is designed to be deterministic and safe, ensuring that technical content like file paths and code snippets remains preserved while math-heavy output is rendered cleanly.

Highlights

  • LaTeX to Unicode Rendering: Implemented a post-processor that converts common LaTeX idioms (e.g., arrows, Greek letters, math operators) into terminal-friendly Unicode characters.
  • Safe Processing: Ensured that non-LaTeX content like Windows paths, regex escapes, and currency amounts remain untouched by using a conservative, regex-based conversion approach.
  • Code Span Preservation: Added a masking mechanism to protect inline code spans and URLs from being incorrectly processed during the LaTeX conversion pass.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a utility to convert LaTeX-style syntax into terminal-friendly Unicode, addressing issues where raw LaTeX commands appeared broken in the CLI output. The implementation includes support for Greek letters, mathematical symbols, sub/superscripts, and basic formatting, with integration into the markdown parser that protects code spans and URLs. Feedback was provided regarding the convertLineBreaks function, which could inadvertently mangle Windows network paths or escaped backslashes in regular prose; it is recommended to restrict this specific conversion to math-mode environments.

out = convertTextFormatting(out);
out = convertFractionsAndRoots(out);
out = convertEscapedSpecials(out);
out = convertLineBreaks(out);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The convertLineBreaks function aggressively converts \\ to a newline character. While this is standard in LaTeX math environments, it is highly problematic in arbitrary prose for a developer-focused CLI. Specifically, it will mangle Windows network paths (e.g., \\server\share) and escaped backslashes in code-like text (e.g., C:\\Path) by turning them into newlines.

Since applyProseConversions is intended for text outside of explicit math delimiters, it is safer to remove this conversion here. Legitimate LaTeX line breaks should be wrapped in math delimiters (handled by applyMathModeConversions) or use standard Markdown newlines.

Suggested change
out = convertLineBreaks(out);
out = convertNamedCommands(out);

@gemini-cli gemini-cli Bot added area/core Issues related to User Interface, OS Support, Core Functionality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! labels Apr 22, 2026
@jacob314
Copy link
Copy Markdown
Contributor

Please sign the CLA and then I can review. Also take a look at the failing tests.

Copy link
Copy Markdown
Contributor

@jacob314 jacob314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Great work! The PR implements a robust, well-documented, and conservative LaTeX to Unicode converter for the CLI frontend. The approach of intercepting and replacing these tokens before they hit the markdown tokenizer is smart, and the regexes correctly avoid over-matching things like Windows file paths and currency.

Findings

  • Correctness: Excellent. The approach to masking inline code spans and URLs, combined with the conservative heuristic for when to strip $, prevents regressions on non-math text. The test coverage is comprehensive.
  • Maintainability: The logic is cleanly separated into its own file (latexToUnicode.ts) with distinct, well-documented transformation steps. Freezing the mapping dictionaries is a nice touch.
  • Efficiency: The short-circuit if (input.indexOf('\\') === -1 && input.indexOf('$') === -1) correctly keeps the hot path fast for plain text.

Nitpicks

packages/cli/src/ui/utils/markdownParsingUtils.ts
In convertLatexPreservingSpans, the MASK_PATTERN replaces matched placeholders with their original strings. In the incredibly unlikely event that a user inputs the exact private use area sentinel \uE0000\uE000 and preserved is empty, preserved[0] will be undefined, and replace will output the literal string "undefined".
You can make this completely bulletproof with a fallback:

  return converted.replace(
    MASK_PATTERN,
    (match, i: string) => preserved[Number(i)] ?? match,
  );

Conclusion

Approved. The implementation is solid, the logic is sound, and all tests pass beautifully. 🚀

dimssu added a commit to dimssu/gemini-cli that referenced this pull request Apr 23, 2026
- Remove convertLineBreaks from applyProseConversions so `\\` outside
  math delimiters is no longer rewritten to a newline. This protects
  Windows UNC paths (`\\server\share`) and escaped backslashes in
  code-like prose. LaTeX line breaks inside `$...$` / `$$...$$` are
  still handled by applyMathModeConversions.
- Add `?? match` fallback to the MASK_PATTERN restore step so a
  stray PUA sentinel in user input can never cause "undefined" to
  leak into rendered output.
- Add tests for Windows UNC paths and math-mode-only line breaks.

Addresses review comments from @gemini-code-assist and @jacob314
on PR google-gemini#25802.
@dimssu
Copy link
Copy Markdown
Contributor Author

dimssu commented Apr 23, 2026

Thanks for the reviews @jacob314 and @gemini-code-assist! Pushed 2649238 addressing both pieces of feedback:

1. convertLineBreaks removed from prose mode (@gemini-code-assist)

Good catch on the UNC-path hazard. applyProseConversions no longer rewrites \\ to a newline — that was too aggressive for a developer CLI where \\server\share, C:\\Path, and escaped backslashes in code samples are common. \\ line breaks inside $...$ / $$...$$ still convert via applyMathModeConversions, which is where the LaTeX spec actually places them. Added two tests (leaves Windows UNC paths alone, leaves \\\\ alone outside math mode) and updated the existing line-break test to assert the math-mode form.

2. ?? match fallback on MASK_PATTERN (@jacob314)

Applied verbatim. If the PUA sentinel ever appears in user input with a stale index, the replace callback now returns the literal match instead of undefined. Added a comment explaining the defensive fallback.

All 76 tests pass locally (latexToUnicode.test.ts + markdownParsingUtils.test.ts).

The failing Slow Test - Win - others check is sandboxManager.integration.test.ts > allows workspace execution timing out after 60s — unrelated to this PR (no code touched outside packages/cli/src/ui/utils/), and it appears to be a known flake on the Windows runner. The CI aggregate job and cla/google failure are upstream (CLA signature pending).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thanks for the update, @dimssu! I appreciate you addressing the feedback regarding convertLineBreaks in prose mode. The change to restrict \\ conversion to math mode within applyMathModeConversions and the addition of tests for Windows UNC paths and escaped backslashes outside math mode are excellent improvements. This ensures that legitimate non-LaTeX content is preserved, which was the main concern. Great work!

dimssu added 2 commits April 24, 2026 04:42
Model responses for math and CS content frequently include LaTeX tokens
like `$\{P_0, \dots, P_n\}$` or `$\to$`. Terminals cannot render LaTeX,
so these appeared as raw backslash sequences that looked broken.

Add a conservative post-processor (`convertLatexToUnicode`) and wire it
into `parseMarkdownToANSI` — the single choke point for all inline text
in the TUI (paragraphs, headers, list items, table cells). Inline code
spans and bare URLs are masked with a private-use-area sentinel during
the conversion pass so their contents remain verbatim.

What gets converted:

- `$...$` / `$$...$$` math delimiters: stripped only when the content
  contains a LaTeX marker (`\\`, `_`, `^`) or is a single variable letter.
  Currency like `$5.99` and shell-style `$USER $HOME` stay intact.
- Named commands -> Unicode: `\to`, `\rightarrow`, `\dots`, `\times`,
  `\leq`, `\geq`, `\neq`, `\approx`, `\infty`, `\in`, `\forall`, `\exists`,
  `\sum`, `\prod`, `\int`, `\partial`, `\nabla`, full Greek alphabet,
  common operator names (`\log`, `\sin`, ...), arrows, set theory, logic.
- Escaped specials: `\{`, `\}`, `\_`, `\%`, `\&`, `\#`, `\$`, `\|`, `\ `.
- Text wrappers: `\textbf{x}` -> `**x**`, `\textit{x}` -> `*x*`,
  `\text{x}`/`\mathrm{x}` -> `x`.
- `\frac{a}{b}` -> `(a)/(b)`, `\sqrt{x}` -> `√(x)`.
- Sub/superscripts (inside math only) where every char maps to Unicode:
  `x_0` -> `x₀`, `E = mc^2` -> `E = mc²`, `x_{12}` -> `x₁₂`.

Unknown `\foo` sequences pass through untouched, so Windows paths, regex
escapes, and unrelated backslash content are preserved.

Fixes google-gemini#25656
- Remove convertLineBreaks from applyProseConversions so `\\` outside
  math delimiters is no longer rewritten to a newline. This protects
  Windows UNC paths (`\\server\share`) and escaped backslashes in
  code-like prose. LaTeX line breaks inside `$...$` / `$$...$$` are
  still handled by applyMathModeConversions.
- Add `?? match` fallback to the MASK_PATTERN restore step so a
  stray PUA sentinel in user input can never cause "undefined" to
  leak into rendered output.
- Add tests for Windows UNC paths and math-mode-only line breaks.

Addresses review comments from @gemini-code-assist and @jacob314
on PR google-gemini#25802.
@dimssu
Copy link
Copy Markdown
Contributor Author

dimssu commented Apr 30, 2026

Quick courtesy ping in case this slipped off the radar. All checks are green, CLA's signed, and the review feedback's been addressed. Happy to address anything else if more eyes land on it.

@scidomino scidomino added this pull request to the merge queue May 4, 2026
@scidomino scidomino self-requested a review May 4, 2026 18:05
Merged via the queue into google-gemini:main with commit 77f4be1 May 4, 2026
27 checks passed
TirthNaik-99 pushed a commit to TirthNaik-99/gemini-cli that referenced this pull request May 4, 2026
…ini#25802)

Co-authored-by: cynthialong0-0 <82900738+cynthialong0-0@users.noreply.github.com>
kimjune01 pushed a commit to kimjune01/gemini-cli-claude that referenced this pull request May 6, 2026
…ini#25802)

Co-authored-by: cynthialong0-0 <82900738+cynthialong0-0@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core Issues related to User Interface, OS Support, Core Functionality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Markdown rendering issue with LaTeX-style syntax ($, \, etc.)

4 participants