Skip to content

Fix Gemini legacy thought message formatting#71

Merged
Zhang-Henry merged 3 commits intomainfrom
fix/gemini-legacy-thought-format
Mar 22, 2026
Merged

Fix Gemini legacy thought message formatting#71
Zhang-Henry merged 3 commits intomainfrom
fix/gemini-legacy-thought-format

Conversation

@bbsngg
Copy link
Copy Markdown
Contributor

@bbsngg bbsngg commented Mar 22, 2026

Summary

  • split legacy Gemini [Thought: true] assistant text into proper thinking vs final message blocks in the chat UI
  • normalize streamed Gemini assistant text at the end of a turn so realtime rendering matches session replay
  • normalize Gemini assistant session persistence so new session JSONL files no longer store inline legacy thought markers

Testing

  • npm run typecheck
  • npm run build

@Zhang-Henry
Copy link
Copy Markdown
Collaborator

Zhang-Henry commented Mar 22, 2026

Code Review — fix/gemini-legacy-thought-format

Nice work splitting the [Thought: true] markers into proper thinking/message blocks. A few issues to address before merge, ordered by severity with fix-difficulty estimates.


HIGH

1. Missing role: 'assistant' on thinking entries (server persistence)

In normalizePersistedGeminiAssistantEntries (server/gemini-cli.js), thinking entries are persisted without a role field:

const entries = legacySegments.slice(0, -1).map((segment) => ({
    type: 'thinking',
    content: segment
    // ← no role
}));

Only the final message entry gets role: 'assistant'. Any downstream code that filters or groups session entries by role (e.g. convertSessionMessages) will silently skip these. One-line fix — add role: 'assistant' to the thinking map.

2. Duplicated splitLegacyGeminiThoughtContent — server (JS) vs client (TS)

The same parsing logic lives in two places with subtly different signatures:

  • server/gemini-cli.js → returns string[] | null
  • src/components/chat/utils/chatFormatting.ts → returns { content, isThinking }[] | null

These will inevitably drift (e.g. if someone adjusts the regex in one but not the other). Needs an arch decision:

  • Option A: Extract into a shared utils/ module imported by both server and client.
  • Option B: Keep separate but add a comment cross-referencing the other copy, and add a shared test fixture that exercises both.

Either way this should be resolved before merge — duplicated parsing logic across the stack boundary is a maintenance risk.


MEDIUM

3. unescapeWithMathProtection applied to previously untouched code paths

The diff adds unescapeWithMathProtection() calls to handleSimpleAssistantMessage and the structured-data text handler in useChatRealtimeHandlers.ts, where these paths previously only ran decodeHtmlEntities + formatUsageLimitText. This changes rendering behavior for all assistant messages in those paths, not just Gemini legacy-thought messages.

If this is intentional (fixing a pre-existing bug where math escaping was missing), please document it in the PR description as a separate behavioral change. If accidental, scope the unescapeWithMathProtection call to only run when legacy thought markers are detected.

4. Streaming → finalize visual jump

During streaming, the user sees the raw text including [Thought: true] markers. When finalizeStreamingMessage fires, it splits the content into separate thought/message blocks, causing the message to visually restructure. This is especially noticeable on long responses.

Hardest to fix — would need the streaming handler to incrementally detect and split thought markers as they arrive. Not necessarily a blocker, but worth noting as a known UX issue and a follow-up item.

5. No unit tests for the new parsing function

splitLegacyGeminiThoughtContent is pure and deterministic — ideal candidate for unit tests. Suggested cases:

  • No markers → null
  • Single marker (only 1 segment after split) → null
  • Multiple markers → correct segment split with isThinking flags
  • Case variations ([thought: TRUE], [Thought:true])
  • Leading/trailing whitespace and newlines around markers
  • Edge: marker at very start or very end of string

This is especially important given issue #2 — if the function is duplicated, tests are the safety net against drift.

bbsngg and others added 2 commits March 22, 2026 13:29
… add unit tests

- Unify server/client splitLegacyGeminiThoughtContent to return { content, isThinking } objects
- Extract buildAssistantMessages helper to eliminate 5 duplicate legacy segment patterns
- Add vitest with 24 unit tests covering splitLegacyGeminiThoughtContent, buildAssistantMessages, and normalizePersistedGeminiAssistantEntries
- Add safety comment on finalizeStreamingMessage for non-Gemini provider compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… parser, revert unintended unescapeWithMathProtection

- Add role: 'assistant' to thinking entries in normalizePersistedGeminiAssistantEntries
- Extract splitLegacyGeminiThoughtContent into shared/geminiThoughtParser.js to eliminate server/client duplication
- Revert unescapeWithMathProtection from handleSimpleAssistantMessage and structured-data text handler (was unintentionally added to previously untouched code paths)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Zhang-Henry Zhang-Henry merged commit c3562bb into main Mar 22, 2026
1 check passed
@bbsngg bbsngg deleted the fix/gemini-legacy-thought-format branch March 22, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants