feat: add Gemini CLI / AI Studio session import support by FBISiri · Pull Request #204 · MemPalace/mempalace

FBISiri · 2026-04-08T03:40:24Z

What does this PR do?

Adds Gemini CLI and Google AI Studio session import support to normalize.py.

MemPalace already has a Gemini CLI integration guide, but normalize.py couldn't parse Gemini session files. This PR fills that gap.

Supported formats

1. Gemini API contents format — used by Gemini CLI session files (~/.gemini/sessions/*.json):

{"contents": [{"role": "user", "parts": [{"text": "..."}]}, ...]}

2. Flat messages list with role: "model" (Gemini convention):

[{"role": "user", "content": "..."}, {"role": "model", "content": "..."}]

Design decisions

Disambiguation: requires at least one role: "model" entry to match. This prevents false positives against Claude/ChatGPT formats that use role: "assistant". Placed after ChatGPT parser and before Slack in the detection chain.
Multi-part text: Gemini API allows multiple text parts per turn — they're joined with spaces.
Non-text parts: inline_data, function_call, etc. are gracefully skipped (only text parts are extracted).
Zero new dependencies: pure JSON parsing, consistent with the existing parsers.

How to test

python -m pytest tests/test_normalize.py -v

6 new test cases:

test_gemini_contents_format — standard Gemini CLI session
test_gemini_flat_messages_format — simplified export
test_gemini_multi_part_text — multiple text blocks in one turn
test_gemini_skips_non_text_parts — images/functions ignored
test_gemini_single_message_returns_none — edge case

Checklist

Tests pass (python -m pytest tests/ -v)
No hardcoded paths
Follows existing code patterns (same style as _try_claude_ai_json, _try_chatgpt_json)

bgauryy · 2026-04-08T18:26:28Z

PR Review: feat: add Gemini CLI / AI Studio session import support

Executive Summary

Aspect	Value
PR Goal	Add Gemini CLI and Google AI Studio session import support to `normalize.py`
Files Changed	2 (`mempalace/normalize.py`, `tests/test_normalize.py`)
Risk Level	🟡 MEDIUM — parser precedence bug silently drops assistant responses for one layout
Review Effort	2/5 — focused change, clear scope
Recommendation	🔄 REQUEST_CHANGES

Affected Areas: normalize.py parser chain, _try_gemini_json new parser, test suite

Business Impact: Users importing Gemini CLI sessions via {"messages": [...]} wrapper with 2+ user turns would get silently broken transcripts (assistant responses dropped).

Flow Changes: New _try_gemini_json parser inserted into the JSON parser chain between _try_chatgpt_json and _try_slack_json. Parser ordering creates a precedence conflict with _try_claude_ai_json for one of the three documented layouts.

Ratings

Aspect	Score
Correctness	3/5
Security	5/5
Performance	5/5
Maintainability	4/5

PR Health

Has clear description
References ticket/issue (if applicable)
Appropriate size (or justified if large)
Has relevant tests (if applicable)

High Priority Issues

(Must fix before merge)

🐛 #1: Parser precedence bug — `_try_claude_ai_json` silently claims Gemini `{"messages": [...]}` data

Location: mempalace/normalize.py:63 | Confidence: ✅ HIGH

_try_claude_ai_json runs before _try_gemini_json in the parser chain. When Gemini data uses the {"messages": [...]} wrapper (Layout 2a), _try_claude_ai_json extracts the list via data.get("messages", ...), captures all role="user" entries, but silently skips role="model" entries (not in its accepted roles: user, human, assistant, ai). If there are 2+ user messages, it returns a broken transcript with only user turns — the Gemini parser never executes.

Trace for input {"messages": [{"role": "user", "content": "Q1"}, {"role": "model", "content": "A1"}, {"role": "user", "content": "Q2"}, {"role": "model", "content": "A2"}]}:

_try_claude_ai_json → extracts messages list → captures ("user", "Q1"), skips model, captures ("user", "Q2"), skips model
len(messages) = 2 >= 2 → returns broken transcript: > Q1\n\n> Q2\n
_try_gemini_json never runs

Fix (choose one):

Option A — Move Gemini parser before Claude AI in the chain (minimal change, safe due to has_model_role guard):

- for parser in (_try_claude_ai_json, _try_chatgpt_json, _try_gemini_json, _try_slack_json):
+ for parser in (_try_gemini_json, _try_claude_ai_json, _try_chatgpt_json, _try_slack_json):

Option B — Add assistant-role guard to _try_claude_ai_json (more defensive, prevents the class of bug):

+ has_assistant = any(r == "assistant" for r, _ in messages)
- if len(messages) >= 2:
+ if len(messages) >= 2 and has_assistant:
      return _messages_to_transcript(messages)

Option A is recommended — _try_gemini_json already has its own has_model_role discriminator that prevents false positives against Claude/ChatGPT data.

🔗 #2: Missing test for Layout 2a — the exact layout affected by bug #1

Location: tests/test_normalize.py (new tests) | Confidence: ✅ HIGH

The PR documents three supported layouts but only tests two:

✅ Layout 1: {"contents": [...]} — tested in test_gemini_contents_format
❌ Layout 2a: {"messages": [...]} — NOT tested
✅ Layout 2b: flat list [...] — tested in test_gemini_flat_messages_format

The missing test is for the exact layout affected by bug #1. Adding this test would have caught the parser precedence issue.

def test_gemini_messages_wrapper_format():
    """Gemini data wrapped in {"messages": [...]} should parse correctly."""
    data = {
        "messages": [
            {"role": "user", "content": "What is Python?"},
            {"role": "model", "content": "A programming language."},
            {"role": "user", "content": "And Java?"},
            {"role": "model", "content": "Also a programming language."},
        ]
    }
    # write to tmp file, normalize, assert both user and model content present

Medium Priority Issues

(Should fix, not blocking)

🔄 #3: Test file will have merge conflicts with current `main`

Location: tests/test_normalize.py | Confidence: ✅ HIGH

The PR's test diff targets an older base where tests/test_normalize.py has bare functions (test_empty()) with tempfile/os.unlink. Current main has been restructured into pytest classes (TestNormalizePlainText, TestClaudeAiJson, etc.) using the tmp_path fixture. The new tests will conflict on merge.

The new tests should be rebased onto current main and restructured as a TestGeminiJson class using tmp_path:

class TestGeminiJson:
    def test_contents_format(self, tmp_path):
        data = {"contents": [
            {"role": "user", "parts": [{"text": "Capital of France?"}]},
            {"role": "model", "parts": [{"text": "Paris."}]},
        ]}
        f = tmp_path / "gemini.json"
        f.write_text(json.dumps(data))
        result = normalize(str(f))
        assert "> Capital of France?" in result
        assert "Paris." in result

🎨 #4: Tests use `tempfile`/`os.unlink` instead of pytest `tmp_path`

Location: tests/test_normalize.py (all new test functions) | Confidence: ✅ HIGH

All 5 new tests use tempfile.NamedTemporaryFile with manual os.unlink(f.name). The codebase convention (every existing test) uses pytest's tmp_path fixture. The tempfile pattern has a cleanup risk: if an assertion fails before os.unlink(), the temp file leaks. tmp_path handles cleanup automatically.

Related to #3 — fixing this during rebase would address both issues.

Low Priority Issues

(Nice to have)

🎨 #5: `elif not text:` is always true — simplify to `else:`

Location: mempalace/normalize.py:+279 (in _try_gemini_json) | Confidence: ✅ HIGH

text = ""
parts = item.get("parts")
if isinstance(parts, list):
    # ... sets text from parts ...
elif not text:  # ← always True here since text="" and parts wasn't a list
    text = _extract_content(item.get("content", ""))

Since text = "" is assigned just above and the elif branch only executes when isinstance(parts, list) is False, not text is always True. This should be else: for clarity.

- elif not text:
+ else:
      text = _extract_content(item.get("content", ""))

🎨 #6: Module docstring mentions "Gemini CLI" but not "Google AI Studio"

Location: mempalace/normalize.py:10 | Confidence: ✅ HIGH

The PR title and body say this supports both "Gemini CLI" and "Google AI Studio" sessions. The docstring update only adds "Gemini CLI JSON sessions". Consider:

-    - Gemini CLI JSON sessions
+    - Gemini CLI / Google AI Studio JSON sessions

Flow Impact Analysis

normalize()
  └─ _try_normalize_json(content)
       ├─ _try_claude_code_jsonl(content)    # JSONL, runs on raw string
       ├─ json.loads(content) → data
       ├─ _try_claude_ai_json(data)          # ⚠️ Claims {"messages":[...]} with 2+ user msgs
       ├─ _try_chatgpt_json(data)            # Requires "mapping" key — no conflict
       ├─ _try_gemini_json(data)             # NEW — handles "contents", "messages", list
       └─ _try_slack_json(data)              # Requires type="message" items — no conflict

Conflict zone: _try_claude_ai_json and _try_gemini_json both handle {"messages": [...]} and top-level lists. The has_model_role guard in the Gemini parser is correct, but it never gets a chance to run when the Claude parser claims the data first.

Safe layouts (no precedence issue):

{"contents": [...]} — Claude parser doesn't check this key
[{"role": "user", ...}, {"role": "model", ...}] (flat, ≤1 user msg) — Claude parser returns None (< 2 messages)

Unsafe layout:

{"messages": [{"role": "user", ...}, {"role": "model", ...}, {"role": "user", ...}, ...]} with 2+ user messages

Created by Octocode MCP https://octocode.ai 🔍🐙

web3guru888

✨ Review of #204 — feat: add Gemini CLI / AI Studio session import support

Scope: +150/−1 · 2 file(s)

mempalace/normalize.py (modified: +68/−1)
tests/test_normalize.py (modified: +82/−0)

Strengths

✅ Includes test coverage

🟢 Approved — clean, well-structured PR. Good work @FBISiri!

_{🏛️ Reviewed by MemPalace-AGI · Autonomous research system with perfect memory · Showcase: Truth Palace of Atlantis}

igorls · 2026-05-08T11:04:13Z

Hi, thanks for the contribution.

This PR has merge conflicts with develop, and the branch has not been updated in over 7 days, which puts it before our most recent release. The conflicts are likely against work that landed in that release.

Could you rebase onto develop so we can take another look?

If this change is no longer relevant, feel free to close the PR.

(This message is part of a periodic backlog pass, sent to all open PRs that match this state.)

Adds _try_gemini_json parser to normalize.py for three layouts: 1. Gemini API contents format (~/.gemini/sessions/*.json): {"contents": [{"role": "user", "parts": [{"text": "..."}]}, ...]} 2. Messages-wrapper variant: {"messages": [{"role": "user", ...}, {"role": "model", ...}]} 3. Flat top-level list with role="model". This complements the existing _try_gemini_jsonl parser (which handles ~/.gemini/tmp/<hash>/chats/session-*.jsonl with session_metadata sentinel) — JSONL covers Gemini CLI runtime sessions, JSON covers exported / Studio-saved transcripts. ## Review feedback addressed (PR MemPalace#204) bgauryy review: - MemPalace#1 Parser-precedence bug: _try_gemini_json runs *before* _try_claude_ai_json so the {"messages":[..., role=model, ...]} layout is no longer silently claimed by the Claude parser. The Gemini parser's has_model_role guard prevents false-positives against Claude / ChatGPT data. - MemPalace#2 Layout 2a coverage: TestGeminiJson.test_messages_wrapper_format + test_messages_wrapper_does_not_get_claimed_by_claude pin the fix in place. - MemPalace#3 Test conflicts with current main: rebased onto develop; tests restructured into TestGeminiJson class. - MemPalace#4 tempfile/os.unlink → pytest tmp_path everywhere. - MemPalace#5 elif not text → else (the elif branch was dead). - MemPalace#6 Module docstring updated to mention Google AI Studio. Tests: 9 new cases in TestGeminiJson covering all three layouts, multi-part text joining, non-text part skipping, has_model_role disambiguation, dispatch-chain regression for review MemPalace#1.

FBISiri · 2026-05-09T06:06:04Z

@bgauryy thanks for the thorough review — all six points addressed in the rebase (a9a2c35).

High priority

Congratulations Milla #1 Parser precedence bug: went with Option A — _try_gemini_json now runs before _try_claude_ai_json in the dispatch chain. The Gemini parser's has_model_role guard (returns None if no role="model" entry is seen) prevents false-positives against Claude / ChatGPT data, so the reordering is safe.
Integrating MemPalace with SoulForge's code intelligence system #2 Layout 2a coverage: added two regression tests — test_messages_wrapper_format (unit-level) and test_messages_wrapper_does_not_get_claimed_by_claude (full normalize() pipeline). Both fail without the precedence fix.

Medium priority

feat: Hermes memory provider integration #3 Test conflicts with current develop: rebased the branch onto current develop. develop has since gained its own _try_gemini_jsonl (for ~/.gemini/tmp/.../chats/session-*.jsonl runtime sessions) — that's a complementary parser to the JSON one this PR adds. Coexistence intentional and called out in the docstring.
Using AAAK as language for agents #4 tempfile/os.unlink → tmp_path: all new tests now use the tmp_path fixture inside a TestGeminiJson pytest class, matching the existing convention.

Low priority

feat: add OpenAI Codex CLI JSONL normalizer #5 elif not text → else: fixed.
fix: handle Windows file-lock errors in test cleanup #6 Module docstring: now says Gemini CLI / Google AI Studio JSON sessions.

tests/test_normalize.py runs clean locally (126 passed, including the 9 new TestGeminiJson cases). PR is back to mergeable — git pull should be a fast-forward.

@igorls heads up — rebased and force-pushed onto current develop per your earlier ask.

web3guru888 approved these changes Apr 11, 2026

View reviewed changes

bensig changed the base branch from main to develop April 11, 2026 22:23

bensig requested review from bensig, igorls and milla-jovovich as code owners April 11, 2026 22:23

igorls added area/cli CLI commands area/mining File and conversation mining enhancement New feature or request labels Apr 14, 2026

igorls added the needs-rebase PR has merge conflicts with develop and needs rebase label May 8, 2026

FBISiri force-pushed the feat/gemini-cli-import branch from 1ed408a to a9a2c35 Compare May 9, 2026 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Gemini CLI / AI Studio session import support#204

feat: add Gemini CLI / AI Studio session import support#204
FBISiri wants to merge 1 commit into
MemPalace:developfrom
FBISiri:feat/gemini-cli-import

FBISiri commented Apr 8, 2026

Uh oh!

bgauryy commented Apr 8, 2026

Uh oh!

web3guru888 left a comment

Uh oh!

igorls commented May 8, 2026

Uh oh!

FBISiri commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

FBISiri commented Apr 8, 2026

What does this PR do?

Supported formats

Design decisions

How to test

Checklist

Uh oh!

bgauryy commented Apr 8, 2026

PR Review: feat: add Gemini CLI / AI Studio session import support

Executive Summary

Ratings

PR Health

High Priority Issues

🐛 #1: Parser precedence bug — _try_claude_ai_json silently claims Gemini {"messages": [...]} data

🔗 #2: Missing test for Layout 2a — the exact layout affected by bug #1

Medium Priority Issues

🔄 #3: Test file will have merge conflicts with current main

🎨 #4: Tests use tempfile/os.unlink instead of pytest tmp_path

Low Priority Issues

🎨 #5: elif not text: is always true — simplify to else:

🎨 #6: Module docstring mentions "Gemini CLI" but not "Google AI Studio"

Flow Impact Analysis

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Strengths

Uh oh!

igorls commented May 8, 2026

Uh oh!

FBISiri commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

🐛 #1: Parser precedence bug — `_try_claude_ai_json` silently claims Gemini `{"messages": [...]}` data

🔄 #3: Test file will have merge conflicts with current `main`

🎨 #4: Tests use `tempfile`/`os.unlink` instead of pytest `tmp_path`

🎨 #5: `elif not text:` is always true — simplify to `else:`