fix: handle large claude.ai exports and multi-conversation "messages" key by z3tz3r0 · Pull Request #676 · MemPalace/mempalace

z3tz3r0 · 2026-04-12T05:57:09Z

Summary

Root cause 1: MAX_FILE_SIZE in convo_miner.py was 10 MB — claude.ai exports routinely exceed this (21+ MB for active users). Files were silently skipped with zero feedback. Raised to 100 MB and added a warning when files are skipped.
Root cause 2: _try_claude_ai_json in normalize.py only detected multi-conversation exports using the "chat_messages" key (privacy export). Standard claude.ai exports use "messages" — these fell through to the flat-messages parser which failed silently (conversation dicts have no "role" at top level), producing 0 drawers.
Parser fix: Now checks for both "chat_messages" and "messages" at the conversation object level, and processes each conversation into a separate transcript section instead of concatenating all 844+ conversations into one.
Tests: 3 new test cases for multi-conversation parsing ("messages" key, per-conversation separation, short conversation filtering).

Note: #646 was closed via #667, but #667 addresses paginated export/read-back — it does not touch convo_miner.py or the MAX_FILE_SIZE skip, nor the "messages" key mismatch in the parser. The two root causes reported in #646 remain unfixed on main.

Test plan

pytest tests/ -v — 592 passed (589 base + 3 new), 0 failed
New tests verify: "messages" key parsing, per-conversation separation, short conversation filtering
Mine a real claude.ai conversations.json export (> 10 MB) and verify drawers are created per conversation

Addresses #646

… key Two bugs in claude.ai export mining: 1. MAX_FILE_SIZE was 10 MB — claude.ai conversation exports routinely exceed this (21+ MB for active users). Files were silently skipped with no warning. Raised to 100 MB and added a warning message when files are skipped due to size. 2. _try_claude_ai_json only detected multi-conversation exports when conversations used the "chat_messages" key (privacy export format). Standard exports use "messages" instead — these fell through to the flat-messages parser which failed silently (conversation dicts have no "role" key at top level), producing 0 drawers. Now checks for both "chat_messages" and "messages" at the conversation level, and processes each conversation into a separate transcript section instead of concatenating all into one. Adds 3 tests for multi-conversation parsing. Addresses MemPalace#646

igorls · 2026-05-08T10:59:38Z

Hi, thanks for the contribution.

This PR has merge conflicts with develop, and the branch has not been updated in over 7 days, which puts it before our most recent release. The conflicts are likely against work that landed in that release.

Could you rebase onto develop so we can take another look?

If this change is no longer relevant, feel free to close the PR.

(This message is part of a periodic backlog pass, sent to all open PRs that match this state.)

z3tz3r0 requested review from bensig, igorls and milla-jovovich as code owners April 12, 2026 05:57

z3tz3r0 mentioned this pull request Apr 12, 2026

bug: _try_claude_ai_json parser silently produces 0 drawers on claude.ai export (conversations.json) #646

Closed

mvalentsev mentioned this pull request Apr 12, 2026

fix: parse Claude.ai privacy export with messages key and sender field (#677) #685

Merged

igorls changed the base branch from main to develop April 13, 2026 04:46

igorls added area/mining File and conversation mining bug Something isn't working labels Apr 14, 2026

igorls added the needs-rebase PR has merge conflicts with develop and needs rebase label May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle large claude.ai exports and multi-conversation "messages" key#676

fix: handle large claude.ai exports and multi-conversation "messages" key#676
z3tz3r0 wants to merge 1 commit into
MemPalace:developfrom
z3tz3r0:fix/claude-ai-export-mining

z3tz3r0 commented Apr 12, 2026

Uh oh!

igorls commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

z3tz3r0 commented Apr 12, 2026

Summary

Test plan

Uh oh!

igorls commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants