Skip to content

feat(convo_miner): auto-route AI tool sessions to wing_api#1236

Merged
milla-jovovich merged 1 commit into
developfrom
feat/convo-miner-wing-api-auto-route
May 21, 2026
Merged

feat(convo_miner): auto-route AI tool sessions to wing_api#1236
milla-jovovich merged 1 commit into
developfrom
feat/convo-miner-wing-api-auto-route

Conversation

@milla-jovovich

@milla-jovovich milla-jovovich commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator

When mempalace mine --mode convos is invoked against a directory inside a known AI-tool storage path (Claude Code, Codex CLI, Gemini CLI), the destination wing now auto-defaults to wing_api rather than the directory basename. Conversations from external API-keyed tools land grouped under a single dedicated wing for visibility.

Detected paths (exact-segment match — substrings like .gemini-backup or .codex-archive do NOT match):

  • any segment .codex (Codex CLI sessions / archives)
  • any segment .gemini (Gemini CLI sessions under ~/.gemini/tmp/...)
  • the consecutive segment pair .claude/projects (Claude Code). .claude alone is NOT matched - that is the settings/config dir, not a conversation source.

Wing-resolution precedence (first match wins):

  1. Explicit --wing argument from the user - always wins
  2. AI-tool path detection -> wing_api
  3. Basename fallback (existing behavior, unchanged)

Two new helpers split out of mine_convos for unit-test coverage:

  • _is_ai_tool_path(path: Path) -> bool
  • _resolve_wing(convo_path: Path, wing: Optional[str]) -> str

mine_convos now calls _resolve_wing in place of its inline basename logic. No other call sites or downstream consumers change.

Test coverage:

  • 15 unit tests covering positive matches (Claude Code subdir + root, Codex root + sessions, Gemini root + chats), negative cases (.claude alone is settings dir, unrelated paths, substring no-match on .gemini-backup / .codex-archive), explicit --wing override, auto-route trio, basename fallback, empty-string-as-no-wing.
  • End-to-end smoke test (manual): real-shape Claude Code JSONL fixture mined via the actual CLI; sqlite read-back of /tmp palace confirms drawers landed with wing='wing_api' and verbatim content preserved; mempalace search --wing wing_api returns expected content ranked.
  • Full pytest sweep: 1388 baseline + 15 new = 1403 passed, zero regressions.

Closes part of #59 for the auto-routing UX.

When mempalace mine --mode convos is invoked against a directory inside
a known AI-tool storage path (Claude Code, Codex CLI, Gemini CLI), the
destination wing now auto-defaults to wing_api rather than the directory
basename. Conversations from external API-keyed tools land grouped under
a single dedicated wing for visibility.

Detected paths (exact-segment match — substrings like .gemini-backup or
.codex-archive do NOT match):

  - any segment .codex (Codex CLI sessions / archives)
  - any segment .gemini (Gemini CLI sessions under ~/.gemini/tmp/...)
  - the consecutive segment pair .claude/projects (Claude Code).
    .claude alone is NOT matched - that is the settings/config dir,
    not a conversation source.

Wing-resolution precedence (first match wins):

  1. Explicit --wing argument from the user - always wins
  2. AI-tool path detection -> wing_api
  3. Basename fallback (existing behavior, unchanged)

Two new helpers split out of mine_convos for unit-test coverage:

  - _is_ai_tool_path(path: Path) -> bool
  - _resolve_wing(convo_path: Path, wing: Optional[str]) -> str

mine_convos now calls _resolve_wing in place of its inline basename
logic. No other call sites or downstream consumers change.

Test coverage:

  - 15 unit tests covering positive matches (Claude Code subdir + root,
    Codex root + sessions, Gemini root + chats), negative cases
    (.claude alone is settings dir, unrelated paths, substring no-match
    on .gemini-backup / .codex-archive), explicit --wing override,
    auto-route trio, basename fallback, empty-string-as-no-wing.
  - End-to-end smoke test (manual): real-shape Claude Code JSONL fixture
    mined via the actual CLI; sqlite read-back of /tmp palace confirms
    drawers landed with wing='wing_api' and verbatim content preserved;
    mempalace search --wing wing_api returns expected content ranked.
  - Full pytest sweep: 1388 baseline + 15 new = 1403 passed, zero
    regressions.

Design context:

This change reflects Aya's product call that conversations from
API-keyed AI tools should land in a structural wing_api rather than be
scattered across topical wings derived from directory basenames. Igor's
ADR-0017 in mempalace-ts proposes the alternative of source-prefix
metadata (source LIKE 'api/%') with topical wing assignment instead;
that approach has architectural merit (wings stay topical) but does not
deliver the single-wing visibility users get here. Open for review
discussion - explicit --wing flag and basename fallback both unchanged,
so this is additive and reversible.

Closes part of #59 for the auto-routing UX.
@milla-jovovich milla-jovovich force-pushed the feat/convo-miner-wing-api-auto-route branch from c2f5d71 to 4098c54 Compare April 27, 2026 08:59

@bensig bensig left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Clean separation of concerns, correct path-matching, full test coverage, and the wing-resolution precedence is exactly right.

Full pytest on this branch: 1456 passed, 1 skipped, 19 in `test_convo_miner.py` (matches the +15 new the body promised).

What I checked

Path matching is right and defensive

  • `path.resolve().parts` handles symlinks and relative paths correctly. A user who symlinks a Claude transcript dir somewhere else still gets routed to `wing_api` because resolve surfaces the original `.claude/projects/` path.
  • `try/except (OSError, RuntimeError)` around `resolve()` catches the rare case of a broken symlink or path-too-long without crashing the mine.
  • Exact-segment match is the key correctness detail. `.gemini-backup` and `.codex-archive` correctly do NOT match — that would have been an easy mistake. The negative tests cover both.
  • `.claude/projects` requires the consecutive-segment pair, not bare `.claude` (which is the settings dir, not conversations). Also tested.

Wing-resolution precedence is correct

  1. Explicit `--wing` always wins. User intent sacrosanct.
  2. AI-tool path → `wing_api` when no explicit wing.
  3. `normalize_wing_name(basename)` fallback uses the shared helper from `config.py` — same source of truth as `cmd_init`, `room_detector_local`, and `miner.load_config`. Slots cleanly into the #1194 consolidation work that landed yesterday.

The empty-string handling (`if wing:` is falsy on `""`) matches the #1097 "empty-string as no filter" pattern that's now consistent across the codebase.

Tests cover the right surface

The 15 new test cases hit:

  • Positive matches: Claude Code subdir + root, Codex root + sessions, Gemini root + chats
  • Negative cases: `.claude` alone (settings, NOT conversations), unrelated paths, substring no-match on `.gemini-backup` / `.codex-archive`
  • Override paths: explicit `--wing` beats auto-route, basename fallback on non-AI paths, empty-string treated as no-wing

The negative cases are the ones that prove the matching is exact rather than fuzzy. Good discipline.

Architecturally sound

Routing API-driven conversations to a dedicated `wing_api` (separate from project wings) is the right default. They're a different kind of content — general LLM exchanges that can span any topic — so segregating them into one wing makes search and graph traversal cleaner. Users who want them in a specific project wing pass `--wing`; users who do nothing get something semantically reasonable.

Minor observations (not blockers)

  1. `wing_api` is hardcoded in `_resolve_wing`. Could be promoted to a module-level constant (`_AI_TOOL_DEFAULT_WING = "wing_api"`) for visibility, but not material.

  2. The detection list is closed. As more AI-tool ecosystems land (Cursor, Continue, Aider, Zed AI, etc.), this set needs extension. Could become config-driven later (env var or `config.json` key like `ai_tool_path_segments`). Out of scope for this PR; worth a follow-up issue if the list grows.

  3. Edge case worth knowing: if a user mines `~/.claude/projects/-Users-me-Projects-MyProject/` and wants those Claude conversations specifically in `wing_myproject`, they need `--wing myproject`. The default (`wing_api`) is more useful for the majority case where Claude conversations are general-purpose, but worth a doc note that "explicit per-project routing of AI-tool conversations is one flag away."

Closes part of #59. Ship it.

@alisacorporation

alisacorporation commented Apr 28, 2026

Copy link
Copy Markdown

@milla-jovovich

photo_2026-04-28_11-40-23

@igorls igorls added area/mining File and conversation mining enhancement New feature or request labels May 2, 2026
@milla-jovovich

Copy link
Copy Markdown
Collaborator Author

approved.

@milla-jovovich milla-jovovich merged commit 60d460b into develop May 21, 2026
6 checks passed
@igorls igorls mentioned this pull request May 24, 2026
3 tasks
arnoldwender pushed a commit to arnoldwender/mempalace that referenced this pull request May 24, 2026
Bumps version 3.3.5 → 3.3.6 across pyproject.toml, version.py, plugin
manifests (.claude-plugin/plugin.json, .claude-plugin/marketplace.json,
.codex-plugin/plugin.json), README badge, and uv.lock. Flips CHANGELOG.md
from ``[Unreleased]`` to ``[3.3.6] — 2026-05-24`` and backfills the
major user-facing entries that landed without changelog entries during
the cycle:

Features:
- MemPalace#1555 office-document mining via --mode extract + virtual line numbers
- MemPalace#1584 surgical closet pointers with date+line locators (Tier 6a)
- MemPalace#1558 + MemPalace#1560 within-wing hallways (entity co-occurrence graph)
- MemPalace#1565 cross-wing tunnels auto-promoted from hallways
- MemPalace#1578 Hebbian potentiation + Ebbinghaus decay on hallways/tunnels
- MemPalace#1236 API-tool transcripts auto-route to wing_api
- MemPalace#711 hooks.auto_save toggle for silent-mode sessions
- MemPalace#1605 COCA content-word filter for entity detection
- MemPalace#1557 case-insensitive entity matching at mine time
- MemPalace#1483 multilingual embeddings (embeddinggemma-300m) by default

Bug Fixes (selected, user-visible):
- MemPalace#1540 silent data loss in three unchunked upsert sites
- MemPalace#1538 paragraph chunker oversized chunks
- MemPalace#1554 per-file chunk cap too low for transcripts
- MemPalace#1562 Windows hook subprocess/ChromaDB deadlock
- MemPalace#1529 create_tunnel corrupted hyphenated wing names
- MemPalace#1424 save-hook truncated hyphenated project folders
- MemPalace#1383 KG cache duplicated graphs for symlinked/cased paths
- MemPalace#1466 silent symlink skip now logged
- MemPalace#1441 macOS stock-bash 3.2 hook compatibility
- MemPalace#1500 / MemPalace#1513 structured JSON-RPC errors on bad MCP input
- MemPalace#1523 VACUUM + FTS5 rebuild after repair
- MemPalace#1548 FTS5 validation at end of mine
- plus MemPalace#1216, MemPalace#1408, MemPalace#1438, MemPalace#1439, MemPalace#1445, MemPalace#1452, MemPalace#1459, MemPalace#1461, MemPalace#1466,
  MemPalace#1470, MemPalace#1477, MemPalace#1485, MemPalace#1500, MemPalace#1513, MemPalace#1528, MemPalace#1532, MemPalace#1543, MemPalace#1546, MemPalace#1585

Performance:
- MemPalace#1474 convo miner pre-fetches mined-set
- MemPalace#1487 rebuild_index progress callback
- MemPalace#1530 MCP cold-start diagnostics + opt-in warmup

Lint passes (ruff 0.15.14); mempalace-mcp entry point alignment
verified per RELEASING.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mining File and conversation mining enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants