Skip to content

Commit 2e6ced9

Browse files
jpheinclaude
andauthored
feat(cli): mempalace mined + purge --source-file (#7)
* feat(cli): mempalace mined + purge --source-file Adds two ergonomic gaps in the mining-management surface JP asked for: 1. **mempalace purge --source-file <path>** — extends the existing purge command with a third filter alongside --wing and --room. Single-filter or combined ($and) usage works the same way wing/room do; the filter accepts a metadata.source_file exact-match. End-to- end test exercises a real palace and confirms only the matching drawers are deleted, leaving siblings intact. 2. **mempalace mined** — companion to `mempalace status`, but groups by wing × source_file rather than wing × room. Answers "which files have I mined into this wing?" so an operator can pick targets for the new --source-file purge. Skips drawers without a source_file metadata key (diary entries, kg drawers, etc.). Honors --wing and --limit (default 50 sources per wing, --limit 0 to show all). Uses the same paginated col.get() pattern as miner.status() to handle 100K+ drawer palaces without tripping sqlite's max-variable limit. Together these close the "removing manually mined data" half of JP's ask. The "automining when files change" half is queued for a separate file-watcher daemon PR. Pre-existing `mempalace mine <dir>` already covers "adding" (and re-mines on mtime change via bulk_check_mined dedup), so no command needed for refresh. Tests: 8 new test cases (3 for purge --source-file with mock + real palace; 3 for cmd_mined including end-to-end groupBy + wing filter + no-palace error path; 2 updated to add the new arg to the test namespace builder). Full suite: 1564 passed, 1 skipped, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): address Copilot review on #4 Four findings from Copilot's review pass on the mining-management CLI: 1. **--wing filter pushed into the query** (Copilot cli.py:800). Previously cmd_mined fetched every drawer in the palace and filtered in Python, doing O(total) work even for "mempalace mined --wing X" on a 150K-drawer palace where X has 12K. Pre-fetch IDs via col.get(where=...) to size the loop, then pass where= on each batch get(). On the canonical palace, --wing filters now scan ~10x less data. 2. **Reject negative --limit at parse time** (Copilot cli.py:818). argparse type=int silently accepted -1, producing nonsensical "... -2 more" output. Add a typed _nonneg_int validator that raises ArgumentTypeError on negatives. argparse turns that into a clean exit code 2 with usage error message. 3. **Update module __doc__** (Copilot cli.py:1495). main() uses the module docstring as the help epilog. The hard-coded command list stopped at "status" — added "mined" and "purge --source-file" so `mempalace --help` reflects the new commands. 4. **Dispatch test for `mempalace mined`** (Copilot cli.py:1542). Sibling commands have main()-level dispatch tests; mined didn't. A typo or registration regression would have slipped through unit coverage. Added test_main_mined_dispatches and a paired test_main_mined_rejects_negative_limit (verifies #2). Tests: 68 cli tests pass; full suite green at 1566 passed, 1 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 69768fc commit 2e6ced9

2 files changed

Lines changed: 312 additions & 12 deletions

File tree

mempalace/cli.py

Lines changed: 132 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
mempalace wake-up Show L0 + L1 wake-up context
1919
mempalace wake-up --wing my_app Wake-up for a specific project
2020
mempalace status Show what's been filed
21+
mempalace mined List mined source files grouped by wing
22+
mempalace purge --source-file <path> Remove drawers mined from a specific file
2123
2224
Examples:
2325
mempalace init ~/projects/my_app
@@ -677,15 +679,19 @@ def cmd_purge(args):
677679
print(f"\n No palace found at {palace_path}")
678680
return
679681

680-
if args.wing and args.room:
681-
where = {"$and": [{"wing": args.wing}, {"room": args.room}]}
682-
elif args.wing:
683-
where = {"wing": args.wing}
684-
elif args.room:
685-
where = {"room": args.room}
686-
else:
687-
print(" Error: specify --wing and/or --room")
682+
source_file = getattr(args, "source_file", None)
683+
clauses = []
684+
if args.wing:
685+
clauses.append({"wing": args.wing})
686+
if args.room:
687+
clauses.append({"room": args.room})
688+
if source_file:
689+
clauses.append({"source_file": source_file})
690+
691+
if not clauses:
692+
print(" Error: specify at least one of --wing, --room, --source-file")
688693
return
694+
where = clauses[0] if len(clauses) == 1 else {"$and": clauses}
689695

690696
backend = ChromaBackend()
691697
try:
@@ -711,6 +717,8 @@ def cmd_purge(args):
711717
label_parts.append(f"wing={args.wing}")
712718
if args.room:
713719
label_parts.append(f"room={args.room}")
720+
if source_file:
721+
label_parts.append(f"source-file={source_file}")
714722
label = " ".join(label_parts)
715723

716724
if match_count == 0:
@@ -741,6 +749,91 @@ def cmd_status(args):
741749
status(palace_path=palace_path)
742750

743751

752+
def cmd_mined(args):
753+
"""List mined source files grouped by wing.
754+
755+
Companion to ``status`` (which groups by wing × room) — answers "which
756+
files have I mined into this wing?" so an operator can pick targets
757+
for ``mempalace purge --source-file <path>``.
758+
759+
Skips drawers without a ``source_file`` metadata key (typically
760+
diary entries, kg drawers, manually-added entries).
761+
"""
762+
from collections import defaultdict
763+
764+
from .backends.chroma import ChromaBackend
765+
from .migrate import contains_palace_database
766+
767+
palace_path = os.path.abspath(
768+
os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
769+
)
770+
771+
if not os.path.isdir(palace_path) or not contains_palace_database(palace_path):
772+
print(f"\n No palace found at {palace_path}")
773+
return
774+
775+
backend = ChromaBackend()
776+
try:
777+
col = backend.get_collection(palace_path, "mempalace_drawers")
778+
except Exception as e:
779+
print(f"\n Error reading palace: {e}")
780+
return
781+
782+
# Wing-by-source aggregation. Mirrors miner.status's pagination so
783+
# palaces with hundreds of thousands of drawers don't trip SQLite's
784+
# max-variable limit on a single col.get(limit=total). When --wing
785+
# is given, push the filter into the query so we scan only that
786+
# wing rather than the full collection (Copilot finding on #4).
787+
where = {"wing": args.wing} if args.wing else None
788+
if where is not None:
789+
try:
790+
scope = col.get(where=where, include=[])
791+
scope_ids = scope.get("ids") if isinstance(scope, dict) else getattr(scope, "ids", [])
792+
total = len(scope_ids or [])
793+
except Exception:
794+
total = col.count()
795+
else:
796+
total = col.count()
797+
wing_sources: dict = defaultdict(lambda: defaultdict(int))
798+
batch_size = 5000
799+
offset = 0
800+
while offset < total:
801+
kwargs = {"limit": batch_size, "offset": offset, "include": ["metadatas"]}
802+
if where is not None:
803+
kwargs["where"] = where
804+
r = col.get(**kwargs)
805+
batch = r.get("metadatas") if isinstance(r, dict) else getattr(r, "metadatas", [])
806+
if not batch:
807+
break
808+
for m in batch:
809+
m = m or {}
810+
src = m.get("source_file")
811+
if not src:
812+
continue
813+
wing = m.get("wing", "?")
814+
wing_sources[wing][src] += 1
815+
offset += len(batch)
816+
817+
if not wing_sources:
818+
scope = f" in wing={args.wing}" if args.wing else ""
819+
print(f"\n No mined source files found{scope}.\n")
820+
return
821+
822+
print(f"\n{'=' * 55}")
823+
print(" MemPalace Mined — sources by wing")
824+
print(f"{'=' * 55}\n")
825+
for wing in sorted(wing_sources):
826+
sources = sorted(wing_sources[wing].items(), key=lambda x: x[1], reverse=True)
827+
print(f" WING: {wing} ({len(sources)} sources, {sum(c for _, c in sources)} drawers)")
828+
shown = sources if args.limit == 0 else sources[: args.limit]
829+
for src, count in shown:
830+
print(f" {count:5} {src}")
831+
if args.limit and len(sources) > args.limit:
832+
print(f" ... {len(sources) - args.limit} more (use --limit 0 to show all)")
833+
print()
834+
print(f"{'=' * 55}\n")
835+
836+
744837
def cmd_repair_status(args):
745838
"""Read-only HNSW capacity health check (#1222)."""
746839
from .repair import status as repair_status
@@ -1398,14 +1491,43 @@ def main():
13981491

13991492
p_purge = sub.add_parser(
14001493
"purge",
1401-
help="Delete drawers by wing and/or room (filtered delete via chromadb)",
1494+
help="Delete drawers by wing, room, and/or source-file (filtered delete via chromadb)",
14021495
)
14031496
p_purge.add_argument("--wing", help="Wing to purge")
14041497
p_purge.add_argument("--room", help="Room to purge (without --wing, purges across ALL wings)")
1498+
p_purge.add_argument(
1499+
"--source-file",
1500+
help="Source-file path to purge (matches metadata.source_file exactly)",
1501+
)
14051502
p_purge.add_argument("--yes", "-y", action="store_true", help="Skip confirmation prompt")
14061503

14071504
sub.add_parser("status", help="Show what's been filed")
14081505

1506+
p_mined = sub.add_parser(
1507+
"mined",
1508+
help="List mined source files grouped by wing (companion to status, which groups by room)",
1509+
)
1510+
p_mined.add_argument("--wing", help="Show only this wing")
1511+
1512+
def _nonneg_int(value: str) -> int:
1513+
# Reject negative --limit values; argparse's bare type=int would
1514+
# silently accept e.g. -1 and produce nonsensical "... -2 more"
1515+
# output (Copilot finding on jphein/mempalace#4).
1516+
try:
1517+
n = int(value)
1518+
except (TypeError, ValueError):
1519+
raise argparse.ArgumentTypeError(f"expected non-negative integer, got {value!r}")
1520+
if n < 0:
1521+
raise argparse.ArgumentTypeError(f"--limit must be >= 0 (got {n})")
1522+
return n
1523+
1524+
p_mined.add_argument(
1525+
"--limit",
1526+
type=_nonneg_int,
1527+
default=50,
1528+
help="Show at most this many sources per wing (default 50; 0 means show all)",
1529+
)
1530+
14091531
args = parser.parse_args()
14101532

14111533
if not args.command:
@@ -1444,6 +1566,7 @@ def main():
14441566
"migrate": cmd_migrate,
14451567
"purge": cmd_purge,
14461568
"status": cmd_status,
1569+
"mined": cmd_mined,
14471570
}
14481571
dispatch[args.command](args)
14491572

0 commit comments

Comments
 (0)