docs(spec): postgres + pgvector + apache age substrate migration#19
Merged
Conversation
Full substrate-swap design for moving mempalace from ChromaDB + sqlite-KG to a unified Postgres deployment with pgvector for embeddings and Apache AGE for the knowledge graph. Composes on top of upstream's existing backend-seam work rather than reinventing it. Compositions with upstream: - MemPalace#413 (BaseCollection seam, merged) — foundation - MemPalace#665 (pgvector / pg_sorted_heap backend, OPEN) — storage half, primary anchor; fork-port if it stalls - MemPalace#574 (LanceDB substrate-swap + `mempalace migrate` ergonomics, OPEN) — template for the migration CLI shape + dimension-mismatch guard Genuinely fork-side: - `mempalace.knowledge_graph_age` — AGE-backed KG (Cypher MERGE for entities, CREATE for edges; properties carry today's sqlite-KG columns: relation_type, source, valid_from, valid_to, confidence) - `mempalace migrate-to-postgres` CLI — 7-phase, restartable, idempotent migration tool; checkpoint per phase in backend_meta; full ChromaDB palace + sqlite KG → unified Postgres in ~7-15 min for the canonical 160K-drawer palace Operational shape: - Offline batch + cutover (decided against dual-write streaming): brief daemon downtime, rollback is "restart daemon on the backup palace" - Backend selection via MEMPALACE_BACKEND=postgres + MEMPALACE_KG_BACKEND=age env vars; readable from config.json too - Composes with palace-daemon's existing /search /context /graph endpoints — substrate swap happens below the BaseCollection ABC, so the daemon's HTTP surface is unchanged - Rows 33/34 daemon-strict routing is backend-agnostic — JSON-RPC forwarding to /mcp doesn't open a local backend in strict mode Non-goals (deferred to v2): - Embedder pluggability (v1 stays on ONNX MiniLM-L6-v2 / 384d fixed) - Backup endpoint rewrite (palace-daemon territory, not mempalace) - Concurrent online migration with dual-write Next: writing-plans skill to decompose the implementable surfaces (pg backend integration, AGE KG, migration tool) into ordered tasks with file paths and test coverage targets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a fork-side design spec describing how MemPalace would migrate from the current ChromaDB + SQLite knowledge graph substrate to a unified Postgres deployment using pgvector (embeddings) and Apache AGE (knowledge graph), including schema, migration phases, cutover, rollback, and testing strategy.
Changes:
- Introduces an end-to-end architecture for a Postgres-based backend + AGE-backed knowledge graph.
- Proposes a Postgres schema (drawers/closets/meta) and indexing strategy.
- Specifies a restartable, checkpointed
mempalace migrate-to-postgresCLI with operational cutover/rollback steps and a test plan.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+102
to
+110
| embedding vector(384), -- nullable: closets may not have one | ||
| source_file text, | ||
| mtime double precision, | ||
| normalize_version integer, | ||
| importance integer DEFAULT 0 | ||
| ); | ||
|
|
||
| CREATE INDEX drawers_embedding_hnsw_idx ON mempalace.drawers | ||
| USING hnsw (embedding vector_cosine_ops) |
Comment on lines
+132
to
+135
| topic text, | ||
| metadata jsonb NOT NULL DEFAULT '{}' | ||
| ); | ||
|
|
Comment on lines
+201
to
+205
| Each phase wrapped in a transaction. Phase checkpoint written to `backend_meta` on success. Re-run with `--resume` (or just rerunning the same command — `--resume` is the default behaviour when checkpoint rows exist) skips completed phases. | ||
|
|
||
| | # | Phase | Action | Idempotent? | Checkpoint key | | ||
| |---|-------|--------|-------------|----------------| | ||
| | 0 | **Preflight** | Connect to both source and target. Verify `pgvector` + `age` extensions are installed. Check that ChromaDB palace embedding-model/dim matches what we'll write into `backend_meta`. Check `PALACE_DAEMON_URL` is not reachable (refuses to proceed if daemon is responsive — would silently lose writes). | Yes (read-only checks) | — | |
Comment on lines
+207
to
+208
| | 2 | **Drawers** | Stream `chromadb.col.get(include=["embeddings","documents","metadatas"])` in batches of `--batch-size`. Build `unnest(...)` arrays. One `INSERT ... SELECT FROM unnest(...) ON CONFLICT (id) DO NOTHING` per batch. Track running count in `backend_meta.migration_drawer_count` for visibility. | Yes (`ON CONFLICT DO NOTHING`) | `migration_phase_drawers` | | ||
| | 3 | **Closets** | Same shape as drawers, smaller cardinality. Usually one or two batches. | Yes | `migration_phase_closets` | |
Comment on lines
+171
to
+183
| **Edge label** `RELATION` (carries all current sqlite-KG columns as edge properties): | ||
| ```cypher | ||
| (s:Entity)-[r:RELATION { | ||
| relation_type: 'works_on', | ||
| source: 'drawer_abc123', | ||
| valid_from: '2026-04-25', | ||
| valid_to: null, | ||
| confidence: 0.9 | ||
| }]->(o:Entity) | ||
| ``` | ||
|
|
||
| Why one edge label instead of one per relation type? Mempalace's KG today stores `relation_type` as a string column on a single `kg_triples` table — already homogeneous. Keeping AGE's edge label uniform preserves that simplicity; queries can filter on `r.relation_type` exactly as the current sqlite KG queries filter on `relation_type =`. A future redesign could shard relation types into separate edge labels for performance, but that's not needed today. | ||
|
|
Comment on lines
+216
to
+218
| - Each phase transactional. Crash mid-phase → re-run picks up where it left off based on the `migration_*_count` watermark or `IF NOT EXISTS` semantics. | ||
| - Phase 2/3 use `ON CONFLICT DO NOTHING` — re-running over already-copied rows is a no-op, not an error. | ||
| - Phase 5's edge creation is the most fragile (no natural primary key on edges in AGE). The watermark count + a "skip first N triples" resume strategy avoids duplicates. |
3 tasks
This was referenced May 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full substrate-swap design for moving mempalace from ChromaDB + sqlite-KG to a unified Postgres deployment with pgvector for embeddings and Apache AGE for the knowledge graph. One DB, two extensions, one connection pool, one backup story, one ACID boundary.
What's in the spec
mempalace.backends.postgres+mempalace.knowledge_graph_age→ Postgres 15+ withpgvector+agedrawers(vector(384) embeddings, HNSW + GIN + partial BTrees backing fork rows 1 / 7),closets,backend_meta(version + dimension-mismatch guard), AGE graphmempalace_kgmempalace migrate-to-postgresCLI, 7 phases, restartable, idempotent, checkpointed viabackend_meta. Targets ~7-15 min for the canonical 160K-drawer palace..chromadb-backup-<date>directory preserved indefinitely; "restart daemon on backup" is a deterministic recovery pathUpstream composition
BaseCollectionABCpgvectorbackendmempalace migrateergonomics + dimension-mismatch guardComments will go up on MemPalace#665 and MemPalace#574 after this spec merges to main, pointing at the stable URL.
Test plan
Out of scope / deferred
🤖 Generated with Claude Code