refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext#1014
Conversation
…ry, PalaceContext Lands the read-side contract so third-party adapter authors (@Perseusxrltd, @JakobSachs, @adv3nt3, @zendesk-thittesdorf, @mfhens, @roip, @MrDys) have a stable target matching what RFC 001 §10 landed on the write side in #995. Scope (this PR): - mempalace/sources/base.py: BaseSourceAdapter ABC with kwargs-only ingest() / describe_schema() and default is_current() / source_summary() / close() (§1.1–1.2). Typed records: SourceRef, SourceItemMetadata, DrawerRecord, RouteHint, SourceSummary, AdapterSchema, FieldSpec (§1.3, §5.2). Error classes: SourceNotFoundError, AuthRequiredError, AdapterClosedError, TransformationViolationError, SchemaConformanceError (§2.7). Class-level identity contract: name / adapter_version / capabilities / supported_modes / declared_transformations / default_privacy_class (§2.1, §1.4, §1.5, §6). - mempalace/sources/transforms.py: reference implementations of the 13 reserved transformations (§1.4) — utf8_replace_invalid, newline_normalize, whitespace_trim, whitespace_collapse_internal, line_trim, line_join_spaces, blank_line_drop — as pure functions, plus identity shims for the six adapter-specific ones (strip_tool_chrome, tool_result_truncate, tool_result_omitted, spellcheck_user, synthesized_marker, speaker_role_assignment) that the conversations adapter will override when migrated. get_transformation(name) resolves by reserved name. - mempalace/sources/registry.py: entry-point discovery via importlib.metadata.entry_points(group="mempalace.sources") + explicit register()/unregister() surface (§3.1–3.2). resolve_adapter_for_source() implements the §3.3 priority order; crucially, no auto-detection on the read side (§3.3 is explicit about that — user intent never inferred from on-disk artifacts). - mempalace/sources/context.py: PalaceContext facade (§9) bundling the drawer/closet collections, knowledge graph, palace path, adapter identity, and progress hooks core passes into adapter.ingest(). upsert_drawer() applies the spec-mandated adapter_name/adapter_version stamps from §5.1. skip_current_item() signals laziness; emit() dispatches to hooks and swallows hook exceptions. - mempalace/knowledge_graph.py: add_triple() gains optional source_drawer_id and adapter_name kwargs (§5.5). Backwards-compatible column migration auto-adds the new columns on open of a pre-RFC 002 palace (PRAGMA table_info then ALTER TABLE ADD COLUMN), matching the pattern used for any new palace-side provenance fields. - pyproject.toml: mempalace.sources entry-point group declared. Empty on the first-party side for now — miners migrate in a follow-up; the group being present means third-party packages can begin registering today. Out of scope (explicit follow-ups): - miner.py → mempalace/sources/filesystem.py. Behavior-preserving rename that also moves READABLE_EXTENSIONS, detect_room(), detect_hall() into the adapter (§9). Larger refactor; lands separately. - convo_miner.py + normalize.py → mempalace/sources/conversations.py. The format-detection if-chain in normalize.py becomes per-format plugins; declared_transformations enumerates what the current pipeline already does to source bytes (§1.4 existing-code mapping). - Closet post-step wired into the conversations adapter (§1.7). - CLI --source flag + --mode deprecation alias (§3.3). - MCP mempalace_mine tool source parameter. - AbstractSourceAdapterContractSuite (§7.1–7.3): byte-preservation round- trip and declared-transformation round-trip tests. - Privacy-class floor enforcement (§6.2); depends on #389 for secrets_possible scanning. Tests: 1018 passed (up from ~990 on develop), +27 targeted tests covering the ABC instantiation rules, typed records, all reserved transformations, the registry register/get/unregister surface, PalaceContext upsert + skip + emit semantics, and both the new KG provenance kwargs and backwards- compatible legacy-schema migration. Refs: #989 (RFC 002 tracking), #990 (RFC 002 spec), #995 (RFC 001 §10 cleanup — sibling PR on the write side).
There was a problem hiding this comment.
Pull request overview
Introduces the RFC 002 “read-side” source adapter scaffolding so third-party adapters can plug into MemPalace via a stable mempalace.sources contract (mirroring the RFC 001 backend seam).
Changes:
- Added
mempalace.sourcespublic API:BaseSourceAdapterABC, typed ingest records, adapter registry, andPalaceContextfacade. - Implemented reserved transformation reference functions + a resolver in
mempalace/sources/transforms.py. - Extended
KnowledgeGraph.add_triple()with optional provenance fields and added an on-open SQLite schema migration for legacy palaces.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
mempalace/sources/base.py |
Defines the adapter contract (ABC), typed records, and source-adapter error types. |
mempalace/sources/context.py |
Adds the PalaceContext facade and drawer upsert helper used by adapters. |
mempalace/sources/registry.py |
Adds registry + entry-point discovery for mempalace.sources adapters. |
mempalace/sources/transforms.py |
Adds reference implementations for reserved transformations + lookup helper. |
mempalace/sources/__init__.py |
Exposes the new sources subsystem public surface. |
mempalace/knowledge_graph.py |
Adds provenance kwargs to add_triple() and schema auto-migration for new columns. |
pyproject.toml |
Declares the mempalace.sources entry-point group. |
tests/test_sources.py |
Adds targeted tests for the new source adapter scaffolding and KG migration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RESERVED_TRANSFORMATIONS: dict[str, Callable[..., str]] = { | ||
| "utf8_replace_invalid": utf8_replace_invalid, | ||
| "newline_normalize": newline_normalize, | ||
| "whitespace_trim": whitespace_trim, | ||
| "whitespace_collapse_internal": whitespace_collapse_internal, |
There was a problem hiding this comment.
Type annotations for the reserved transformation registry imply every function is callable like (...)->str, but the registry includes both a bytes->str transform (utf8_replace_invalid) and str->str transforms. Consider introducing a dedicated Transformation type/Protocol (or using overloads / Callable[[Any], str]) so static type checkers and adapter authors don’t get misleading signatures.
| """Deterministic drawer id: ``<sha1(source_file)>_<chunk_index>``. | ||
|
|
||
| Matches the shape existing miners rely on (``source_file`` + chunk index | ||
| pair) while keeping the id chroma-safe (no separators that collide with | ||
| existing metadata values). Adapters that need a different id scheme can | ||
| bypass :meth:`PalaceContext.upsert_drawer` and write through | ||
| ``drawer_collection.upsert`` directly. | ||
| """ | ||
| import hashlib | ||
|
|
||
| digest = hashlib.sha1(record.source_file.encode("utf-8")).hexdigest()[:16] |
There was a problem hiding this comment.
Drawer IDs are derived from sha1(source_file) truncated to 16 hex chars (64 bits). That increases the (small but real) risk of collisions and overwriting drawers compared to the existing miners’ sha256-based IDs (typically 24 hex chars). Consider switching to sha256 and a longer prefix (or reusing the existing drawer-id scheme) to keep collision risk negligible.
| """Deterministic drawer id: ``<sha1(source_file)>_<chunk_index>``. | |
| Matches the shape existing miners rely on (``source_file`` + chunk index | |
| pair) while keeping the id chroma-safe (no separators that collide with | |
| existing metadata values). Adapters that need a different id scheme can | |
| bypass :meth:`PalaceContext.upsert_drawer` and write through | |
| ``drawer_collection.upsert`` directly. | |
| """ | |
| import hashlib | |
| digest = hashlib.sha1(record.source_file.encode("utf-8")).hexdigest()[:16] | |
| """Deterministic drawer id: ``<sha256(source_file)[:24]>_<chunk_index>``. | |
| Matches the shape existing miners rely on (``source_file`` + chunk index | |
| pair) while keeping the id chroma-safe (no separators that collide with | |
| existing metadata values). Using a longer SHA-256 prefix keeps collision | |
| risk negligible while preserving the existing id layout. Adapters that | |
| need a different id scheme can bypass :meth:`PalaceContext.upsert_drawer` | |
| and write through ``drawer_collection.upsert`` directly. | |
| """ | |
| import hashlib | |
| digest = hashlib.sha256(record.source_file.encode("utf-8")).hexdigest()[:24] |
|
|
||
|
|
||
| def test_source_ref_options_default_is_empty_dict(): | ||
| # Frozen dataclass must not share default_factory=list instance across instances. |
There was a problem hiding this comment.
This test comment mentions default_factory=list, but SourceRef.options uses default_factory=dict. Updating the comment avoids confusion about what’s being validated (non-shared dict state across instances).
| # Frozen dataclass must not share default_factory=list instance across instances. | |
| # Frozen dataclass must not share a default_factory=dict instance across instances. |
| """) | ||
| self._migrate_schema(conn) | ||
| conn.commit() |
There was a problem hiding this comment.
On a brand-new DB, _init_db creates triples without the new RFC 002 columns and then immediately runs _migrate_schema to ALTER TABLE them in. Consider adding source_drawer_id and adapter_name directly to the CREATE TABLE IF NOT EXISTS triples definition (keeping _migrate_schema for legacy palaces) so the canonical schema is self-contained and avoids extra ALTERs on fresh installs.
| # now, we provide identity shims that raise if invoked without adapter-supplied | ||
| # context. Adapters that declare these MUST either override with a concrete | ||
| # implementation or provide a namespaced reference under | ||
| # ``mempalace.sources.transforms.<adapter_name>_<transform_name>`` (per the | ||
| # module docstring). The conformance suite looks up the adapter-specific | ||
| # implementation first, falling back to these only when none exists. |
There was a problem hiding this comment.
The comment above these adapter-specific transformations says the identity shims "raise if invoked" without adapter context, but the implementations currently just return the input unchanged. Please align the comment with the actual behavior (identity shims), or make the functions raise so the conformance suite can’t silently accept missing adapter-specific references.
| # now, we provide identity shims that raise if invoked without adapter-supplied | |
| # context. Adapters that declare these MUST either override with a concrete | |
| # implementation or provide a namespaced reference under | |
| # ``mempalace.sources.transforms.<adapter_name>_<transform_name>`` (per the | |
| # module docstring). The conformance suite looks up the adapter-specific | |
| # implementation first, falling back to these only when none exists. | |
| # now, we provide identity shims that leave the input unchanged when no | |
| # adapter-specific implementation is available. Adapters that declare these | |
| # MUST either override with a concrete implementation or provide a namespaced | |
| # reference under | |
| # ``mempalace.sources.transforms.<adapter_name>_<transform_name>`` (per the | |
| # module docstring). The conformance suite looks up the adapter-specific | |
| # implementation first, falling back to these identity shims only when none | |
| # exists. |
Five findings from the automated review, fixed with targeted tests where behavior changed: 1. Transformation Protocol (transforms.py). The registry mixed a bytes-to-str transform (utf8_replace_invalid) with str-to-str transforms under a single Callable[..., str] type, misleading static type checkers and adapter authors. Introduced a Transformation Protocol with __call__(data: bytes|str) -> str and retyped the registry + get_transformation return. 2. Drawer-id collision risk (context.py). Switched _build_drawer_id from sha1[:16]=64 bits to sha256[:24]=96 bits. 64 bits sits uncomfortably close to the birthday bound for palace-sized corpora; 96 bits keeps the collision probability negligible while preserving the existing <prefix>_<chunk> layout adapters rely on. 3. Fresh-schema KG columns (knowledge_graph.py). source_drawer_id and adapter_name now live in the canonical CREATE TABLE so new palaces don't take an ALTER round-trip on first open. _migrate_schema stays for legacy palaces (SQLite has no ADD COLUMN IF NOT EXISTS, so PRAGMA introspection is still needed there). 4. Identity-shim comment (transforms.py). Comment said the adapter-specific transforms "raise if invoked without adapter context" but they return the input unchanged. Updated the comment to match the actual identity- shim behavior Copilot suggested. 5. Test docstring (test_sources.py). Comment mentioned default_factory=list but SourceRef.options uses default_factory=dict. Corrected. Tests: 1020 passed (up from 1018), +2 new tests for the sha256 id shape and the fresh-schema column presence on new palaces.
|
Thanks @copilot — all five review items addressed in 89904ed. Recap:
Tests: 1020 passed (up from 1018), +2 targeted tests. Lint + format clean. |
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…uard Merges MemPalace#990 (RFC 002 spec), MemPalace#1014 (BaseSourceAdapter/PalaceContext scaffolding), MemPalace#1013 (Layer3.search_raw None guard), MemPalace#1012 (docs), MemPalace#1010 (chromadb >=1.5.4), and MemPalace#998 (sweeper/tandem transcript safety net). Fork changes preserved: - quarantine_stale_hnsw() in chroma.py (guards HNSW/sqlite drift segfault) - get-then-create instead of get_or_create (guards ChromaDB 1.5.x metadata segfault) - paginated status() loop (guards SQLite variable limit on large palaces) - searcher hits-loop, BM25 fallback, _count_in_scope - .jsonl exempt from JUNK_FILE_SIZE cap (Claude Code transcripts can be large) - _validate_where() + operator constants taken from upstream Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Version bumps across pyproject.toml, mempalace/version.py, README badge, uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*). CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop fix + tandem sweeper (#998), None-metadata guards (#999, #1013), chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681), HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).
…#1484 Four issues raised in the automated review (2026-05-13T01:40Z): 1. **opencode_session_version missing from metadata** (high) `is_current()` at opencode.py:391 compares `existing_metadata.get( "opencode_session_version")` against the new `SourceItemMetadata.version`. Without the metadata key being written on first ingest, the comparison always falls back to "exists → current" and incremental ingest can never detect updates to existing sessions. Now populated as `str(time_updated or time_created or 0)` — same value as the version yielded in SourceItemMetadata above. 2. **PalaceContext._skip_requested encapsulation violation** (medium) The adapter was reading and writing the private flag directly. Added `PalaceContext.is_skip_requested()` public method (read-only) so adapters can short-circuit expensive work (SQL query, transcript build, chunking) when core has signaled skip. Core still owns the reset — adapters MUST NOT clear it, per the new docstring. This is a small companion change to the upstream RFC 002 scaffolding (MemPalace#1014); justified because the spec's "core checks between yields" pattern doesn't hold for Python generators (the adapter's code runs between yields, not core's). The check needs to be available to the adapter. 3. **filed_at generated inside chunk loop** (medium) For consistency across chunks of the same session, `filed_at` is now computed once per session and reused for every chunk's metadata. Also pre-computes `session_version` for the same reason. 4. **PEP 8 import placement** (medium) `import json as _json` was mid-file in transforms.py; hoisted to the top with the other imports. Also removed an unused `import json` from opencode.py that ruff caught. Tests: 57 pass (28 opencode + 29 base sources); ruff clean on all three modified files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…daemon-routed integration recipe (#106) * feat(sources): OpenCode adapter on RFC 002 contract Adds mempalace/sources/opencode.py — an OpenCodeSourceAdapter subclass of BaseSourceAdapter that ingests OpenCode AI-coding-CLI session transcripts from OpenCode's local SQLite store (~/.local/share/opencode/opencode.db) into the palace as DrawerRecords formatted to match convo_miner's exchange-pair shape. The adapter: * Yields SourceItemMetadata then DrawerRecords per session. * Each session becomes one source_file shaped as opencode://<absolute-db-path>#session=<sid>; chunks are chunked_content exchange-pair drawers. * Declares 8 transformations (6 opencode-namespaced + 2 reserved); every name resolves to a reference implementation on mempalace.sources.transforms per RFC 002 §7.3. * Implements is_current honoring opencode_session_version when present, falling back to "metadata exists → assume current" for append-only safety on older drawers. * Routes wing from session.directory basename (or explicit options['wing'] override); room from detect_convo_room on the rendered transcript; hall from convo_miner._detect_hall_cached. * Stamps universal §5.1 metadata (wing, room, hall, filed_at, added_by, ingest_mode, extract_mode, privacy_class) plus the declared per-adapter schema (session_id, session_title, project_dir, session_created_at, message_count, opencode_db_path). * default_privacy_class = "pii_potential" — AI sessions leak everything; users opt in explicitly to laxer floors. mempalace/sources/transforms.py: adds 6 opencode-namespaced transformations (extract_text_parts, skip_tool_echo, skip_file_injection, role_coerce, same_role_merge, format_exchange). Each operates on the role-tab-prefixed line stream the adapter's canonical_source_bytes produces; declared in declaration order so the conformance round-trip test reproduces drawer content exactly. pyproject.toml: registers the adapter under the [project.entry-points."mempalace.sources"] group as opencode = "mempalace.sources.opencode:OpenCodeSourceAdapter". tests/test_sources_opencode.py: 28 tests covering * class identity, capabilities, schema shape * SourceNotFoundError on missing DB / missing tables * AdapterClosedError after close() * source_summary item count + missing-DB path * ingest yields metadata then drawers per session * cancelled / single-turn sessions skipped * universal + schema metadata fields on every drawer (flat-scalar) * RouteHint carries wing + room * wing routing groups by session.directory * explicit options['wing'] wins over directory derivation * skip_current_item short-circuits drawer emit per RFC 002 §1.2 * is_current with/without opencode_session_version * tool-input / tool-output / tool-echo / file-injection parts are stripped from drawer content * declared-transformation round-trip reproduces chunk content (RFC 002 §7.3) * empty DB, single-message session edge cases * Unicode (BMP + non-BMP) preserved through transcript * registry resolves the adapter when registered explicitly * byte_preserving capability is NOT advertised (declared-lossy) tests/fixtures/opencode/sample_session_2026_05_12/: builder script and README documenting the live opencode-ai 1.14.39 schema captured verbatim from JP's local install on 2026-05-12. No recorded .db ships (real-session content is unsanitizable user-private data); build_fixture.py reproduces the schema and populates it with synthetic-but-realistic exchanges the tests consume. tests/test_corpus_origin_integration.py: extends the §-section allowlist to include the new test file (existing allowlist already covers mempalace/sources/). Reverse-engineering credit: the OpenCode SQLite schema, json_extract paths, tool-echo / file-injection skip filters, and same-role merge originated in @JakobSachs's PR #23 (feat: add OpenCode SQLite session database support, base=develop). This adapter rebuilds those primitives on the RFC 002 contract so OpenCode support can ship as a registered adapter rather than as a normalize.py branch — see #23 coordination thread. Test suite: 1876 passed, 7 skipped, 106 deselected (28 new opencode tests, no regressions). Co-authored-by: Jakob Sachs <28728963+JakobSachs@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(sources/opencode): address Gemini Code Assist review on MemPalace#1484 Four issues raised in the automated review (2026-05-13T01:40Z): 1. **opencode_session_version missing from metadata** (high) `is_current()` at opencode.py:391 compares `existing_metadata.get( "opencode_session_version")` against the new `SourceItemMetadata.version`. Without the metadata key being written on first ingest, the comparison always falls back to "exists → current" and incremental ingest can never detect updates to existing sessions. Now populated as `str(time_updated or time_created or 0)` — same value as the version yielded in SourceItemMetadata above. 2. **PalaceContext._skip_requested encapsulation violation** (medium) The adapter was reading and writing the private flag directly. Added `PalaceContext.is_skip_requested()` public method (read-only) so adapters can short-circuit expensive work (SQL query, transcript build, chunking) when core has signaled skip. Core still owns the reset — adapters MUST NOT clear it, per the new docstring. This is a small companion change to the upstream RFC 002 scaffolding (MemPalace#1014); justified because the spec's "core checks between yields" pattern doesn't hold for Python generators (the adapter's code runs between yields, not core's). The check needs to be available to the adapter. 3. **filed_at generated inside chunk loop** (medium) For consistency across chunks of the same session, `filed_at` is now computed once per session and reused for every chunk's metadata. Also pre-computes `session_version` for the same reason. 4. **PEP 8 import placement** (medium) `import json as _json` was mid-file in transforms.py; hoisted to the top with the other imports. Also removed an unused `import json` from opencode.py that ruff caught. Tests: 57 pass (28 opencode + 29 base sources); ruff clean on all three modified files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(sources/opencode): address @igorls review on MemPalace#1484 Three blockers + one minor cleanup from the maintainer review at 2026-05-13T02:52Z: 1. **ruff F401 — unused `os` import** in tests/test_sources_opencode.py:17 Dropped. No call sites used it. 2. **ruff E402 — module-level import not at top** in tests The `sys.path.insert(0, FIXTURE_DIR); import build_fixture` pattern tripped E402 (the `# noqa: E402` was suppressing a legitimate complaint). Refactored to `importlib.util.spec_from_file_location` + `module_from_spec` per @igorls's suggestion — keeps the fixture loader at top of file with the other imports, no sys.path mutation at module scope. Also registers the loaded module in `sys.modules` so `dataclasses` and typing introspection inside the fixture builder can resolve `cls.__module__` correctly. 3. **Route-hint wing mismatch** (RFC 002 §2.5 violation) `_route_hint_for()` (lazy-fetch SourceItemMetadata stage) computed wing from `directory` only; `_wing_for()` (eager DrawerRecord stage) honored `source.options["wing"]` first. When a user passed `options={"wing": "Custom Wing"}`, the metadata hint said `"<dirname>"` while the actual drawers said `"custom_wing"` — core could make wrong skip/routing decisions on the gap. Fix: `_route_hint_for(source, directory)` now delegates to `_wing_for` so both stages apply identical precedence. 4. **Unjustified `# noqa: F401` on `AuthRequiredError`** (minor) The import claimed re-export "used in docstrings" but `__all__` only exposes `OpenCodeSourceAdapter` + `session_source_file`. Dropped the import + the noqa. Tests: 57 pass (28 opencode + 29 base sources); ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style(sources/opencode): ruff format with CI's ruff 0.4.x CI's lint job ran on commit 13353d9 and failed `ruff format --check .` even though local `ruff format --check` was clean. Cause: ruff version mismatch — CI installs `>=0.4.0,<0.5` (per ci.yml lint job), local env has ruff 0.15.12. Different major versions format differently; 0.15-formatted source isn't 0.4.x-format-clean. Reformatted `mempalace/sources/opencode.py` and `tests/test_sources_opencode.py` with `uvx --from "ruff>=0.4.0,<0.5" ruff format` so CI's check passes. Changes are whitespace-only — no semantic diff. Tests still pass 28/28. Lint clean under 0.4.x. The 29 other files that local ruff 0.15.12 wants to reformat are upstream's own files and pass upstream's CI as-is; left untouched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style(tests/fixtures/opencode): ruff format build_fixture.py with 0.4.x Missed in the previous format pass (f94e3fe) — only touched the two top-level files. CI's `ruff format --check .` scans the whole tree and caught it. Whitespace-only changes. * feat: add OpenCode MCP integration for MemPalace * fix: use python -m mempalace.mcp_server for robustness * docs(integrations): OpenCode integration recipe + cherry-pick fork-changes entries Adds the three-direction OpenCode + MemPalace integration recipe: - ``docs/integrations/opencode.md`` — full setup guide covering the read (MCP), push (live-capture plugin), and pull (retrospective backfill) paths for daemon-routed deployments. - ``examples/opencode/opencode.jsonc.example`` — copy-paste user config pointing at the palace-daemon wrapper. - ``examples/opencode/option-k-plugin-daemon-routing.patch`` — a re-applicable diff for option-K's ``opencode-plugin-mempalace`` v1.2.1 issue #1 (isInitialized passes ``--palace`` which bypasses ``PALACE_DAEMON_URL`` routing). Also adds two fork-changes.yaml entries for the cherry-picked upstream PRs already in this branch: - ``opencode-mcp-config-cherry-pick-1567`` (commit ba16b82) - ``opencode-source-adapter-cherry-pick-1484`` (commit 2ffe652) The recipe's own fork-changes.yaml entry is added in the next commit once this commit's SHA is known (avoids the self-referencing-commit anti-pattern flagged in the worktree handoff). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): add opencode-integration-recipe entry pointing at 60dc9e6 Companion to 60dc9e6 (the OpenCode integration recipe commit). Split out per the worktree handoff to avoid the self-referencing-commit-SHA anti-pattern: the YAML entry now points at the prior docs commit, not at itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(opencode-integration): bundled live-capture plugin + split option-K patches The previously combined option-K patch (`option-k-plugin-daemon-routing.patch`) mixed two unrelated fixes against two different files and was failing `patch --dry-run` once Fix 1 was applied. Split into: - `option-k-plugin-daemon-routing.patch` — Fix 1 only (mempalace-cli.js, isInitialized daemon detection, option-K#1). - `option-k-plugin-message-updated.patch` — Fix 2 (index.js, subscribe to `message.updated` instead of the non-existent `chat.message`, filed upstream as option-K#4). End-to-end testing with both patches applied surfaced a third bug (option-K#5): the plugin's `mempalace mine <dir>` call hits the daemon, which evaluates `<dir>` against ITS OWN filesystem. For remote-daemon setups (palace-daemon on a different host from OpenCode) the path doesn't exist on the daemon's filesystem and the call returns 400. The option-K plugin is architecturally incompatible with multi-host deployments. Ships a self-contained replacement at `examples/opencode/live-capture/`: - `mempalace-live-capture.js` — minimal OpenCode plugin that subscribes to session.idle / session.deleted / session.status[idle] and spawns the Python helper. Detached subprocess, debounced per session, logs to ~/.local/share/opencode/mempalace-live-capture.log. - `capture-session.py` — Python helper that reads OpenCode's local SQLite session DB, extracts the role-pair transcript via the in-tree `OpenCodeSourceAdapter` helpers, and POSTs to the daemon's `/silent-save` endpoint. Stdlib-only, no extra pip deps. Verified end-to-end against the canonical daemon at disks.jphe.in:8085: a fresh opencode session ends with the transcript landing in wing_opencode_<basename>/room=diary, retrievable via mempalace_search. `docs/integrations/opencode.md` now documents both deployment paths (bundled plugin for remote-daemon, option-K + patches for local palaces) and explicitly notes that `experimental.chat.system.transform` does not exist in the OpenCode plugin API (so per-turn system-prompt injection is not available; agents recall memories via explicit MCP tool calls). Filed: - option-K/opencode-plugin-mempalace#4 - option-K/opencode-plugin-mempalace#5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): add commit ref for opencode-live-capture-plugin entry Closes the YAML→render loop: scripts/check-docs.sh now verifies the commit hash resolves and FORK_CHANGELOG.md matches the manifest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Jakob Sachs <28728963+JakobSachs@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Dxrk System <dxrk@local>
…#1484 Four issues raised in the automated review (2026-05-13T01:40Z): 1. **opencode_session_version missing from metadata** (high) `is_current()` at opencode.py:391 compares `existing_metadata.get( "opencode_session_version")` against the new `SourceItemMetadata.version`. Without the metadata key being written on first ingest, the comparison always falls back to "exists → current" and incremental ingest can never detect updates to existing sessions. Now populated as `str(time_updated or time_created or 0)` — same value as the version yielded in SourceItemMetadata above. 2. **PalaceContext._skip_requested encapsulation violation** (medium) The adapter was reading and writing the private flag directly. Added `PalaceContext.is_skip_requested()` public method (read-only) so adapters can short-circuit expensive work (SQL query, transcript build, chunking) when core has signaled skip. Core still owns the reset — adapters MUST NOT clear it, per the new docstring. This is a small companion change to the upstream RFC 002 scaffolding (MemPalace#1014); justified because the spec's "core checks between yields" pattern doesn't hold for Python generators (the adapter's code runs between yields, not core's). The check needs to be available to the adapter. 3. **filed_at generated inside chunk loop** (medium) For consistency across chunks of the same session, `filed_at` is now computed once per session and reused for every chunk's metadata. Also pre-computes `session_version` for the same reason. 4. **PEP 8 import placement** (medium) `import json as _json` was mid-file in transforms.py; hoisted to the top with the other imports. Also removed an unused `import json` from opencode.py that ruff caught. Tests: 57 pass (28 opencode + 29 base sources); ruff clean on all three modified files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Lands the read-side plugin contract so third-party source adapters can publish
pip install mempalace-source-<name>packages against a stable target, matching what #995 did for storage backends on the write side.Sibling to RFC 001 / #743 (write) and RFC 002 / #990 (spec). Tracking issue: #989.
What's in this PR
mempalace/sources/base.py—BaseSourceAdapterABC with kwargs-onlyingest()/describe_schema()and default implementations ofis_current()/source_summary()/close()(§1.1–1.2). Typed records:SourceRef,SourceItemMetadata,DrawerRecord,RouteHint,SourceSummary,AdapterSchema,FieldSpec(§1.3, §5.2). Error classes (§2.7). Class-level identity contract:name,adapter_version,capabilities,supported_modes,declared_transformations,default_privacy_class(§2.1, §1.4, §1.5, §6).mempalace/sources/transforms.py— reference implementations of the 13 reserved transformations (§1.4):utf8_replace_invalid,newline_normalize,whitespace_trim,whitespace_collapse_internal,line_trim,line_join_spaces,blank_line_dropas pure functions. The six adapter-specific ones (strip_tool_chrome,tool_result_truncate,tool_result_omitted,spellcheck_user,synthesized_marker,speaker_role_assignment) ship as identity shims the conversations adapter will override when migrated.get_transformation(name)resolves reserved names.mempalace/sources/registry.py— entry-point discovery viaimportlib.metadata.entry_points(group="mempalace.sources")+register()/unregister()(§3.1–3.2).resolve_adapter_for_source()implements the §3.3 priority order. Crucially: no auto-detection on the read side (§3.3 is explicit about that — user intent never inferred from on-disk artifacts).mempalace/sources/context.py—PalaceContextfacade (§9) bundling drawer/closet collections, knowledge graph, palace path, adapter identity, and progress hooks.upsert_drawer()applies the spec-mandatedadapter_name/adapter_versionstamps from §5.1 so adapters don't need to populate them.skip_current_item()signals laziness.emit()dispatches to hooks and swallows hook exceptions.mempalace/knowledge_graph.py—add_triple()gains optionalsource_drawer_idandadapter_namekwargs (§5.5). Backwards-compatible schema migration auto-adds the new columns on open of a pre-RFC 002 palace (PRAGMAtable_info→ALTER TABLE ADD COLUMN), so existing palaces upgrade transparently.pyproject.toml—mempalace.sourcesentry-point group declared (empty on the first-party side for now). Third-party packages can begin registering today; the group being declared is the enabling bit.Explicitly out of scope (follow-up PRs)
miner.py→mempalace/sources/filesystem.py. Behavior-preserving rename +READABLE_EXTENSIONS,detect_room(),detect_hall()moving into the adapter.convo_miner.py+normalize.py→mempalace/sources/conversations.py. Format-detectionif-chain becomes per-format plugins;declared_transformationsenumerates what the current pipeline already does to source bytes (§1.4 existing-code mapping).--sourceflag +--modedeprecation alias (§3.3).mempalace_minetoolsourceparameter.AbstractSourceAdapterContractSuite(§7.1–7.3): byte-preservation + declared-transformation round-trip tests.secrets_possiblescanning.Test plan
uv run python -m pytest tests/ --ignore=tests/benchmarks— 1018 passed (+27 targeted tests for this PR).uv run ruff check .— clean.uv run ruff format --check .— clean.tests/test_sources.py:TypeError)field(default_factory=dict)doesn't share state)get_transformationresolves reserved names, rejects unknownregister/get_adapter/get_adapter_class/unregister, caching semantics, unknown-nameKeyErrorresolve_adapter_for_sourcepriority order; default =filesystemPalaceContext.upsert_drawerstampsadapter_name/adapter_version/source_file/chunk_indexPalaceContext.skip_current_itemsets flag;emitdispatches and swallows hook errorsKnowledgeGraph.add_tripleaccepts new kwargs; writes to new columnsadd_triplecallers unchangedCoordination
cc @Perseusxrltd @JakobSachs @adv3nt3 @zendesk-thittesdorf @mfhens @roip @MrDys — this is the §9 spec surface called out in #989. If you're working on Cursor/OpenCode/Pi/git/factory source adapters, this is the ABC to target. Once this merges, the ~next PR migrates
miner.py/convo_miner.pyonto the same contract (so we have two first-party reference adapters) and then the in-flight source-ingester PRs can align.Refs: #989 (RFC 002 tracking), #990 (RFC 002 spec), #995 (RFC 001 §10 write-side sibling).