Skip to content

refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext#1014

Merged
igorls merged 2 commits into
developfrom
refactor/rfc-002-sources-scaffolding
Apr 18, 2026
Merged

refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext#1014
igorls merged 2 commits into
developfrom
refactor/rfc-002-sources-scaffolding

Conversation

@igorls

@igorls igorls commented Apr 18, 2026

Copy link
Copy Markdown
Member

Summary

Lands the read-side plugin contract so third-party source adapters can publish pip install mempalace-source-<name> packages against a stable target, matching what #995 did for storage backends on the write side.

Sibling to RFC 001 / #743 (write) and RFC 002 / #990 (spec). Tracking issue: #989.

What's in this PR

  • mempalace/sources/base.pyBaseSourceAdapter ABC with kwargs-only ingest() / describe_schema() and default implementations of is_current() / source_summary() / close() (§1.1–1.2). Typed records: SourceRef, SourceItemMetadata, DrawerRecord, RouteHint, SourceSummary, AdapterSchema, FieldSpec (§1.3, §5.2). Error classes (§2.7). Class-level identity contract: name, adapter_version, capabilities, supported_modes, declared_transformations, default_privacy_class (§2.1, §1.4, §1.5, §6).

  • mempalace/sources/transforms.py — reference implementations of the 13 reserved transformations (§1.4): utf8_replace_invalid, newline_normalize, whitespace_trim, whitespace_collapse_internal, line_trim, line_join_spaces, blank_line_drop as pure functions. The six adapter-specific ones (strip_tool_chrome, tool_result_truncate, tool_result_omitted, spellcheck_user, synthesized_marker, speaker_role_assignment) ship as identity shims the conversations adapter will override when migrated. get_transformation(name) resolves reserved names.

  • mempalace/sources/registry.py — entry-point discovery via importlib.metadata.entry_points(group="mempalace.sources") + register() / unregister() (§3.1–3.2). resolve_adapter_for_source() implements the §3.3 priority order. Crucially: no auto-detection on the read side (§3.3 is explicit about that — user intent never inferred from on-disk artifacts).

  • mempalace/sources/context.pyPalaceContext facade (§9) bundling drawer/closet collections, knowledge graph, palace path, adapter identity, and progress hooks. upsert_drawer() applies the spec-mandated adapter_name/adapter_version stamps from §5.1 so adapters don't need to populate them. skip_current_item() signals laziness. emit() dispatches to hooks and swallows hook exceptions.

  • mempalace/knowledge_graph.pyadd_triple() gains optional source_drawer_id and adapter_name kwargs (§5.5). Backwards-compatible schema migration auto-adds the new columns on open of a pre-RFC 002 palace (PRAGMA table_infoALTER TABLE ADD COLUMN), so existing palaces upgrade transparently.

  • pyproject.tomlmempalace.sources entry-point group declared (empty on the first-party side for now). Third-party packages can begin registering today; the group being declared is the enabling bit.

Explicitly out of scope (follow-up PRs)

  • miner.pymempalace/sources/filesystem.py. Behavior-preserving rename + READABLE_EXTENSIONS, detect_room(), detect_hall() moving into the adapter.
  • convo_miner.py + normalize.pymempalace/sources/conversations.py. Format-detection if-chain becomes per-format plugins; declared_transformations enumerates what the current pipeline already does to source bytes (§1.4 existing-code mapping).
  • Closet post-step wired into the conversations adapter (§1.7).
  • CLI --source flag + --mode deprecation alias (§3.3).
  • MCP mempalace_mine tool source parameter.
  • AbstractSourceAdapterContractSuite (§7.1–7.3): byte-preservation + declared-transformation round-trip tests.
  • Privacy-class floor enforcement (§6.2) — depends on feat: add sensitive content scanner for palace drawers #389 for secrets_possible scanning.

Test plan

  • uv run python -m pytest tests/ --ignore=tests/benchmarks1018 passed (+27 targeted tests for this PR).
  • uv run ruff check . — clean.
  • uv run ruff format --check . — clean.
  • Coverage in tests/test_sources.py:
    • ABC instantiation enforcement (missing required methods → TypeError)
    • Typed records + default values isolation (frozen dataclass field(default_factory=dict) doesn't share state)
    • All 13 reserved transformations present in the registry
    • Pure reserved transforms: correct input → output for each
    • get_transformation resolves reserved names, rejects unknown
    • Registry: register / get_adapter / get_adapter_class / unregister, caching semantics, unknown-name KeyError
    • resolve_adapter_for_source priority order; default = filesystem
    • PalaceContext.upsert_drawer stamps adapter_name / adapter_version / source_file / chunk_index
    • PalaceContext.skip_current_item sets flag; emit dispatches and swallows hook errors
    • KnowledgeGraph.add_triple accepts new kwargs; writes to new columns
    • Legacy palaces without the new columns auto-migrate on open
    • Backwards-compat: existing add_triple callers unchanged

Coordination

cc @Perseusxrltd @JakobSachs @adv3nt3 @zendesk-thittesdorf @mfhens @roip @MrDys — this is the §9 spec surface called out in #989. If you're working on Cursor/OpenCode/Pi/git/factory source adapters, this is the ABC to target. Once this merges, the ~next PR migrates miner.py / convo_miner.py onto the same contract (so we have two first-party reference adapters) and then the in-flight source-ingester PRs can align.

Refs: #989 (RFC 002 tracking), #990 (RFC 002 spec), #995 (RFC 001 §10 write-side sibling).

…ry, PalaceContext

Lands the read-side contract so third-party adapter authors (@Perseusxrltd,
@JakobSachs, @adv3nt3, @zendesk-thittesdorf, @mfhens, @roip, @MrDys) have a
stable target matching what RFC 001 §10 landed on the write side in #995.

Scope (this PR):

- mempalace/sources/base.py: BaseSourceAdapter ABC with kwargs-only
  ingest() / describe_schema() and default is_current() / source_summary()
  / close() (§1.1–1.2). Typed records: SourceRef, SourceItemMetadata,
  DrawerRecord, RouteHint, SourceSummary, AdapterSchema, FieldSpec (§1.3,
  §5.2). Error classes: SourceNotFoundError, AuthRequiredError,
  AdapterClosedError, TransformationViolationError, SchemaConformanceError
  (§2.7). Class-level identity contract: name / adapter_version /
  capabilities / supported_modes / declared_transformations /
  default_privacy_class (§2.1, §1.4, §1.5, §6).

- mempalace/sources/transforms.py: reference implementations of the 13
  reserved transformations (§1.4) — utf8_replace_invalid, newline_normalize,
  whitespace_trim, whitespace_collapse_internal, line_trim, line_join_spaces,
  blank_line_drop — as pure functions, plus identity shims for the six
  adapter-specific ones (strip_tool_chrome, tool_result_truncate,
  tool_result_omitted, spellcheck_user, synthesized_marker,
  speaker_role_assignment) that the conversations adapter will override
  when migrated. get_transformation(name) resolves by reserved name.

- mempalace/sources/registry.py: entry-point discovery via
  importlib.metadata.entry_points(group="mempalace.sources") + explicit
  register()/unregister() surface (§3.1–3.2). resolve_adapter_for_source()
  implements the §3.3 priority order; crucially, no auto-detection on the
  read side (§3.3 is explicit about that — user intent never inferred from
  on-disk artifacts).

- mempalace/sources/context.py: PalaceContext facade (§9) bundling the
  drawer/closet collections, knowledge graph, palace path, adapter identity,
  and progress hooks core passes into adapter.ingest(). upsert_drawer()
  applies the spec-mandated adapter_name/adapter_version stamps from §5.1.
  skip_current_item() signals laziness; emit() dispatches to hooks and
  swallows hook exceptions.

- mempalace/knowledge_graph.py: add_triple() gains optional source_drawer_id
  and adapter_name kwargs (§5.5). Backwards-compatible column migration
  auto-adds the new columns on open of a pre-RFC 002 palace (PRAGMA
  table_info then ALTER TABLE ADD COLUMN), matching the pattern used for
  any new palace-side provenance fields.

- pyproject.toml: mempalace.sources entry-point group declared. Empty on
  the first-party side for now — miners migrate in a follow-up; the group
  being present means third-party packages can begin registering today.

Out of scope (explicit follow-ups):

- miner.py → mempalace/sources/filesystem.py. Behavior-preserving rename
  that also moves READABLE_EXTENSIONS, detect_room(), detect_hall() into
  the adapter (§9). Larger refactor; lands separately.
- convo_miner.py + normalize.py → mempalace/sources/conversations.py. The
  format-detection if-chain in normalize.py becomes per-format plugins;
  declared_transformations enumerates what the current pipeline already
  does to source bytes (§1.4 existing-code mapping).
- Closet post-step wired into the conversations adapter (§1.7).
- CLI --source flag + --mode deprecation alias (§3.3).
- MCP mempalace_mine tool source parameter.
- AbstractSourceAdapterContractSuite (§7.1–7.3): byte-preservation round-
  trip and declared-transformation round-trip tests.
- Privacy-class floor enforcement (§6.2); depends on #389 for
  secrets_possible scanning.

Tests: 1018 passed (up from ~990 on develop), +27 targeted tests covering
the ABC instantiation rules, typed records, all reserved transformations,
the registry register/get/unregister surface, PalaceContext upsert + skip +
emit semantics, and both the new KG provenance kwargs and backwards-
compatible legacy-schema migration.

Refs: #989 (RFC 002 tracking), #990 (RFC 002 spec), #995 (RFC 001 §10
cleanup — sibling PR on the write side).
Copilot AI review requested due to automatic review settings April 18, 2026 19:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces the RFC 002 “read-side” source adapter scaffolding so third-party adapters can plug into MemPalace via a stable mempalace.sources contract (mirroring the RFC 001 backend seam).

Changes:

  • Added mempalace.sources public API: BaseSourceAdapter ABC, typed ingest records, adapter registry, and PalaceContext facade.
  • Implemented reserved transformation reference functions + a resolver in mempalace/sources/transforms.py.
  • Extended KnowledgeGraph.add_triple() with optional provenance fields and added an on-open SQLite schema migration for legacy palaces.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
mempalace/sources/base.py Defines the adapter contract (ABC), typed records, and source-adapter error types.
mempalace/sources/context.py Adds the PalaceContext facade and drawer upsert helper used by adapters.
mempalace/sources/registry.py Adds registry + entry-point discovery for mempalace.sources adapters.
mempalace/sources/transforms.py Adds reference implementations for reserved transformations + lookup helper.
mempalace/sources/__init__.py Exposes the new sources subsystem public surface.
mempalace/knowledge_graph.py Adds provenance kwargs to add_triple() and schema auto-migration for new columns.
pyproject.toml Declares the mempalace.sources entry-point group.
tests/test_sources.py Adds targeted tests for the new source adapter scaffolding and KG migration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/sources/transforms.py Outdated
Comment on lines +149 to +153
RESERVED_TRANSFORMATIONS: dict[str, Callable[..., str]] = {
"utf8_replace_invalid": utf8_replace_invalid,
"newline_normalize": newline_normalize,
"whitespace_trim": whitespace_trim,
"whitespace_collapse_internal": whitespace_collapse_internal,

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotations for the reserved transformation registry imply every function is callable like (...)->str, but the registry includes both a bytes->str transform (utf8_replace_invalid) and str->str transforms. Consider introducing a dedicated Transformation type/Protocol (or using overloads / Callable[[Any], str]) so static type checkers and adapter authors don’t get misleading signatures.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/sources/context.py Outdated
Comment on lines +129 to +139
"""Deterministic drawer id: ``<sha1(source_file)>_<chunk_index>``.

Matches the shape existing miners rely on (``source_file`` + chunk index
pair) while keeping the id chroma-safe (no separators that collide with
existing metadata values). Adapters that need a different id scheme can
bypass :meth:`PalaceContext.upsert_drawer` and write through
``drawer_collection.upsert`` directly.
"""
import hashlib

digest = hashlib.sha1(record.source_file.encode("utf-8")).hexdigest()[:16]

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drawer IDs are derived from sha1(source_file) truncated to 16 hex chars (64 bits). That increases the (small but real) risk of collisions and overwriting drawers compared to the existing miners’ sha256-based IDs (typically 24 hex chars). Consider switching to sha256 and a longer prefix (or reusing the existing drawer-id scheme) to keep collision risk negligible.

Suggested change
"""Deterministic drawer id: ``<sha1(source_file)>_<chunk_index>``.
Matches the shape existing miners rely on (``source_file`` + chunk index
pair) while keeping the id chroma-safe (no separators that collide with
existing metadata values). Adapters that need a different id scheme can
bypass :meth:`PalaceContext.upsert_drawer` and write through
``drawer_collection.upsert`` directly.
"""
import hashlib
digest = hashlib.sha1(record.source_file.encode("utf-8")).hexdigest()[:16]
"""Deterministic drawer id: ``<sha256(source_file)[:24]>_<chunk_index>``.
Matches the shape existing miners rely on (``source_file`` + chunk index
pair) while keeping the id chroma-safe (no separators that collide with
existing metadata values). Using a longer SHA-256 prefix keeps collision
risk negligible while preserving the existing id layout. Adapters that
need a different id scheme can bypass :meth:`PalaceContext.upsert_drawer`
and write through ``drawer_collection.upsert`` directly.
"""
import hashlib
digest = hashlib.sha256(record.source_file.encode("utf-8")).hexdigest()[:24]

Copilot uses AI. Check for mistakes.
Comment thread tests/test_sources.py Outdated


def test_source_ref_options_default_is_empty_dict():
# Frozen dataclass must not share default_factory=list instance across instances.

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test comment mentions default_factory=list, but SourceRef.options uses default_factory=dict. Updating the comment avoids confusion about what’s being validated (non-shared dict state across instances).

Suggested change
# Frozen dataclass must not share default_factory=list instance across instances.
# Frozen dataclass must not share a default_factory=dict instance across instances.

Copilot uses AI. Check for mistakes.
Comment on lines 95 to 97
""")
self._migrate_schema(conn)
conn.commit()

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a brand-new DB, _init_db creates triples without the new RFC 002 columns and then immediately runs _migrate_schema to ALTER TABLE them in. Consider adding source_drawer_id and adapter_name directly to the CREATE TABLE IF NOT EXISTS triples definition (keeping _migrate_schema for legacy palaces) so the canonical schema is self-contained and avoids extra ALTERs on fresh installs.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/sources/transforms.py Outdated
Comment on lines +93 to +98
# now, we provide identity shims that raise if invoked without adapter-supplied
# context. Adapters that declare these MUST either override with a concrete
# implementation or provide a namespaced reference under
# ``mempalace.sources.transforms.<adapter_name>_<transform_name>`` (per the
# module docstring). The conformance suite looks up the adapter-specific
# implementation first, falling back to these only when none exists.

Copilot AI Apr 18, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above these adapter-specific transformations says the identity shims "raise if invoked" without adapter context, but the implementations currently just return the input unchanged. Please align the comment with the actual behavior (identity shims), or make the functions raise so the conformance suite can’t silently accept missing adapter-specific references.

Suggested change
# now, we provide identity shims that raise if invoked without adapter-supplied
# context. Adapters that declare these MUST either override with a concrete
# implementation or provide a namespaced reference under
# ``mempalace.sources.transforms.<adapter_name>_<transform_name>`` (per the
# module docstring). The conformance suite looks up the adapter-specific
# implementation first, falling back to these only when none exists.
# now, we provide identity shims that leave the input unchanged when no
# adapter-specific implementation is available. Adapters that declare these
# MUST either override with a concrete implementation or provide a namespaced
# reference under
# ``mempalace.sources.transforms.<adapter_name>_<transform_name>`` (per the
# module docstring). The conformance suite looks up the adapter-specific
# implementation first, falling back to these identity shims only when none
# exists.

Copilot uses AI. Check for mistakes.
Five findings from the automated review, fixed with targeted tests where
behavior changed:

1. Transformation Protocol (transforms.py). The registry mixed a bytes-to-str
   transform (utf8_replace_invalid) with str-to-str transforms under a single
   Callable[..., str] type, misleading static type checkers and adapter
   authors. Introduced a Transformation Protocol with __call__(data: bytes|str)
   -> str and retyped the registry + get_transformation return.

2. Drawer-id collision risk (context.py). Switched _build_drawer_id from
   sha1[:16]=64 bits to sha256[:24]=96 bits. 64 bits sits uncomfortably
   close to the birthday bound for palace-sized corpora; 96 bits keeps the
   collision probability negligible while preserving the existing
   <prefix>_<chunk> layout adapters rely on.

3. Fresh-schema KG columns (knowledge_graph.py). source_drawer_id and
   adapter_name now live in the canonical CREATE TABLE so new palaces don't
   take an ALTER round-trip on first open. _migrate_schema stays for legacy
   palaces (SQLite has no ADD COLUMN IF NOT EXISTS, so PRAGMA introspection
   is still needed there).

4. Identity-shim comment (transforms.py). Comment said the adapter-specific
   transforms "raise if invoked without adapter context" but they return
   the input unchanged. Updated the comment to match the actual identity-
   shim behavior Copilot suggested.

5. Test docstring (test_sources.py). Comment mentioned default_factory=list
   but SourceRef.options uses default_factory=dict. Corrected.

Tests: 1020 passed (up from 1018), +2 new tests for the sha256 id shape
and the fresh-schema column presence on new palaces.
@igorls

igorls commented Apr 18, 2026

Copy link
Copy Markdown
Member Author

Thanks @copilot — all five review items addressed in 89904ed. Recap:

  1. Mixed signature types in RESERVED_TRANSFORMATIONS (transforms.py). Introduced a Transformation Protocol with __call__(data: bytes | str, /) -> str and retyped both the registry (dict[str, Transformation]) and get_transformation return. Static type checkers now accept the utf8_replace_invalid bytes→str shape alongside the str→str transforms cleanly.

  2. Drawer-id collision risk (context.py). Switched _build_drawer_id from sha1[:16] (64 bits) to sha256[:24] (96 bits) per your suggestion. 64 bits sits too close to the birthday bound for palace-sized corpora; 96 keeps collision probability negligible. Added a regression test that asserts both the sha256 shape and that the old sha1 scheme is no longer produced.

  3. Fresh-schema columns moved into CREATE TABLE (knowledge_graph.py). source_drawer_id and adapter_name now live in the canonical CREATE TABLE IF NOT EXISTS triples so new palaces get them directly — no ALTER round-trip on first open. _migrate_schema stays for legacy palaces (SQLite still has no ADD COLUMN IF NOT EXISTS). Added a test confirming fresh palaces have the columns without relying on the migration path.

  4. Identity-shim comment (transforms.py). Comment said the six adapter-specific transforms "raise if invoked without adapter context" but they return the input unchanged. Rewrote the comment per your suggested wording so docs and behavior agree.

  5. Test docstring typo (test_sources.py). default_factory=listdefault_factory=dict (matches SourceRef.options).

Tests: 1020 passed (up from 1018), +2 targeted tests. Lint + format clean.

Copilot AI commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • chroma-onnx-models.s3.amazonaws.com
    • Triggering command: /usr/bin/python3 python3 -m pytest tests/ --ignore=tests/benchmarks -q (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@igorls igorls merged commit 66090b2 into develop Apr 18, 2026
8 checks passed
jphein added a commit to techempower-org/mempalace that referenced this pull request Apr 19, 2026
…uard

Merges MemPalace#990 (RFC 002 spec), MemPalace#1014 (BaseSourceAdapter/PalaceContext scaffolding),
MemPalace#1013 (Layer3.search_raw None guard), MemPalace#1012 (docs), MemPalace#1010 (chromadb >=1.5.4),
and MemPalace#998 (sweeper/tandem transcript safety net).

Fork changes preserved:
- quarantine_stale_hnsw() in chroma.py (guards HNSW/sqlite drift segfault)
- get-then-create instead of get_or_create (guards ChromaDB 1.5.x metadata segfault)
- paginated status() loop (guards SQLite variable limit on large palaces)
- searcher hits-loop, BM25 fallback, _count_in_scope
- .jsonl exempt from JUNK_FILE_SIZE cap (Claude Code transcripts can be large)
- _validate_where() + operator constants taken from upstream

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
igorls added a commit that referenced this pull request Apr 19, 2026
Version bumps across pyproject.toml, mempalace/version.py, README badge,
uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*).

CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added
covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop
fix + tandem sweeper (#998), None-metadata guards (#999, #1013),
chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681),
HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path
cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).
@igorls igorls mentioned this pull request Apr 19, 2026
8 tasks
jphein added a commit to techempower-org/mempalace that referenced this pull request May 13, 2026
…#1484

Four issues raised in the automated review (2026-05-13T01:40Z):

1. **opencode_session_version missing from metadata** (high)
   `is_current()` at opencode.py:391 compares `existing_metadata.get(
   "opencode_session_version")` against the new `SourceItemMetadata.version`.
   Without the metadata key being written on first ingest, the comparison
   always falls back to "exists → current" and incremental ingest can never
   detect updates to existing sessions. Now populated as
   `str(time_updated or time_created or 0)` — same value as the version
   yielded in SourceItemMetadata above.

2. **PalaceContext._skip_requested encapsulation violation** (medium)
   The adapter was reading and writing the private flag directly. Added
   `PalaceContext.is_skip_requested()` public method (read-only) so adapters
   can short-circuit expensive work (SQL query, transcript build, chunking)
   when core has signaled skip. Core still owns the reset — adapters MUST
   NOT clear it, per the new docstring. This is a small companion change to
   the upstream RFC 002 scaffolding (MemPalace#1014); justified because the spec's
   "core checks between yields" pattern doesn't hold for Python generators
   (the adapter's code runs between yields, not core's). The check needs to
   be available to the adapter.

3. **filed_at generated inside chunk loop** (medium)
   For consistency across chunks of the same session, `filed_at` is now
   computed once per session and reused for every chunk's metadata. Also
   pre-computes `session_version` for the same reason.

4. **PEP 8 import placement** (medium)
   `import json as _json` was mid-file in transforms.py; hoisted to the
   top with the other imports.

Also removed an unused `import json` from opencode.py that ruff caught.

Tests: 57 pass (28 opencode + 29 base sources); ruff clean on all three
modified files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jphein added a commit to techempower-org/mempalace that referenced this pull request May 22, 2026
…daemon-routed integration recipe (#106)

* feat(sources): OpenCode adapter on RFC 002 contract

Adds mempalace/sources/opencode.py — an OpenCodeSourceAdapter
subclass of BaseSourceAdapter that ingests OpenCode AI-coding-CLI
session transcripts from OpenCode's local SQLite store
(~/.local/share/opencode/opencode.db) into the palace as
DrawerRecords formatted to match convo_miner's exchange-pair shape.

The adapter:
  * Yields SourceItemMetadata then DrawerRecords per session.
  * Each session becomes one source_file shaped as
    opencode://<absolute-db-path>#session=<sid>; chunks are
    chunked_content exchange-pair drawers.
  * Declares 8 transformations (6 opencode-namespaced + 2 reserved);
    every name resolves to a reference implementation on
    mempalace.sources.transforms per RFC 002 §7.3.
  * Implements is_current honoring opencode_session_version when
    present, falling back to "metadata exists → assume current"
    for append-only safety on older drawers.
  * Routes wing from session.directory basename (or explicit
    options['wing'] override); room from detect_convo_room on the
    rendered transcript; hall from convo_miner._detect_hall_cached.
  * Stamps universal §5.1 metadata (wing, room, hall, filed_at,
    added_by, ingest_mode, extract_mode, privacy_class) plus the
    declared per-adapter schema (session_id, session_title,
    project_dir, session_created_at, message_count, opencode_db_path).
  * default_privacy_class = "pii_potential" — AI sessions leak
    everything; users opt in explicitly to laxer floors.

mempalace/sources/transforms.py: adds 6 opencode-namespaced
transformations (extract_text_parts, skip_tool_echo,
skip_file_injection, role_coerce, same_role_merge, format_exchange).
Each operates on the role-tab-prefixed line stream the adapter's
canonical_source_bytes produces; declared in declaration order so
the conformance round-trip test reproduces drawer content exactly.

pyproject.toml: registers the adapter under the
[project.entry-points."mempalace.sources"] group as
opencode = "mempalace.sources.opencode:OpenCodeSourceAdapter".

tests/test_sources_opencode.py: 28 tests covering
  * class identity, capabilities, schema shape
  * SourceNotFoundError on missing DB / missing tables
  * AdapterClosedError after close()
  * source_summary item count + missing-DB path
  * ingest yields metadata then drawers per session
  * cancelled / single-turn sessions skipped
  * universal + schema metadata fields on every drawer (flat-scalar)
  * RouteHint carries wing + room
  * wing routing groups by session.directory
  * explicit options['wing'] wins over directory derivation
  * skip_current_item short-circuits drawer emit per RFC 002 §1.2
  * is_current with/without opencode_session_version
  * tool-input / tool-output / tool-echo / file-injection parts
    are stripped from drawer content
  * declared-transformation round-trip reproduces chunk content
    (RFC 002 §7.3)
  * empty DB, single-message session edge cases
  * Unicode (BMP + non-BMP) preserved through transcript
  * registry resolves the adapter when registered explicitly
  * byte_preserving capability is NOT advertised (declared-lossy)

tests/fixtures/opencode/sample_session_2026_05_12/: builder script
and README documenting the live opencode-ai 1.14.39 schema captured
verbatim from JP's local install on 2026-05-12. No recorded .db
ships (real-session content is unsanitizable user-private data);
build_fixture.py reproduces the schema and populates it with
synthetic-but-realistic exchanges the tests consume.

tests/test_corpus_origin_integration.py: extends the §-section
allowlist to include the new test file (existing allowlist already
covers mempalace/sources/).

Reverse-engineering credit: the OpenCode SQLite schema, json_extract
paths, tool-echo / file-injection skip filters, and same-role merge
originated in @JakobSachs's PR #23 (feat: add OpenCode SQLite
session database support, base=develop). This adapter rebuilds those
primitives on the RFC 002 contract so OpenCode support can ship as a
registered adapter rather than as a normalize.py branch — see #23
coordination thread.

Test suite: 1876 passed, 7 skipped, 106 deselected (28 new opencode
tests, no regressions).

Co-authored-by: Jakob Sachs <28728963+JakobSachs@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(sources/opencode): address Gemini Code Assist review on MemPalace#1484

Four issues raised in the automated review (2026-05-13T01:40Z):

1. **opencode_session_version missing from metadata** (high)
   `is_current()` at opencode.py:391 compares `existing_metadata.get(
   "opencode_session_version")` against the new `SourceItemMetadata.version`.
   Without the metadata key being written on first ingest, the comparison
   always falls back to "exists → current" and incremental ingest can never
   detect updates to existing sessions. Now populated as
   `str(time_updated or time_created or 0)` — same value as the version
   yielded in SourceItemMetadata above.

2. **PalaceContext._skip_requested encapsulation violation** (medium)
   The adapter was reading and writing the private flag directly. Added
   `PalaceContext.is_skip_requested()` public method (read-only) so adapters
   can short-circuit expensive work (SQL query, transcript build, chunking)
   when core has signaled skip. Core still owns the reset — adapters MUST
   NOT clear it, per the new docstring. This is a small companion change to
   the upstream RFC 002 scaffolding (MemPalace#1014); justified because the spec's
   "core checks between yields" pattern doesn't hold for Python generators
   (the adapter's code runs between yields, not core's). The check needs to
   be available to the adapter.

3. **filed_at generated inside chunk loop** (medium)
   For consistency across chunks of the same session, `filed_at` is now
   computed once per session and reused for every chunk's metadata. Also
   pre-computes `session_version` for the same reason.

4. **PEP 8 import placement** (medium)
   `import json as _json` was mid-file in transforms.py; hoisted to the
   top with the other imports.

Also removed an unused `import json` from opencode.py that ruff caught.

Tests: 57 pass (28 opencode + 29 base sources); ruff clean on all three
modified files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(sources/opencode): address @igorls review on MemPalace#1484

Three blockers + one minor cleanup from the maintainer review at
2026-05-13T02:52Z:

1. **ruff F401 — unused `os` import** in tests/test_sources_opencode.py:17
   Dropped. No call sites used it.

2. **ruff E402 — module-level import not at top** in tests
   The `sys.path.insert(0, FIXTURE_DIR); import build_fixture` pattern
   tripped E402 (the `# noqa: E402` was suppressing a legitimate
   complaint). Refactored to `importlib.util.spec_from_file_location` +
   `module_from_spec` per @igorls's suggestion — keeps the fixture
   loader at top of file with the other imports, no sys.path mutation
   at module scope. Also registers the loaded module in `sys.modules`
   so `dataclasses` and typing introspection inside the fixture builder
   can resolve `cls.__module__` correctly.

3. **Route-hint wing mismatch** (RFC 002 §2.5 violation)
   `_route_hint_for()` (lazy-fetch SourceItemMetadata stage) computed
   wing from `directory` only; `_wing_for()` (eager DrawerRecord stage)
   honored `source.options["wing"]` first. When a user passed
   `options={"wing": "Custom Wing"}`, the metadata hint said
   `"<dirname>"` while the actual drawers said `"custom_wing"` — core
   could make wrong skip/routing decisions on the gap.

   Fix: `_route_hint_for(source, directory)` now delegates to
   `_wing_for` so both stages apply identical precedence.

4. **Unjustified `# noqa: F401` on `AuthRequiredError`** (minor)
   The import claimed re-export "used in docstrings" but `__all__`
   only exposes `OpenCodeSourceAdapter` + `session_source_file`.
   Dropped the import + the noqa.

Tests: 57 pass (28 opencode + 29 base sources); ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style(sources/opencode): ruff format with CI's ruff 0.4.x

CI's lint job ran on commit 13353d9 and failed `ruff format --check .`
even though local `ruff format --check` was clean. Cause: ruff version
mismatch — CI installs `>=0.4.0,<0.5` (per ci.yml lint job), local env
has ruff 0.15.12. Different major versions format differently;
0.15-formatted source isn't 0.4.x-format-clean.

Reformatted `mempalace/sources/opencode.py` and
`tests/test_sources_opencode.py` with `uvx --from "ruff>=0.4.0,<0.5"
ruff format` so CI's check passes. Changes are whitespace-only — no
semantic diff.

Tests still pass 28/28. Lint clean under 0.4.x. The 29 other files
that local ruff 0.15.12 wants to reformat are upstream's own files
and pass upstream's CI as-is; left untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style(tests/fixtures/opencode): ruff format build_fixture.py with 0.4.x

Missed in the previous format pass (f94e3fe) — only touched the two
top-level files. CI's `ruff format --check .` scans the whole tree and
caught it.

Whitespace-only changes.

* feat: add OpenCode MCP integration for MemPalace

* fix: use python -m mempalace.mcp_server for robustness

* docs(integrations): OpenCode integration recipe + cherry-pick fork-changes entries

Adds the three-direction OpenCode + MemPalace integration recipe:

- ``docs/integrations/opencode.md`` — full setup guide covering the
  read (MCP), push (live-capture plugin), and pull (retrospective
  backfill) paths for daemon-routed deployments.
- ``examples/opencode/opencode.jsonc.example`` — copy-paste user
  config pointing at the palace-daemon wrapper.
- ``examples/opencode/option-k-plugin-daemon-routing.patch`` — a
  re-applicable diff for option-K's ``opencode-plugin-mempalace``
  v1.2.1 issue #1 (isInitialized passes ``--palace`` which bypasses
  ``PALACE_DAEMON_URL`` routing).

Also adds two fork-changes.yaml entries for the cherry-picked
upstream PRs already in this branch:

- ``opencode-mcp-config-cherry-pick-1567`` (commit ba16b82)
- ``opencode-source-adapter-cherry-pick-1484`` (commit 2ffe652)

The recipe's own fork-changes.yaml entry is added in the next commit
once this commit's SHA is known (avoids the self-referencing-commit
anti-pattern flagged in the worktree handoff).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): add opencode-integration-recipe entry pointing at 60dc9e6

Companion to 60dc9e6 (the OpenCode integration recipe commit). Split
out per the worktree handoff to avoid the self-referencing-commit-SHA
anti-pattern: the YAML entry now points at the prior docs commit,
not at itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(opencode-integration): bundled live-capture plugin + split option-K patches

The previously combined option-K patch (`option-k-plugin-daemon-routing.patch`)
mixed two unrelated fixes against two different files and was failing
`patch --dry-run` once Fix 1 was applied. Split into:

- `option-k-plugin-daemon-routing.patch` — Fix 1 only (mempalace-cli.js,
  isInitialized daemon detection, option-K#1).
- `option-k-plugin-message-updated.patch` — Fix 2 (index.js, subscribe
  to `message.updated` instead of the non-existent `chat.message`,
  filed upstream as option-K#4).

End-to-end testing with both patches applied surfaced a third bug
(option-K#5): the plugin's `mempalace mine <dir>` call hits the daemon,
which evaluates `<dir>` against ITS OWN filesystem. For remote-daemon
setups (palace-daemon on a different host from OpenCode) the path
doesn't exist on the daemon's filesystem and the call returns 400.
The option-K plugin is architecturally incompatible with multi-host
deployments.

Ships a self-contained replacement at `examples/opencode/live-capture/`:

- `mempalace-live-capture.js` — minimal OpenCode plugin that subscribes
  to session.idle / session.deleted / session.status[idle] and spawns
  the Python helper. Detached subprocess, debounced per session,
  logs to ~/.local/share/opencode/mempalace-live-capture.log.
- `capture-session.py` — Python helper that reads OpenCode's local
  SQLite session DB, extracts the role-pair transcript via the in-tree
  `OpenCodeSourceAdapter` helpers, and POSTs to the daemon's
  `/silent-save` endpoint. Stdlib-only, no extra pip deps.

Verified end-to-end against the canonical daemon at disks.jphe.in:8085:
a fresh opencode session ends with the transcript landing in
wing_opencode_<basename>/room=diary, retrievable via mempalace_search.

`docs/integrations/opencode.md` now documents both deployment paths
(bundled plugin for remote-daemon, option-K + patches for local
palaces) and explicitly notes that
`experimental.chat.system.transform` does not exist in the OpenCode
plugin API (so per-turn system-prompt injection is not available;
agents recall memories via explicit MCP tool calls).

Filed:
- option-K/opencode-plugin-mempalace#4
- option-K/opencode-plugin-mempalace#5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): add commit ref for opencode-live-capture-plugin entry

Closes the YAML→render loop: scripts/check-docs.sh now verifies the
commit hash resolves and FORK_CHANGELOG.md matches the manifest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jakob Sachs <28728963+JakobSachs@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Dxrk System <dxrk@local>
jphein added a commit to techempower-org/mempalace that referenced this pull request May 22, 2026
…#1484

Four issues raised in the automated review (2026-05-13T01:40Z):

1. **opencode_session_version missing from metadata** (high)
   `is_current()` at opencode.py:391 compares `existing_metadata.get(
   "opencode_session_version")` against the new `SourceItemMetadata.version`.
   Without the metadata key being written on first ingest, the comparison
   always falls back to "exists → current" and incremental ingest can never
   detect updates to existing sessions. Now populated as
   `str(time_updated or time_created or 0)` — same value as the version
   yielded in SourceItemMetadata above.

2. **PalaceContext._skip_requested encapsulation violation** (medium)
   The adapter was reading and writing the private flag directly. Added
   `PalaceContext.is_skip_requested()` public method (read-only) so adapters
   can short-circuit expensive work (SQL query, transcript build, chunking)
   when core has signaled skip. Core still owns the reset — adapters MUST
   NOT clear it, per the new docstring. This is a small companion change to
   the upstream RFC 002 scaffolding (MemPalace#1014); justified because the spec's
   "core checks between yields" pattern doesn't hold for Python generators
   (the adapter's code runs between yields, not core's). The check needs to
   be available to the adapter.

3. **filed_at generated inside chunk loop** (medium)
   For consistency across chunks of the same session, `filed_at` is now
   computed once per session and reused for every chunk's metadata. Also
   pre-computes `session_version` for the same reason.

4. **PEP 8 import placement** (medium)
   `import json as _json` was mid-file in transforms.py; hoisted to the
   top with the other imports.

Also removed an unused `import json` from opencode.py that ruff caught.

Tests: 57 pass (28 opencode + 29 base sources); ruff clean on all three
modified files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants