feat(#197 partial): memory-gbrain + memory-hybrid scaffolding (skeleton, v1.0.5 target) by jayzalowitz · Pull Request #215 · jayzalowitz/skytwin

jayzalowitz · 2026-05-07T20:21:48Z

Summary

SKELETON only for #197 — proves the MemoryPort swap-port architecture works. Live gbrain integration, CRDB driver shim, and embedding pipeline are explicitly deferred to v1.0.5 per the issue spec ("target ship: v1.0.5, ~2 weeks post-OSS-launch").

This unblocks the v1 launch with the architecture in place; the actual gbrain swap can land in a separate v1.0.5 PR without further architectural change.

What's in here

`@skytwin/memory-gbrain`

GbrainMemoryPort implements the MemoryPort contract from @skytwin/memory-port. Declares capabilities { semantic_search, code_aware_search }.

searchSemantic: shells out to gbrain search --json --query=... --limit=N with a 5s hard timeout. Returns [] (NOT an error) when gbrain is absent / non-zero exit / timeout — so the hybrid composer can fall back cleanly.
CLI detection: which gbrain / where gbrain via child_process.execSync. Returns false on any error.
Other methods: walkGraph / getEpisodes / getTriples / summarize / compress and write methods throw NotImplementedError. The HybridMemoryPort routes them to the secondary backend (MemPalace).
No PII in logs: only operation names + result counts, never query text.

`@skytwin/memory-hybrid`

HybridMemoryPort composes any two MemoryPort impls.

Constructor: new HybridMemoryPort({ primary, secondary, routing? })
Writes go to BOTH backends. Primary write must succeed; secondary is best-effort (failures logged but never propagated).
Reads route per-capability: configurable via RoutingRules. Defaults route searchSemantic and code_aware_search to primary; walkGraph/episodes/triples/summarize/compress to secondary.
capabilities() returns the union of primary + secondary capabilities.
exportAll/importAll → secondary only.
Compile-time type-compatibility check: tests include a MemPalaceMemoryPort extends MemoryPort assertion that fails tsc if shapes ever diverge.

Hard rails preserved

No PII in logs (verified manually)
No new top-level deps; uses Node built-in child_process only
Strict 5s timeout on every gbrain CLI call
No as casts in production code — GbrainMemoryPort satisfies MemoryPort exactly via the type system

Explicitly deferred to v1.0.5

CRDB driver shim (@skytwin/memory-gbrain-crdb-adapter)
Full gbrain MCP integration (replacing the CLI shell-out)
Embedding pipeline wiring
federated_sources capability (gbrain v1.1+)
Web UI for memory backend selection

Tests

@skytwin/memory-gbrain: 18 (6 cli-detector + 12 gbrain-port)
@skytwin/memory-hybrid: 10
Full workspace: 64/64 turbo tasks (was 60, +4 from new package test+build)

Test plan

pnpm --filter @skytwin/memory-gbrain test → 18/18
pnpm --filter @skytwin/memory-hybrid test → 10/10
pnpm build → 32/32 packages clean
pnpm test → 64/64 turbo tasks pass

🤖 Generated with Claude Code

…keletons SKELETON only. Live gbrain integration, CRDB driver shim, full MCP integration, and embedding pipeline are explicitly deferred to v1.0.5 per the issue spec. @skytwin/memory-gbrain: - GbrainMemoryPort implements MemoryPort. Declares capabilities { semantic_search, code_aware_search }. searchSemantic shells out to `gbrain search --json --query=... --limit=N` with a 5s hard timeout. Returns [] (NOT an error) when gbrain is absent / non-zero exit / timeout — so HybridMemoryPort can fall back cleanly. - CLI detection via `which gbrain` / `where gbrain`. - All write methods + walkGraph / getEpisodes / getTriples / summarize / compress throw NotImplementedError; HybridMemoryPort routes them to the secondary backend (MemPalace) instead. - 18 tests (6 cli-detector + 12 gbrain-port). @skytwin/memory-hybrid: - HybridMemoryPort composes any two MemoryPort impls. - Writes go to BOTH backends. Primary write must succeed; secondary is best-effort (failures logged but never propagated). - Reads route per-capability via configurable RoutingRules. - Default routing: searchSemantic + code_aware_search → primary (gbrain when present); walkGraph + episodes + triples + summarize + compress → secondary (MemPalace). - exportAll / importAll → secondary only. - 10 tests including a compile-time MemPalaceMemoryPort extends MemoryPort assertion to catch shape divergence. No PII in logs (only operation names + result counts, never query text). No new top-level deps. No `as` casts in production code. Tests: 28 new (18 gbrain + 10 hybrid). 64/64 turbo tasks pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds two new (skeleton) workspace packages to validate the MemoryPort swap-port architecture: a gbrain-backed port stub and a hybrid composer that routes operations between two MemoryPort backends.

Changes:

Added @skytwin/memory-gbrain with best-effort CLI-based searchSemantic and placeholders for the rest of the MemoryPort contract.
Added @skytwin/memory-hybrid to compose two MemoryPort implementations with configurable read routing and dual-write semantics.
Updated workspace config (TS path mapping, lockfile) and documented the scaffold in CHANGELOG.md.

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tsconfig.json	Adds TS path aliases for the new packages.
pnpm-lock.yaml	Registers the two new workspace importers and their deps.
packages/memory-hybrid/tsconfig.json	TS build config for the hybrid package.
packages/memory-hybrid/package.json	New package manifest and scripts.
packages/memory-hybrid/src/index.ts	Public entrypoint exports for the hybrid package.
packages/memory-hybrid/src/hybrid-port.ts	Implements `HybridMemoryPort` routing + dual-write logic.
packages/memory-hybrid/src/tests/hybrid-port.test.ts	Unit tests for routing/dual-write/migration delegation.
packages/memory-gbrain/tsconfig.json	TS build config for the gbrain package.
packages/memory-gbrain/package.json	New package manifest and scripts.
packages/memory-gbrain/src/index.ts	Public entrypoint exports for the gbrain package.
packages/memory-gbrain/src/cli-detector.ts	`gbrain` CLI presence detection.
packages/memory-gbrain/src/gbrain-port.ts	`GbrainMemoryPort` skeleton with CLI-based `searchSemantic`.
packages/memory-gbrain/src/tests/cli-detector.test.ts	Unit tests for CLI detection behavior.
packages/memory-gbrain/src/tests/gbrain-port.test.ts	Unit tests for `GbrainMemoryPort` behaviors and placeholders.
CHANGELOG.md	Adds an unreleased entry documenting the scaffold packages.

Files not reviewed (1)

pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    try {
+      const raw = execSync(
+        `gbrain search --json --query=${JSON.stringify(_query)} --limit=${k}`,
+        { timeout: GBRAIN_TIMEOUT_MS, encoding: 'utf8', stdio: ['ignore', 'pipe', 'ignore'] },
+      );


+    }
+  }
+
+  capabilities(): Set<MemoryCapability> {


+  // ── Write — best-effort dual-write ───────────────────────────────
+
+  async recordSignal(s: RawSignal): Promise<void> {
+    await this.primary.recordSignal(s);
+    await this.bestEffortSecondary('recordSignal', () => this.secondary.recordSignal(s));
+  }
+
+  async recordEntity(e: KnowledgeEntity): Promise<void> {
+    await this.primary.recordEntity(e);
+    await this.bestEffortSecondary('recordEntity', () => this.secondary.recordEntity(e));
+  }
+
+  async recordTriple(t: KnowledgeTriple): Promise<void> {
+    await this.primary.recordTriple(t);
+    await this.bestEffortSecondary('recordTriple', () => this.secondary.recordTriple(t));
+  }
+
+  async recordEpisode(e: Episode): Promise<void> {
+    await this.primary.recordEpisode(e);
+    await this.bestEffortSecondary('recordEpisode', () => this.secondary.recordEpisode(e));


+/**
+ * Per-method routing overrides. Each key is a MemoryPort method name. A value
+ * of 'primary' routes to the primary port; 'secondary' routes to the
+ * secondary port. Missing keys fall back to the defaults.
+ */
+export interface RoutingRules {
+  searchSemantic?: 'primary' | 'secondary';
+  code_aware_search?: 'primary' | 'secondary';
+  walkGraph?: 'primary' | 'secondary';
+  getEpisodes?: 'primary' | 'secondary';
+  getTriples?: 'primary' | 'secondary';
+  summarize?: 'primary' | 'secondary';
+  compress?: 'primary' | 'secondary';
+}


+ *   2. Whether the primary port declares the relevant capability, OR
+ *   3. The DEFAULT_ROUTING table (primary for semantic/code-aware, secondary
+ *      for graph/episodic/triple/summarize/compress).
+ *


+  /**
+   * Run a secondary write in the background. Errors are logged and swallowed
+   * so they never interfere with the primary write result.
+   */
+  private async bestEffortSecondary(
+    operation: string,
+    fn: () => Promise<void>,
+  ): Promise<void> {
+    try {
+      await fn();
+    } catch (err: unknown) {


+    "@skytwin/memory-port": "workspace:*",
+    "@skytwin/shared-types": "workspace:*"


… epics Audited Copilot review comments across PRs #198, #206-#215, #218, #219, Themed follow-ups will land separately. Security - Shell injection in @skytwin/memory-gbrain searchSemantic (#215): switch from execSync with shell to execFileSync (no shell), so query metacharacters cannot inject. Added regression test. - redactPII / redactPayload skipped arrays — PII in array-of-object payloads leaked to provenance and API responses (#209, #210, #211). Both helpers now recurse arrays. Tests updated (one previously asserted the bug as expected behavior). - /api/capabilities/suggestions spread the row, leaking raw evidence_sources alongside the redacted preview (#211). Switched to explicit field projection. - Email-redaction regex [A-Z|a-z] matched literal | as TLD char (#211). Fixed to [A-Za-z]. Correctness - Credential vault never engaged: only the worker creates DbTokenStore but never called setKeyCache, so at-rest encryption + lazy migration were dead weight (#212). Worker now owns a KeyCache and wires it; cross-process unlock IPC is a #212 follow-up. - DXT routes broken under real auth: getUserId read req.user?.id but production middleware sets req.authenticatedUserId (#219). Switched field; tests updated to mirror production middleware. - Twin MCP provenance only fired on success (#209). Wrapped each tool handler in try/finally so audit fires for both success and failure; provenance failures never mask the underlying tool result. - Migration 027 used inline INDEX (...) WHERE inside CREATE TABLE, which CockroachDB does not accept (#198). Pulled the partial indexes out as standalone CREATE INDEX IF NOT EXISTS — idempotent. - Onboarding hard-coded first_run_choice='about-me' at three call sites (#208). Track _wizardState.firstRunChoice on welcome-screen selection, read everywhere downstream. - Briefing generator capped at LIMIT 500 silently dropped users past the cap (#206). Switched to a 500-row paged scan. - Zero-trust mode helpers exist but aren't wired into the decision pipeline (#222). Updated CHANGELOG and capability-detail copy to be honest; #222 follow-up tracks the wiring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… epics (#226) * fix(copilot-sweep batch 1): security + correctness across this week's epics Audited Copilot review comments across PRs #198, #206-#215, #218, #219, Themed follow-ups will land separately. Security - Shell injection in @skytwin/memory-gbrain searchSemantic (#215): switch from execSync with shell to execFileSync (no shell), so query metacharacters cannot inject. Added regression test. - redactPII / redactPayload skipped arrays — PII in array-of-object payloads leaked to provenance and API responses (#209, #210, #211). Both helpers now recurse arrays. Tests updated (one previously asserted the bug as expected behavior). - /api/capabilities/suggestions spread the row, leaking raw evidence_sources alongside the redacted preview (#211). Switched to explicit field projection. - Email-redaction regex [A-Z|a-z] matched literal | as TLD char (#211). Fixed to [A-Za-z]. Correctness - Credential vault never engaged: only the worker creates DbTokenStore but never called setKeyCache, so at-rest encryption + lazy migration were dead weight (#212). Worker now owns a KeyCache and wires it; cross-process unlock IPC is a #212 follow-up. - DXT routes broken under real auth: getUserId read req.user?.id but production middleware sets req.authenticatedUserId (#219). Switched field; tests updated to mirror production middleware. - Twin MCP provenance only fired on success (#209). Wrapped each tool handler in try/finally so audit fires for both success and failure; provenance failures never mask the underlying tool result. - Migration 027 used inline INDEX (...) WHERE inside CREATE TABLE, which CockroachDB does not accept (#198). Pulled the partial indexes out as standalone CREATE INDEX IF NOT EXISTS — idempotent. - Onboarding hard-coded first_run_choice='about-me' at three call sites (#208). Track _wizardState.firstRunChoice on welcome-screen selection, read everywhere downstream. - Briefing generator capped at LIMIT 500 silently dropped users past the cap (#206). Switched to a 500-row paged scan. - Zero-trust mode helpers exist but aren't wired into the decision pipeline (#222). Updated CHANGELOG and capability-detail copy to be honest; #222 follow-up tracks the wiring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(copilot-sweep batch 1, post-/review): address Copilot inline comments on #226 Per the new merge gate (CLAUDE.md), every Copilot inline comment is addressed before merge. Five comments on #226, all valid: - Migration 027 idempotency: if the inline `INDEX (...) WHERE ...` form ever ran successfully (CRDB version-dependent), it would have created an auto-named partial index. Added defensive `DROP INDEX IF EXISTS <table>@<auto-name>` for both mcp_servers_last_active_at_idx and app_suggestions_user_id_idx so a re-run doesn't leave duplicate partial indexes covering the same predicate. - briefing-generator: switched OFFSET pagination to keyset pagination (`AND user_id > $last`). OFFSET pagination on DISTINCT scales quadratically with user count — the briefing job's runtime would grow unboundedly. Keyset stays linear and the (user_id, status) index serves the range scan directly. - DXT route docstring: removed the "other route modules use the same order" claim — it was misleading (other routes still vary in precedence). Now just notes a shared-helper #226 follow-up if the ordering proves load-bearing. - gbrain-port test: renamed `mockExecSync` → `mockExecFileSync` so the variable name matches the API under test (execFileSync). Tests: all 421 api + 48 worker + 20 memory-gbrain tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(CHANGELOG): add post-/review fixes subsection for #226 Per CLAUDE.md "Review Discipline": post-/review fixes get their own commits AND a CHANGELOG subsection so the audit trail of "what review caught" is visible to future readers. VERSION not bumped — this batch lands as part of the stacked sweep (#226 → #232); a single VERSION bump will follow when the full chain merges, to avoid CHANGELOG/VERSION conflicts on each cascade rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…efault) Promotes @skytwin/memory-gbrain from a CLI-shellout skeleton (PR #215) to a real, in-process, CockroachDB-backed memory layer. Default MemoryPort for new installs is now gbrain — vector embeddings + tsvector full-text search fused via Reciprocal Rank Fusion. No separate Postgres process, no external CLI install — gbrain runs against the SkyTwin DB stack directly. Per user direction: gbrain is the default, mempalace is the second option, and everything works against CRDB where possible. Ships: - 040-gbrain-memory.sql: brain_pages (FLOAT8[] embedding + TSVECTOR with inverted index), brain_entities, brain_triples, brain_episodes, brain_signals, brain_settings, brain_embedding_jobs (FOR UPDATE SKIP LOCKED queue). - @skytwin/memory-gbrain-crdb-adapter (NEW): repository.ts (CRDB-backed + hybridSearch), in-memory-repository.ts (test-friendly mirror), embedding.ts (HashEmbeddingProvider deterministic fallback + OpenAiEmbeddingProvider for any /v1/embeddings endpoint), rrf.ts. - @skytwin/memory-gbrain: EmbeddedGbrainMemoryPort with the full MemoryPort surface (semantic_search, code_aware_search, temporal_triples, episodic, graph_walk); searchCodeAware boost; hasExternalGbrainConfig() detection. - @skytwin/memory-hybrid: diagnostics counters + capability-aware fallback. - apps/api/src/memory-setup.ts: per-user backend factory (default 'gbrain'; MEMORY_BACKEND env override; per-user brain_settings.backend wins). - apps/api/src/routes/memory-config.ts: GET/POST /api/memory-config, /dismiss-notification, /diagnostics. - apps/web memory-settings page with the "your twin got smarter" notice. - docs/memory-swap.md: backends-at-a-glance, env knobs, rollback path. Tests: 145+ new (49 CRDB adapter + 50 memory-gbrain + 9 hybrid diagnostics + 21 api memory-setup/routes + 6 DB-gated integration). Full suite: 70/70 turbo tasks pass. Closes #197. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…efault) (#250) * feat(#197): gbrain memory backend + CRDB adapter + hybrid composer (default) Promotes @skytwin/memory-gbrain from a CLI-shellout skeleton (PR #215) to a real, in-process, CockroachDB-backed memory layer. Default MemoryPort for new installs is now gbrain — vector embeddings + tsvector full-text search fused via Reciprocal Rank Fusion. No separate Postgres process, no external CLI install — gbrain runs against the SkyTwin DB stack directly. Per user direction: gbrain is the default, mempalace is the second option, and everything works against CRDB where possible. Ships: - 040-gbrain-memory.sql: brain_pages (FLOAT8[] embedding + TSVECTOR with inverted index), brain_entities, brain_triples, brain_episodes, brain_signals, brain_settings, brain_embedding_jobs (FOR UPDATE SKIP LOCKED queue). - @skytwin/memory-gbrain-crdb-adapter (NEW): repository.ts (CRDB-backed + hybridSearch), in-memory-repository.ts (test-friendly mirror), embedding.ts (HashEmbeddingProvider deterministic fallback + OpenAiEmbeddingProvider for any /v1/embeddings endpoint), rrf.ts. - @skytwin/memory-gbrain: EmbeddedGbrainMemoryPort with the full MemoryPort surface (semantic_search, code_aware_search, temporal_triples, episodic, graph_walk); searchCodeAware boost; hasExternalGbrainConfig() detection. - @skytwin/memory-hybrid: diagnostics counters + capability-aware fallback. - apps/api/src/memory-setup.ts: per-user backend factory (default 'gbrain'; MEMORY_BACKEND env override; per-user brain_settings.backend wins). - apps/api/src/routes/memory-config.ts: GET/POST /api/memory-config, /dismiss-notification, /diagnostics. - apps/web memory-settings page with the "your twin got smarter" notice. - docs/memory-swap.md: backends-at-a-glance, env knobs, rollback path. Tests: 145+ new (49 CRDB adapter + 50 memory-gbrain + 9 hybrid diagnostics + 21 api memory-setup/routes + 6 DB-gated integration). Full suite: 70/70 turbo tasks pass. Closes #197. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#197 post-/review): brain_settings default = 'gbrain' (was 'hybrid') /review caught a real bug: the migration's `brain_settings.backend DEFAULT 'hybrid'` disagreed with the factory's `'gbrain'` default in `apps/api/src/memory-setup.ts`. Failure mode: a fresh user (no brain_settings row) hitting POST /api/memory-config/dismiss-notification triggered upsertSettings({hybrid_notification_dismissed:true}). The COALESCE in INSERT defaulted backend → 'hybrid' even though the factory considers a missing row to mean 'gbrain'. Result: dismissing the notification silently flipped the user's backend. Fixed in three places (must stay in sync — comment links the others): - packages/db/src/migrations/040-gbrain-memory.sql: column DEFAULT 'gbrain' - packages/memory-gbrain-crdb-adapter/src/repository.ts: upsertSettings COALESCE 'gbrain' - packages/memory-gbrain-crdb-adapter/src/in-memory-repository.ts: upsertSettings fallback 'gbrain' Plus a regression test on the in-memory store. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(#197): persona-driven E2E + realistic-corpus + robustness + migration Adds 50+ tests that drive the gbrain memory layer with realistic data and edge cases — addressing the user's "deeply test this with realish examples" request. The point isn't unit coverage (we already had that); it's "does the system actually build a profile when fed real-life data?" New test files: - realistic-corpus.ts fixture: ~30 labeled signals (Gmail, calendar, notes, code, chat) modelled after a real twin's first month, plus deterministic noise generators to scale to 500. - realistic-retrieval.test.ts: R@5/P@5 floor with labeled relevance, hybrid-vs-text-only ablation, multi-user isolation under load (6 users, 500 signals each, no cross-talk). - persona-sam-patel.ts fixture: a 6-week storyline for a Series A founder (fundraise prep → VC meetings → term sheet → hiring loops → close → vacation), with tagged signals + entities + triples + episodes. - persona-simulation.test.ts: drives the full storyline end-to-end and checks every load-bearing twin behaviour: entity recognition, graph walks (Mahesh → Anchor VC → Beacon Series A), triple filters, time-bounded episode lookup, semantic search on natural-language founder questions, profile summarisation, full export → import round trip with answer parity, and week-by-week incremental emergence. - concurrent-worker.test.ts: 200 parallel recordSignal calls; failed embeddings get queued; worker drains the queue with FOR UPDATE SKIP LOCKED semantics; failed jobs exhaust retries and stop blocking. - migration.test.ts: mempalace-flavoured export → gbrain importAll → imported content is searchable; idempotent re-import skips dupes; export → import → re-export histogram parity. - robustness.test.ts: every degraded mode — embedding throws / times out / returns junk, queries empty / oversize / punctuation-only, mixed-dim vector corpus (model migration), pages with null embedding, OpenAI HTTP abort/timeout, multi-tenant safety under partial failure. - memory-config-roundtrip.test.ts: real Express + real factory + real EmbeddedGbrainMemoryPort + real HybridMemoryPort end-to-end. Stubs only the @skytwin/db query layer. Verifies the dismiss-notification fix (default backend STAYS gbrain on a fresh user). Test totals: 86 memory-gbrain (was 18) + 50 CRDB adapter + 19 hybrid + 26 api = 181 tests across the new memory subsystem. Full suite: 70/70 turbo tasks pass. Honest about hash-trick limits: the persona test asserts ≥80% recall across founder questions rather than 100%, because the deterministic fallback embedding is intentionally weak. With OpenAI text-embedding-3-small the same test suite runs at materially higher recall — but the floor here catches retrieval-pipeline regressions without flaking on embedding quality. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(#197): full E2E fake-user → DecisionMaker — surfaces real safety finding Constructs a complete fake user (Bob Patel, Series A SaaS founder) with realistic preferences, behavioural patterns, traits, and trust tier MODERATE_AUTONOMY. Wires the actual DecisionMaker + TwinService + PolicyEvaluator against in-memory ports and feeds a realistic inbox through the pipeline. This is the user's "would the system actually do the email" check. Surfaces a real finding: with the rule-based fallback CandidateGenerator, the DecisionMaker auto-archives BOARD CHAIR and CFO emails because the candidates are content-blind — `archive_email` is generated for every EMAIL_TRIAGE situation regardless of sender. Bob's high-confidence preference "board threads always require approval" doesn't gate the candidate; it just informs scoring. Result at MODERATE_AUTONOMY: [AUTO-EXECUTE ] archive_email — Stratechery newsletter ← right [AUTO-EXECUTE ] archive_email — Board chair: May meeting ← WRONG [AUTO-EXECUTE ] archive_email — CFO: Q2 forecast review ← WRONG [NEEDS APPROVAL] accept_invite — Eng leadership 1:1 ← right [AUTO-EXECUTE ] snooze_reminder — Adobe Creative Cloud ← right [AUTO-EXECUTE ] escalate_to_user — Friendly check-in ← right Production safeguards against this: 1. OBSERVER / SUGGEST trust tier always gates everything (test asserts). 2. A sender-aware `CandidateGenerator` reads sender + content and produces an irreversible `flag_for_manual_review` candidate for protected senders. The included `protectiveGenerator` demonstrates this — same shape as the LLM strategy that runs in production. With the protective generator wired in: [AUTO-EXECUTE ] archive_email — Stratechery newsletter [NEEDS APPROVAL] flag_for_manual_review — Board chair [NEEDS APPROVAL] flag_for_manual_review — CFO 16 tests across two describe blocks. Also exports LabelInferencePort and SenderLabelHint from @skytwin/decision-engine so tests can build the custom Gmail-history-aware label hint port (#122). Full suite: 70/70 turbo tasks pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(#197): SenderAwareCandidateGenerator + memory-enriched DecisionContext Closes the safety gap surfaced by the fake-user E2E: the rule-based candidate generator was content-blind, so at MODERATE_AUTONOMY the twin auto-archived board chair / CFO / legal emails the same way it auto-archived newsletters. This patch lands two production wirings: 1. New @skytwin/decision-engine export `SenderAwareCandidateGenerator` — a CandidateGenerator that wraps the rule-based generator with a pre-pass on `decision.rawData.from` and decision content. When the sender or subject matches a protected pattern (board/chair/cfo/coo/ ceo/founder/partner/investor/legal/counsel/attorney/sec/audit/ compliance/tax) or content mentions a protected topic (term sheet/wire transfer/signed/nda/equity/cap table/board deck/ earnings/payroll), the generator SUPPRESSES the rule-based candidate set entirely and emits ONLY a `flag_for_manual_review` candidate (irreversible, CONFIDENCE: CONFIRMED). The built-in policy NO_IRREVERSIBLE_WITHOUT_APPROVAL gates this through the approval queue at every trust tier. Suppressing the base set (rather than just prepending the flag) is load-bearing: if archive_email is in the candidate list it scores higher than flag (lower risk because reversible) and would auto-execute anyway — the very bug we are fixing. Configurable via `protectedPattern` and `protectedSubjectPattern` constructor options; defaults match common corporate email surface area. 2. Wired SenderAwareCandidateGenerator into events.ts as the rule-based fallback. Used both: - directly as the DecisionMaker's CandidateGenerator when no LLM client is configured - as the inner RuleBasedCandidateGenerator that LLM strategies fall back to when LLM calls fail This means the safety improvement applies both to users without LLM keys (rule-based by default) and to LLM users when their LLM call fails — there is no path through events.ts that auto-archives a board email at MODERATE_AUTONOMY+. 3. Wired episodicMemories into DecisionContext. mempalaceRepository .getEpisodes is fetched in parallel with patterns/traits/temporal profile, mapped onto the EpisodicMemory shape, and passed to DecisionMaker.evaluate. The existing scoreCandidate boost (decision-maker.ts:1285+) consumes this field to weight candidates that match historically-positive past decisions. Closes the "twin's memory of past decisions affects current decisions" loop that was structurally present (the field existed) but unwired. Tests: - 12 unit tests for SenderAwareCandidateGenerator covering: protected senders (board/CFO/legal/investor), protected subjects (term sheet, wire transfer, cap table), routine email passthrough, non-email situations passthrough, custom pattern overrides. - 3 integration tests for the events.ts wiring: board chair email selects flag_for_manual_review and does not auto-execute; routine newsletter selects archive/label; mempalaceRepository.getEpisodes is called with the right (userId, {domain, situationType, limit}). - Updated existing events-routes.test.ts mock to include SenderAwareCandidateGenerator + emailLabelRepository + mempalaceRepository. Full suite: 70/70 turbo tasks pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(#197): embedding-backfill worker — drains brain_embedding_jobs queue Required when an external embedding provider (OpenAI, Ollama, vLLM) is configured: the synchronous embed call inside recordSignal can fail (rate limit, network, timeout). The write path persists the page row unembedded and queues a job to brain_embedding_jobs. Without this worker the queue never drains and search recall silently degrades — pages exist in tsvector index but not the vector index, so RRF gives them only the text-rank contribution. What ships: - apps/worker/src/jobs/embedding-backfill.ts: - `runEmbeddingBackfillJob({ batchSize, embedding })` — single-cycle drain. Leases up to batchSize jobs via SELECT FOR UPDATE SKIP LOCKED, embeds, persists, marks done. Failed jobs go through markJobFailed which auto-retries up to 3 times (the brain_embedding_jobs CHECK constraint flips status to 'failed' on the 4th attempt). - `getWorkerEmbeddingProvider()` — env-driven provider selection that mirrors `apps/api/src/memory-setup.ts` exactly. Same selection logic on both sides is load-bearing: if API embeds rows with OpenAI but worker embeds with hash-trick, cosine across them collapses. - Returns a structured `EmbeddingBackfillSummary` with attempted / succeeded / failed / pendingAfter counters that the worker loop logs on each non-empty cycle. - apps/worker/src/index.ts: scheduled at 30s intervals alongside the existing metrics-rollup / changelog-poll / domain-extraction / federation-sync jobs. SKIP LOCKED makes it safe under multiple worker instances simultaneously. Tests: 12 cases in apps/worker/src/__tests__/embedding-backfill.test.ts: - happy path (drains queue, marks each done, respects batchSize) - failure handling (embedding throws → markJobFailed; lease throws → cycle stops cleanly; markJobFailed itself throws → run continues) - pendingAfter from DB and graceful pending-query failure - env-driven provider choice (hash default, OpenAI when key set, OPENAI_EMBEDDING_MODEL override, fallback to OPENAI_API_KEY) Full suite: 70/70 turbo tasks pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(#197): feedback → episode loop closes the learning circle When a user approves or rejects an action via POST /api/approvals/:id/respond, also persist an Episode into the memory layer. The next time a similar decision is evaluated, mempalaceRepository.getEpisodes (already wired in events.ts as part of this PR) pulls that episode into DecisionContext.episodicMemories. DecisionMaker.calculateEpisodicBoost (decision-maker.ts:1285+) consumes the episode's `actionTaken` + `utilityScore` to tilt the candidate score: - approve → utility 0.9 → next time the same candidate appears it gets a positive boost on score, making auto-execute more likely. - reject → utility 0.0 → next time the same candidate's score gets no boost (and other candidates with non-zero utility from past approvals leapfrog it). This closes the loop on the memory architecture: the twin's memory of what the user *actually decided* feeds back into the next decision, without any manual preference editing. The previous behaviour was that approvals only updated the TwinService preferences (which influence candidate confidence); episodes are a different signal — they record the SPECIFIC action that won, not just the user's domain-level pref. Implementation: - apps/api/src/routes/approvals.ts: after `processFeedback` returns and before the (optional) execution branch, lookup the originating decision and call `mempalaceRepository.createEpisode` with the approval outcome. Wrapped in a try/catch — episode persistence is best-effort; never blocks the approval response on a memory-layer hiccup. - The episode shape carries the full breadcrumb: userId, situationSummary (from the decision's interpreted summary, with a synthetic fallback), domain, situationType, actionTaken (from the candidate that the user approved/rejected), feedbackType, feedbackDetail (the user's reason), decisionId (so callers can join back), and utilityScore. Tests: 4 cases in apps/api/src/__tests__/feedback-loop.test.ts: - approve path → createEpisode called with utility 0.9 - reject path → createEpisode called with utility 0.0 + reason text - createEpisode throws → approval still returns 200 (best-effort) - decision row missing interpreted summary → synthetic fallback Full suite: 70/70 turbo tasks pass; api 535 / worker 83 / memory-gbrain 86. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(#197): gbrain MemoryPort writes on ingest + corrections E2E Two improvements that close the remaining loop in the gbrain memory layer: 1. **events.ts now writes inbound signals to gbrain.** Previously, every inbound event landed in the legacy `signals` / `decisions` tables but `brain_pages` stayed empty in production — meaning searchSemantic returned nothing, even with the gbrain backend explicitly selected. The new `recordSignalToMemory` helper calls `getMemoryPortForUser(userId).port.recordSignal(...)` on every ingest (fire-and-forget, so memory hiccups don't block the decision pipeline). 2. **approvals.ts now writes the resulting episode to gbrain too.** The prior commit added the legacy mempalaceRepository.createEpisode call; this layer adds a parallel `port.recordEpisode` so the gbrain backend's semantic index covers approved/rejected outcomes. Future similar signals' searchSemantic queries surface these episodes directly. Tests: - apps/api/src/__tests__/gbrain-write-on-events.test.ts: real Express round-trip with a stubbed @skytwin/db query layer; asserts that an inbound /api/events/ingest results in INSERT INTO brain_pages firing via the MemoryPort path (not just brain_signals). - packages/decision-engine/src/__tests__/twin-learns-from-corrections.test.ts: 5 cases proving DecisionMaker.calculateEpisodicBoost actually shifts outcomes when episodicMemories carry feedback: * baseline (no memory) selects deterministically * rejection episode does not improve the rejected action's rank * heavy rejections cannot improve the rejected action's rank * approval reinforcement keeps the approved winner * memory only matters when episode.actionTaken matches the candidate These tests run the REAL DecisionMaker.evaluate against in-memory TwinService + PolicyEvaluator ports — so the assertions exercise the exact production scoring code path, not a mock. Full suite: 70/70 turbo tasks pass; api 536, decision-engine 109. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(#197): assistant chat uses MemoryPort.searchSemantic alongside legacy ILIKE Wires the chat assistant's `MemoryContextProvider` to call the user's selected MemoryPort (`getMemoryPortForUser`) in parallel with the legacy `mempalaceRepository.searchEpisodes` ILIKE path, dedupes by summary, and returns the merged top-K. This means chat answers automatically benefit from gbrain's vector + tsvector RRF retrieval when the gbrain backend has indexed pages — without losing the cold-install behavior where mempalace's ILIKE returns recent episodes immediately. Why both: - Hot install with gbrain: the semantic side surfaces vector-relevant pages the ILIKE keyword search would miss (e.g. "what did the CFO say?" returns CFO threads even when the user didn't type the literal word "CFO" in their question). Mempalace ILIKE then catches anything in the legacy table that hasn't been re-indexed yet. - Cold install: brain_pages is empty so semantic returns []. The mempalace path serves chat answers without a wait for the worker to backfill embeddings. - Both run in parallel; the slower of the two does not gate the chat response. Per-side errors are caller-swallowed. The dedupe is by lowercased summary text — same episode often surfaces from both sources, especially after `recordEpisode` has dual-written it. Full suite: 70/70 turbo tasks pass; api 536 passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(#197): full-loop E2E — signal → approval → next signal carries the episode Drives the entire memory-feedback loop through real Express route handlers: POST /api/events/ingest (board chair email — sender-aware path) → flag_for_manual_review candidate, autoExecute=false, approval created POST /api/approvals/:id/respond (user rejects) → mempalaceRepository.createEpisode called with utility 0.0, feedback_type='reject', action_taken='flag_for_manual_review' → episodeStore now has the rejection row POST /api/events/ingest (similar board email) → mempalaceRepository.getEpisodes called, returns the rejection episode → DecisionContext.episodicMemories carries it → DecisionMaker.calculateEpisodicBoost weighs it This proves the wiring intact across all three route handlers and the DB-backed memory store. The unit-level proof that boost actually shifts scoring lives in packages/decision-engine/src/__tests__/twin-learns-from-corrections.test.ts. Full suite: 70/70 turbo tasks pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(#197): memory dashboard — show users what their twin remembers Closes the "show me the goods" gap: until now, all the memory infrastructure was invisible to users. The dashboard surface makes the value visible. Ships: 1. **`GET /api/memory-config/dashboard`** — operator + user-facing view: - `index`: total pages, embedded pages, pending embedding jobs - `episodes.recent[]`: last 10 episodes (summary, action, feedback) - `episodes.feedbackCounts`: histogram (approve / reject / undo / pending) - `entities.total`, `topByRecency` (last 10), `topByType` (top 5) Each query is independently failure-handled via .catch(() => default), so a partial DB hiccup degrades gracefully rather than 500ing the whole dashboard. 2. **`apps/web/public/js/pages/memory-settings.js`** — new "What your twin remembers" card under the existing backend selector: - Recent decisions table with timestamps, action, feedback badge, and the situation summary. - Feedback count strip (✓ approved, ✗ rejected, etc.) - Top entities by recency + entity-type histogram. - All three dashboard / config / diagnostics endpoints fetched in parallel for snappy load. Tests: 5 new cases in memory-config-routes.test.ts covering: - 400 on invalid userId - empty-state shape - feedback counts aggregated correctly - top entities sorted by recency, type histogram by count - partial DB failure → graceful degraded response Full suite: 70/70 turbo tasks pass; api 541 / 558 (added 5). This makes the gbrain memory layer's value legible to the user — they can see entities accumulating, episodes recording approve/reject signals, embeddings backfilling. The "twin learns" loop is now visible end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#197 post-/review): address Copilot findings + real MemPalaceMemoryPort Addresses every finding from Copilot's PR #250 review plus the merge conflict with main. Blockers fixed: 1. **Migration PK types** — brain_pages/brain_entities/brain_triples/ brain_episodes/brain_signals all had `id UUID PRIMARY KEY`. Production signal IDs are not UUIDs (e.g. `sig_gmail_abc123`); they're connector-assigned opaque strings. Forcing UUID would 500 every recordSignal in prod. Changed to `id STRING PRIMARY KEY DEFAULT gen_random_uuid()::STRING`. brain_settings.user_id stays UUID (real FK to users) and brain_embedding_jobs.id stays UUID (internal-only). 2. **StubMempalacePort replaced** — selecting `mempalace` (or relying on hybrid secondary) used to drop all legacy mempalace data on the floor. Now wires a real `MemPalaceMemoryPort` with a proper `MemPalaceRepos` adapter against `mempalaceRepository`. Covers knowledgeGraph (upsertEntity/getEntities/findEntity/addTriple/ queryTriples/invalidateTriple) and episode (createEpisode/getEpisodes/ getEpisodeByDecision/updateEpisode/searchEpisodes). Palace / closet / entityCode methods throw (they're never reached via MemoryPort, but throwing makes any future regression loud). Other bugs Copilot flagged: 3. **`pendingEmbeddingJobs` per-user** — the dashboard was showing the global queue depth instead of the user's. Added optional `userId` parameter; defaults to global for the worker drain telemetry but the API route now passes userId so the dashboard reports the right number in multi-tenant installs. 4. **`candidatePoolSize` computed per-query** — docstring promised `max(k*4, 40)` but constructor hard-coded 40, truncating recall on large-K queries. Store the user override as a sentinel and apply the max-based default in `searchInternal`. 5. **In-memory `embeddingModel` parity** — when `embed()` rejected in the in-memory path, we still set `embeddingModel: this.embedding.model`, leaving pages with non-null model + null embedding. The CRDB path conditionally sets only when embedding succeeded. Matched both paths. 6. **`event.target.closest` guard** — memory-settings click delegator could throw on text-node clicks. Guard with `instanceof Element` per CLAUDE.md frontend event-handling discipline. 7. **`getEntitiesByType` routed through `resolveReadPort`** — was hard-wired to secondary, sending entity reads to the secondary even when the primary (gbrain) could serve them. Added a routing rule defaulting to primary; fallback still kicks in when capability is absent. 8. **docs/memory-swap.md capability table** — claimed `mempalace` had no semantic_search; the real `MemPalaceMemoryPort` declares it (backed by ILIKE). Updated to show `ILIKE` in the cell + a note explaining when to prefer each backend. Plus a rebase onto main (#248 first-run dashboard merged in between). The conflict was in CHANGELOG.md — both entries are now stacked under unreleased. Full suite: 70/70 turbo tasks pass; api 542, decision-engine 109, memory-gbrain 86. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(#197): /document-release sync — arch-philosophy, cockroach-architecture, technical-spec Post-ship documentation update for the gbrain memory backend ship. - docs/architecture-philosophy.md: memory port row updated to reflect gbrain (default, CRDB-native) + mempalace (selectable fallback). The "interim" framing was obsolete — gbrain is the default. - docs/cockroach-architecture.md: added the 7 brain_* tables to the schema reference. Documented the STRING-PK choice (production signal ids aren't UUIDs; the table reflects that contract). - docs/technical-spec.md: package layout shows the 5 new memory-* packages. Build dependency chain updated to include them in topological order. Full suite: 70/70 turbo tasks pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(#197 polish): SSE live memory dashboard + Ollama recipe + CRDB harness - SSE: events.ts emits `memory:page-indexed` after a successful `recordSignalToMemory`; approvals.ts emits `memory:episode-recorded` after `mempalaceRepository.createEpisode`. Web sse-client.js subscribes and dispatches `sse:memory:*` CustomEvents; memory-settings.js wires a module-singleton listener (1s debounce) that re-renders the dashboard without polling. - Ollama recipe in docs/memory-swap.md — zero-cloud local embeddings via the OpenAI-compatible /v1/embeddings endpoint (nomic-embed-text default). - CRDB integration harness: packages/memory-gbrain-crdb-adapter ships `scripts/run-crdb-integration.sh` (Docker-based) and a `test:crdb` package script. Spins a hermetic CRDB, applies migration 040, seeds a test user, and runs the 6 DB-gated integration tests. - Tests: feedback-loop.test.ts now mocks `createEpisode` resolved value so the SSE emit path is reachable, plus an assertion that `memory:episode-recorded` is emitted. gbrain-write-on-events.test.ts gains a parallel assertion for `memory:page-indexed`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(#197): align memory-swap prose with corrected capability table The bullet on line 24-25 still said mempalace "declares no semantic_search capability" — that was true at the start of #197 but MemPalaceMemoryPort.capabilities() now returns 'semantic_search' (ILIKE-backed). The table below already reflected this; the prose did not. Tightens the wording to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#197 post-/review): harden CRDB integration harness Three issues found by the /review pass on commits aa78f13..HEAD: 1. Migration was applied twice. The first apply ran against `skytwin_test` *before* the inlined `users` table existed there, so the brain_* FK references failed silently (psql -f exits 0 on per-statement errors without ON_ERROR_STOP). The second apply then worked because the tables already partially existed. Reordered to create-db → create-users-in-test-db → apply-migration-once. 2. Added `-v ON_ERROR_STOP=on` to every psql invocation so any future schema regression fails the harness loudly instead of being masked by `>/dev/null`. 3. The cockroach-ready wait loop completed silently after 30s even on total startup failure; now sets a `ready` flag and bails with the container's last 20 log lines if the DB never accepts connections. Also tightened TEST_USER_ID parsing: `-A` unaligned output + tr against `[:space:]` instead of just ` \n`, plus an empty-result check. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(#197 post-/document-release): restore Embedded LLM round-3 CHANGELOG entry The "Embedded LLM downloader: round-3 review fixes (#187 AC#2 follow-up)" entry landed on main via PR #249 (commit c6e93de) after this branch last rebased. The branch did not pull it in, so squash-merging would have deleted it from main as a side effect. Restored verbatim from origin/main:CHANGELOG.md so the squash diff is purely additive. Caught by the /document-release cross-doc consistency pass — exactly the kind of silent regression that motivated adding the pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(#197 post-/review r2): address Copilot round-2 findings Five findings from Copilot's re-review on commit 50f297c: 1. **brain_embedding_jobs.page_id FK type mismatch (HIGH)** — column was declared `UUID` but `brain_pages.id` is `STRING`. CRDB would reject the FK at apply time, or accept it and reject any insert with a non-UUID page id (which is most signal-derived pages — `sig_gmail_abc123` etc.). Fixed in migration 040 to `page_id STRING` matching `brain_pages.id` exactly. 2. **assistant.ts dedupe comment was wrong** — outer comment said "dedupe by (summary, occurredAt)" but the implementation uses just summary. Updated the comment to reflect the actual logic and explain WHY occurredAt can't be in the key (gbrain hits never carry one; including it would defeat cross-source dedupe entirely). 3. **hybrid-port.ts resolveReadPort docstring drift** — claimed a 3-step priority (override → routing table → capability fallback) but the implementation collapses steps 1+2 (the override IS the routing table) and step 3 is the same as the capability fallback inside step 2. Rewrote the docstring to match the actual logic. 4. **.claude/scheduled_tasks.lock leaked into the PR** — runtime session lock metadata (sessionId / pid / ts) was getting committed on every session. `git rm --cached` to untrack, added to .gitignore so future sessions don't re-add it. This is technically a removal-from-main but is the right long-term shape. 5. **BrainPageRow.embedding type / parsePageRow runtime mismatch** — types.ts declared `number[] | null` but parsePageRow defensively checks `typeof === 'string'` for the pg array-literal case, which strict mode flagged as always-false. Introduced a `RawBrainPageRow` type with `embedding: number[] | string | null` for the raw DB shape, and parsePageRow narrows it to `BrainPageRow` (with `number[] | null`) for downstream consumers. No behaviour change. Verified: pnpm --filter @skytwin/api test → 544 pass; memory-* tests → 155 pass. No regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 7, 2026 20:21

jayzalowitz added enhancement New feature or request capability-loop Capability Acquisition Loop epic + children (OSS launch v1) labels May 7, 2026

Copilot started reviewing on behalf of jayzalowitz May 7, 2026 20:22 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

jayzalowitz merged commit 086f979 into main May 7, 2026
12 checks passed

jayzalowitz mentioned this pull request May 8, 2026

fix(copilot-sweep batch 1): security + correctness across this week's epics #226

Merged

5 tasks

This was referenced May 9, 2026

Epic: Replace @skytwin/mempalace with GBrain — adopt the open-source production brain as SkyTwin's memory layer #138

Closed

feat(#197): gbrain memory backend + CRDB adapter + hybrid composer (default) #250

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(#197 partial): memory-gbrain + memory-hybrid scaffolding (skeleton, v1.0.5 target)#215

feat(#197 partial): memory-gbrain + memory-hybrid scaffolding (skeleton, v1.0.5 target)#215
jayzalowitz merged 1 commit into
mainfrom
feat/memory-gbrain-port

jayzalowitz commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"@skytwin/memory-port": "workspace:*",
		"@skytwin/shared-types": "workspace:*"

Conversation

jayzalowitz commented May 7, 2026

Summary

What's in here

@skytwin/memory-gbrain

@skytwin/memory-hybrid

Hard rails preserved

Explicitly deferred to v1.0.5

Tests

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`@skytwin/memory-gbrain`

`@skytwin/memory-hybrid`