Skip to content

feat(#187 follow-up): first-run dashboard "needs a brain" prompt#248

Merged
jayzalowitz merged 4 commits into
mainfrom
feat/dashboard-brain-prompt
May 9, 2026
Merged

feat(#187 follow-up): first-run dashboard "needs a brain" prompt#248
jayzalowitz merged 4 commits into
mainfrom
feat/dashboard-brain-prompt

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

A tiny but launch-critical UX gap: a brand-new user lands on the dashboard with no AI provider configured and an empty state that offers no path forward. They're left to discover Settings → AI brain on their own.

This PR surfaces a banner when zero AI providers are enabled AND zero decisions exist yet — the natural first-run state. Two CTAs route to Settings:

Privacy framing leads: "SkyTwin runs locally on your machine — no API keys, no per-message cost, your data never leaves the device. Pick up a free model in 5 minutes, or bring your own paid provider if you'd rather."

That's the real first-run differentiator vs. ChatGPT-style assistants. The dominant first-run fear is "what is this thing going to do with my data?" — leading with local-first answers it before they have to ask.

Why now

Once #247 (model downloader) lands, the local-brain path is fully self-serve in 5 minutes. But the dashboard didn't tell the user that path exists. This PR closes that gap and means the dashboard works as a first-run surface even before the user opens Settings.

Independent of #247 — this routes to Settings either way. Once #247 merges, the route lands on the model downloader card. Until then, it lands on the AI provider config (still self-serve, just paid).

Implementation

  • apps/web/public/js/pages/dashboard.js — new renderBrainPrompt(); gated by enabledProviderCount === 0 && recentDecisions.length === 0 and !tourMode (seeded demo user has providers pre-configured)
  • Reuses GET /api/settings/:userId via fetchSettingsno new API endpoint
  • One new dashboard card, slotted between the "since last visit" banner and the Connect Google hero

Test plan

  • pnpm --filter @skytwin/web build — clean
  • New user (no providers, no decisions, not in tour) — banner appears
  • Tour mode user — banner hidden
  • User with 1+ enabled provider — banner hidden
  • User with no providers but ≥1 decision — banner hidden (they got past first-run somehow)
  • CTAs route to #/settings

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 9, 2026 17:22

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a first-run onboarding banner to the web dashboard so brand-new users who have no enabled AI providers and no decisions yet get an obvious path to configure an “AI brain” via Settings.

Changes:

  • Add renderBrainPrompt() dashboard card with local-first privacy framing and two CTAs to #/settings.
  • Fetch user settings on the dashboard (reusing existing GET /api/settings/:userId) to detect whether any AI providers are enabled.
  • Document the UX change in CHANGELOG.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
CHANGELOG.md Adds an unreleased entry describing the first-run “needs a brain” dashboard prompt.
apps/web/public/js/pages/dashboard.js Fetches settings and conditionally renders a first-run banner when no providers are enabled and no decisions exist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/web/public/js/pages/dashboard.js Outdated
<span class="card-title">Your twin needs a brain to start</span>
</div>
<div class="card-subtitle" style="margin-bottom: 0.75rem; line-height: 1.5;">
SkyTwin runs <strong>locally on your machine</strong> — no API keys, no per-message cost, your data never leaves the device. Pick up a free model in 5 minutes, or bring your own paid provider if you'd rather.
Comment thread apps/web/public/js/pages/dashboard.js Outdated
Comment on lines 282 to 299
@@ -278,6 +295,7 @@ export async function renderDashboard(container, userId) {
fetchBriefing(userId),
fetchLatestTwinBriefing(userId, 'daily').catch(() => null),
slowFetch(`lifebooks-${userId}`, fetchLifebooks, [userId]),
slowFetch(`settings-${userId}`, fetchSettings, [userId]),
]);
Comment thread apps/web/public/js/pages/dashboard.js Outdated
Comment on lines +363 to +369
const aiProviders = settingsData?.status === 'fulfilled'
? (settingsData.value?.aiProviders ?? [])
: [];
const enabledProviderCount = Array.isArray(aiProviders)
? aiProviders.filter((p) => p?.enabled).length
: 0;
const showBrainPrompt = !tourMode && enabledProviderCount === 0 && recentDecisions.length === 0;
jayzalowitz added a commit that referenced this pull request May 9, 2026
Three findings from Copilot's review of PR #248, all addressed:

- **Privacy copy was misleading**: "your data never leaves the
  device" was true only for the local brain path; the BYO API key
  path sends data to the third-party. Restructured the card so the
  privacy claim is scoped per-option: "Local brain — runs on your
  machine, no API keys, no per-message cost, your data never leaves
  the device" / "API key — uses Anthropic / OpenAI / Google. Faster
  on a small laptop, but each message goes to that provider."
  Headline now neutral: "Pick how your twin thinks."

- **Stale provider state**: settingsData was wrapped in slowFetch
  with a 30s TTL, so a user who enabled their first AI provider in
  Settings and bounced back to the dashboard could keep seeing the
  prompt for half a minute. Switched to a direct fetchSettings()
  call — the endpoint is small and only relevant for first-run
  renders, so the cache savings weren't earned.

- **False positive on transient API failure**: Both fetchSettings
  and fetchDecisions falling back to empty arrays meant a single
  blip would surface the onboarding prompt to users who actually
  have providers and decisions. Now requires positive evidence
  before showing: settingsFulfilled && decisionsFulfilled.

Web build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jayzalowitz

Copy link
Copy Markdown
Owner Author

Pushed b00192b addressing all 3 Copilot findings:

Privacy copy was misleading (line 253) — fixed. "Your data never leaves the device" was true only for the local brain path; the BYO API-key path obviously sends each message to the third-party. Restructured the card so the privacy claim is scoped per-option:

Local brain — runs on your machine, no API keys, no per-message cost, your data never leaves the device.
API key — uses Anthropic / OpenAI / Google. Faster on a small laptop, but each message goes to that provider.

Headline is now neutral: "Pick how your twin thinks."

Stale provider state via slowFetch (line 299) — fixed. Dropped the slowFetch wrapper. The endpoint is small, this fetch is only relevant for first-run users (who don't have many other dashboard re-renders happening), and the cache savings weren't earned vs. the cost of "user enables a provider and the prompt sticks around for 30s."

False positive on transient API failure (line 369) — fixed. Both fetchSettings and fetchDecisions falling back to empty arrays meant a single blip would surface the onboarding banner to users who actually have providers + decisions. showBrainPrompt now requires positive evidence:

const showBrainPrompt = !tourMode
  && settingsFulfilled
  && decisionsFulfilled
  && enabledProviderCount === 0
  && recentDecisions.length === 0;

Web build clean.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread apps/web/public/js/pages/dashboard.js Outdated
Comment on lines +286 to +306
@@ -278,6 +299,11 @@ export async function renderDashboard(container, userId) {
fetchBriefing(userId),
fetchLatestTwinBriefing(userId, 'daily').catch(() => null),
slowFetch(`lifebooks-${userId}`, fetchLifebooks, [userId]),
// Not slow-cached: a user enabling their first AI provider in
// Settings should make the first-run prompt go away on the next
// dashboard render, not 30s later. The endpoint is small and
// only relevant on first-run renders anyway.
fetchSettings(userId),
Comment thread apps/web/public/js/pages/dashboard.js Outdated
? (settingsData.value?.aiProviders ?? [])
: [];
const enabledProviderCount = Array.isArray(aiProviders)
? aiProviders.filter((p) => p?.enabled).length
jayzalowitz added a commit that referenced this pull request May 9, 2026
…ntion

Two findings from Copilot's re-review of PR #248:

- **fetchSettings on every render is noisy**: dropping it into the
  Promise.allSettled batch meant SSE-driven re-renders hit /api/settings
  every tick once the user had any activity. Restructured: only fetch
  when the cheap prerequisites (`!tourMode && recentDecisions.length === 0`)
  already point at first-run. After the user gets any decision the fetch
  doesn't happen at all.

- **Provider-enabled check disagreed with Settings UI**: settings.js
  treats providers as enabled unless `enabled === false` (`p.enabled !==
  false`); my dashboard code only counted truthy values. An existing
  user with providers stored without an explicit `enabled` field would
  have been incorrectly shown the first-run banner. Aligned to
  `p?.enabled !== false`.

Web build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jayzalowitz

Copy link
Copy Markdown
Owner Author

Pushed c9fa927 addressing both findings from your re-review:

  • Conditional fetch (line 306): fetchSettings removed from the Promise.allSettled batch. Now gated on the cheap prerequisites (!tourMode && decisionsFulfilled && recentDecisions.length === 0) — once the user has any activity, the fetch doesn't run at all. SSE-driven re-renders no longer hit /api/settings every tick.

  • Provider enablement convention (line 382): aligned with apps/web/public/js/pages/settings.js (lines 1220, 1340). Now uses p?.enabled !== false so existing rows without an explicit enabled field are treated as on, matching the Settings UI's truthiness rule.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +365 to +380
// First-run "needs a brain" prompt. Two prerequisites are cheap and
// already known here: tour mode (always-off) and recentDecisions
// (zero only on first-run-ish accounts). Only when both clear do we
// pay for a settings fetch — keeps SSE-driven re-renders from hitting
// /api/settings on every tick once the user has any history.
// Provider enablement matches settings.js: a provider is "enabled"
// unless `enabled === false`, so existing rows without an explicit
// field are treated as on (same convention the Settings UI uses).
const decisionsFulfilled = decisions?.status === 'fulfilled';
let showBrainPrompt = false;
if (!tourMode && decisionsFulfilled && recentDecisions.length === 0) {
try {
const settings = await fetchSettings(userId);
const aiProviders = Array.isArray(settings?.aiProviders) ? settings.aiProviders : [];
const enabledProviderCount = aiProviders.filter((p) => p?.enabled !== false).length;
showBrainPrompt = enabledProviderCount === 0;
jayzalowitz and others added 3 commits May 9, 2026 14:38
Surface a banner on the dashboard when a new user has zero AI providers
enabled AND zero decisions yet — the natural first-run state where the
dashboard would otherwise look empty and offer no path forward.

Two CTAs route to Settings: "Set up the local brain" (where the #187
AC#2 model downloader lives) and "Or bring your own API key". Copy
leads with the privacy framing — "runs locally on your machine — no
API keys, no per-message cost, your data never leaves the device" —
because that's the real first-run differentiator.

Skipped in tour mode (seeded demo user has providers pre-configured).

Reuses GET /api/settings/:userId via fetchSettings — no new API.
Build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from Copilot's review of PR #248, all addressed:

- **Privacy copy was misleading**: "your data never leaves the
  device" was true only for the local brain path; the BYO API key
  path sends data to the third-party. Restructured the card so the
  privacy claim is scoped per-option: "Local brain — runs on your
  machine, no API keys, no per-message cost, your data never leaves
  the device" / "API key — uses Anthropic / OpenAI / Google. Faster
  on a small laptop, but each message goes to that provider."
  Headline now neutral: "Pick how your twin thinks."

- **Stale provider state**: settingsData was wrapped in slowFetch
  with a 30s TTL, so a user who enabled their first AI provider in
  Settings and bounced back to the dashboard could keep seeing the
  prompt for half a minute. Switched to a direct fetchSettings()
  call — the endpoint is small and only relevant for first-run
  renders, so the cache savings weren't earned.

- **False positive on transient API failure**: Both fetchSettings
  and fetchDecisions falling back to empty arrays meant a single
  blip would surface the onboarding prompt to users who actually
  have providers and decisions. Now requires positive evidence
  before showing: settingsFulfilled && decisionsFulfilled.

Web build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntion

Two findings from Copilot's re-review of PR #248:

- **fetchSettings on every render is noisy**: dropping it into the
  Promise.allSettled batch meant SSE-driven re-renders hit /api/settings
  every tick once the user had any activity. Restructured: only fetch
  when the cheap prerequisites (`!tourMode && recentDecisions.length === 0`)
  already point at first-run. After the user gets any decision the fetch
  doesn't happen at all.

- **Provider-enabled check disagreed with Settings UI**: settings.js
  treats providers as enabled unless `enabled === false` (`p.enabled !==
  false`); my dashboard code only counted truthy values. An existing
  user with providers stored without an explicit `enabled` field would
  have been incorrectly shown the first-run banner. Aligned to
  `p?.enabled !== false`.

Web build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jayzalowitz jayzalowitz force-pushed the feat/dashboard-brain-prompt branch from c9fa927 to 030c950 Compare May 9, 2026 18:38
Copilot's third-round review of PR #248: in the post-OAuth first-scan
window, the dashboard re-renders every 4s. My conditional fetchSettings
fired on every render for users who'd connected Google but hadn't
configured an AI provider — 4s API hits forever until they got a
decision.

Memoize via a module-level cache with 5s TTL. Short enough that "user
enables a provider in Settings → returns to dashboard" feels instant;
long enough to absorb the 4s polling storm. Kept separate from the
30s _slowCache that other dashboard fetches use — same idea, different
freshness budget.

Web build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jayzalowitz jayzalowitz merged commit 34a2879 into main May 9, 2026
8 checks passed
@jayzalowitz jayzalowitz deleted the feat/dashboard-brain-prompt branch May 9, 2026 18:56
jayzalowitz added a commit that referenced this pull request May 11, 2026
…ryPort

Addresses every finding from Copilot's PR #250 review plus the merge
conflict with main.

Blockers fixed:

1. **Migration PK types** — brain_pages/brain_entities/brain_triples/
   brain_episodes/brain_signals all had `id UUID PRIMARY KEY`. Production
   signal IDs are not UUIDs (e.g. `sig_gmail_abc123`); they're
   connector-assigned opaque strings. Forcing UUID would 500 every
   recordSignal in prod. Changed to `id STRING PRIMARY KEY DEFAULT
   gen_random_uuid()::STRING`. brain_settings.user_id stays UUID (real FK
   to users) and brain_embedding_jobs.id stays UUID (internal-only).

2. **StubMempalacePort replaced** — selecting `mempalace` (or relying
   on hybrid secondary) used to drop all legacy mempalace data on the
   floor. Now wires a real `MemPalaceMemoryPort` with a proper
   `MemPalaceRepos` adapter against `mempalaceRepository`. Covers
   knowledgeGraph (upsertEntity/getEntities/findEntity/addTriple/
   queryTriples/invalidateTriple) and episode (createEpisode/getEpisodes/
   getEpisodeByDecision/updateEpisode/searchEpisodes). Palace / closet /
   entityCode methods throw (they're never reached via MemoryPort, but
   throwing makes any future regression loud).

Other bugs Copilot flagged:

3. **`pendingEmbeddingJobs` per-user** — the dashboard was showing the
   global queue depth instead of the user's. Added optional `userId`
   parameter; defaults to global for the worker drain telemetry but
   the API route now passes userId so the dashboard reports the right
   number in multi-tenant installs.

4. **`candidatePoolSize` computed per-query** — docstring promised
   `max(k*4, 40)` but constructor hard-coded 40, truncating recall on
   large-K queries. Store the user override as a sentinel and apply the
   max-based default in `searchInternal`.

5. **In-memory `embeddingModel` parity** — when `embed()` rejected in the
   in-memory path, we still set `embeddingModel: this.embedding.model`,
   leaving pages with non-null model + null embedding. The CRDB path
   conditionally sets only when embedding succeeded. Matched both paths.

6. **`event.target.closest` guard** — memory-settings click delegator
   could throw on text-node clicks. Guard with
   `instanceof Element` per CLAUDE.md frontend event-handling discipline.

7. **`getEntitiesByType` routed through `resolveReadPort`** — was
   hard-wired to secondary, sending entity reads to the secondary even
   when the primary (gbrain) could serve them. Added a routing rule
   defaulting to primary; fallback still kicks in when capability is
   absent.

8. **docs/memory-swap.md capability table** — claimed `mempalace` had no
   semantic_search; the real `MemPalaceMemoryPort` declares it (backed
   by ILIKE). Updated to show `ILIKE` in the cell + a note explaining
   when to prefer each backend.

Plus a rebase onto main (#248 first-run dashboard merged in between).
The conflict was in CHANGELOG.md — both entries are now stacked under
unreleased.

Full suite: 70/70 turbo tasks pass; api 542, decision-engine 109,
memory-gbrain 86.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request May 11, 2026
…efault) (#250)

* feat(#197): gbrain memory backend + CRDB adapter + hybrid composer (default)

Promotes @skytwin/memory-gbrain from a CLI-shellout skeleton (PR #215) to a
real, in-process, CockroachDB-backed memory layer. Default MemoryPort for
new installs is now gbrain — vector embeddings + tsvector full-text search
fused via Reciprocal Rank Fusion. No separate Postgres process, no external
CLI install — gbrain runs against the SkyTwin DB stack directly.

Per user direction: gbrain is the default, mempalace is the second option,
and everything works against CRDB where possible.

Ships:
- 040-gbrain-memory.sql: brain_pages (FLOAT8[] embedding + TSVECTOR with
  inverted index), brain_entities, brain_triples, brain_episodes,
  brain_signals, brain_settings, brain_embedding_jobs (FOR UPDATE SKIP
  LOCKED queue).
- @skytwin/memory-gbrain-crdb-adapter (NEW): repository.ts (CRDB-backed +
  hybridSearch), in-memory-repository.ts (test-friendly mirror),
  embedding.ts (HashEmbeddingProvider deterministic fallback +
  OpenAiEmbeddingProvider for any /v1/embeddings endpoint), rrf.ts.
- @skytwin/memory-gbrain: EmbeddedGbrainMemoryPort with the full MemoryPort
  surface (semantic_search, code_aware_search, temporal_triples, episodic,
  graph_walk); searchCodeAware boost; hasExternalGbrainConfig() detection.
- @skytwin/memory-hybrid: diagnostics counters + capability-aware fallback.
- apps/api/src/memory-setup.ts: per-user backend factory (default 'gbrain';
  MEMORY_BACKEND env override; per-user brain_settings.backend wins).
- apps/api/src/routes/memory-config.ts: GET/POST /api/memory-config,
  /dismiss-notification, /diagnostics.
- apps/web memory-settings page with the "your twin got smarter" notice.
- docs/memory-swap.md: backends-at-a-glance, env knobs, rollback path.

Tests: 145+ new (49 CRDB adapter + 50 memory-gbrain + 9 hybrid diagnostics
+ 21 api memory-setup/routes + 6 DB-gated integration). Full suite: 70/70
turbo tasks pass.

Closes #197.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(#197 post-/review): brain_settings default = 'gbrain' (was 'hybrid')

/review caught a real bug: the migration's `brain_settings.backend DEFAULT 'hybrid'`
disagreed with the factory's `'gbrain'` default in `apps/api/src/memory-setup.ts`.

Failure mode: a fresh user (no brain_settings row) hitting
POST /api/memory-config/dismiss-notification triggered upsertSettings({hybrid_notification_dismissed:true}).
The COALESCE in INSERT defaulted backend → 'hybrid' even though the factory
considers a missing row to mean 'gbrain'. Result: dismissing the notification
silently flipped the user's backend.

Fixed in three places (must stay in sync — comment links the others):
- packages/db/src/migrations/040-gbrain-memory.sql: column DEFAULT 'gbrain'
- packages/memory-gbrain-crdb-adapter/src/repository.ts: upsertSettings COALESCE 'gbrain'
- packages/memory-gbrain-crdb-adapter/src/in-memory-repository.ts: upsertSettings fallback 'gbrain'

Plus a regression test on the in-memory store.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(#197): persona-driven E2E + realistic-corpus + robustness + migration

Adds 50+ tests that drive the gbrain memory layer with realistic data and
edge cases — addressing the user's "deeply test this with realish examples"
request. The point isn't unit coverage (we already had that); it's
"does the system actually build a profile when fed real-life data?"

New test files:
- realistic-corpus.ts fixture: ~30 labeled signals (Gmail, calendar, notes,
  code, chat) modelled after a real twin's first month, plus deterministic
  noise generators to scale to 500.
- realistic-retrieval.test.ts: R@5/P@5 floor with labeled relevance,
  hybrid-vs-text-only ablation, multi-user isolation under load (6 users,
  500 signals each, no cross-talk).
- persona-sam-patel.ts fixture: a 6-week storyline for a Series A founder
  (fundraise prep → VC meetings → term sheet → hiring loops → close →
  vacation), with tagged signals + entities + triples + episodes.
- persona-simulation.test.ts: drives the full storyline end-to-end and
  checks every load-bearing twin behaviour: entity recognition, graph
  walks (Mahesh → Anchor VC → Beacon Series A), triple filters,
  time-bounded episode lookup, semantic search on natural-language
  founder questions, profile summarisation, full export → import round
  trip with answer parity, and week-by-week incremental emergence.
- concurrent-worker.test.ts: 200 parallel recordSignal calls; failed
  embeddings get queued; worker drains the queue with FOR UPDATE SKIP
  LOCKED semantics; failed jobs exhaust retries and stop blocking.
- migration.test.ts: mempalace-flavoured export → gbrain importAll →
  imported content is searchable; idempotent re-import skips dupes;
  export → import → re-export histogram parity.
- robustness.test.ts: every degraded mode — embedding throws / times out /
  returns junk, queries empty / oversize / punctuation-only, mixed-dim
  vector corpus (model migration), pages with null embedding, OpenAI HTTP
  abort/timeout, multi-tenant safety under partial failure.
- memory-config-roundtrip.test.ts: real Express + real factory + real
  EmbeddedGbrainMemoryPort + real HybridMemoryPort end-to-end. Stubs only
  the @skytwin/db query layer. Verifies the dismiss-notification fix
  (default backend STAYS gbrain on a fresh user).

Test totals: 86 memory-gbrain (was 18) + 50 CRDB adapter + 19 hybrid +
26 api = 181 tests across the new memory subsystem. Full suite: 70/70
turbo tasks pass.

Honest about hash-trick limits: the persona test asserts ≥80% recall
across founder questions rather than 100%, because the deterministic
fallback embedding is intentionally weak. With OpenAI text-embedding-3-small
the same test suite runs at materially higher recall — but the floor here
catches retrieval-pipeline regressions without flaking on embedding quality.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(#197): full E2E fake-user → DecisionMaker — surfaces real safety finding

Constructs a complete fake user (Bob Patel, Series A SaaS founder) with
realistic preferences, behavioural patterns, traits, and trust tier
MODERATE_AUTONOMY. Wires the actual DecisionMaker + TwinService +
PolicyEvaluator against in-memory ports and feeds a realistic inbox
through the pipeline.

This is the user's "would the system actually do the email" check.

Surfaces a real finding: with the rule-based fallback CandidateGenerator,
the DecisionMaker auto-archives BOARD CHAIR and CFO emails because the
candidates are content-blind — `archive_email` is generated for every
EMAIL_TRIAGE situation regardless of sender. Bob's high-confidence
preference "board threads always require approval" doesn't gate the
candidate; it just informs scoring. Result at MODERATE_AUTONOMY:

  [AUTO-EXECUTE  ] archive_email — Stratechery newsletter        ← right
  [AUTO-EXECUTE  ] archive_email — Board chair: May meeting       ← WRONG
  [AUTO-EXECUTE  ] archive_email — CFO: Q2 forecast review        ← WRONG
  [NEEDS APPROVAL] accept_invite — Eng leadership 1:1             ← right
  [AUTO-EXECUTE  ] snooze_reminder — Adobe Creative Cloud         ← right
  [AUTO-EXECUTE  ] escalate_to_user — Friendly check-in           ← right

Production safeguards against this:
  1. OBSERVER / SUGGEST trust tier always gates everything (test asserts).
  2. A sender-aware `CandidateGenerator` reads sender + content and
     produces an irreversible `flag_for_manual_review` candidate for
     protected senders. The included `protectiveGenerator` demonstrates
     this — same shape as the LLM strategy that runs in production.

With the protective generator wired in:

  [AUTO-EXECUTE  ] archive_email           — Stratechery newsletter
  [NEEDS APPROVAL] flag_for_manual_review  — Board chair
  [NEEDS APPROVAL] flag_for_manual_review  — CFO

16 tests across two describe blocks. Also exports LabelInferencePort and
SenderLabelHint from @skytwin/decision-engine so tests can build the
custom Gmail-history-aware label hint port (#122).

Full suite: 70/70 turbo tasks pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(#197): SenderAwareCandidateGenerator + memory-enriched DecisionContext

Closes the safety gap surfaced by the fake-user E2E: the rule-based
candidate generator was content-blind, so at MODERATE_AUTONOMY the twin
auto-archived board chair / CFO / legal emails the same way it auto-archived
newsletters. This patch lands two production wirings:

1. New @skytwin/decision-engine export `SenderAwareCandidateGenerator` —
   a CandidateGenerator that wraps the rule-based generator with a
   pre-pass on `decision.rawData.from` and decision content. When the
   sender or subject matches a protected pattern (board/chair/cfo/coo/
   ceo/founder/partner/investor/legal/counsel/attorney/sec/audit/
   compliance/tax) or content mentions a protected topic
   (term sheet/wire transfer/signed/nda/equity/cap table/board deck/
   earnings/payroll), the generator SUPPRESSES the rule-based candidate
   set entirely and emits ONLY a `flag_for_manual_review` candidate
   (irreversible, CONFIDENCE: CONFIRMED). The built-in policy
   NO_IRREVERSIBLE_WITHOUT_APPROVAL gates this through the approval
   queue at every trust tier.

   Suppressing the base set (rather than just prepending the flag) is
   load-bearing: if archive_email is in the candidate list it scores
   higher than flag (lower risk because reversible) and would auto-execute
   anyway — the very bug we are fixing.

   Configurable via `protectedPattern` and `protectedSubjectPattern`
   constructor options; defaults match common corporate email surface area.

2. Wired SenderAwareCandidateGenerator into events.ts as the rule-based
   fallback. Used both:
     - directly as the DecisionMaker's CandidateGenerator when no LLM
       client is configured
     - as the inner RuleBasedCandidateGenerator that LLM strategies fall
       back to when LLM calls fail

   This means the safety improvement applies both to users without LLM
   keys (rule-based by default) and to LLM users when their LLM call
   fails — there is no path through events.ts that auto-archives a board
   email at MODERATE_AUTONOMY+.

3. Wired episodicMemories into DecisionContext. mempalaceRepository
   .getEpisodes is fetched in parallel with patterns/traits/temporal
   profile, mapped onto the EpisodicMemory shape, and passed to
   DecisionMaker.evaluate. The existing scoreCandidate boost
   (decision-maker.ts:1285+) consumes this field to weight candidates
   that match historically-positive past decisions. Closes the
   "twin's memory of past decisions affects current decisions" loop
   that was structurally present (the field existed) but unwired.

Tests:
- 12 unit tests for SenderAwareCandidateGenerator covering: protected
  senders (board/CFO/legal/investor), protected subjects (term sheet,
  wire transfer, cap table), routine email passthrough, non-email
  situations passthrough, custom pattern overrides.
- 3 integration tests for the events.ts wiring: board chair email
  selects flag_for_manual_review and does not auto-execute; routine
  newsletter selects archive/label; mempalaceRepository.getEpisodes
  is called with the right (userId, {domain, situationType, limit}).
- Updated existing events-routes.test.ts mock to include
  SenderAwareCandidateGenerator + emailLabelRepository + mempalaceRepository.

Full suite: 70/70 turbo tasks pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(#197): embedding-backfill worker — drains brain_embedding_jobs queue

Required when an external embedding provider (OpenAI, Ollama, vLLM) is
configured: the synchronous embed call inside recordSignal can fail
(rate limit, network, timeout). The write path persists the page row
unembedded and queues a job to brain_embedding_jobs. Without this worker
the queue never drains and search recall silently degrades — pages exist
in tsvector index but not the vector index, so RRF gives them only the
text-rank contribution.

What ships:

- apps/worker/src/jobs/embedding-backfill.ts:
  - `runEmbeddingBackfillJob({ batchSize, embedding })` — single-cycle
    drain. Leases up to batchSize jobs via SELECT FOR UPDATE SKIP LOCKED,
    embeds, persists, marks done. Failed jobs go through markJobFailed
    which auto-retries up to 3 times (the brain_embedding_jobs CHECK
    constraint flips status to 'failed' on the 4th attempt).
  - `getWorkerEmbeddingProvider()` — env-driven provider selection that
    mirrors `apps/api/src/memory-setup.ts` exactly. Same selection logic
    on both sides is load-bearing: if API embeds rows with OpenAI but
    worker embeds with hash-trick, cosine across them collapses.
  - Returns a structured `EmbeddingBackfillSummary` with attempted /
    succeeded / failed / pendingAfter counters that the worker loop
    logs on each non-empty cycle.

- apps/worker/src/index.ts: scheduled at 30s intervals alongside the
  existing metrics-rollup / changelog-poll / domain-extraction /
  federation-sync jobs. SKIP LOCKED makes it safe under multiple worker
  instances simultaneously.

Tests: 12 cases in apps/worker/src/__tests__/embedding-backfill.test.ts:
- happy path (drains queue, marks each done, respects batchSize)
- failure handling (embedding throws → markJobFailed; lease throws →
  cycle stops cleanly; markJobFailed itself throws → run continues)
- pendingAfter from DB and graceful pending-query failure
- env-driven provider choice (hash default, OpenAI when key set,
  OPENAI_EMBEDDING_MODEL override, fallback to OPENAI_API_KEY)

Full suite: 70/70 turbo tasks pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(#197): feedback → episode loop closes the learning circle

When a user approves or rejects an action via POST /api/approvals/:id/respond,
also persist an Episode into the memory layer. The next time a similar
decision is evaluated, mempalaceRepository.getEpisodes (already wired in
events.ts as part of this PR) pulls that episode into
DecisionContext.episodicMemories. DecisionMaker.calculateEpisodicBoost
(decision-maker.ts:1285+) consumes the episode's `actionTaken` +
`utilityScore` to tilt the candidate score:

  - approve → utility 0.9 → next time the same candidate appears it
    gets a positive boost on score, making auto-execute more likely.
  - reject → utility 0.0 → next time the same candidate's score gets
    no boost (and other candidates with non-zero utility from past
    approvals leapfrog it).

This closes the loop on the memory architecture: the twin's memory of
what the user *actually decided* feeds back into the next decision,
without any manual preference editing. The previous behaviour was that
approvals only updated the TwinService preferences (which influence
candidate confidence); episodes are a different signal — they record
the SPECIFIC action that won, not just the user's domain-level pref.

Implementation:

- apps/api/src/routes/approvals.ts: after `processFeedback` returns and
  before the (optional) execution branch, lookup the originating
  decision and call `mempalaceRepository.createEpisode` with the
  approval outcome. Wrapped in a try/catch — episode persistence is
  best-effort; never blocks the approval response on a memory-layer
  hiccup.

- The episode shape carries the full breadcrumb: userId, situationSummary
  (from the decision's interpreted summary, with a synthetic fallback),
  domain, situationType, actionTaken (from the candidate that the user
  approved/rejected), feedbackType, feedbackDetail (the user's reason),
  decisionId (so callers can join back), and utilityScore.

Tests: 4 cases in apps/api/src/__tests__/feedback-loop.test.ts:
  - approve path → createEpisode called with utility 0.9
  - reject path → createEpisode called with utility 0.0 + reason text
  - createEpisode throws → approval still returns 200 (best-effort)
  - decision row missing interpreted summary → synthetic fallback

Full suite: 70/70 turbo tasks pass; api 535 / worker 83 / memory-gbrain 86.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(#197): gbrain MemoryPort writes on ingest + corrections E2E

Two improvements that close the remaining loop in the gbrain memory layer:

1. **events.ts now writes inbound signals to gbrain.** Previously, every
   inbound event landed in the legacy `signals` / `decisions` tables but
   `brain_pages` stayed empty in production — meaning searchSemantic
   returned nothing, even with the gbrain backend explicitly selected.
   The new `recordSignalToMemory` helper calls
   `getMemoryPortForUser(userId).port.recordSignal(...)` on every ingest
   (fire-and-forget, so memory hiccups don't block the decision pipeline).

2. **approvals.ts now writes the resulting episode to gbrain too.** The
   prior commit added the legacy mempalaceRepository.createEpisode call;
   this layer adds a parallel `port.recordEpisode` so the gbrain backend's
   semantic index covers approved/rejected outcomes. Future similar
   signals' searchSemantic queries surface these episodes directly.

Tests:
- apps/api/src/__tests__/gbrain-write-on-events.test.ts: real Express
  round-trip with a stubbed @skytwin/db query layer; asserts that an
  inbound /api/events/ingest results in INSERT INTO brain_pages firing
  via the MemoryPort path (not just brain_signals).

- packages/decision-engine/src/__tests__/twin-learns-from-corrections.test.ts:
  5 cases proving DecisionMaker.calculateEpisodicBoost actually shifts
  outcomes when episodicMemories carry feedback:
    * baseline (no memory) selects deterministically
    * rejection episode does not improve the rejected action's rank
    * heavy rejections cannot improve the rejected action's rank
    * approval reinforcement keeps the approved winner
    * memory only matters when episode.actionTaken matches the candidate

  These tests run the REAL DecisionMaker.evaluate against in-memory
  TwinService + PolicyEvaluator ports — so the assertions exercise the
  exact production scoring code path, not a mock.

Full suite: 70/70 turbo tasks pass; api 536, decision-engine 109.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(#197): assistant chat uses MemoryPort.searchSemantic alongside legacy ILIKE

Wires the chat assistant's `MemoryContextProvider` to call the user's
selected MemoryPort (`getMemoryPortForUser`) in parallel with the legacy
`mempalaceRepository.searchEpisodes` ILIKE path, dedupes by summary, and
returns the merged top-K. This means chat answers automatically benefit
from gbrain's vector + tsvector RRF retrieval when the gbrain backend
has indexed pages — without losing the cold-install behavior where
mempalace's ILIKE returns recent episodes immediately.

Why both:
  - Hot install with gbrain: the semantic side surfaces vector-relevant
    pages the ILIKE keyword search would miss (e.g. "what did the CFO
    say?" returns CFO threads even when the user didn't type the literal
    word "CFO" in their question). Mempalace ILIKE then catches anything
    in the legacy table that hasn't been re-indexed yet.
  - Cold install: brain_pages is empty so semantic returns []. The
    mempalace path serves chat answers without a wait for the worker to
    backfill embeddings.
  - Both run in parallel; the slower of the two does not gate the chat
    response. Per-side errors are caller-swallowed.

The dedupe is by lowercased summary text — same episode often surfaces
from both sources, especially after `recordEpisode` has dual-written it.

Full suite: 70/70 turbo tasks pass; api 536 passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(#197): full-loop E2E — signal → approval → next signal carries the episode

Drives the entire memory-feedback loop through real Express route handlers:

  POST /api/events/ingest         (board chair email — sender-aware path)
  → flag_for_manual_review candidate, autoExecute=false, approval created

  POST /api/approvals/:id/respond (user rejects)
  → mempalaceRepository.createEpisode called with utility 0.0,
    feedback_type='reject', action_taken='flag_for_manual_review'
  → episodeStore now has the rejection row

  POST /api/events/ingest         (similar board email)
  → mempalaceRepository.getEpisodes called, returns the rejection
    episode → DecisionContext.episodicMemories carries it →
    DecisionMaker.calculateEpisodicBoost weighs it

This proves the wiring intact across all three route handlers and the
DB-backed memory store. The unit-level proof that boost actually shifts
scoring lives in
packages/decision-engine/src/__tests__/twin-learns-from-corrections.test.ts.

Full suite: 70/70 turbo tasks pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(#197): memory dashboard — show users what their twin remembers

Closes the "show me the goods" gap: until now, all the memory infrastructure
was invisible to users. The dashboard surface makes the value visible.

Ships:

1. **`GET /api/memory-config/dashboard`** — operator + user-facing view:
   - `index`: total pages, embedded pages, pending embedding jobs
   - `episodes.recent[]`: last 10 episodes (summary, action, feedback)
   - `episodes.feedbackCounts`: histogram (approve / reject / undo / pending)
   - `entities.total`, `topByRecency` (last 10), `topByType` (top 5)

   Each query is independently failure-handled via .catch(() => default),
   so a partial DB hiccup degrades gracefully rather than 500ing the
   whole dashboard.

2. **`apps/web/public/js/pages/memory-settings.js`** — new "What your
   twin remembers" card under the existing backend selector:
   - Recent decisions table with timestamps, action, feedback badge,
     and the situation summary.
   - Feedback count strip (✓ approved, ✗ rejected, etc.)
   - Top entities by recency + entity-type histogram.
   - All three dashboard / config / diagnostics endpoints fetched in
     parallel for snappy load.

Tests: 5 new cases in memory-config-routes.test.ts covering:
   - 400 on invalid userId
   - empty-state shape
   - feedback counts aggregated correctly
   - top entities sorted by recency, type histogram by count
   - partial DB failure → graceful degraded response

Full suite: 70/70 turbo tasks pass; api 541 / 558 (added 5).

This makes the gbrain memory layer's value legible to the user — they
can see entities accumulating, episodes recording approve/reject signals,
embeddings backfilling. The "twin learns" loop is now visible end-to-end.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(#197 post-/review): address Copilot findings + real MemPalaceMemoryPort

Addresses every finding from Copilot's PR #250 review plus the merge
conflict with main.

Blockers fixed:

1. **Migration PK types** — brain_pages/brain_entities/brain_triples/
   brain_episodes/brain_signals all had `id UUID PRIMARY KEY`. Production
   signal IDs are not UUIDs (e.g. `sig_gmail_abc123`); they're
   connector-assigned opaque strings. Forcing UUID would 500 every
   recordSignal in prod. Changed to `id STRING PRIMARY KEY DEFAULT
   gen_random_uuid()::STRING`. brain_settings.user_id stays UUID (real FK
   to users) and brain_embedding_jobs.id stays UUID (internal-only).

2. **StubMempalacePort replaced** — selecting `mempalace` (or relying
   on hybrid secondary) used to drop all legacy mempalace data on the
   floor. Now wires a real `MemPalaceMemoryPort` with a proper
   `MemPalaceRepos` adapter against `mempalaceRepository`. Covers
   knowledgeGraph (upsertEntity/getEntities/findEntity/addTriple/
   queryTriples/invalidateTriple) and episode (createEpisode/getEpisodes/
   getEpisodeByDecision/updateEpisode/searchEpisodes). Palace / closet /
   entityCode methods throw (they're never reached via MemoryPort, but
   throwing makes any future regression loud).

Other bugs Copilot flagged:

3. **`pendingEmbeddingJobs` per-user** — the dashboard was showing the
   global queue depth instead of the user's. Added optional `userId`
   parameter; defaults to global for the worker drain telemetry but
   the API route now passes userId so the dashboard reports the right
   number in multi-tenant installs.

4. **`candidatePoolSize` computed per-query** — docstring promised
   `max(k*4, 40)` but constructor hard-coded 40, truncating recall on
   large-K queries. Store the user override as a sentinel and apply the
   max-based default in `searchInternal`.

5. **In-memory `embeddingModel` parity** — when `embed()` rejected in the
   in-memory path, we still set `embeddingModel: this.embedding.model`,
   leaving pages with non-null model + null embedding. The CRDB path
   conditionally sets only when embedding succeeded. Matched both paths.

6. **`event.target.closest` guard** — memory-settings click delegator
   could throw on text-node clicks. Guard with
   `instanceof Element` per CLAUDE.md frontend event-handling discipline.

7. **`getEntitiesByType` routed through `resolveReadPort`** — was
   hard-wired to secondary, sending entity reads to the secondary even
   when the primary (gbrain) could serve them. Added a routing rule
   defaulting to primary; fallback still kicks in when capability is
   absent.

8. **docs/memory-swap.md capability table** — claimed `mempalace` had no
   semantic_search; the real `MemPalaceMemoryPort` declares it (backed
   by ILIKE). Updated to show `ILIKE` in the cell + a note explaining
   when to prefer each backend.

Plus a rebase onto main (#248 first-run dashboard merged in between).
The conflict was in CHANGELOG.md — both entries are now stacked under
unreleased.

Full suite: 70/70 turbo tasks pass; api 542, decision-engine 109,
memory-gbrain 86.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(#197): /document-release sync — arch-philosophy, cockroach-architecture, technical-spec

Post-ship documentation update for the gbrain memory backend ship.

- docs/architecture-philosophy.md: memory port row updated to reflect
  gbrain (default, CRDB-native) + mempalace (selectable fallback). The
  "interim" framing was obsolete — gbrain is the default.
- docs/cockroach-architecture.md: added the 7 brain_* tables to the
  schema reference. Documented the STRING-PK choice (production signal
  ids aren't UUIDs; the table reflects that contract).
- docs/technical-spec.md: package layout shows the 5 new memory-* packages.
  Build dependency chain updated to include them in topological order.

Full suite: 70/70 turbo tasks pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(#197 polish): SSE live memory dashboard + Ollama recipe + CRDB harness

- SSE: events.ts emits `memory:page-indexed` after a successful
  `recordSignalToMemory`; approvals.ts emits `memory:episode-recorded`
  after `mempalaceRepository.createEpisode`. Web sse-client.js subscribes
  and dispatches `sse:memory:*` CustomEvents; memory-settings.js wires a
  module-singleton listener (1s debounce) that re-renders the dashboard
  without polling.
- Ollama recipe in docs/memory-swap.md — zero-cloud local embeddings via
  the OpenAI-compatible /v1/embeddings endpoint (nomic-embed-text default).
- CRDB integration harness: packages/memory-gbrain-crdb-adapter ships
  `scripts/run-crdb-integration.sh` (Docker-based) and a `test:crdb`
  package script. Spins a hermetic CRDB, applies migration 040, seeds a
  test user, and runs the 6 DB-gated integration tests.
- Tests: feedback-loop.test.ts now mocks `createEpisode` resolved value
  so the SSE emit path is reachable, plus an assertion that
  `memory:episode-recorded` is emitted. gbrain-write-on-events.test.ts
  gains a parallel assertion for `memory:page-indexed`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(#197): align memory-swap prose with corrected capability table

The bullet on line 24-25 still said mempalace "declares no semantic_search
capability" — that was true at the start of #197 but
MemPalaceMemoryPort.capabilities() now returns 'semantic_search'
(ILIKE-backed). The table below already reflected this; the prose did not.
Tightens the wording to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(#197 post-/review): harden CRDB integration harness

Three issues found by the /review pass on commits aa78f13..HEAD:

1. Migration was applied twice. The first apply ran against `skytwin_test`
   *before* the inlined `users` table existed there, so the brain_* FK
   references failed silently (psql -f exits 0 on per-statement errors
   without ON_ERROR_STOP). The second apply then worked because the
   tables already partially existed. Reordered to create-db →
   create-users-in-test-db → apply-migration-once.
2. Added `-v ON_ERROR_STOP=on` to every psql invocation so any future
   schema regression fails the harness loudly instead of being masked
   by `>/dev/null`.
3. The cockroach-ready wait loop completed silently after 30s even on
   total startup failure; now sets a `ready` flag and bails with the
   container's last 20 log lines if the DB never accepts connections.
   Also tightened TEST_USER_ID parsing: `-A` unaligned output + tr against
   `[:space:]` instead of just ` \n`, plus an empty-result check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(#197 post-/document-release): restore Embedded LLM round-3 CHANGELOG entry

The "Embedded LLM downloader: round-3 review fixes (#187 AC#2 follow-up)"
entry landed on main via PR #249 (commit c6e93de) after this branch
last rebased. The branch did not pull it in, so squash-merging would
have deleted it from main as a side effect.

Restored verbatim from origin/main:CHANGELOG.md so the squash diff is
purely additive. Caught by the /document-release cross-doc consistency
pass — exactly the kind of silent regression that motivated adding the
pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(#197 post-/review r2): address Copilot round-2 findings

Five findings from Copilot's re-review on commit 50f297c:

1. **brain_embedding_jobs.page_id FK type mismatch (HIGH)** — column was
   declared `UUID` but `brain_pages.id` is `STRING`. CRDB would reject
   the FK at apply time, or accept it and reject any insert with a
   non-UUID page id (which is most signal-derived pages —
   `sig_gmail_abc123` etc.). Fixed in migration 040 to `page_id STRING`
   matching `brain_pages.id` exactly.

2. **assistant.ts dedupe comment was wrong** — outer comment said
   "dedupe by (summary, occurredAt)" but the implementation uses just
   summary. Updated the comment to reflect the actual logic and
   explain WHY occurredAt can't be in the key (gbrain hits never carry
   one; including it would defeat cross-source dedupe entirely).

3. **hybrid-port.ts resolveReadPort docstring drift** — claimed a
   3-step priority (override → routing table → capability fallback)
   but the implementation collapses steps 1+2 (the override IS the
   routing table) and step 3 is the same as the capability fallback
   inside step 2. Rewrote the docstring to match the actual logic.

4. **.claude/scheduled_tasks.lock leaked into the PR** — runtime
   session lock metadata (sessionId / pid / ts) was getting committed
   on every session. `git rm --cached` to untrack, added to
   .gitignore so future sessions don't re-add it. This is technically
   a removal-from-main but is the right long-term shape.

5. **BrainPageRow.embedding type / parsePageRow runtime mismatch** —
   types.ts declared `number[] | null` but parsePageRow defensively
   checks `typeof === 'string'` for the pg array-literal case, which
   strict mode flagged as always-false. Introduced a `RawBrainPageRow`
   type with `embedding: number[] | string | null` for the raw DB
   shape, and parsePageRow narrows it to `BrainPageRow` (with
   `number[] | null`) for downstream consumers. No behaviour change.

Verified: pnpm --filter @skytwin/api test → 544 pass; memory-* tests
→ 155 pass. No regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants