feat: milestone 1.5 scope expansion (v0.2.0.0)#5
Merged
Conversation
Adds execution router, whatWouldIDo query API, twin export, proactive evaluator, preference archaeology, cross-domain correlation, undo-with- learning, and golden path E2E test. 163 tests passing across 15 packages. Phase 1: DB migrations (6 tables, 5 column adds), shared types (18 new interfaces) Phase 2: Execution router package, whatWouldIDo + ProactiveEvaluator, twin export + archaeology Phase 3: Briefings API, skill-gaps API, CrossDomainCorrelator, undo feedback flow Phase 4: Golden path E2E integration test covering full decision pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove trust tier self-declaration from /ask request body (Safety Invariant #3) - Replace mock policy evaluator with real PolicyEvaluator via DB adapter (#1) - Use no-op DecisionRepository in /ask to prevent persisting synthetic predictions - Return modifiedRiskAssessment in RoutingDecision so callers get bumped risk (#7) - Change OpenClaw reversibilityGuarantee to 'none' since rollback always fails (#5) - Split migration NOT NULL DEFAULT into safe 3-step pattern for CockroachDB - Add missing FK indexes on skill_gap_log, twin_exports, briefings tables - Make undoReasoning optional for undo feedback to preserve API compatibility - Stop fallback chain on non-completed status to prevent duplicate execution - Fix route ordering: /export/:userId now defined before /:userId in twin router All 164 tests pass. Build clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…back-utils - 43 tests for InferenceEngine (calculateConfidence, detectContradictions, analyzeEvidence, mergeInference, valuesAreConsistent, updateInferencesFromFeedback) - 26 tests for DecisionMaker (generateCandidates per situation type, calculatePatternBoost, calculateTraitAdjustment, shouldAutoExecute per trust tier, zero candidates escalation) - 20 tests for feedback utils (mapFeedbackType, parseUndoReasoning) - 7 tests for rate limiting (checkRateLimit per trust tier, window reset) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1.5 scope expansion: 7 new capabilities, 6 new DB tables, 96 new tests (260 total), 9 safety fixes from pre-landing review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configured by /setup-deploy. Project is pre-deployment (no platform, no deploy workflows). /land-and-deploy will merge-only and skip deploy verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 23, 2026
jayzalowitz
added a commit
that referenced
this pull request
May 5, 2026
Closes 7 of 20 findings from .context/ux-review/FINDINGS.md (the load-bearing ones); polish queued for a follow-up. All visible to non-technical users on first contact with the app. Centralized friendly errors (P0 #4) - New ApiError class with kind discriminant + friendlyMessage field. - "API proxy error" no longer leaks to users; translated to "Can't reach SkyTwin right now. We'll keep trying." - renderApiError(err, {context, retry}) + wireApiRetry give every page the same calm error card with a Try again button. Approvals/Decisions/Audit/Chat use the helper (P0 #5) - Pre-fix Approvals showed "0 waiting / You're all caught up" when API was down — indistinguishable from genuinely empty. Real approvals could be missed. Now: distinct offline state. UUID badge → friendly fallback (P0 #2) - Header user badge + Settings footer used to show raw "11111111-2222-..." when the user record couldn't load. - Now shows "You (1111…)" with userId in tooltip for devs. Connection status as actionable header banner (P1 #12) - Promoted out of bottom-left footer where most users wouldn't see. - Calm yellow banner with animated dot + Retry now button. - Hidden when connected so happy path has no extra chrome. - Respects prefers-reduced-motion. Demo preview card static fallback (P1 #6) - Pre-fix the "Try one — see how it thinks" card was hidden via display:none when /api/v1/demo/info failed. Onboarding step 1 became a wall of text. - Now: card always renders; clicks return pre-canned sample answers with the same visual treatment + "Live preview offline" caveat. Mobile bottom-nav fixes (P0 #3) - Replaced single-letter H/!/D/T/S icons with inline-SVG icons. - Added Chat link (was missing entirely from mobile). - Increased page-content padding-bottom 4rem → 5.5rem so the nav stops overlapping page content (composer hint was hidden under it). Voice mismatch (P1 #13) - Sidebar "My learnings" → "What I've learned", matches page header. Full backend test suite still green across 40 packages. Browser-agent visual verification screenshots in .context/ux-review/14- through 19-. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 7, 2026
jayzalowitz
added a commit
that referenced
this pull request
May 22, 2026
Codex /review on the cumulative #350 diff caught 4 P1 + 2 P2 issues that my prior Claude /review passes (scoped to each new commit) missed. Cross-model agreement was 0% — different scopes catch different things, which is exactly what the merge gate's two-reviewer promise is for. P1 fixes (default-flow blockers): 1. apps/worker/src/index.ts resolveGoogleConfig — required both clientId AND clientSecret. The bundled PKCE flow mints tokens with no clientSecret, so worker logged "credentials not configured; skipping Google connectors" and never processed a single signal on the grandma-grade default install. OAuth worked, twin did nothing. Fix: mirror api oauth.ts three-layer resolve (env → DB → bundled), accept empty clientSecret as the PKCE signal (refreshAccessToken already handles it correctly). Service-manager already injects SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID into worker env via buildChildEnv. 2. bin/skytwin-install — `pnpm db:migrate` ran before DATABASE_URL was exported, so @skytwin/db fell back to localhost:26257. If the user set SKYTWIN_DB_PORT to dodge a collision or `localhost` resolved to ::1 instead of the 127.0.0.1 listener, migrations silently landed on the wrong socket. Build the URL from the same env vars bin/skytwin-db uses. 3. apps/web/public/js/pages/onboarding.js — desktop pendingKey onComplete stored userId + session token + hash but skipped KEY_ONBOARDED, hideWizard(), and skyTwinSetUserId(). Dashboard rendered #/connect-gmail BEHIND the still-visible onboarding modal — sign-in looked stuck, reload reopened first-run. Mirror the tour path's full three-step teardown. 4. apps/web/public/js/pages/connect-gmail.js — final OAuth step used `window.location.href = data.url` which inside Electron's renderer loads accounts.google.com in an embedded UA, rejected as disallowed_useragent. Route through startGoogleSignIn which detects Electron and uses openExternal + pendingKey poll. Plumbed `include` param through getGoogleAuthUrl + startGoogleSignIn so the Gmail scope opt-in survives the routing change. P2 fixes: 5. apps/web/public/js/pages/connect-gmail.js — PUT /api/credentials/ google bootstrap-without-session is a self-hoster edge case. Default bundled-client launch path doesn't reach it. Documented inline + launch-plan, with operator workarounds noted; a proper bootstrap token mechanism is its own scoped change. 6. bin/skytwin-db is_running — fallback path returned true on ANY port listener. A stray postgres / leftover container would make cmd_start skip launching CRDB. New is_crdb_responding() helper runs SELECT 1 to verify the listener speaks CRDB before short-circuiting. Tests: 697 API + 115 worker — all green. No new test files added; fixes either mirror established patterns (#1, #3, #4) or harden bash fallbacks the existing test harness doesn't exercise (#2, #5, #6). GATE: codex re-review pending after CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
May 23, 2026
…ent, Gmail BYO wizard, tier-2 polish (#350) * v0.6.56.0 feat(install): grandma-grade install — native CockroachDB + Docker validation matrix Before this change a non-technical user had to install Docker Desktop (~700MB, EULA, "open it once" gotcha) and a 9.6GB Ollama model just to run the default install. Now the only prerequisites are Node 20+ and pnpm — both installed automatically by bin/skytwin-install. - Native CockroachDB binary (hash-verified against published .sha256sum) installed into ~/.local/share/skytwin/bin/cockroach. New bin/skytwin-db control surface. Docker stays supported as an opt-in via SKYTWIN_USE_DOCKER=true. - Electron desktop app bundles per-platform CRDB binaries (darwin arm64/amd64, linux amd64/arm64, win amd64) via electron-builder extraResources. New CockroachManager spawns the bundled binary against app.getPath('userData')/crdb-data. - Embedded llama.cpp becomes the default LLM fallback when both binary and model are present (gate fixed — old version added the provider on binary alone, breaking dev machines with Homebrew llama-cli but no model). - Docker validation harness (bin/validate-installs ubuntu|debian|fedora) drives install.sh end-to-end in fresh containers and asserts localhost:3200 responds. Caught a pre-existing migration bug (migration 055 used `do` as a CRDB-reserved table alias) that fresh installs hit every time. - GitHub Actions workflow .github/workflows/install-validation.yml runs the matrix on every PR that touches the install pipeline. Tasks: ubuntu PASS, debian PASS, fedora PASS. All 678 API tests + 173 desktop tests pass. README and CHANGELOG updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): post-/review hardenings — single-instance lock, IPv6 binding, graceful drain Addresses findings from the in-PR /review pass against the original v0.6.56.0 commit. Critical fixes: - **Single-instance lock in Electron main.** Without app.requestSingleInstanceLock(), a second launch (double-click in dock, login-item + manual click) raced CockroachManager.start() against the running instance — both saw port-not-bound, both spawned `cockroach start-single-node` against the same data dir, the loser hit CRDB's LOCK file with a cryptic error and the user saw no UI feedback. - **Bind 127.0.0.1 by default, not 'localhost'.** CRDB runs --insecure here; on systems whose /etc/hosts maps localhost to the IPv6 unspecified address (::), the previous default would have broadcast the cluster to the LAN. - **bin/skytwin-db tmpdir cleanup via EXIT trap.** Failed downloads, sha-mismatch errors, and "could not locate cockroach binary" all previously leaked ~70MB /tmp files. Trap fires on every exit path now. High-impact fixes: - **electron-builder extraResources dedup.** Old config shipped all 5 platforms' CRDB binaries (~700MB) inside every artifact. Per-platform mac/win/linux blocks now ship only the host arch's binary. - **CRDB graceful drain via `cockroach node drain` then SIGTERM with 30s timeout.** Previous 5s SIGKILL would have corrupted WAL mid-flush. - **bin/skytwin-db honors XDG_DATA_HOME.** Falls back to ~/.local/share/skytwin per spec when unset. - **SKYTWIN_DB_BINARY_URL_BASE allowlist** (https-only, normal-looking hostname). Stops SSRF / file:// / ftp:// override attempts. SHA-256 verify is still the real defense; this is belt-and-suspenders. - **Per-service logs to $ROOT/.logs/ instead of /tmp/.** systemd PrivateTmp=yes and tmpfiles.d cleanup were wiping the exact logs needed to debug a failed install attempt. - **find -perm portability.** Old `-perm -u+x` is GNU-only; BSD find on macOS rejects the syntax and emits nothing through the pipe, leading to a confusing "Could not locate cockroach binary" on Apple Silicon. Tests: - 6 cockroach-manager tests pass (added one pinning the 127.0.0.1 default) - 174/197 desktop tests pass (24 unrelated skipped) - 678/702 API tests pass (24 unrelated skipped) - Ubuntu Docker validation: PASS (re-ran with all hardenings) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): pin CRDB --log-dir to userData/crdb-logs (post-Copilot) Copilot review #1: waitForReady's timeout error said "Check logs in ${dataDir}/logs" but `cockroach start-single-node` wasn't being invoked with --log-dir, so CRDB defaulted to a platform-dependent location the user couldn't find from the error message. Now we pass --log-dir and mkdir it ahead of time; the error message and reality match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): stricter CRDB readiness check + always ensure DB exists (post-Copilot) Copilot review comment #2 (cockroach-manager.ts:102-110): the previous `isReady()` accepted ANY TCP listener on port 26257 as proof CRDB was up, and the start path skipped `ensureDatabase()` when the port was already bound. Two real failure modes: 1. A non-CRDB process binds 26257 first (test leftover, port collision, unrelated tool). CockroachManager treats it as "already running," never spawns CRDB, the API silently connects to the wrong service. 2. A partial first run left CRDB running but missed the CREATE DATABASE step (e.g. crash between start and ensureDatabase). The next launch sees the port bound, returns early, the API dies with "database skytwin does not exist." New behavior: - `portListening()` is the cheap TCP check. - `isCrdbResponding()` confirms the listener is actually CRDB by running `cockroach sql -e 'SELECT 1'` (2s timeout). Only this verdict is trusted as "running." - `start()` always calls `ensureDatabase()` even when CRDB is already responding — covers the partial-run heal path. Tests unchanged; the new helper is a private detail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: sync CONTRIBUTING + cockroach-architecture + technical-spec with v0.6.56.0 install Post-merge-gate /document-release pass. The codebase changed from "docker-compose up -d cockroachdb" to "bin/skytwin-db install && start && ensure-db" but three docs still had the old instructions: - CONTRIBUTING.md: Getting Started step 3 now uses bin/skytwin-db, with a pointer to bin/validate-installs for fresh-install regression testing before opening a PR. - docs/cockroach-architecture.md: new "Native binary" section as the default; Docker Compose kept as a legacy/opt-in subsection. - docs/technical-spec.md: Getting Started uses bin/skytwin-db; admin UI now documented at 127.0.0.1:26258 (native path) with 8080 noted as the legacy Docker default; DATABASE_URL example updated to 127.0.0.1 with a one-line explanation of why we avoid 'localhost' under --insecure. No code changes. CHANGELOG entry already covers the underlying behavior; this is pure doc-drift reconciliation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): add post-/review fixes subsection for v0.6.56.0 Per CLAUDE.md convention — keeps the original-cut vs review-caught diff readable in the release notes without forcing readers into git log spelunking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.6.57.0 fix(desktop): DATABASE_URL routing + migration cascade + Google PKCE The v0.6.56 desktop bundle technically launched but every Gmail/Calendar query 500'd because none of its 57 migrations had actually applied. Root cause: packages/db ignored DATABASE_URL and connected to the default localhost:26257 — migrations landed on whatever stray docker-compose CRDB happened to be on that port instead of the bundled one. Independently, end users still hit a "create your own Google Cloud OAuth app" wall on sign-in. This release closes both. DATABASE_URL routing - packages/db/src/connection.ts parses DATABASE_URL first, with DATABASE_HOST/PORT/NAME as legacy fallback. Re-evaluates on the first getPool() call so service-manager's env injection (Electron main runs migrations in-process) takes effect. Migration cascade - 023 split into 023 (column add, always safe) + 057 (FK-chain dedupe + unique index, runs after the full schema is in place). Earlier in-23 dedupe failed because it referenced decision_outcomes.execution_plan_id from migration 055. - 046 replaces crdb_internal.force_error() with SELECT 1/0 WHERE …; bundled CRDB v23.2 locks crdb_internal behind allow_unsafe_internals. Desktop bundle assembly - ServiceManager runs migrations in-process via the named up() export instead of spawning child node (defeats 001-initial.ts's CLI guard, asar visibility, and ESM-from-CJS dynamic-import quirks all at once). - pnpm deploy --prod for self-contained api/worker/web bundles (~45 MB each vs ~14 GB from a naive cp -RL of pnpm symlinks). - apps/web Express server spawned alongside api + worker — previous bundle returned ECONNREFUSED on localhost:3200. - Per-installation SESSION_SECRET auto-generated in Electron main, persisted at userData/secrets/session-secret (mode 0o600). - USE_MOCK_IRONCLAW defaults to true in the bundle. - vitest excludes apps/desktop/dist-electron/ so packaged-app test copies don't break the suite. - packages/db/package.json's build script copies *.sql to dist/. - apps/desktop/dist-electron/ added to .gitignore. Google OAuth PKCE - @skytwin/connectors: new generatePkcePair() (RFC 7636 §4), generateAuthUrl() accepts code_challenge, exchangeCode() sends code_verifier instead of client_secret when secret is empty, refreshAccessToken() omits client_secret on refresh in PKCE mode. - apps/api/src/routes/oauth.ts: server-local Map<state,codeVerifier> keeps the verifier off the Google round-trip (consume-on-read so a replayed callback can't redeem twice). Honors a bundle-default SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID so end users skip the "paste your client_id+secret" Setup screen. - apps/desktop/src/service-manager.ts injects the build-time client_id into the spawned API. The constant ships empty in this commit — register a Verified OAuth client of type "Desktop app" in the SkyTwin Google Cloud project and bake the client_id in (or pass at build time) before the first signed release. Tests - 11 new tests in packages/connectors/src/__tests__/google-oauth-pkce.test.ts covering pair generation, S256 challenge derivation, URL params, and both token-exchange + refresh request shapes in PKCE vs confidential modes. - Full suite still green: 3,084 tests across 20+ packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): replace bash-only copy-sql with cross-platform Node script The @skytwin/db build's `tsc && bash -c 'mkdir -p … && cp …'` failed on Windows CI runners: cmd.exe can't run `bash -c` natively, and even with Git's bash on PATH the `2>/dev/null || true` segment was parsed as "'true'' is not recognized as an internal or external command". Replaced with packages/db/scripts/copy-sql.cjs — pure Node, no shell — which walks src/{migrations,schemas} and copies the *.sql files into dist/. Same observable behaviour on macOS/Linux (56 migration files + 1 schema file in dist/migrations and dist/schemas), now also working on Windows where Desktop — Windows (NSIS installer) was failing the @skytwin/db build step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(desktop): bake the registered SkyTwin Desktop OAuth client_id Replaced the empty BUNDLED_GOOGLE_CLIENT_ID placeholder in apps/desktop/src/service-manager.ts with the real client_id from the "SkyTwin Desktop" OAuth client (type: Desktop app) registered in the skytwin-492700 Google Cloud project on 2026-05-22. End users now click "Sign in with Google" in the dashboard and get straight to Google's consent screen — no more "create your own Google Cloud OAuth app and paste your client_id + secret" friction. PKCE binds each auth code to a per-flow code_verifier that the API holds in memory (see apps/api/src/routes/oauth.ts), so the public client_id alone redeems nothing. The redirect lands on http://127.0.0.1:<port>/api/oauth/google/callback and never traverses our infrastructure. Tokens stay on the user's machine, encrypted by credential-vault. Override at build time via SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID env if shipping a forked SkyTwin build that should consent under a different brand. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): note the registered SkyTwin Desktop OAuth client_id The previous CHANGELOG entry for v0.6.57.0 said the bundled client_id was empty and needed to be filled in before release. It's now populated with the real value from the "SkyTwin Desktop" OAuth client registered in skytwin-492700 on 2026-05-22. Updated the changelog so the historical record matches what actually shipped. Also notes the consent-screen state: Testing mode pending Google verification for Gmail/Calendar sensitive scopes — listed test users sign in cleanly; other users see the "unverified app" warning until verification completes (separate effort). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(verification): GitHub Pages site + Google OAuth verification plan Adds the public-facing pages Google requires for OAuth brand verification + sensitive-scope review: docs/index.html Homepage describing SkyTwin's functionality, the Google scopes we request, and why. docs/privacy.html Privacy policy disclosing how Google user data is accessed, used, stored (locally), and the Limited Use compliance statement. docs/terms.html Apache-2.0-aligned terms of service. docs/_config.yml Jekyll config that excludes the existing technical-spec markdown from being served as site pages (they're written for GitHub rendering and would break as Jekyll output). docs/google-verification.md Status tracker + ready-to-paste scope justifications for the OAuth consent-screen review. Documents the three-tier verification path: brand verification (days), sensitive-scope review (weeks), restricted-scope security assessment (months + $$$). Hosted at https://jayzalowitz.github.io/skytwin/ once GitHub Pages is enabled on this repo's `docs/` folder. github.io is auto-verified by Google's brand-verification checks, so no Search Console dance. This commit only ships the content. Wiring the consent-screen homepage/privacy URLs and publishing the app are manual steps in the Google Cloud Console + GitHub Pages settings — see docs/google-verification.md for the punch list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: allow manual workflow_dispatch on Build & Package A docs-only push to a feature branch skips the desktop/mobile jobs because of the path filter on changes, but we still want a way to re-trigger them against the cumulative branch state — for example, when an earlier desktop-touching commit's run was cancelled by a subsequent docs commit (cancel-in-progress concurrency). Manual dispatch via gh workflow run is the lightest-weight escape hatch. No behaviour change for push / pull_request triggers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(oauth): tiered scope policy — Calendar bundled, Gmail BYO Solves the restricted-scope verification problem at $0 cost. Until SkyTwin can fund a ~$15k–$50k annual CASA Tier 2/3 security assessment for the bundled OAuth client, Gmail's restricted scopes are reserved for users who paste their own Google Cloud OAuth credentials into the Setup page. Calendar + identity flow through the bundled client and clear with normal sensitive-scope app review (days–weeks, no fee). What changed in apps/api/src/routes/oauth.ts: - resolveGoogleConfig() now reports a `source` field: 'user-supplied' (env vars or DB-stored from Setup), 'bundled' (SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID), or 'unset'. - New resolveRequestedScopes({source, includeGmail}) computes the scope set + returns a `skipped` list reporting capabilities that were silently dropped. The bundled source drops Gmail; the user-supplied source allows it. - /google/authorize honors ?include=gmail (also accepts ?scopes=gmail and ?gmail=true); requests dropped under the bundled gate return HTTP 412 with code GMAIL_REQUIRES_BYO_CLIENT and a help URL pointing at the user-facing walkthrough at /connect-gmail. - 6 new tests in oauth-scope-tiers.test.ts lock in the gating across every (source, includeGmail) combination. What changed in docs/: - docs/google-verification.md rewritten end-to-end as the staged rollout plan: brand verification status, sensitive-scope review for Calendar, restricted-scope tier for Gmail with the BYO escape hatch, scope justifications ready to paste into Google's submission form, demo-video script, and the issue draft for #351. - docs/connect-gmail.html — five-minute step-by-step walkthrough (create GCP project → enable Gmail API → configure consent screen → create OAuth client → paste into SkyTwin Setup). Linked from docs/index.html. What needs to happen separately: - PR #350 merges → GitHub Pages goes live → brand verification can be submitted. - Calendar-scope review (Tier 1) — submit through the GCP console once Pages serves the privacy policy URL. - Restricted-scope verification for Gmail — tracked in #351, only when SkyTwin can sustain the annual CASA fee. Tests: 684 api tests passing including the 6 new tier-gating ones. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): in-app Gmail-setup wizard at /#/connect-gmail Reframes Gmail BYO as the launch Gmail experience (not a fallback) and ships the matching in-app guided wizard, so users don't have to leave the dashboard to wire up Gmail features that are core SkyTwin value. What landed - apps/web/public/js/pages/connect-gmail.js — 5-step wizard with progress dots, per-step deep links into GCP Console (open in the user's existing browser session — SkyTwin never sees Google credentials), final paste-and-connect form that PUTs to /api/credentials/google and redirects through /api/oauth/google/authorize?include=gmail&userId=… so the consent dance happens against the user's just-saved client. - apps/web/public/js/app.js route '/connect-gmail' → renderConnectGmail. - Singleton-delegator click handler wired with the _listenerWired guard + hash gating, matching the CLAUDE.md frontend convention. - Client-side validation catches the common "you pasted the wrong thing" cases before the server roundtrip (Client ID must end in `.apps.googleusercontent.com`, Client Secret length check). - Step 5 pre-fills any previously-saved creds from /api/credentials/google so a partial setup survives a refresh. Reframing — Gmail BYO is the product, not a workaround - docs/connect-gmail.html: rewritten intro to clarify "this is how every SkyTwin user wires up Gmail today" (not "5-minute setup, one-time" — that read as optional). - docs/google-verification.md: Tier 2 section now states "this is the launch Gmail experience, not a fallback" and links the in-app wizard alongside the public-web mirror. - CHANGELOG: same reframe; mentions the wizard explicitly. Help-URL routing - The 412 `GMAIL_REQUIRES_BYO_CLIENT` response from /api/oauth/google/authorize?include=gmail now carries both `help: '#/connect-gmail'` (in-app SPA route) and `docs: 'https://jayzalowitz.github.io/skytwin/connect-gmail.html'` (public-web mirror) so callers in either context can route the user to the right surface. Why this and not a SkyTwin-driven embedded BrowserWindow Earlier design sketch had SkyTwin opening child BrowserWindows driving the GCP Console with a sidebar walkthrough. Killed on reflection: it would require the user to sign into Google inside a SkyTwin-managed browser, which captures the session and credentials in SkyTwin's process memory — exactly the threat model BYO is designed to avoid. The wizard now opens each GCP Console URL in the user's *own* default browser (`target="_blank"` in dashboard context; Electron's setWindowOpenHandler routes the same anchor to the OS browser in desktop context — already wired via apps/desktop/src/main.ts's open-external IPC). User clicks the Google buttons themselves; SkyTwin only navigates the wizard forward. Tests - 6 scope-tier tests still passing (resolveRequestedScopes contract unchanged). - Full api suite green (684 passing, 24 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs+web: launch plan + dashboard Gmail follow-up CTA Two pieces of forward-pushing work: 1. docs/launch-plan.md — the real meaningful roadmap from "code in a feature branch" to "grandma can download the app." Three tiers: Tier 1 = launch blockers (PR merge, brand verification, code signing, demo video, release tag, README rewrite). Tier 2 = first- month polish (auto-update, PKCE store in DB, onboarding deep-link, sample-profile polish). Tier 3 = strategic / post-launch (CASA assessment for Gmail #351, mobile stores, hosted variant). Each Tier 1 item names the dependency (purchase / review / merge) and the owner (us vs. Google). Costs are itemised: $99 Apple + ~$400 Windows EV cert + ~$15 domain = $500–$1000/year recurring to start. CASA assessment is the deferred $15k–$50k sitting behind a usage trigger. "What is explicitly NOT in launch scope" section names the tempting Tier-3 items (federation sync, MCP marketplace, hosted product) so they don't crowd out the boring Tier-1 work. 2. dashboard Gmail follow-up CTA — apps/web/public/js/pages/dashboard-view.js gets a new renderConnectGmailHero() card that surfaces immediately after the bundled Google sign-in completes (Calendar + identity granted, Gmail scopes absent). Links to the in-app wizard at /#/connect-gmail with the "Why is this step needed?" external doc alongside. Without this nudge users would finish bundled sign-in, look at an empty Approvals queue, and not know SkyTwin's inbox features need a second 5-minute step. Three-line check on Gmail state: card returns '' when (a) tour mode is active, (b) Google isn't connected yet (the existing ConnectGoogleHero owns that state), or (c) Gmail scopes are already present on the OAuth token. Driven by the `scopes` array the /oauth/google/status endpoint already returns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(oauth): tier 2 launch polish — PKCE DB store + onboarding deep-link Three Tier-2 launch-plan items in one PR, each isolated to its own subsystem: 2.2 PKCE verifier store now lives in CRDB (`oauth_pkce_pending` table + `oauthPkcePendingRepository`). A desktop restart between /authorize and /callback no longer drops the verifier and breaks sign-in. `consume()` is a single DELETE...RETURNING so the same replay- protection property survives the move off the in-memory Map. 2.3 OAuth /authorize accepts a whitelisted `next=connect-gmail` deep- link. After Google consent the onboarding wizard lands the user on /#/connect-gmail (with a "Calendar connected — now let's hook up Gmail" banner) instead of dropping them on the dashboard root and making them discover the follow-up CTA card. Free-form `next` URLs are explicitly NOT accepted — that would be an open-redirect; the whitelist is the security boundary. 2.4 Unset bundled client_id now bounces the user into the same connect- gmail wizard instead of showing a generic error toast. The 503 is tagged with `code: 'NO_GOOGLE_CLIENT_CONFIGURED'`; ApiError plumbs structured `code`/`help`/`docs` through to the dashboard, and the onboarding wizard branches on the code to route the user. The connect-gmail wizard's final OAuth call uses ?newUser=true when no userId is in localStorage so brand-new onboarding users finish the flow. Tests: 5 new for the PKCE repository, 5 new for the next= state round- trip (HMAC tampering breaks signature verification, unknown next= drops to null, etc.). All 689 API tests + 295 DB tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(oauth): post-/review fixes for tier-2 polish - parseSignedState: use hasOwnProperty.call on NEXT_HASH_ROUTES instead of bracket lookup so a `next=constructor` (or __proto__, toString, …) tag can't reach the inherited Object property and slip past the truthy check. New test loops the four common prototype keys and asserts nextHash stays null. Today's only consumer would have rendered a stringified function into the redirect URL — broken, not exploitable, but worth closing. - Onboarding NO_GOOGLE_CLIENT_CONFIGURED handler now re-enables the "Continue with Google" button before changing window.location.hash, so a synchronous re-render can't leave the button stuck on "Redirecting…". - Operator note in the PKCE-store comment block clarifying that migration 058 must run before the API serves traffic. We deliberately do NOT fall back to an in-memory Map — that would defeat the cross-restart guarantee the move to DB is meant to provide. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(oauth): desktop new-user flow auto-advances via pendingKey poll The desktop newUser sign-in opens Google in the system browser; the callback fires there and there's no IPC back to the Electron app. The old code re-enabled the wizard button with a "return here and continue" error and the user had to manually click again — a real grandma-blocker the existing TODO admitted to ("the web flow advances via redirect, desktop currently does not"). This closes the gap: - Wizard generates a UUIDv4 pendingKey client-side (crypto.randomUUID). - /authorize validates UUID shape (anti-SQL-injection / anti-traversal, not crypto-grade) and threads it through HMAC-signed state as `key=<uuid>`. parseSignedState re-validates on read. - /callback writes resulting userId + accountEmail + scopes + nextHash to a new oauth_pending_signin table (migration 059) keyed by the pendingKey, then renders the existing "close this tab" HTML. - New GET /api/oauth/google/pending/:key endpoint (public — the unguessable random key IS the authorization). Consume-on-read (DELETE...RETURNING) so a leaked key can only be redeemed once. Mirrors the existing pollUntilConnected pattern. - google-signin.js polls the new endpoint when (desktop && newUser); fires onComplete with { userId, nextHash } on success. - Onboarding wizard's onComplete sets userId in localStorage and routes to the deep-link target — auto-advance, no second click. Tests: 6 new for oauthPendingSigninRepository (replay protection, expiry defence-in-depth, scope-shape coercion); 4 new for the isValidPendingKey gate + key= state encoding (rejecting SQL injection, path traversal, wrong-version UUIDs, uppercase, etc.). All 694 API tests + 301 DB tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * security(oauth): harden /pending/:key — mint session, rate-limit, sql-filter expiry Post-/review fixes for the desktop pendingKey endpoint: 1. CRITICAL — don't return bare userId. The pre-existing POST /api/sessions accepts any userId from a localhost caller and returns a 7-day session token (unchanged by this PR — it's been the QR-pairing trust model). Returning userId from /pending/:key would chain those two endpoints: leaked key → consume → forge a session as that user. Instead, /pending/:key now mints the session itself in-process and returns the token alongside the userId. The pendingKey IS the credential; consume-on-read makes it one-shot. Client stashes the token under KEY_SESSION_TOKEN so subsequent API calls flow through Authorization: Bearer exactly like QR pairing. 2. MEDIUM — per-IP rate limit on /pending/:key. Without it, an attacker who exfiltrated a partial key (truncated log, side channel) could enumerate the remainder at line rate. Also a basic DoS vector. Wraps the same checkNewUserRateLimit() that already gates ?newUser=true. 3. MEDIUM — silent failure on remember() now logs with userId + pendingKey so the operator can correlate a wizard timeout with a real DB failure rather than chasing a phantom Google issue. 4. MEDIUM — consume() WHERE now filters by expires_at >= $now in SQL so a poll arriving past TTL doesn't delete the row before sweepExpired() reclaims it. Without this, network jitter on the client could destroy a row mid-handoff and the legitimate wizard would 404 even though the OAuth round-trip succeeded. 8. NIT — migration comment "128-bit" → "122-bit / UUIDv4" to stop overstating entropy. All 694 API tests + 301 DB tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(oauth): atomicity + dedicated poll bucket for /pending — closes 2nd /review Second /review pass on the security-hardening commit (9509269) flagged two real regressions I introduced: 1. Atomicity gap. consume() already DELETEd the pending row before sessionRepository.create() ran; a transient CRDB failure or missing-table error mid-call would strand the user with no session AND no recoverable pending row. The poll loop would then silently exhaust its 5-min budget. Now the consume + session INSERT happen in a single withTransaction() — if the INSERT throws, the DELETE rolls back and the user can retry. 2. Rate-limiter starvation. /pending/:key was sharing the checkNewUserRateLimit bucket (5 hits / 60 s). The wizard polls every 2 s for 5 minutes = 30 hits/min from the same IP — would 429 after ~10 seconds and quietly time out at 5 min, exactly re-introducing the grandma-blocker this endpoint exists to fix. New checkPendingPollRateLimit() backed by its own Map; capped at 120/min so a normal poll loop runs comfortably with headroom for retries and jitter. New test covers the cross-starvation case (filling the authorize bucket leaves the poll bucket untouched). 3. Truncate pendingKey to 8-char prefix in the failure log (the key is 5-min-lived but log aggregators may index it longer). All 697 API tests + 301 DB tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): address Copilot findings on commit fbfc11e Five real findings on the latest commit; the sixth (cockroach-manager early-return skipping ensureDatabase) was already addressed in e3c3951 and is just re-flagged on a stale view — the SQL probe and unconditional ensureDatabase call are in the code today. 1. parseDatabaseUrl ssl handling. Only `sslmode=disable` was mapped to `false`; everything else returned undefined and fell back to the env default (false). A `DATABASE_URL=…?sslmode=require` against a secure CRDB cluster would silently connect over plaintext. New sslConfigForSslmode() maps disable/require/verify-ca/verify-full to the corresponding pg.PoolConfig.ssl shape; unknown values still fall through (typo-tolerant against env override). 2. cockroach-manager stop() — proc.kill('SIGTERM') after gracefulQuit() was unconditional; if the drain already caused CRDB to exit, the SIGTERM throws ESRCH and a clean shutdown becomes an exception. Now we check `proc.exitCode === null` first and wrap in try/catch anyway as belt-and-suspenders. Also fixed the stale "cockroach quit" comment — the implementation uses `cockroach node drain`. 3. apps/desktop/package.json scripts had `tsc &&electron` (no space after &&) on five lines and `tsc && electron-builder` (with space) on the same lines later. POSIX parses `&&token` correctly so this isn't a runtime bug, but the inconsistency is real. Normalized. 4. JSDoc for startGoogleSignIn's `onComplete` callback omitted the sessionToken field that pollUntilPendingResolved emits. Updated the type signature + docstring so future callers see the contract. 5. build-single-binary.sh bash dependency for the Windows package script — documented inline. Windows GitHub runners ship Git Bash so CI works; local Windows devs need Git Bash / WSL / MSYS. A future Node port would remove the constraint but no CI gates on it today. All 301 DB tests + 175 desktop tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): promote tour-mode CTA on welcome screen (launch-plan 2.6) Welcome-screen tour link was a tiny gray text link ("Explore with a sample profile instead →", 0.82rem, --text-muted) — easy to miss next to three large CTA buttons. Promoted to a btn-outline btn-lg card with an "or" horizontal divider above it, matching the visual rhythm of the existing choices but clearly framed as the alternative no-sign-in path. Conditional-on-demo-availability is preserved: CTA + divider both live inside the same #onb-tour-row div, both reveal when fetchDemoInfo() returns available=true. Non-localhost / non-dev-bypass deployments still get a clean welcome screen with no broken tour link. No behavioural change to /api/v1/demo/{info,preview} or skyTwinExitTour. CHANGELOG (Unreleased) + launch-plan §2.6 (now "partial — Unreleased") + README "first 60 seconds" walkthrough updated to match the new label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * a11y(web): expose "or" semantic on tour-CTA divider (post-/review) The /review pass on 8c882ad flagged that the "or" divider used aria-hidden="true" on the entire wrapper div, hiding both the decorative lines AND the semantic "or" word from screen readers. Result: AT users go from the third primary CTA straight to "Try with a sample profile" without the alternative-path framing that's visually obvious. Fix: move aria-hidden to just the three inner spans (two lines + text), promote the wrapper to role="separator" with aria-label="or" so the relationship between the two button groups is announced once, correctly, without the decorative SVG noise. Visual identical. No behavioural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(desktop): rename @skytwin/desktop → skytwin-desktop (Windows NSIS) Windows NSIS build was failing after 2.5h with: File: failed creating mmap of "...@skytwindesktop-0.3.0-x64.nsis.7z" Error in macro x64_app_files on macroline 1 Error in macro extractEmbeddedAppPackage on macroline 8 !include: error in script: "installSection.nsh" on line 66 Error in script "<stdin>" on line 199 -- aborting creation process Root cause: electron-builder derives the intermediate .nsis.7z filename from package.json `name`. The npm scoped name `@skytwin/desktop` gets flattened to `@skytwindesktop` (only the `/` is stripped, not the `@`), so the .nsis.7z lives at `...\@skytwindesktop-0.3.0-x64.nsis.7z`. NSIS's makensis trips on @-prefixed paths in the File include macro and fails to mmap the archive even though the file was written successfully. Confirmed by the same-bundle pattern on this PR's prior CI run: - macOS (DMG): same bundle, same @-prefixed intermediate — ✓ packaged - Linux (AppImage/deb/rpm): same — ✓ packaged - Windows (NSIS): same — ✗ mmap of @-filename DMG and AppImage don't use makensis, so they sail through. Fix: drop the `@scope/` prefix from the desktop package's npm name. It's a leaf consumer (no other workspace package imports from it — verified with grep), and pnpm-lock.yaml keys workspace entries by directory path, not by npm name, so the lockfile is unchanged. `pnpm install --frozen-lockfile` passes locally. Updates: - apps/desktop/package.json — name field + the embedded help-text in the placeholder `build` script. - Root package.json — 6 desktop:* scripts that use `pnpm --filter`. - .github/workflows/build.yml — mac/win/linux package steps. - .github/workflows/release.yml — build + 3 publish-always steps. - apps/desktop/scripts/build-single-binary.sh — help-text echo. - apps/desktop/src/headless.ts — invocation comment. The package directory + workspace location are unchanged; only the public `name` string flips. CHANGELOG references stay as-is (they're historical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): codex /review findings — worker, install, onboarding, OAuth Codex /review on the cumulative #350 diff caught 4 P1 + 2 P2 issues that my prior Claude /review passes (scoped to each new commit) missed. Cross-model agreement was 0% — different scopes catch different things, which is exactly what the merge gate's two-reviewer promise is for. P1 fixes (default-flow blockers): 1. apps/worker/src/index.ts resolveGoogleConfig — required both clientId AND clientSecret. The bundled PKCE flow mints tokens with no clientSecret, so worker logged "credentials not configured; skipping Google connectors" and never processed a single signal on the grandma-grade default install. OAuth worked, twin did nothing. Fix: mirror api oauth.ts three-layer resolve (env → DB → bundled), accept empty clientSecret as the PKCE signal (refreshAccessToken already handles it correctly). Service-manager already injects SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID into worker env via buildChildEnv. 2. bin/skytwin-install — `pnpm db:migrate` ran before DATABASE_URL was exported, so @skytwin/db fell back to localhost:26257. If the user set SKYTWIN_DB_PORT to dodge a collision or `localhost` resolved to ::1 instead of the 127.0.0.1 listener, migrations silently landed on the wrong socket. Build the URL from the same env vars bin/skytwin-db uses. 3. apps/web/public/js/pages/onboarding.js — desktop pendingKey onComplete stored userId + session token + hash but skipped KEY_ONBOARDED, hideWizard(), and skyTwinSetUserId(). Dashboard rendered #/connect-gmail BEHIND the still-visible onboarding modal — sign-in looked stuck, reload reopened first-run. Mirror the tour path's full three-step teardown. 4. apps/web/public/js/pages/connect-gmail.js — final OAuth step used `window.location.href = data.url` which inside Electron's renderer loads accounts.google.com in an embedded UA, rejected as disallowed_useragent. Route through startGoogleSignIn which detects Electron and uses openExternal + pendingKey poll. Plumbed `include` param through getGoogleAuthUrl + startGoogleSignIn so the Gmail scope opt-in survives the routing change. P2 fixes: 5. apps/web/public/js/pages/connect-gmail.js — PUT /api/credentials/ google bootstrap-without-session is a self-hoster edge case. Default bundled-client launch path doesn't reach it. Documented inline + launch-plan, with operator workarounds noted; a proper bootstrap token mechanism is its own scoped change. 6. bin/skytwin-db is_running — fallback path returned true on ANY port listener. A stray postgres / leftover container would make cmd_start skip launching CRDB. New is_crdb_responding() helper runs SELECT 1 to verify the listener speaks CRDB before short-circuiting. Tests: 697 API + 115 worker — all green. No new test files added; fixes either mirror established patterns (#1, #3, #4) or harden bash fallbacks the existing test harness doesn't exercise (#2, #5, #6). GATE: codex re-review pending after CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(windows): exclude workspace from Defender to stop makensis mmap fail Two consecutive Windows CI runs failed at the same step with the same error, on different commits and different filenames: - 71bdee5: makensis File: failed creating mmap of "...@skytwindesktop-0.3.0-x64.nsis.7z" - 3a25132: makensis File: failed creating mmap of "...skytwin-desktop-0.3.0-x64.nsis.7z" The rename from @skytwin/desktop -> skytwin-desktop in 2da3549 removed the leading @ from the filename. The error still reproduced verbatim against the new name, so the @ theory was wrong. Actual root cause: makensis is a 32-bit process that opens the freshly- written .nsis.7z intermediate via mmap to embed it into the final installer.exe. On the GitHub Actions windows-latest runner, Windows Defender's real-time scanner opens that same .7z to scan it the moment it's written. Defender's open holds a sharing lock; makensis's mmap call races against it and returns failure. This is documented in electron-userland/electron-builder#6107. Fix: add an ExclusionPath for the build workspace + electron-builder cache dirs before the Package step runs. Defender stays active on the runner overall (so signtool's signing pass on cockroach.exe / SkyTwin.exe still gets scanned), but the staging dirs that NSIS reads back are out of bounds. Uses Add-MpPreference -ExclusionPath which only requires the admin shell the runner already has, no policy changes. Why not disable Defender entirely: - Disabling RT scanning leaves the signtool steps unprotected, and we sign two .exe files (cockroach.exe + SkyTwin.exe) before makensis runs. - Exclusion is the surgical fix; disable is the sledgehammer. Why not nsis-web (download payload at install time): - That target requires a release URL the payload is hosted at; CI runs don't tag releases. - Scope creep for fixing a CI race condition. Expected outcome: Windows job clears the makensis step on first try (previously failed at ~2h26 to 2h36 with same error). If it still fails post-exclusion, the next diagnosis target is bundle size vs 32-bit makensis address space, but exclusion is overwhelmingly the most likely cause given the timing reproducibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(ci): tarball embedded apps + nsis tuning to unstick Windows Three Windows CI runs failed in a row, each at the same makensis mmap step ~2.5h into the job. Three problems were stacked: 1. The .nsis.7z intermediate was racing Windows Defender's RT scanner. Defender holds a sharing handle while it scans the freshly-written .7z; 32-bit makensis mmap-opens the same file and gets ESHARING surfaced as `File: failed creating mmap of`. 2. electron-builder's win-unpacked copy step was spending an hour just writing the ~10,000 loose files in dist/embedded/{api,worker,web}/ (pnpm-deploy node_modules trees, multiplied by 3 apps). NTFS small-file throughput on the GitHub Actions runner is much worse than APFS / ext4 — the macOS+Linux desktop builds finished in 9 and 4 minutes against the same input. 3. `differentialPackage: true` (electron-builder default) was running an extra .blockmap generation pass on the already-slow .nsis.7z. The blockmap is for electron-updater delta downloads we don't ship yet (gated on §1.5 release tag + signing certs). Already-shipped Defender exclusion (a4b2e09) addresses #1. This commit addresses #2 and #3: #2 fix: `apps/desktop/scripts/build-single-binary.sh` now tars the embedded api/worker/web trees into a single `apps.tar.gz` after the pnpm-deploy + strip-self-symlinks step. The extraResources filter in `apps/desktop/package.json` shrinks to {apps.tar.gz, bundle-manifest. json}. `apps/desktop/src/service-manager.ts` gains an `ensureEmbeddedRoot()` method that extracts the tarball to `<userData>/embedded/` on first launch, gated by a `.version` marker so subsequent launches (and post-upgrade launches) handle the extract correctly. `startApi`, `startWeb`, `startWorker`, and `runMigrations` all consume the extracted path. #3 fix: `nsis.differentialPackage: false` + `compression: "normal"` pinned explicitly so a future electron-builder bump can't silently switch to LZMA-max and regress build time. Trade-offs documented in CHANGELOG. User-facing first-launch latency gains ~5-15s for the one-time tar extract; subsequent launches see no change (sentinel-file existence check is microsecond). Installer size shrinks by the tar.gz compression ratio (~30-40% on node_modules). Tests: - platform-utils.test.ts updated for the new extraResources shape (apps.tar.gz + bundle-manifest.json, no more api/**/* etc.) + negative assertions that the old loose patterns are gone. - 175 desktop tests pass. - tsc --noEmit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(ci): cache CRDB binaries + parallelize the download loop Two stacked wins on top of the already-shipped tarball + Defender exclusion work: 1. CRDB binary cache. The 5-platform CRDB set (~700MB compressed, ~140MB per platform) was being re-downloaded on every desktop CI job because actions/cache@v4 was only pointed at electron and electron-builder caches. Three desktop jobs x 5 binaries x ~10s each = ~150s spent on cold-cache work that's identical between runs. Cache path now includes ~/.cache/skytwin/crdb-binaries (where bin/skytwin-db's download helper stages the archives) and the cache key hashes build-single-binary.sh too so a SKYTWIN_CRDB_VERSION bump invalidates correctly. 2. Parallel downloads in build-single-binary.sh. The `for entry in CRDB_TARGETS; bundle_crdb_binary "$entry"; done` loop blocks on each platform sequentially even though every call has independent inputs and outputs. Backgrounded with `&` + reaped via `wait $pid` in a follow-up loop. Cold-cache wall time drops from ~25-50s to ~5-10s (limited by the slowest single download). Warm cache short-circuits at the early `already bundled, skipping` return so parallelism is a no-op there. `set -e` alone doesn't propagate failures from backgrounded functions, so an explicit `crdb_failed` flag walks the wait results and `exit 1`s if any child returned non-zero. Without that, a corrupt download (sha mismatch -> exit 3) would silently leave the binary missing and electron-builder would fail later with a confusing "missing extraResources" error. Net expected savings: roughly 1-3 minutes off each desktop build on warm cache (the dominant case after the first run), ~30s on cold cache. Doesn't change the long-pole job (Linux at 14m, three output formats), so the end-to-end wall time stays around 17 min — but the runtime spent on redundant network is gone. Tests: bash -n clean, workflow YAML valid (would fail at GH parse if not). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
May 23, 2026
…t fixes (#352) * v0.6.56.0 feat(install): grandma-grade install — native CockroachDB + Docker validation matrix Before this change a non-technical user had to install Docker Desktop (~700MB, EULA, "open it once" gotcha) and a 9.6GB Ollama model just to run the default install. Now the only prerequisites are Node 20+ and pnpm — both installed automatically by bin/skytwin-install. - Native CockroachDB binary (hash-verified against published .sha256sum) installed into ~/.local/share/skytwin/bin/cockroach. New bin/skytwin-db control surface. Docker stays supported as an opt-in via SKYTWIN_USE_DOCKER=true. - Electron desktop app bundles per-platform CRDB binaries (darwin arm64/amd64, linux amd64/arm64, win amd64) via electron-builder extraResources. New CockroachManager spawns the bundled binary against app.getPath('userData')/crdb-data. - Embedded llama.cpp becomes the default LLM fallback when both binary and model are present (gate fixed — old version added the provider on binary alone, breaking dev machines with Homebrew llama-cli but no model). - Docker validation harness (bin/validate-installs ubuntu|debian|fedora) drives install.sh end-to-end in fresh containers and asserts localhost:3200 responds. Caught a pre-existing migration bug (migration 055 used `do` as a CRDB-reserved table alias) that fresh installs hit every time. - GitHub Actions workflow .github/workflows/install-validation.yml runs the matrix on every PR that touches the install pipeline. Tasks: ubuntu PASS, debian PASS, fedora PASS. All 678 API tests + 173 desktop tests pass. README and CHANGELOG updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): post-/review hardenings — single-instance lock, IPv6 binding, graceful drain Addresses findings from the in-PR /review pass against the original v0.6.56.0 commit. Critical fixes: - **Single-instance lock in Electron main.** Without app.requestSingleInstanceLock(), a second launch (double-click in dock, login-item + manual click) raced CockroachManager.start() against the running instance — both saw port-not-bound, both spawned `cockroach start-single-node` against the same data dir, the loser hit CRDB's LOCK file with a cryptic error and the user saw no UI feedback. - **Bind 127.0.0.1 by default, not 'localhost'.** CRDB runs --insecure here; on systems whose /etc/hosts maps localhost to the IPv6 unspecified address (::), the previous default would have broadcast the cluster to the LAN. - **bin/skytwin-db tmpdir cleanup via EXIT trap.** Failed downloads, sha-mismatch errors, and "could not locate cockroach binary" all previously leaked ~70MB /tmp files. Trap fires on every exit path now. High-impact fixes: - **electron-builder extraResources dedup.** Old config shipped all 5 platforms' CRDB binaries (~700MB) inside every artifact. Per-platform mac/win/linux blocks now ship only the host arch's binary. - **CRDB graceful drain via `cockroach node drain` then SIGTERM with 30s timeout.** Previous 5s SIGKILL would have corrupted WAL mid-flush. - **bin/skytwin-db honors XDG_DATA_HOME.** Falls back to ~/.local/share/skytwin per spec when unset. - **SKYTWIN_DB_BINARY_URL_BASE allowlist** (https-only, normal-looking hostname). Stops SSRF / file:// / ftp:// override attempts. SHA-256 verify is still the real defense; this is belt-and-suspenders. - **Per-service logs to $ROOT/.logs/ instead of /tmp/.** systemd PrivateTmp=yes and tmpfiles.d cleanup were wiping the exact logs needed to debug a failed install attempt. - **find -perm portability.** Old `-perm -u+x` is GNU-only; BSD find on macOS rejects the syntax and emits nothing through the pipe, leading to a confusing "Could not locate cockroach binary" on Apple Silicon. Tests: - 6 cockroach-manager tests pass (added one pinning the 127.0.0.1 default) - 174/197 desktop tests pass (24 unrelated skipped) - 678/702 API tests pass (24 unrelated skipped) - Ubuntu Docker validation: PASS (re-ran with all hardenings) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): pin CRDB --log-dir to userData/crdb-logs (post-Copilot) Copilot review #1: waitForReady's timeout error said "Check logs in ${dataDir}/logs" but `cockroach start-single-node` wasn't being invoked with --log-dir, so CRDB defaulted to a platform-dependent location the user couldn't find from the error message. Now we pass --log-dir and mkdir it ahead of time; the error message and reality match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): stricter CRDB readiness check + always ensure DB exists (post-Copilot) Copilot review comment #2 (cockroach-manager.ts:102-110): the previous `isReady()` accepted ANY TCP listener on port 26257 as proof CRDB was up, and the start path skipped `ensureDatabase()` when the port was already bound. Two real failure modes: 1. A non-CRDB process binds 26257 first (test leftover, port collision, unrelated tool). CockroachManager treats it as "already running," never spawns CRDB, the API silently connects to the wrong service. 2. A partial first run left CRDB running but missed the CREATE DATABASE step (e.g. crash between start and ensureDatabase). The next launch sees the port bound, returns early, the API dies with "database skytwin does not exist." New behavior: - `portListening()` is the cheap TCP check. - `isCrdbResponding()` confirms the listener is actually CRDB by running `cockroach sql -e 'SELECT 1'` (2s timeout). Only this verdict is trusted as "running." - `start()` always calls `ensureDatabase()` even when CRDB is already responding — covers the partial-run heal path. Tests unchanged; the new helper is a private detail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: sync CONTRIBUTING + cockroach-architecture + technical-spec with v0.6.56.0 install Post-merge-gate /document-release pass. The codebase changed from "docker-compose up -d cockroachdb" to "bin/skytwin-db install && start && ensure-db" but three docs still had the old instructions: - CONTRIBUTING.md: Getting Started step 3 now uses bin/skytwin-db, with a pointer to bin/validate-installs for fresh-install regression testing before opening a PR. - docs/cockroach-architecture.md: new "Native binary" section as the default; Docker Compose kept as a legacy/opt-in subsection. - docs/technical-spec.md: Getting Started uses bin/skytwin-db; admin UI now documented at 127.0.0.1:26258 (native path) with 8080 noted as the legacy Docker default; DATABASE_URL example updated to 127.0.0.1 with a one-line explanation of why we avoid 'localhost' under --insecure. No code changes. CHANGELOG entry already covers the underlying behavior; this is pure doc-drift reconciliation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): add post-/review fixes subsection for v0.6.56.0 Per CLAUDE.md convention — keeps the original-cut vs review-caught diff readable in the release notes without forcing readers into git log spelunking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.6.57.0 fix(desktop): DATABASE_URL routing + migration cascade + Google PKCE The v0.6.56 desktop bundle technically launched but every Gmail/Calendar query 500'd because none of its 57 migrations had actually applied. Root cause: packages/db ignored DATABASE_URL and connected to the default localhost:26257 — migrations landed on whatever stray docker-compose CRDB happened to be on that port instead of the bundled one. Independently, end users still hit a "create your own Google Cloud OAuth app" wall on sign-in. This release closes both. DATABASE_URL routing - packages/db/src/connection.ts parses DATABASE_URL first, with DATABASE_HOST/PORT/NAME as legacy fallback. Re-evaluates on the first getPool() call so service-manager's env injection (Electron main runs migrations in-process) takes effect. Migration cascade - 023 split into 023 (column add, always safe) + 057 (FK-chain dedupe + unique index, runs after the full schema is in place). Earlier in-23 dedupe failed because it referenced decision_outcomes.execution_plan_id from migration 055. - 046 replaces crdb_internal.force_error() with SELECT 1/0 WHERE …; bundled CRDB v23.2 locks crdb_internal behind allow_unsafe_internals. Desktop bundle assembly - ServiceManager runs migrations in-process via the named up() export instead of spawning child node (defeats 001-initial.ts's CLI guard, asar visibility, and ESM-from-CJS dynamic-import quirks all at once). - pnpm deploy --prod for self-contained api/worker/web bundles (~45 MB each vs ~14 GB from a naive cp -RL of pnpm symlinks). - apps/web Express server spawned alongside api + worker — previous bundle returned ECONNREFUSED on localhost:3200. - Per-installation SESSION_SECRET auto-generated in Electron main, persisted at userData/secrets/session-secret (mode 0o600). - USE_MOCK_IRONCLAW defaults to true in the bundle. - vitest excludes apps/desktop/dist-electron/ so packaged-app test copies don't break the suite. - packages/db/package.json's build script copies *.sql to dist/. - apps/desktop/dist-electron/ added to .gitignore. Google OAuth PKCE - @skytwin/connectors: new generatePkcePair() (RFC 7636 §4), generateAuthUrl() accepts code_challenge, exchangeCode() sends code_verifier instead of client_secret when secret is empty, refreshAccessToken() omits client_secret on refresh in PKCE mode. - apps/api/src/routes/oauth.ts: server-local Map<state,codeVerifier> keeps the verifier off the Google round-trip (consume-on-read so a replayed callback can't redeem twice). Honors a bundle-default SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID so end users skip the "paste your client_id+secret" Setup screen. - apps/desktop/src/service-manager.ts injects the build-time client_id into the spawned API. The constant ships empty in this commit — register a Verified OAuth client of type "Desktop app" in the SkyTwin Google Cloud project and bake the client_id in (or pass at build time) before the first signed release. Tests - 11 new tests in packages/connectors/src/__tests__/google-oauth-pkce.test.ts covering pair generation, S256 challenge derivation, URL params, and both token-exchange + refresh request shapes in PKCE vs confidential modes. - Full suite still green: 3,084 tests across 20+ packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): replace bash-only copy-sql with cross-platform Node script The @skytwin/db build's `tsc && bash -c 'mkdir -p … && cp …'` failed on Windows CI runners: cmd.exe can't run `bash -c` natively, and even with Git's bash on PATH the `2>/dev/null || true` segment was parsed as "'true'' is not recognized as an internal or external command". Replaced with packages/db/scripts/copy-sql.cjs — pure Node, no shell — which walks src/{migrations,schemas} and copies the *.sql files into dist/. Same observable behaviour on macOS/Linux (56 migration files + 1 schema file in dist/migrations and dist/schemas), now also working on Windows where Desktop — Windows (NSIS installer) was failing the @skytwin/db build step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(desktop): bake the registered SkyTwin Desktop OAuth client_id Replaced the empty BUNDLED_GOOGLE_CLIENT_ID placeholder in apps/desktop/src/service-manager.ts with the real client_id from the "SkyTwin Desktop" OAuth client (type: Desktop app) registered in the skytwin-492700 Google Cloud project on 2026-05-22. End users now click "Sign in with Google" in the dashboard and get straight to Google's consent screen — no more "create your own Google Cloud OAuth app and paste your client_id + secret" friction. PKCE binds each auth code to a per-flow code_verifier that the API holds in memory (see apps/api/src/routes/oauth.ts), so the public client_id alone redeems nothing. The redirect lands on http://127.0.0.1:<port>/api/oauth/google/callback and never traverses our infrastructure. Tokens stay on the user's machine, encrypted by credential-vault. Override at build time via SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID env if shipping a forked SkyTwin build that should consent under a different brand. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(changelog): note the registered SkyTwin Desktop OAuth client_id The previous CHANGELOG entry for v0.6.57.0 said the bundled client_id was empty and needed to be filled in before release. It's now populated with the real value from the "SkyTwin Desktop" OAuth client registered in skytwin-492700 on 2026-05-22. Updated the changelog so the historical record matches what actually shipped. Also notes the consent-screen state: Testing mode pending Google verification for Gmail/Calendar sensitive scopes — listed test users sign in cleanly; other users see the "unverified app" warning until verification completes (separate effort). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(verification): GitHub Pages site + Google OAuth verification plan Adds the public-facing pages Google requires for OAuth brand verification + sensitive-scope review: docs/index.html Homepage describing SkyTwin's functionality, the Google scopes we request, and why. docs/privacy.html Privacy policy disclosing how Google user data is accessed, used, stored (locally), and the Limited Use compliance statement. docs/terms.html Apache-2.0-aligned terms of service. docs/_config.yml Jekyll config that excludes the existing technical-spec markdown from being served as site pages (they're written for GitHub rendering and would break as Jekyll output). docs/google-verification.md Status tracker + ready-to-paste scope justifications for the OAuth consent-screen review. Documents the three-tier verification path: brand verification (days), sensitive-scope review (weeks), restricted-scope security assessment (months + $$$). Hosted at https://jayzalowitz.github.io/skytwin/ once GitHub Pages is enabled on this repo's `docs/` folder. github.io is auto-verified by Google's brand-verification checks, so no Search Console dance. This commit only ships the content. Wiring the consent-screen homepage/privacy URLs and publishing the app are manual steps in the Google Cloud Console + GitHub Pages settings — see docs/google-verification.md for the punch list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: allow manual workflow_dispatch on Build & Package A docs-only push to a feature branch skips the desktop/mobile jobs because of the path filter on changes, but we still want a way to re-trigger them against the cumulative branch state — for example, when an earlier desktop-touching commit's run was cancelled by a subsequent docs commit (cancel-in-progress concurrency). Manual dispatch via gh workflow run is the lightest-weight escape hatch. No behaviour change for push / pull_request triggers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(oauth): tiered scope policy — Calendar bundled, Gmail BYO Solves the restricted-scope verification problem at $0 cost. Until SkyTwin can fund a ~$15k–$50k annual CASA Tier 2/3 security assessment for the bundled OAuth client, Gmail's restricted scopes are reserved for users who paste their own Google Cloud OAuth credentials into the Setup page. Calendar + identity flow through the bundled client and clear with normal sensitive-scope app review (days–weeks, no fee). What changed in apps/api/src/routes/oauth.ts: - resolveGoogleConfig() now reports a `source` field: 'user-supplied' (env vars or DB-stored from Setup), 'bundled' (SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID), or 'unset'. - New resolveRequestedScopes({source, includeGmail}) computes the scope set + returns a `skipped` list reporting capabilities that were silently dropped. The bundled source drops Gmail; the user-supplied source allows it. - /google/authorize honors ?include=gmail (also accepts ?scopes=gmail and ?gmail=true); requests dropped under the bundled gate return HTTP 412 with code GMAIL_REQUIRES_BYO_CLIENT and a help URL pointing at the user-facing walkthrough at /connect-gmail. - 6 new tests in oauth-scope-tiers.test.ts lock in the gating across every (source, includeGmail) combination. What changed in docs/: - docs/google-verification.md rewritten end-to-end as the staged rollout plan: brand verification status, sensitive-scope review for Calendar, restricted-scope tier for Gmail with the BYO escape hatch, scope justifications ready to paste into Google's submission form, demo-video script, and the issue draft for #351. - docs/connect-gmail.html — five-minute step-by-step walkthrough (create GCP project → enable Gmail API → configure consent screen → create OAuth client → paste into SkyTwin Setup). Linked from docs/index.html. What needs to happen separately: - PR #350 merges → GitHub Pages goes live → brand verification can be submitted. - Calendar-scope review (Tier 1) — submit through the GCP console once Pages serves the privacy policy URL. - Restricted-scope verification for Gmail — tracked in #351, only when SkyTwin can sustain the annual CASA fee. Tests: 684 api tests passing including the 6 new tier-gating ones. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): in-app Gmail-setup wizard at /#/connect-gmail Reframes Gmail BYO as the launch Gmail experience (not a fallback) and ships the matching in-app guided wizard, so users don't have to leave the dashboard to wire up Gmail features that are core SkyTwin value. What landed - apps/web/public/js/pages/connect-gmail.js — 5-step wizard with progress dots, per-step deep links into GCP Console (open in the user's existing browser session — SkyTwin never sees Google credentials), final paste-and-connect form that PUTs to /api/credentials/google and redirects through /api/oauth/google/authorize?include=gmail&userId=… so the consent dance happens against the user's just-saved client. - apps/web/public/js/app.js route '/connect-gmail' → renderConnectGmail. - Singleton-delegator click handler wired with the _listenerWired guard + hash gating, matching the CLAUDE.md frontend convention. - Client-side validation catches the common "you pasted the wrong thing" cases before the server roundtrip (Client ID must end in `.apps.googleusercontent.com`, Client Secret length check). - Step 5 pre-fills any previously-saved creds from /api/credentials/google so a partial setup survives a refresh. Reframing — Gmail BYO is the product, not a workaround - docs/connect-gmail.html: rewritten intro to clarify "this is how every SkyTwin user wires up Gmail today" (not "5-minute setup, one-time" — that read as optional). - docs/google-verification.md: Tier 2 section now states "this is the launch Gmail experience, not a fallback" and links the in-app wizard alongside the public-web mirror. - CHANGELOG: same reframe; mentions the wizard explicitly. Help-URL routing - The 412 `GMAIL_REQUIRES_BYO_CLIENT` response from /api/oauth/google/authorize?include=gmail now carries both `help: '#/connect-gmail'` (in-app SPA route) and `docs: 'https://jayzalowitz.github.io/skytwin/connect-gmail.html'` (public-web mirror) so callers in either context can route the user to the right surface. Why this and not a SkyTwin-driven embedded BrowserWindow Earlier design sketch had SkyTwin opening child BrowserWindows driving the GCP Console with a sidebar walkthrough. Killed on reflection: it would require the user to sign into Google inside a SkyTwin-managed browser, which captures the session and credentials in SkyTwin's process memory — exactly the threat model BYO is designed to avoid. The wizard now opens each GCP Console URL in the user's *own* default browser (`target="_blank"` in dashboard context; Electron's setWindowOpenHandler routes the same anchor to the OS browser in desktop context — already wired via apps/desktop/src/main.ts's open-external IPC). User clicks the Google buttons themselves; SkyTwin only navigates the wizard forward. Tests - 6 scope-tier tests still passing (resolveRequestedScopes contract unchanged). - Full api suite green (684 passing, 24 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs+web: launch plan + dashboard Gmail follow-up CTA Two pieces of forward-pushing work: 1. docs/launch-plan.md — the real meaningful roadmap from "code in a feature branch" to "grandma can download the app." Three tiers: Tier 1 = launch blockers (PR merge, brand verification, code signing, demo video, release tag, README rewrite). Tier 2 = first- month polish (auto-update, PKCE store in DB, onboarding deep-link, sample-profile polish). Tier 3 = strategic / post-launch (CASA assessment for Gmail #351, mobile stores, hosted variant). Each Tier 1 item names the dependency (purchase / review / merge) and the owner (us vs. Google). Costs are itemised: $99 Apple + ~$400 Windows EV cert + ~$15 domain = $500–$1000/year recurring to start. CASA assessment is the deferred $15k–$50k sitting behind a usage trigger. "What is explicitly NOT in launch scope" section names the tempting Tier-3 items (federation sync, MCP marketplace, hosted product) so they don't crowd out the boring Tier-1 work. 2. dashboard Gmail follow-up CTA — apps/web/public/js/pages/dashboard-view.js gets a new renderConnectGmailHero() card that surfaces immediately after the bundled Google sign-in completes (Calendar + identity granted, Gmail scopes absent). Links to the in-app wizard at /#/connect-gmail with the "Why is this step needed?" external doc alongside. Without this nudge users would finish bundled sign-in, look at an empty Approvals queue, and not know SkyTwin's inbox features need a second 5-minute step. Three-line check on Gmail state: card returns '' when (a) tour mode is active, (b) Google isn't connected yet (the existing ConnectGoogleHero owns that state), or (c) Gmail scopes are already present on the OAuth token. Driven by the `scopes` array the /oauth/google/status endpoint already returns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(oauth): tier 2 launch polish — PKCE DB store + onboarding deep-link Three Tier-2 launch-plan items in one PR, each isolated to its own subsystem: 2.2 PKCE verifier store now lives in CRDB (`oauth_pkce_pending` table + `oauthPkcePendingRepository`). A desktop restart between /authorize and /callback no longer drops the verifier and breaks sign-in. `consume()` is a single DELETE...RETURNING so the same replay- protection property survives the move off the in-memory Map. 2.3 OAuth /authorize accepts a whitelisted `next=connect-gmail` deep- link. After Google consent the onboarding wizard lands the user on /#/connect-gmail (with a "Calendar connected — now let's hook up Gmail" banner) instead of dropping them on the dashboard root and making them discover the follow-up CTA card. Free-form `next` URLs are explicitly NOT accepted — that would be an open-redirect; the whitelist is the security boundary. 2.4 Unset bundled client_id now bounces the user into the same connect- gmail wizard instead of showing a generic error toast. The 503 is tagged with `code: 'NO_GOOGLE_CLIENT_CONFIGURED'`; ApiError plumbs structured `code`/`help`/`docs` through to the dashboard, and the onboarding wizard branches on the code to route the user. The connect-gmail wizard's final OAuth call uses ?newUser=true when no userId is in localStorage so brand-new onboarding users finish the flow. Tests: 5 new for the PKCE repository, 5 new for the next= state round- trip (HMAC tampering breaks signature verification, unknown next= drops to null, etc.). All 689 API tests + 295 DB tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(oauth): post-/review fixes for tier-2 polish - parseSignedState: use hasOwnProperty.call on NEXT_HASH_ROUTES instead of bracket lookup so a `next=constructor` (or __proto__, toString, …) tag can't reach the inherited Object property and slip past the truthy check. New test loops the four common prototype keys and asserts nextHash stays null. Today's only consumer would have rendered a stringified function into the redirect URL — broken, not exploitable, but worth closing. - Onboarding NO_GOOGLE_CLIENT_CONFIGURED handler now re-enables the "Continue with Google" button before changing window.location.hash, so a synchronous re-render can't leave the button stuck on "Redirecting…". - Operator note in the PKCE-store comment block clarifying that migration 058 must run before the API serves traffic. We deliberately do NOT fall back to an in-memory Map — that would defeat the cross-restart guarantee the move to DB is meant to provide. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(oauth): desktop new-user flow auto-advances via pendingKey poll The desktop newUser sign-in opens Google in the system browser; the callback fires there and there's no IPC back to the Electron app. The old code re-enabled the wizard button with a "return here and continue" error and the user had to manually click again — a real grandma-blocker the existing TODO admitted to ("the web flow advances via redirect, desktop currently does not"). This closes the gap: - Wizard generates a UUIDv4 pendingKey client-side (crypto.randomUUID). - /authorize validates UUID shape (anti-SQL-injection / anti-traversal, not crypto-grade) and threads it through HMAC-signed state as `key=<uuid>`. parseSignedState re-validates on read. - /callback writes resulting userId + accountEmail + scopes + nextHash to a new oauth_pending_signin table (migration 059) keyed by the pendingKey, then renders the existing "close this tab" HTML. - New GET /api/oauth/google/pending/:key endpoint (public — the unguessable random key IS the authorization). Consume-on-read (DELETE...RETURNING) so a leaked key can only be redeemed once. Mirrors the existing pollUntilConnected pattern. - google-signin.js polls the new endpoint when (desktop && newUser); fires onComplete with { userId, nextHash } on success. - Onboarding wizard's onComplete sets userId in localStorage and routes to the deep-link target — auto-advance, no second click. Tests: 6 new for oauthPendingSigninRepository (replay protection, expiry defence-in-depth, scope-shape coercion); 4 new for the isValidPendingKey gate + key= state encoding (rejecting SQL injection, path traversal, wrong-version UUIDs, uppercase, etc.). All 694 API tests + 301 DB tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * security(oauth): harden /pending/:key — mint session, rate-limit, sql-filter expiry Post-/review fixes for the desktop pendingKey endpoint: 1. CRITICAL — don't return bare userId. The pre-existing POST /api/sessions accepts any userId from a localhost caller and returns a 7-day session token (unchanged by this PR — it's been the QR-pairing trust model). Returning userId from /pending/:key would chain those two endpoints: leaked key → consume → forge a session as that user. Instead, /pending/:key now mints the session itself in-process and returns the token alongside the userId. The pendingKey IS the credential; consume-on-read makes it one-shot. Client stashes the token under KEY_SESSION_TOKEN so subsequent API calls flow through Authorization: Bearer exactly like QR pairing. 2. MEDIUM — per-IP rate limit on /pending/:key. Without it, an attacker who exfiltrated a partial key (truncated log, side channel) could enumerate the remainder at line rate. Also a basic DoS vector. Wraps the same checkNewUserRateLimit() that already gates ?newUser=true. 3. MEDIUM — silent failure on remember() now logs with userId + pendingKey so the operator can correlate a wizard timeout with a real DB failure rather than chasing a phantom Google issue. 4. MEDIUM — consume() WHERE now filters by expires_at >= $now in SQL so a poll arriving past TTL doesn't delete the row before sweepExpired() reclaims it. Without this, network jitter on the client could destroy a row mid-handoff and the legitimate wizard would 404 even though the OAuth round-trip succeeded. 8. NIT — migration comment "128-bit" → "122-bit / UUIDv4" to stop overstating entropy. All 694 API tests + 301 DB tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(oauth): atomicity + dedicated poll bucket for /pending — closes 2nd /review Second /review pass on the security-hardening commit (9509269) flagged two real regressions I introduced: 1. Atomicity gap. consume() already DELETEd the pending row before sessionRepository.create() ran; a transient CRDB failure or missing-table error mid-call would strand the user with no session AND no recoverable pending row. The poll loop would then silently exhaust its 5-min budget. Now the consume + session INSERT happen in a single withTransaction() — if the INSERT throws, the DELETE rolls back and the user can retry. 2. Rate-limiter starvation. /pending/:key was sharing the checkNewUserRateLimit bucket (5 hits / 60 s). The wizard polls every 2 s for 5 minutes = 30 hits/min from the same IP — would 429 after ~10 seconds and quietly time out at 5 min, exactly re-introducing the grandma-blocker this endpoint exists to fix. New checkPendingPollRateLimit() backed by its own Map; capped at 120/min so a normal poll loop runs comfortably with headroom for retries and jitter. New test covers the cross-starvation case (filling the authorize bucket leaves the poll bucket untouched). 3. Truncate pendingKey to 8-char prefix in the failure log (the key is 5-min-lived but log aggregators may index it longer). All 697 API tests + 301 DB tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): address Copilot findings on commit fbfc11e Five real findings on the latest commit; the sixth (cockroach-manager early-return skipping ensureDatabase) was already addressed in e3c3951 and is just re-flagged on a stale view — the SQL probe and unconditional ensureDatabase call are in the code today. 1. parseDatabaseUrl ssl handling. Only `sslmode=disable` was mapped to `false`; everything else returned undefined and fell back to the env default (false). A `DATABASE_URL=…?sslmode=require` against a secure CRDB cluster would silently connect over plaintext. New sslConfigForSslmode() maps disable/require/verify-ca/verify-full to the corresponding pg.PoolConfig.ssl shape; unknown values still fall through (typo-tolerant against env override). 2. cockroach-manager stop() — proc.kill('SIGTERM') after gracefulQuit() was unconditional; if the drain already caused CRDB to exit, the SIGTERM throws ESRCH and a clean shutdown becomes an exception. Now we check `proc.exitCode === null` first and wrap in try/catch anyway as belt-and-suspenders. Also fixed the stale "cockroach quit" comment — the implementation uses `cockroach node drain`. 3. apps/desktop/package.json scripts had `tsc &&electron` (no space after &&) on five lines and `tsc && electron-builder` (with space) on the same lines later. POSIX parses `&&token` correctly so this isn't a runtime bug, but the inconsistency is real. Normalized. 4. JSDoc for startGoogleSignIn's `onComplete` callback omitted the sessionToken field that pollUntilPendingResolved emits. Updated the type signature + docstring so future callers see the contract. 5. build-single-binary.sh bash dependency for the Windows package script — documented inline. Windows GitHub runners ship Git Bash so CI works; local Windows devs need Git Bash / WSL / MSYS. A future Node port would remove the constraint but no CI gates on it today. All 301 DB tests + 175 desktop tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(web): promote tour-mode CTA on welcome screen (launch-plan 2.6) Welcome-screen tour link was a tiny gray text link ("Explore with a sample profile instead →", 0.82rem, --text-muted) — easy to miss next to three large CTA buttons. Promoted to a btn-outline btn-lg card with an "or" horizontal divider above it, matching the visual rhythm of the existing choices but clearly framed as the alternative no-sign-in path. Conditional-on-demo-availability is preserved: CTA + divider both live inside the same #onb-tour-row div, both reveal when fetchDemoInfo() returns available=true. Non-localhost / non-dev-bypass deployments still get a clean welcome screen with no broken tour link. No behavioural change to /api/v1/demo/{info,preview} or skyTwinExitTour. CHANGELOG (Unreleased) + launch-plan §2.6 (now "partial — Unreleased") + README "first 60 seconds" walkthrough updated to match the new label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * a11y(web): expose "or" semantic on tour-CTA divider (post-/review) The /review pass on 8c882ad flagged that the "or" divider used aria-hidden="true" on the entire wrapper div, hiding both the decorative lines AND the semantic "or" word from screen readers. Result: AT users go from the third primary CTA straight to "Try with a sample profile" without the alternative-path framing that's visually obvious. Fix: move aria-hidden to just the three inner spans (two lines + text), promote the wrapper to role="separator" with aria-label="or" so the relationship between the two button groups is announced once, correctly, without the decorative SVG noise. Visual identical. No behavioural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(desktop): rename @skytwin/desktop → skytwin-desktop (Windows NSIS) Windows NSIS build was failing after 2.5h with: File: failed creating mmap of "...@skytwindesktop-0.3.0-x64.nsis.7z" Error in macro x64_app_files on macroline 1 Error in macro extractEmbeddedAppPackage on macroline 8 !include: error in script: "installSection.nsh" on line 66 Error in script "<stdin>" on line 199 -- aborting creation process Root cause: electron-builder derives the intermediate .nsis.7z filename from package.json `name`. The npm scoped name `@skytwin/desktop` gets flattened to `@skytwindesktop` (only the `/` is stripped, not the `@`), so the .nsis.7z lives at `...\@skytwindesktop-0.3.0-x64.nsis.7z`. NSIS's makensis trips on @-prefixed paths in the File include macro and fails to mmap the archive even though the file was written successfully. Confirmed by the same-bundle pattern on this PR's prior CI run: - macOS (DMG): same bundle, same @-prefixed intermediate — ✓ packaged - Linux (AppImage/deb/rpm): same — ✓ packaged - Windows (NSIS): same — ✗ mmap of @-filename DMG and AppImage don't use makensis, so they sail through. Fix: drop the `@scope/` prefix from the desktop package's npm name. It's a leaf consumer (no other workspace package imports from it — verified with grep), and pnpm-lock.yaml keys workspace entries by directory path, not by npm name, so the lockfile is unchanged. `pnpm install --frozen-lockfile` passes locally. Updates: - apps/desktop/package.json — name field + the embedded help-text in the placeholder `build` script. - Root package.json — 6 desktop:* scripts that use `pnpm --filter`. - .github/workflows/build.yml — mac/win/linux package steps. - .github/workflows/release.yml — build + 3 publish-always steps. - apps/desktop/scripts/build-single-binary.sh — help-text echo. - apps/desktop/src/headless.ts — invocation comment. The package directory + workspace location are unchanged; only the public `name` string flips. CHANGELOG references stay as-is (they're historical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): codex /review findings — worker, install, onboarding, OAuth Codex /review on the cumulative #350 diff caught 4 P1 + 2 P2 issues that my prior Claude /review passes (scoped to each new commit) missed. Cross-model agreement was 0% — different scopes catch different things, which is exactly what the merge gate's two-reviewer promise is for. P1 fixes (default-flow blockers): 1. apps/worker/src/index.ts resolveGoogleConfig — required both clientId AND clientSecret. The bundled PKCE flow mints tokens with no clientSecret, so worker logged "credentials not configured; skipping Google connectors" and never processed a single signal on the grandma-grade default install. OAuth worked, twin did nothing. Fix: mirror api oauth.ts three-layer resolve (env → DB → bundled), accept empty clientSecret as the PKCE signal (refreshAccessToken already handles it correctly). Service-manager already injects SKYTWIN_DEFAULT_GOOGLE_CLIENT_ID into worker env via buildChildEnv. 2. bin/skytwin-install — `pnpm db:migrate` ran before DATABASE_URL was exported, so @skytwin/db fell back to localhost:26257. If the user set SKYTWIN_DB_PORT to dodge a collision or `localhost` resolved to ::1 instead of the 127.0.0.1 listener, migrations silently landed on the wrong socket. Build the URL from the same env vars bin/skytwin-db uses. 3. apps/web/public/js/pages/onboarding.js — desktop pendingKey onComplete stored userId + session token + hash but skipped KEY_ONBOARDED, hideWizard(), and skyTwinSetUserId(). Dashboard rendered #/connect-gmail BEHIND the still-visible onboarding modal — sign-in looked stuck, reload reopened first-run. Mirror the tour path's full three-step teardown. 4. apps/web/public/js/pages/connect-gmail.js — final OAuth step used `window.location.href = data.url` which inside Electron's renderer loads accounts.google.com in an embedded UA, rejected as disallowed_useragent. Route through startGoogleSignIn which detects Electron and uses openExternal + pendingKey poll. Plumbed `include` param through getGoogleAuthUrl + startGoogleSignIn so the Gmail scope opt-in survives the routing change. P2 fixes: 5. apps/web/public/js/pages/connect-gmail.js — PUT /api/credentials/ google bootstrap-without-session is a self-hoster edge case. Default bundled-client launch path doesn't reach it. Documented inline + launch-plan, with operator workarounds noted; a proper bootstrap token mechanism is its own scoped change. 6. bin/skytwin-db is_running — fallback path returned true on ANY port listener. A stray postgres / leftover container would make cmd_start skip launching CRDB. New is_crdb_responding() helper runs SELECT 1 to verify the listener speaks CRDB before short-circuiting. Tests: 697 API + 115 worker — all green. No new test files added; fixes either mirror established patterns (#1, #3, #4) or harden bash fallbacks the existing test harness doesn't exercise (#2, #5, #6). GATE: codex re-review pending after CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(windows): exclude workspace from Defender to stop makensis mmap fail Two consecutive Windows CI runs failed at the same step with the same error, on different commits and different filenames: - 71bdee5b: makensis File: failed creating mmap of "...@skytwindesktop-0.3.0-x64.nsis.7z" - 3a251323: makensis File: failed creating mmap of "...skytwin-desktop-0.3.0-x64.nsis.7z" The rename from @skytwin/desktop -> skytwin-desktop in 2da3549b removed the leading @ from the filename. The error still reproduced verbatim against the new name, so the @ theory was wrong. Actual root cause: makensis is a 32-bit process that opens the freshly- written .nsis.7z intermediate via mmap to embed it into the final installer.exe. On the GitHub Actions windows-latest runner, Windows Defender's real-time scanner opens that same .7z to scan it the moment it's written. Defender's open holds a sharing lock; makensis's mmap call races against it and returns failure. This is documented in electron-userland/electron-builder#6107. Fix: add an ExclusionPath for the build workspace + electron-builder cache dirs before the Package step runs. Defender stays active on the runner overall (so signtool's signing pass on cockroach.exe / SkyTwin.exe still gets scanned), but the staging dirs that NSIS reads back are out of bounds. Uses Add-MpPreference -ExclusionPath which only requires the admin shell the runner already has, no policy changes. Why not disable Defender entirely: - Disabling RT scanning leaves the signtool steps unprotected, and we sign two .exe files (cockroach.exe + SkyTwin.exe) before makensis runs. - Exclusion is the surgical fix; disable is the sledgehammer. Why not nsis-web (download payload at install time): - That target requires a release URL the payload is hosted at; CI runs don't tag releases. - Scope creep for fixing a CI race condition. Expected outcome: Windows job clears the makensis step on first try (previously failed at ~2h26 to 2h36 with same error). If it still fails post-exclusion, the next diagnosis target is bundle size vs 32-bit makensis address space, but exclusion is overwhelmingly the most likely cause given the timing reproducibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(ci): tarball embedded apps + nsis tuning to unstick Windows Three Windows CI runs failed in a row, each at the same makensis mmap step ~2.5h into the job. Three problems were stacked: 1. The .nsis.7z intermediate was racing Windows Defender's RT scanner. Defender holds a sharing handle while it scans the freshly-written .7z; 32-bit makensis mmap-opens the same file and gets ESHARING surfaced as `File: failed creating mmap of`. 2. electron-builder's win-unpacked copy step was spending an hour just writing the ~10,000 loose files in dist/embedded/{api,worker,web}/ (pnpm-deploy node_modules trees, multiplied by 3 apps). NTFS small-file throughput on the GitHub Actions runner is much worse than APFS / ext4 — the macOS+Linux desktop builds finished in 9 and 4 minutes against the same input. 3. `differentialPackage: true` (electron-builder default) was running an extra .blockmap generation pass on the already-slow .nsis.7z. The blockmap is for electron-updater delta downloads we don't ship yet (gated on §1.5 release tag + signing certs). Already-shipped Defender exclusion (a4b2e09) addresses #1. This commit addresses #2 and #3: #2 fix: `apps/desktop/scripts/build-single-binary.sh` now tars the embedded api/worker/web trees into a single `apps.tar.gz` after the pnpm-deploy + strip-self-symlinks step. The extraResources filter in `apps/desktop/package.json` shrinks to {apps.tar.gz, bundle-manifest. json}. `apps/desktop/src/service-manager.ts` gains an `ensureEmbeddedRoot()` method that extracts the tarball to `<userData>/embedded/` on first launch, gated by a `.version` marker so subsequent launches (and post-upgrade launches) handle the extract correctly. `startApi`, `startWeb`, `startWorker`, and `runMigrations` all consume the extracted path. #3 fix: `nsis.differentialPackage: false` + `compression: "normal"` pinned explicitly so a future electron-builder bump can't silently switch to LZMA-max and regress build time. Trade-offs documented in CHANGELOG. User-facing first-launch latency gains ~5-15s for the one-time tar extract; subsequent launches see no change (sentinel-file existence check is microsecond). Installer size shrinks by the tar.gz compression ratio (~30-40% on node_modules). Tests: - platform-utils.test.ts updated for the new extraResources shape (apps.tar.gz + bundle-manifest.json, no more api/**/* etc.) + negative assertions that the old loose patterns are gone. - 175 desktop tests pass. - tsc --noEmit clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(ci): cache CRDB binaries + parallelize the download loop Two stacked wins on top of the already-shipped tarball + Defender exclusion work: 1. CRDB binary cache. The 5-platform CRDB set (~700MB compressed, ~140MB per platform) was being re-downloaded on every desktop CI job because actions/cache@v4 was only pointed at electron and electron-builder caches. Three desktop jobs x 5 binaries x ~10s each = ~150s spent on cold-cache work that's identical between runs. Cache path now includes ~/.cache/skytwin/crdb-binaries (where bin/skytwin-db's download helper stages the archives) and the cache key hashes build-single-binary.sh too so a SKYTWIN_CRDB_VERSION bump invalidates correctly. 2. Parallel downloads in build-single-binary.sh. The `for entry in CRDB_TARGETS; bundle_crdb_binary "$entry"; done` loop blocks on each platform sequentially even though every call has independent inputs and outputs. Backgrounded with `&` + reaped via `wait $pid` in a follow-up loop. Cold-cache wall time drops from ~25-50s to ~5-10s (limited by the slowest single download). Warm cache short-circuits at the early `already bundled, skipping` return so parallelism is a no-op there. `set -e` alone doesn't propagate failures from backgrounded functions, so an explicit `crdb_failed` flag walks the wait results and `exit 1`s if any child returned non-zero. Without that, a corrupt download (sha mismatch -> exit 3) would silently leave the binary missing and electron-builder would fail later with a confusing "missing extraResources" error. Net expected savings: roughly 1-3 minutes off each desktop build on warm cache (the dominant case after the first run), ~30s on cold cache. Doesn't change the long-pole job (Linux at 14m, three output formats), so the end-to-end wall time stays around 17 min — but the runtime spent on redundant network is gone. Tests: bash -n clean, workflow YAML valid (would fail at GH parse if not). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: fix release.yml pnpm-setup + build.yml auto-publish on tagged main Two CI failures surfaced after PR #350 merged + v0.6.57.0 was tagged: 1. release.yml — all three platforms failed at pnpm/action-setup with ERR_PNPM_BAD_PM_VERSION ("Multiple versions of pnpm specified"). The action was pinned at @v4 with `version: 9`, but every package in the workspace has `packageManager: "pnpm@..."` in package.json which @v4 also reads. With both inputs present, v4 errors instead of picking one. Aligning with build.yml's @v5 usage (which reads only from packageManager) clears it. 2. build.yml desktop-macOS — once v0.6.57.0 landed on main, electron-builder's auto-publish heuristic saw the tag and tried to upload the just-built .dmg to GitHub Releases. build.yml is the PR/main validation gate, not the publish path — it doesn't set GH_TOKEN, so the upload threw "GitHub Personal Access Token is not set". Passing `--publish never` explicitly to every `electron-builder` invocation in build.yml short-circuits the auto-publish detection, regardless of branch/tag context. release.yml is the only path that should publish, and it already passes `--publish always` with the right token. Also bring release.yml up to the same parity as build.yml for the fixes that landed during PR #350: - CRDB binary cache path (~/.cache/skytwin/crdb-binaries) in the actions/cache@v4 step on all three OS variants, with the same cache key shape so warm caches transfer between the two workflows. - Defender ExclusionPath step on the windows-latest variant before electron-builder runs — same makensis mmap-race fix. Net: re-running release.yml against tag v0.6.57.0 (either via the tag delete+re-push or via workflow_dispatch) should produce the .dmg + .exe + .AppImage + .deb + .rpm artifacts the README rewrite needs. main CI's macOS job is fixed for any future tag-on-main scenario. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.6.58.0 fix: address Copilot review on PR #352 Bumped MICRO → PATCH because Copilot's review surfaced four real bugs in code that landed via #350's squash, and addressing them in #352 expands scope beyond pure CI workflow fixes. CI workflow fixes (original #352 scope): - release.yml: pnpm/action-setup @v4+version conflict (ERR_PNPM_BAD_PM_VERSION) - build.yml: electron-builder auto-publish on tagged main without GH_TOKEN - release.yml parity with build.yml's PR #350 fixes (CRDB cache, Defender) Copilot fixes: - install.sh worktree detection — `[ -d .git ]` → `[ -e .git ]` so gitlink files (Conductor worktrees, etc.) hit the fetch+merge branch the header comment already promised. Previously fell through to "no .git directory, use as-is" silently. - oauth-pending-signin-repository.remember() now actually calls sweepExpired() best-effort. Header docstring promised it; code never did. Abandoned OAuth flows were growing the table monotonically. - generatePendingKey() guards `crypto.getRandomValues` too — the polyfill path was guarded only on `crypto.randomUUID`, so an environment with no crypto global threw a useless ReferenceError instead of a typed "browser too old" error pointing at the existing-user fallback. - connection.ts sslConfigForSslmode() throws on unknown sslmode instead of silently downgrading. A typo like `sslmode=requier` previously fell through to DATABASE_SSL (default false) and shipped a plaintext connection against what should have been a secure cluster. Also added explicit `allow` and `prefer` handlers matching libpq semantics. Tests: 301 DB tests pass (no sslmode typos in repo). JS/TS/bash syntax clean. The sslConfigForSslmode change is intentionally not test-covered in this PR — it's a private function and adding a test file for it is its own scope. Documented as a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): address /review findings on PR #352 Adversarial review caught four real issues in the previous v0.6.58.0 commit. Fixes in order of severity: [HIGH] connection.ts — module-load throw cascade. The new sslmode throw was firing at @skytwin/db IMPORT time (via FROM_URL initializer), which would have crashed every consumer on bad input: unrelated test files, migration scripts, type-checker tooling. Wrapped the module-load parse in a try/catch so import always succeeds; getPool() re-parses fresh on each first call and lets the throw propagate from there. This matches the contract the sslConfigForSslmode docstring already promised. [HIGH] build.yml double-publisher race. tag push (v*.*.*) fires BOTH build.yml AND release.yml — both target the same GitHub Release. The legacy release: job at build.yml:419 used softprops/action-gh-release while release.yml uses electron-builder's GH publisher. They race for the same release artifacts; one wins, the other duplicates or fails. Deleted the build.yml release job — release.yml is canonical (it has the code-signing env wiring CSC_LINK/APPLE_ID/etc that build.yml never had). build.yml stays as validation-only. [MEDIUM] oauth-pending-signin-repository.remember — three concerns: 1. Empty `.catch(() => {})` silently swallowed sweep failures, killing the only observability operators had into table-growth bugs. Now logs via console.warn before swallowing. 2. `this.sweepExpired()` would TypeError if a caller destructures (`const { remember } = repo`). Switched to explicit reference `oauthPendingSigninRepository.sweepExpired()`. 3. oauth.ts:833 already called sweepExpired explicitly before the remember call — now redundant (two pool connections per callback under burst). Removed the caller-side sweep; repo owns housekeeping. Tests: 697 API + 301 DB all pass. No new tests in this commit; the sslmode behavior change deserves regression coverage in a follow-up (test for `sslmode=requier` throws + `sslmode=allow|prefer` returns undefined). Out of scope for a /review fix-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
May 23, 2026
Five consecutive failures of release.yml on tag v0.6.58.0: 1. pnpm/action-setup v4+version conflict (PR #352 fix) 2. `pnpm --filter skytwin-desktop build` skipped workspace deps (PR #353 fix) 3. pnpm `--` separator broke electron-builder arg parsing (PR #354 fix) 4. Empty CSC_LINK env var made electron-builder treat CWD as cert path (PR #355 attempted fix — did not actually work, see #5) 5. CSC_IDENTITY_AUTO_DISCOVERY=false isn't enough because CSC_LINK="" (set-to-empty-string, not unset) still triggers the path-resolve code path Each fix revealed the next bug because release.yml was never tested end-to-end — it's been broken since the file was committed. At 5 fixes deep, the right move is to stop fixing release.yml and use the known-working publisher pattern instead. build.yml already builds artifacts successfully on tag push via its desktop-mac/desktop-windows/desktop-linux/mobile-* matrix. PR #352 deleted build.yml's softprops-based release: job specifically to avoid double-publishing with release.yml. With release.yml deleted, that conflict is gone — restore the simpler chain: - Desktop+mobile matrix builds artifacts (already works, --publish never). - New release: job downloads via actions/download-artifact and creates a draft GitHub Release via softprops/action-gh-release@v3. Trade-off: - Lose: electron-builder's GitHub publisher integration (auto-updater channel YAML). When code signing + auto-update become priorities, add release.yml back with the lessons from #352-#355 baked in OR switch to a single workflow with electron-builder publish. - Gain: artifacts actually publish today, on an unsigned-build basis, which is what the launch plan §1.6 README rewrite needs. After this lands: re-tag v0.6.58.0 (5th attempt). build.yml's matrix runs as before, plus the new release: job downloads + publishes a draft. Operator manually clicks Publish in the GitHub UI to make the release live. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 25, 2026
jayzalowitz
added a commit
that referenced
this pull request
Jun 6, 2026
Correctness: - deadline urgency: stale (past-relative-to-now) deadlines no longer read as critical; far-out deadlines no longer DOWNGRADE a type's default urgency (#1/#2) - security markers curated to specific phrases — kill false positives on shipping notices / "welcome back" / articles (#3); marker check also applied on the LLM path so escalate-only holds regardless of classifier (safety defense-in-depth) - digest emits signalRefs[] so citation chips actually render (#4) - scope gate now covers calendar RSVP/invite write actions (#5) - commitment extractor: clause-level negation (keep real commitments sharing a sentence with "if I…") (#6); "by <person>" no longer a deadline hint (#7) - entity resolver compares full normalized string, not the truncated slug (#10) Hardening/robustness: - demo-guard isLocalDbTarget: exact host match, not substring (#8) - provisionNewUser is genuinely best-effort (try/catch) — never 500s after the user row exists - briefing-generator pinned to prompt v1 until it consumes v2 structured output (avoids requesting+discarding todos/topics); v2 deterministic_fallback fixed - briefing test mock provides userRepository.getLocale so the LLM-prose path is actually exercised (#13) Regression tests added for each. Full suite green (70/70 tasks). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
jayzalowitz
added a commit
that referenced
this pull request
Jun 10, 2026
) (#488) * feat(decision-engine): SignalText multi-source accessor + capability matrix (spec 07, #480) Normalize any RawSignal (email/calendar/filesystem/voice) into a channel-agnostic SignalText so commitment/deadline/security/cluster/entity capabilities are source-agnostic. Extends AuthoringTier with authored_originated/received_shared; adds a tested capability×source coverage matrix. Foundation for #475/#476/#479. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(api,db): observer default + new-user provisioning + seedUpsert (spec 10, #483) - LOCKED: new users default to trust_tier 'observer' (users.ts) — matches DB default + CLAUDE.md; resolves the 3-way conflict that forced 'suggest'. - provisionNewUser: eager empty twin profile + conservative autonomy defaults (no spend caps, so the built-in NO_SPEND_WITHOUT_LIMIT gate blocks spend until the user sets a budget — safe by construction). - seedUpsert/buildUpsertSql: shared, tested idempotent upsert helper for re-runnable seeds (used by spec 09). Existing seed.ts already idempotent. Part C (promotion soak-floor hoursInCurrentTier + tier-ladder intro) still TODO. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(worker,db): enforce promotion soak-floor via hoursInCurrentTier (spec 10 Part C, #483) Daily promotion-eligibility job now populates hoursInCurrentTier (from last tier change or account creation), so the engine actually enforces minDurationInTierHours (24h observer->suggest, etc). Closes the documented gap where the floor was skipped in the auto path. Fail-safe 0 keeps a promotion blocked when time can't be derived. Tier-ladder intro UI folds into spec 08. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): deadline extraction feeds urgency (spec 03, #476) extractDeadline parses absolute/relative dates (chrono-node) from any text-bearing signal (SignalText-compatible) and returns the earliest credible FUTURE deadline. situation-interpreter.enrichDeadline stamps rawEvent.deadline when the connector didn't, so the existing assessUrgency consumer finally gets fed. Rejects past dates + no-match. v1 leaves per-user-timezone resolution to spec 12. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): commitment extraction from authored content (spec 02, #475) extractCommitments surfaces the user's own stated obligations ("I'll send the draft tomorrow" -> "Send the draft tomorrow") from authored SignalText. Gated to authoredByUser + the commitments source allowlist (safety invariant #8: never from inbound content). Rule extractor handles modal forms, excludes questions/past/third-party/hypotheticals, dedups, and emits a deadlineHint for spec 03. CommitmentStrategy seam left for an LLM path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): inbound security-alert classifier, escalate-only (spec 06, #479) Adds SituationType.SECURITY_ALERT (enums.ts). classifySituation matches inbound account-security markers FIRST (precedence over finance/email), urgency=high, domain=security. The candidate generator emits ONLY a human-review escalation that says "open the provider directly" with link-free parameters — never an auto-executable action, never a URL from the untrusted body (safety invariant #8). Provenance stays untrusted_external regardless of claimed sender. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): signal topic clustering for the digest (spec 04, #477) clusterSignals groups awareness signals into life-domain topic clusters for the Topics section. Anchors to known domains (beats the reference product's mis-filing), guarantees complete + non-overlapping partition, caps cluster count with overflow merged into "More updates" (logged via onMerge). Deterministic fallback ships; ClusterStrategy seam for an LLM path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): source-coverage model for graceful degradation (spec 13, #487) computeCoverage evaluates the capability x source matrix against a user's connected accounts -> per-capability available/partial/unavailable + the sources that would unlock each, plus a coldStart flag (zero sources, distinct from connected-but-quiet). Excludes mock sources. Drives "connect X to unlock Y" transparency; UI affordances render in spec 08. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(policy,decision-engine): access-faithful gates — scope + hidden (spec 11, #485) Scope gate (policy-engine): requiredWriteScope/hasWriteScope/applyScopeGate. Wired into DecisionMaker.generateCandidates — when grantedScopes is supplied, un-granted write candidates (send/calendar) downgrade to a human-review "grant access" item. Fail-safe NOT granted (safety invariant #8). Visibility filter (decision-engine): isHidden/filterVisible — the single hide predicate the digest routes input through (briefing-generator wiring lands with spec 01). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(db,worker,decision-engine): locale & timezone faithfulness (spec 12, #486) Migration 063 adds users.language + users.timezone. userRepository.getLocale + resolveLanguage/resolveTimezone/isNonEnglish helpers (safe fallbacks: en / UTC with a logged-default flag). Briefing prose locale now reads the user profile instead of hardcoded 'en'. isNonEnglish is the LLM-vs-rule routing signal for the extractors (degraded-marker wiring is a follow-up on 02/03/06). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(db): launch demo fixture — opt-in, isolated, guarded (spec 09, #482) assertDemoSafe (3-gate invariant #0): explicit-only, prod hard-blocked + non-local needs override, identity isolation via is_demo (migration 064). Never wired into bin/skytwin-dev/auto-seed — can't run for a real or new user. demo-fixture.ts guards then upserts the reserved demo user + ingests a synthetic source-varied corpus (email/calendar/file/voice) through /api/events/ingest; --reset deletes is_demo rows only. `pnpm demo:fixture`. Guard fully unit-tested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine,policy-prompts): digest to-do/FYI split (spec 01, #474) buildDigest partitions items into action-required to-dos (urgency-ordered, capped) vs domain-clustered topics, with no overlap. Composes the epic: filters hidden content first (spec 11), clusters topics (spec 04), carries sourceType+deadline for the UI (spec 07/03). New briefing-prose v2 prompt emits the two-section structured payload (todos + topics). The structured_payload column + repo read + render land with spec 08 (UI). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): entity extraction + cross-signal resolution (spec 05, #478) extractEntities pulls people (emails) + orgs (suffix-tagged) from SignalText. resolveEntities links mentions to stable entityIds — exact email key for people (never fuzzy), token-overlap floor for orgs, conservative mint-on-doubt so a false merge can't corrupt the graph. linkEntitiesAcrossSignals aggregates "every signal touching X". Persistence reuses MemoryPort.recordEntity; the getSignalsForEntity port method is the remaining integration seam. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(web,api): digest UI — two-bucket, source-aware, cited (spec 08, #481) twin-briefing.js renders the structured digest: To-dos above Topics, each row with a source-type chip (email/calendar/file/voice) + citation chips that open the in-app signal detail (never an external URL — safety #8). Reuses the existing singleton-delegator + hash-gate + data-action conventions (new open-signal action). Falls back to prose when structured is null (back-compat). API /latest passes through structured (nullable, forward-compatible). CSS reuses card/badge tokens. Mobile BriefingScreen mirror is the remaining part of this spec. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(post-/review): address review findings across the epic Correctness: - deadline urgency: stale (past-relative-to-now) deadlines no longer read as critical; far-out deadlines no longer DOWNGRADE a type's default urgency (#1/#2) - security markers curated to specific phrases — kill false positives on shipping notices / "welcome back" / articles (#3); marker check also applied on the LLM path so escalate-only holds regardless of classifier (safety defense-in-depth) - digest emits signalRefs[] so citation chips actually render (#4) - scope gate now covers calendar RSVP/invite write actions (#5) - commitment extractor: clause-level negation (keep real commitments sharing a sentence with "if I…") (#6); "by <person>" no longer a deadline hint (#7) - entity resolver compares full normalized string, not the truncated slug (#10) Hardening/robustness: - demo-guard isLocalDbTarget: exact host match, not substring (#8) - provisionNewUser is genuinely best-effort (try/catch) — never 500s after the user row exists - briefing-generator pinned to prompt v1 until it consumes v2 structured output (avoids requesting+discarding todos/topics); v2 deterministic_fallback fixed - briefing test mock provides userRepository.getLocale so the LLM-prose path is actually exercised (#13) Regression tests added for each. Full suite green (70/70 tasks). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): collapse prose under a disclosure when the digest renders (design-review) Showing the structured two-bucket digest AND the full prose was the same briefing twice. When structured is present, the prose moves under a "Full briefing" <details> as the long-form view; falls back to inline prose when there's no structured payload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine,web): power view — inline technical depth (spec 14) One digest, two depths. Default stays the clean view (non-technical users unaffected); a discoverable header "Power view" toggle (persisted) + per-item "Details" expander reveal the depth SkyTwin already computes — provenance, confidence %, urgency reason, why-it-didn't-auto-run (scope/tier/policy), real source refs, and the explanation — plus a coverage panel ("what I can see, connect X to unlock Y"). Not buried in settings. buildDigestItemDetail is the pure view-model (raw codes -> human strings), unit tested. UI follows the singleton-delegator/hash-gate/data-action conventions. Digest payload carries optional per-item detail + coverage (generator populates). Verified rendering via a headless-browser screenshot. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(design): lock design system — calm command center, premium iris (DESIGN.md) Source of truth grounded in a full element-and-state inventory of the digest surfaces. Cool-neutral base (refines existing #0f1117 tokens; rejected the warm/brown direction), iris #7C72E8 as the SINGLE accent meaning "needs you / act", Fraunces voice + Geist + Geist Mono, action-vs-awareness hierarchy. Catalogs every element + EVERY state including the gaps never rendered before: cold-start, scope-blocked grant-access, loading, error, prose-fallback, distinct security treatment, provenance in default view. CLAUDE.md now points UI + /qa + /design-review at it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(web): implement DESIGN.md in the digest — iris, two-zone, gap states (spec 15) Wires the locked design system into the real digest UI: - Load Fraunces (twin voice) + Geist + Geist Mono (index.html) - Iris #7C72E8 as the single accent = "needs you / act"; killed the CAPS source-chip soup -> one neutral source mark + a single "·N sources" citation; provenance as a dot (neutral, never accent) - Action zone (to-dos: checkbox + inline Draft/Snooze/Verify/Grant, hover-reveal, always-on for security + touch) vs awareness zone (topics: lighter, no edge) - Twin voice (Fraunces) + value line ("✓ N handled · M need you · K to catch up") - Power view detail panel + coverage panel restyled to the system - GAP STATES now designed: loading skeleton, empty-quiet, cold-start ("connect a source"), prose-fallback disclosure, distinct security treatment, scope-blocked "Grant access". Verified via headless-browser render of the real CSS. Row-action wiring (draft/snooze/verify) routes/acknowledges until the act layer lands. App-wide token adoption (vs digest-scoped iris) is a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): make DOMContentLoaded handler async — SPA-breaking syntax error (pre-existing) app.js:856 registered a non-async DOMContentLoaded handler, but the pairToken branch (line ~904) uses `await fetch(...)` → "Unexpected reserved word" at parse time, which aborts ALL app initialization. Every page rendered as an empty #page-content shell. Present on origin/main; web JS has no type-check or tests, so it shipped silently. One-word fix (() => → async () =>); verified by booting the seeded app and touring dashboard/decisions/approvals/settings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(api,web): render the to-do/FYI digest live end-to-end (parity) The digest existed as tested modules but never rendered in the running app: the briefing generator produces no structured payload, so /latest returned null and the UI fell back to "No briefing content yet". This closes that seam so the AI-inbox parity (to-dos vs topics, multi-source) actually shows. - live-digest.ts: compute the structured digest from a user's recent decisions — read each decision's RawSignal through toSignalText (spec 07) for real, source-agnostic titles, partition via buildDigest (spec 01/04), attach power-view detail (spec 14) and coverage (spec 13). - twin-briefings /latest: when no structured_payload is stored, compute the digest live (best-effort; degrades to prose on error) and synthesize a briefing envelope so the page renders parity today. Forward-compatible: a stored payload still wins once the worker writes one. - dashboard: Home leads with a read-only digest hero (action zone first, DESIGN.md) linking to the full interactive /briefing; stop showing the "connect Google" nag once the twin has produced decisions. - index.html: first-class "Briefing" nav link. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(api,web,db): show "Needs you" for pending-approval decisions The decisions log mapped auto_executed=false to "You OK'd", which mislabels a decision still awaiting approval (notably an escalated security alert) as already approved. Surface the outcome's requires_approval through the API and add a distinct "Needs you" state so the log matches the Approvals page. - decision-repository.getOutcomesForDecisions: also select requires_approval. - decisions route: return requiresApproval per decision. - decisions.js: Auto / Needs you / You OK'd / Pending, in that order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(api): describePreference never renders "[object Object]" A structured preference value (e.g. a brand-preference object) fell through to String(value) and rendered as "[object Object]" in the dashboard "What I've learned" summaries. Render arrays/objects readably instead. Adds a regression test covering objects, nested objects, arrays, booleans, strings, numbers, and null. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): hide read controls on the live-computed briefing The live digest (no stored row) carries the sentinel id 'live'; its "Mark as read" button POSTed to /briefings/live/read and 400'd on the UUID check. Gate the New badge + Mark-as-read on a persisted briefing so the control only shows when there's a real row to mark. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(api,decision-engine): make power-view digest detail meaningful The power-view detail panel rendered noise: "URGENCY: Default for security", "REFS: email: 77538186" (an internal id slice), "WHY: Account notice" (just the title again), and no confidence at all. Feed it real technical depth instead: - confidence: pull decision_outcomes.confidence -> a real percentage. - source ref: the actual sender/organizer/file ("email: no-reply@accounts.example"), not an opaque decision-id slice. - urgencyReason: a real driver ("Security alert — always sent to you", "New invite — awaiting your RSVP", "Routine — no deadline detected") via a new optional urgencyReason override on buildDigestItemDetail, instead of the generic "Default for <domain>". - drop the redundant explanation (it duplicated the title). - honest whyNotAutoExecuted: use the engine's real escalation_reason, and only fall back to the trust-tier gate when the item genuinely required approval — no fabricated "trust_tier:observer" on escalate-only items. - normalizeUrgency: map the DB default 'normal' to 'medium', not 'low' (silent demotion). - name the recent-decisions window; drop the redundant maxTodos override. Adds a DB-mocked buildLiveDigest suite (cold start, to-do mapping + detail, malformed raw_event, provenance fail-safe, handledCount) plus normalizeUrgency and urgencyReasonFor helper tests. Fixes the sections-fold test's @skytwin/db mock to define query so the live-digest path resolves cleanly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): don't suppress connect heroes once the twin has data Gating the Connect-Google/Connect-Gmail heroes on `hasAnyData` hid the onboarding CTA for users who have decisions but haven't connected Gmail (the "Calendar connected, Gmail not yet" segment) — the heroes already self-suppress when actually connected, so the extra gate only hurt real users. Revert to gating on tourMode only. Also drop a dead `t.kind === 'security'` branch in the Home digest hero (buildDigest never sets kind). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(digest): show what each item says + the recommended next step The digest told you a title and a pile of system metadata (origin, confidence, "why it escalated") but not the two things that actually matter: what the item says and what to do about it. Surface both, sourced from data we already had: - body: the real content (email snippet, event description, file excerpt, transcript) via toSignalText, rendered as a one-line preview under each title — visible by default, not buried in the power view. - suggestedAction: the twin's recommended next step, taken from the pipeline's selected candidate action ("Accept this calendar invitation", "Review this security alert in the provider's official app — don't click links in the message"), with sensible fallbacks for escalate-only situations. UI: the to-do/topic rows now lead with title -> what it says; the power-view detail leads with the actionable "suggested" step, and the trust metadata (origin/confidence/refs) drops below it. The Home hero shows the content line plus an iris "→ next step" so it's actionable without opening anything. Carries body through DigestItem/DigestTodo/DigestTopicItem + buildDigest, and adds suggestedAction to DigestItemDetail. Tests cover body extraction, the pipeline-selected action, and the security/RSVP fallbacks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(digest): clean, user-facing next step on every item Two gaps from the last pass: some suggestions were the rule-based engine's raw internal text ("Apply appropriate labels to this email", "Escalate to user: Decision needed regarding: transcript"), and the suggestion only showed in the power-view detail — so in the default view most items had no visible next step. - suggestedActionFor now maps the structured selected action TYPE to plain English ("Accept the invite, or decline / propose another time", "Nothing needed — I'll file it", "Take a look and tell me what to do"), with a security-specific instruction and situation fallbacks. Every item gets a clean, user-facing step — no engine internals leak through. - The "→ next step" now renders in the row itself for every item (to-do and topic), visible without the power view. The power-view detail drops back to the trust/technical metadata (origin, confidence, refs) it's meant for. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(digest): plain-language detail — drop the system vocabulary The detail panel was accurate but spoke the way the system names things, not the way a person asks. A non-technical user can't parse "ORIGIN: Inbound — untrusted", "REFS", "NOT AUTO-RUN", a bare "CONFIDENCE: 80%", or "From your twin" — and "untrusted" reads as a threat rather than "you didn't write this". Rephrase everything user-facing: - provenance: "Inbound — untrusted" -> "From someone else"; "From your twin" -> "From your assistant"; fail-safe stays "someone else". - block reasons: "trust level (observer) asks me to check" -> "You've asked me to check with you before I act"; "From untrusted content" -> "It came from someone else, so I want your OK first". No internal codes leak. - detail labels: origin/confidence/urgency/not-auto-run/refs become "where it's from / written by / how sure I am / why now / why I'm asking you". - source ref: a real sender or a friendly "your calendar"/"a voice note", not an id slice or a filename echo. Default view was already plain; this brings the power view to the same bar so "advanced" doesn't mean "fluent in our nouns". Tests updated to assert the plain wording and that no jargon leaks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(digest): every expand earns its rows; rename to "Your briefing" Make the detail expansion uniformly useful and cut the filler: - add "when" (relative time) — was missing entirely. - "why now" is explanatory for FYI items too ("Not time-sensitive — just so you're aware") instead of the meaningless "Normal priority". - confidence gets a word: "fairly sure (80%)", "very sure (100%)". - drop the redundant "written by: someone else" (the sender already shows it); keep "written by: you" only when you authored it (genuinely notable). - friendly source when there's no sender ("a voice note", "your files"). Also rename the page "Twin Briefing" → "Your briefing" with a plain subtitle, matching the Home hero — "twin" is our metaphor, not a word a first-timer maps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(live-digest): align urgencyReasonFor assertion with new wording The critical-urgency reason changed to "Urgent — needs your attention now"; update the assertion from /critical/i to /urgent/i. (Caught by the full test run after the per-file runs passed — the prior commit shipped this red.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(decision-engine): persist candidates before risk assessments saveRiskAssessment runs `UPDATE candidate_actions ... WHERE id = ?`, but saveCandidates (the INSERT) ran AFTER it — so the UPDATE hit zero rows, the full RiskAssessment (overallTier/dimensions) was lost, and only the thin `{reasoning}` placeholder survived. At approve time the execute-preflight (getRiskAssessment → parseRiskAssessmentFromRow, which requires overallTier) then returned null → `risk_assessment_missing`, blocking the ENTIRE approve→execute path (no action could ever be executed). Move saveCandidates ahead of the risk-assessment loop so the rows exist when the UPDATEs land. Adds a regression test asserting saveCandidates is invoked before the first saveRiskAssessment (via vi.fn invocationCallOrder). Found via a safe end-to-end execution-stack test (mock adapter + isolated tokenless user + fake email); verified fixed: fresh fake email → approve → execution completed via the (mock) adapter, no risk_assessment_missing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(execution): safe end-to-end execution-stack harness + OpenClaw test docs bin/skytwin-test-execution-stack: a repeatable, no-real-side-effects test of the full execution path (ingest → decide → policy/spend/risk gate → approval → execution router → adapter → result). Two safety layers: an isolated TOKENLESS test user (Direct handlers throw at resolveAccessToken before any Google fetch) + USE_MOCK_IRONCLAW (simulated adapter). Spins up its own mock-mode API on a test port; re-runnable; asserts the stack executed and recorded a result. docs/testing-openclaw.md: how to exercise the OpenClaw adapter safely against local Ollama via the openclaw-bridge (verified working: Ollama installed, bridge completes a fake action end-to-end, simulated, nothing real touched). Notes the router trust-ranking caveat (direct outranks openclaw, so isolate it to see OpenClaw execute) and the OPENCLAW_API_URL config. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): setup — don't surface IronClaw credential-sync when it's unreachable The Connect (#/setup) page showed "Not fully synced to IronClaw" + a "Sync to IronClaw" button even when no IronClaw is configured/reachable (the common case), so clicking it failed with a connection error. Gate the sync lookup on ironclawSync.reachable: when IronClaw isn't reachable (no IronClaw, the local mock, or a remote that's down) the sync affordance is hidden entirely — it's an advanced feature that only applies to a real, reachable IronClaw. The execution adapter row still shows its true state (Running / Registered-but-unreachable / Not detected) via renderAdapterStatus. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web,api): Vault page loads under dev-auth bypass (was "API may be offline") The Credential Vault page (#/credential-vault) showed "Unable to load vault status. The API may be offline." on every load: the route's getUserId read only req.user?.id (unset under the localhost dev-auth bypass), with none of the req.query['userId'] fallback every other route has — so /credential-vault/status 400'd with "userId is required". Add the standard session→query→body userId fallback (ownership still gated by requireOwnership when a real session exists), and pass userId on the web's init/rotate/lock/unlock POST bodies so those work under bypass too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): setup — optional execution adapters read as optional, not failed An optional, unconfigured execution adapter (IronClaw / OpenClaw) rendered as "Not detected" in the setup page's Live status — which reads like something is broken. For optional engines, that's not a failure: most users never run them (the always-available Direct adapter handles actions). renderAdapterStatus now takes an `optional` flag; an optional adapter that isn't registered shows "Optional — not connected" (calm, muted) instead of "Not detected". Direct still shows "Not detected" if it ever went missing (a real problem). This is the proper fix — correct whether or not a mock IronClaw is running, so no demo crutch is needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: add Codex agent instructions * fix: address inbox intelligence review findings * fix: require approval for missing-scope escalations --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expands SkyTwin from a reactive decision pipeline into a proactive personal judgment layer. Seven new capabilities across 5 build phases, plus 9 safety fixes from pre-landing review.
Execution Router — New
@skytwin/execution-routerpackage with trust-ranked adapter selection (IronClaw > OpenClaw > Direct), risk modifiers for irreversible actions, fallback chains that only retry on thrown errors (not partial execution), and skill gap logging.Twin Query API —
whatWouldIDo()at POST /ask/:userId predicts what the twin would do without persisting state. Uses no-op DecisionRepository to avoid polluting decision history with synthetic records. Rate limited per trust tier (60-600 req/hr).Twin Export & Portability — GET /export/:userId in JSON or Markdown. Full profile including preferences, inferences, behavioral patterns, cross-domain traits, and temporal profile.
Proactive Mode — ProactiveEvaluator scans signals, partitions into auto-executable (HIGH confidence only) and approval-needed, generates urgency-sorted morning briefings.
Preference Archaeology — Detects implicit behavioral patterns from 5+ consistent evidence signals, surfaces as proposals for user confirmation with scaled confidence.
Undo-with-Learning — Structured undo reasoning (whatWentWrong, severity, whichStep, preferredAlternative) with 2x weight correction. Severe undos trigger extra confidence reduction.
Cross-Domain Correlation — Four rules: calendar-email links, same-sender threading, calendar conflicts, subscription-financial connections.
Infrastructure — 6 new CockroachDB tables, 4 column additions using safe 3-step migration pattern, missing FK indexes added.
Test Coverage
Tests: 164 → 260 (+96 new)
Coverage audit traced 185 code paths across all changed files:
Coverage gate: 62% (above 60% minimum). DB repos and route handlers account for the remaining gaps.
Pre-Landing Review
9 issues found and fixed (from /review):
Scope Drift
Scope Check: CLEAN
Intent: Implement milestone 1.5 scope expansion (7 capabilities across 5 phases)
Delivered: All 12/12 plan items implemented
Plan Completion
All 33 TODO items from the CEO review completed in v0.2.0.0:
TODOS
All 33 items marked complete in v0.2.0.0.
Test plan
🤖 Generated with Claude Code