feat: bootstrap SkyTwin monorepo (v0.1.0.0)#1
Conversation
…pose Sets up the SkyTwin monorepo infrastructure: pnpm workspace config, Turborepo build pipeline, root TypeScript config, vitest, and CockroachDB via docker-compose for local development.
Defines all TypeScript interfaces (User, TwinProfile, DecisionObject, CandidateAction, RiskAssessment, ActionPolicy, ExplanationRecord, etc.) and enums (TrustTier, RiskTier, ConfidenceLevel, SituationType). Adds environment config loading with validation.
14-table SQL schema covering users, twin profiles, preferences, decisions, candidate actions, policies, approvals, execution plans, explanation records, and feedback events. Six repository modules with parameterized queries. Migration runner and seed data.
Twin model service with inference engine. Decision engine with situation interpreter and risk assessor. Policy engine with 5 built-in safety policies. Explanation generator. Signal connectors with mock email/calendar. IronClaw adapter with mock. Eval harness with scenarios. 48 tests across decision, policy, and twin packages.
Express API with routes for event ingestion, twin management, decisions, approvals, and feedback. End-to-end email triage workflow. Worker service for polling signal connectors. Web app placeholder for future admin/audit UI.
…tifacts 7 documentation files covering product vision, architecture, safety model, decision engine, IronClaw integration, CockroachDB architecture, and evals. 15 planning artifacts: 5 milestone docs (M0-M4) and 10 issue specs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete implementation of the SkyTwin digital twin system covering: - OAuth flow: Google OAuth2 with DB-persisted tokens, auto-refresh via DbTokenStore adapter bridging connectors ↔ DB layers - Approval pipeline: full CRUD repository, pending/history endpoints, approve/reject with feedback loop back into twin model - Pattern persistence: DB-backed PatternRepositoryPort implementation for behavioral_patterns and cross_domain_traits tables - Pattern-aware decisions: DecisionContext extended with patterns/traits/ temporalProfile; DecisionMaker scores candidates using pattern boosts and cross-domain trait adjustments - Real connectors: GmailConnector and GoogleCalendarConnector wired into worker via DbTokenStore (replaces mock stand-ins for OAuth users) - Multi-user worker: discovers users from DB, per-user connector lifecycle, re-discovers every 10 poll cycles - Web dashboard: onboarding wizard, approval cards, twin profile in plain English, confidence bars, per-domain stats, mobile responsive - Evals: accuracy from real feedback, learning progress, confidence scoring - IronClaw adapter: handler registry, email/calendar/generic handlers, real adapter with plan/execute/rollback - 89 tests passing across 12 test files, 14/14 packages build clean Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Bootstraps the SkyTwin monorepo with core domain packages (twin model, decision/policy engines, connectors, IronClaw adapter), a CockroachDB-backed persistence layer, and initial API/web apps to run the end-to-end pipeline.
Changes:
- Adds workspace package scaffolding (TypeScript strict, build/test scripts) across core packages and apps.
- Introduces new behavior modeling & safety components (temporal analyzer, pattern detector, built-in policies, eval runner).
- Adds DB repositories/migrations plus API routes and a web dashboard for onboarding, decisions, approvals, OAuth.
Reviewed changes
Copilot reviewed 150 out of 173 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/twin-model/src/index.ts | Exposes twin-model public entrypoints (TwinService/inference types). |
| packages/twin-model/src/analyzers/temporal-analyzer.ts | Adds temporal profiling from evidence (active hours, weekday patterns, response times). |
| packages/twin-model/src/analyzers/pattern-detector.ts | Adds habit/contextual pattern detection from evidence. |
| packages/twin-model/src/analyzers/tests/temporal-analyzer.test.ts | Adds unit tests for temporal analyzer. |
| packages/twin-model/src/analyzers/tests/pattern-detector.test.ts | Adds unit tests for pattern detector. |
| packages/twin-model/src/analyzers/tests/cross-domain-analyzer.test.ts | Adds unit tests for cross-domain analyzer behaviors. |
| packages/twin-model/package.json | Defines twin-model package metadata and scripts. |
| packages/shared-types/tsconfig.json | Adds TS project config for shared-types build output. |
| packages/shared-types/src/user.ts | Adds User + autonomy settings types. |
| packages/shared-types/src/twin.ts | Adds twin profile/evidence/feedback types. |
| packages/shared-types/src/policy.ts | Adds policy + approval request types. |
| packages/shared-types/src/patterns.ts | Adds pattern/temporal/trait shared types. |
| packages/shared-types/src/oauth.ts | Adds OAuth token and connector config types. |
| packages/shared-types/src/index.ts | Centralizes shared-types exports. |
| packages/shared-types/src/explanation.ts | Adds explanation record types. |
| packages/shared-types/src/execution.ts | Adds execution plan/handler types used by IronClaw adapter. |
| packages/shared-types/src/eval-types.ts | Adds eval history/metrics types. |
| packages/shared-types/src/enums.ts | Adds TrustTier/RiskTier/ConfidenceLevel/etc enums. |
| packages/shared-types/src/decision.ts | Adds decision context & outcome types. |
| packages/shared-types/package.json | Defines shared-types package metadata and scripts. |
| packages/policy-engine/tsconfig.json | Adds TS project config for policy-engine. |
| packages/policy-engine/src/index.ts | Exposes policy engine entrypoints and defaults. |
| packages/policy-engine/src/default-policies.ts | Adds built-in safety/trust-tier policies. |
| packages/policy-engine/package.json | Defines policy-engine package metadata and scripts. |
| packages/ironclaw-adapter/tsconfig.json | Adds TS project config for ironclaw-adapter. |
| packages/ironclaw-adapter/src/real-adapter.ts | Adds real adapter that dispatches to handlers + rollback. |
| packages/ironclaw-adapter/src/mock-ironclaw-adapter.ts | Adds basic mock adapter implementation. |
| packages/ironclaw-adapter/src/ironclaw-adapter.ts | Defines IronClawAdapter interface. |
| packages/ironclaw-adapter/src/index.ts | Exposes adapter mocks/real adapter/registry/handlers. |
| packages/ironclaw-adapter/src/handlers/generic-action-handler.ts | Adds catch-all handler for unknown action types. |
| packages/ironclaw-adapter/src/handlers/email-action-handler.ts | Adds Gmail-backed email action handler. |
| packages/ironclaw-adapter/src/handlers/calendar-action-handler.ts | Adds Google Calendar-backed calendar action handler. |
| packages/ironclaw-adapter/src/handler-registry.ts | Adds handler registry for routing action types. |
| packages/ironclaw-adapter/src/adapter-interface.ts | Adds extended executor interface (getStatus/boolean health). |
| packages/ironclaw-adapter/src/tests/real-adapter.test.ts | Adds tests for real adapter behavior. |
| packages/ironclaw-adapter/src/tests/handler-registry.test.ts | Adds tests for registry matching and fallback. |
| packages/ironclaw-adapter/package.json | Defines ironclaw-adapter package metadata and scripts. |
| packages/explanations/tsconfig.json | Adds TS project config for explanations. |
| packages/explanations/src/index.ts | Exposes explanations package entrypoints. |
| packages/explanations/package.json | Defines explanations package metadata and scripts. |
| packages/evals/tsconfig.json | Adds TS project config for evals. |
| packages/evals/src/scenarios/safety-regressions.ts | Adds safety regression scenario definitions. |
| packages/evals/src/scenario.ts | Defines eval scenario/result/report models. |
| packages/evals/src/runner.ts | Adds eval runner + discrepancy detection/reporting. |
| packages/evals/src/regression-detector.ts | Adds regression/improvement detection utility. |
| packages/evals/src/index.ts | Exposes eval suite entrypoints. |
| packages/evals/src/accuracy-tracker.ts | Adds feedback-based accuracy metric tracking. |
| packages/evals/src/tests/regression-detector.test.ts | Adds tests for regression detector logic. |
| packages/evals/src/tests/accuracy-tracker.test.ts | Adds tests for accuracy tracker calculations. |
| packages/evals/package.json | Defines evals package metadata, deps, and scripts. |
| packages/decision-engine/tsconfig.json | Adds TS project config for decision-engine. |
| packages/decision-engine/src/index.ts | Exposes decision-engine entrypoints. |
| packages/decision-engine/package.json | Defines decision-engine package metadata and deps. |
| packages/db/tsconfig.json | Adds TS project config for db package. |
| packages/db/src/schemas/index.ts | Exposes schema path and table name constants. |
| packages/db/src/repositories/user-repository.ts | Adds user CRUD repository. |
| packages/db/src/repositories/twin-repository.ts | Adds twin profile CRUD + version snapshotting. |
| packages/db/src/repositories/policy-repository.ts | Adds action policy repository CRUD. |
| packages/db/src/repositories/pattern-repository.ts | Adds persistence for patterns and cross-domain traits. |
| packages/db/src/repositories/oauth-repository.ts | Adds OAuth token persistence/retrieval. |
| packages/db/src/repositories/index.ts | Re-exports repositories/types for db package. |
| packages/db/src/repositories/feedback-repository.ts | Adds feedback persistence with pagination. |
| packages/db/src/repositories/explanation-repository.ts | Adds explanation persistence/querying. |
| packages/db/src/repositories/approval-repository.ts | Adds approval request persistence + response updates. |
| packages/db/src/migrations/004-eval-history.sql | Adds eval_runs + accuracy_metrics tables. |
| packages/db/src/migrations/003-behavioral-patterns.sql | Adds behavioral_patterns + cross_domain_traits + temporal_profile column. |
| packages/db/src/migrations/002-oauth-tokens.sql | Adds oauth_tokens + connector_configs tables. |
| packages/db/src/migrations/001-initial.ts | Adds initial migration runner (schema.sql executor). |
| packages/db/src/index.ts | Exposes db package surface (connection, repos, schema metadata). |
| packages/db/src/connection.ts | Adds CockroachDB pool utilities + healthcheck helpers. |
| packages/db/package.json | Defines db package metadata and deps (pg, config). |
| packages/core/tsconfig.json | Adds TS project config for core. |
| packages/core/src/index.ts | Adds core utilities (IDs, logger, tier ordering helpers). |
| packages/core/package.json | Defines core package metadata and deps. |
| packages/connectors/tsconfig.json | Adds TS project config for connectors. |
| packages/connectors/src/signal-connector.ts | Adds class-based signal connector abstraction. |
| packages/connectors/src/oauth/token-store.ts | Defines OAuth token store port. |
| packages/connectors/src/oauth/google-oauth.ts | Adds Google OAuth URL/code exchange/refresh/revoke helpers. |
| packages/connectors/src/oauth/db-token-store.ts | Adds DB-backed token store with refresh logic. |
| packages/connectors/src/index.ts | Exposes connectors package surface (mocks, real, oauth). |
| packages/connectors/src/google-calendar-connector.ts | Adds Google Calendar polling connector with syncToken. |
| packages/connectors/src/connector-interface.ts | Adds interface-based connector abstraction (RawSignal). |
| packages/connectors/src/tests/db-token-store.test.ts | Adds tests for token store mapping/refresh behavior. |
| packages/connectors/package.json | Defines connectors package metadata and deps. |
| packages/config/tsconfig.json | Adds TS project config for config package. |
| packages/config/src/index.ts | Adds config loading + validation utilities. |
| packages/config/package.json | Defines config package metadata and scripts. |
| package.json | Adds monorepo root scripts for turbo + db commands. |
| docker-compose.yml | Adds local CockroachDB + optional API service. |
| apps/worker/tsconfig.json | Adds worker TS project config. |
| apps/worker/package.json | Defines worker app metadata and deps. |
| apps/web/tsconfig.json | Adds web TS project config. |
| apps/web/src/index.ts | Adds Express static server + API proxy for SPA. |
| apps/web/public/js/pages/onboarding.js | Adds onboarding flow (identity + tier + OAuth link). |
| apps/web/public/js/pages/decisions.js | Adds decision history UI with explanation expansion. |
| apps/web/public/js/app.js | Adds hash router + onboarding gating + health/badge polling. |
| apps/web/public/js/api-client.js | Adds API client helpers for dashboard pages. |
| apps/web/public/index.html | Adds SPA shell markup + navigation. |
| apps/web/package.json | Defines web app package metadata and scripts. |
| apps/api/tsconfig.json | Adds API TS project config. |
| apps/api/src/workflows/email-triage.ts | Adds integrated email triage orchestration workflow. |
| apps/api/src/routes/users.ts | Adds user lookup + trust-tier update endpoints. |
| apps/api/src/routes/twin.ts | Adds twin profile + preference update endpoints. |
| apps/api/src/routes/oauth.ts | Adds Google OAuth authorize/callback/status/disconnect endpoints. |
| apps/api/src/routes/feedback.ts | Adds feedback ingestion endpoint + twin feedback loop. |
| apps/api/src/routes/events.ts | Adds event ingestion endpoint wiring full pipeline + execution/approval. |
| apps/api/src/routes/evals.ts | Adds eval monitoring endpoints (accuracy/learning/confidence). |
| apps/api/src/routes/decisions.ts | Adds decision listing + explanation fetching endpoints. |
| apps/api/src/routes/approvals.ts | Adds approval pending/history/respond endpoints. |
| apps/api/src/index.ts | Adds Express app wiring + error handling + health endpoint. |
| apps/api/package.json | Defines API app package metadata and deps. |
| VERSION | Adds initial version marker. |
| README.md | Adds repo overview + quickstart + architecture. |
| CLAUDE.md | Adds contributor/assistant instructions and invariants. |
| CHANGELOG.md | Adds initial changelog entry for 0.1.0.0. |
| .env.example | Adds example environment variables for local dev. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async upsertPattern(userId: string, pattern: BehavioralPattern): Promise<BehavioralPattern> { | ||
| const result = await query<BehavioralPatternRow>( | ||
| `INSERT INTO behavioral_patterns ( | ||
| id, user_id, pattern_type, description, trigger_config, | ||
| observed_action, frequency, confidence, | ||
| first_observed_at, last_observed_at, metadata | ||
| ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) | ||
| ON CONFLICT (id) DO UPDATE SET | ||
| description = EXCLUDED.description, | ||
| trigger_config = EXCLUDED.trigger_config, | ||
| observed_action = EXCLUDED.observed_action, | ||
| frequency = EXCLUDED.frequency, | ||
| confidence = EXCLUDED.confidence, | ||
| last_observed_at = EXCLUDED.last_observed_at, | ||
| metadata = EXCLUDED.metadata | ||
| RETURNING *`, | ||
| [ | ||
| pattern.id, | ||
| userId, |
There was a problem hiding this comment.
The behavioral_patterns migration defines id UUID PRIMARY KEY DEFAULT gen_random_uuid(), but this repository always supplies pattern.id as the inserted id. Pattern IDs generated in code (e.g., pat_<timestamp>...) are not valid UUIDs, causing inserts to fail. Fix by either (a) generating UUIDs (e.g., via crypto.randomUUID() / @skytwin/core generateId()), or (b) omitting id from the INSERT and letting the DB default generate it (then returning the generated id).
| async upsertTrait(userId: string, trait: CrossDomainTrait): Promise<CrossDomainTrait> { | ||
| const result = await query<CrossDomainTraitRow>( | ||
| `INSERT INTO cross_domain_traits ( | ||
| id, user_id, trait_name, confidence, supporting_domains, | ||
| evidence_count, description | ||
| ) VALUES ($1, $2, $3, $4, $5, $6, $7) | ||
| ON CONFLICT (user_id, trait_name) DO UPDATE SET | ||
| confidence = EXCLUDED.confidence, | ||
| supporting_domains = EXCLUDED.supporting_domains, | ||
| evidence_count = EXCLUDED.evidence_count, | ||
| description = EXCLUDED.description, | ||
| updated_at = now() | ||
| RETURNING *`, | ||
| [ | ||
| trait.id, | ||
| userId, |
There was a problem hiding this comment.
Same issue as patterns: cross_domain_traits.id is defined as UUID PRIMARY KEY DEFAULT gen_random_uuid(), but trait.id is always provided. If trait.id isn't a UUID, inserts will fail. Prefer letting the DB generate the id or enforce UUID generation at the type/constructor level.
| router.get('/google/authorize', (req, res) => { | ||
| const scopes = [...GMAIL_SCOPES, ...CALENDAR_SCOPES]; | ||
| const state = req.query['userId'] as string | undefined; | ||
| const url = generateAuthUrl(googleConfig, scopes, state); | ||
| res.json({ url }); | ||
| }); |
There was a problem hiding this comment.
OAuth state is being set to a user-controlled userId, which defeats the CSRF protection purpose of state and makes account-linking vulnerable (an attacker can authorize with their Google account and bind tokens to an arbitrary userId). Use a cryptographically random nonce as state, persist it server-side (or sign/encrypt it with a server secret) and verify it in the callback before saving tokens; ideally require an authenticated session and bind the OAuth flow to that session's user.
| private async sendReply(accessToken: string, messageId: string, replyType: string): Promise<StepResult> { | ||
| // Build a minimal reply message | ||
| // In production this would construct a proper MIME message referencing the original | ||
| const raw = Buffer.from( | ||
| `Subject: Re: (auto-reply)\r\n` + | ||
| `In-Reply-To: ${messageId}\r\n` + | ||
| `Content-Type: text/plain; charset="UTF-8"\r\n\r\n` + | ||
| `[SkyTwin auto-${replyType}] This is an automated response.`, | ||
| ).toString('base64url'); | ||
|
|
||
| const url = `${GMAIL_API}/users/me/messages/send`; | ||
| const response = await fetch(url, { | ||
| method: 'POST', | ||
| headers: { | ||
| Authorization: `Bearer ${accessToken}`, | ||
| 'Content-Type': 'application/json', | ||
| }, | ||
| body: JSON.stringify({ raw, threadId: messageId }), | ||
| }); |
There was a problem hiding this comment.
This implementation is very likely to fail against Gmail API: the raw RFC 2822 message is missing required headers (at minimum To: and often From:; additionally the reply should include References:/Message-ID handling), and threadId is not the same as messageId. If this is intended as a placeholder, consider removing send_reply support from canHandle() for now and returning a clear 'not implemented' error, or implement a proper reply flow (fetch original message/threadId + construct valid MIME with correct headers).
| private async modifyLabels( | ||
| accessToken: string, | ||
| messageId: string, | ||
| addLabels: string[], | ||
| removeLabels: string[], | ||
| ): Promise<StepResult> { | ||
| const url = `${GMAIL_API}/users/me/messages/${messageId}/modify`; | ||
| const response = await fetch(url, { | ||
| method: 'POST', | ||
| headers: { | ||
| Authorization: `Bearer ${accessToken}`, | ||
| 'Content-Type': 'application/json', | ||
| }, | ||
| body: JSON.stringify({ | ||
| addLabelIds: addLabels, | ||
| removeLabelIds: removeLabels, | ||
| }), | ||
| }); |
There was a problem hiding this comment.
Gmail messages.modify expects addLabelIds/removeLabelIds to be Gmail label IDs, not human-readable label names (except for some system labels where the ID equals the name). If upstream passes label names like 'Finance', this will fail. Consider either (a) requiring callers to pass label IDs (rename parameter to labelIds), or (b) adding a lookup step to translate label names to IDs.
| export class RealIronClawAdapter implements IronClawAdapter { | ||
| private readonly executedPlans = new Map<string, ExecutionPlan>(); |
There was a problem hiding this comment.
executedPlans grows without bound and is never pruned, which can cause unbounded memory growth in long-running processes that execute many plans. Consider deleting entries after successful rollback or after a retention window, adding a max-size LRU, and/or persisting execution history to the DB instead of keeping it all in memory.
| analyzeTemporalPatterns(evidence: TwinEvidence[]): TemporalProfile { | ||
| if (evidence.length === 0) { | ||
| return this.emptyProfile(''); | ||
| } |
There was a problem hiding this comment.
When evidence is empty, the returned TemporalProfile.userId is set to an empty string. This can propagate invalid user identity into downstream logic and persistence. Consider accepting userId as an explicit parameter, or returning a profile with a caller-provided userId (or altering the API to return TemporalProfile | null when no evidence exists).
| // Inject OAuth token if available for real execution | ||
| const tokenRow = await oauthRepository.getToken(userId, 'google'); | ||
| if (tokenRow) { | ||
| outcome.selectedAction.parameters['accessToken'] = tokenRow.access_token; | ||
| } | ||
|
|
||
| // Auto-execute via IronClaw | ||
| const plan = await ironclawAdapter.buildPlan(outcome.selectedAction); |
There was a problem hiding this comment.
Avoid mutating outcome.selectedAction.parameters to inject the OAuth access token: it increases the risk of leaking credentials via logs, persistence, or error reporting (especially if outcomes/actions are stored). Prefer passing the token out-of-band to the adapter (e.g., adapter options/context), or clone the action into an execution-only object that is never persisted/returned.
| // Inject OAuth token if available for real execution | |
| const tokenRow = await oauthRepository.getToken(userId, 'google'); | |
| if (tokenRow) { | |
| outcome.selectedAction.parameters['accessToken'] = tokenRow.access_token; | |
| } | |
| // Auto-execute via IronClaw | |
| const plan = await ironclawAdapter.buildPlan(outcome.selectedAction); | |
| // Retrieve OAuth token if available for real execution, but avoid mutating outcome.selectedAction | |
| const tokenRow = await oauthRepository.getToken(userId, 'google'); | |
| const executionAction = tokenRow | |
| ? { | |
| ...outcome.selectedAction, | |
| parameters: { | |
| ...(outcome.selectedAction.parameters ?? {}), | |
| accessToken: tokenRow.access_token, | |
| }, | |
| } | |
| : outcome.selectedAction; | |
| // Auto-execute via IronClaw | |
| const plan = await ironclawAdapter.buildPlan(executionAction); |
| const updated = await userRepository.updateTrustTier(userId, body.trustTier); | ||
| res.json({ user: updated }); |
There was a problem hiding this comment.
userRepository.updateTrustTier can return null when the user doesn't exist, but the route returns 200 with { user: null }. Return 404 (or 400) when updated is null to keep API semantics consistent with the GET endpoint.
|
|
||
| const profile = analyzer.analyzeTemporalPatterns(evidence); | ||
| expect(profile.peakResponseTimes['email']).toBeDefined(); | ||
| // Median of [120000, 180000] = 120000 (floor of length/2 = index 1) |
There was a problem hiding this comment.
The inline comment contradicts the assertion. Either adjust the comment to match the implementation/expectation (index 1 → 180000), or change the median calculation/expectation if the intended median for even-length arrays is different (e.g., average of middle two).
| // Median of [120000, 180000] = 120000 (floor of length/2 = index 1) | |
| // Median of [120000, 180000] = 180000 (taking the upper middle value at index 1) |
…phase 1 Replace `as never` casts in events.ts and approvals.ts with proper adapter classes that bridge domain port interfaces to concrete DB repositories: - TwinRepositoryAdapter: maps TwinRepositoryPort to twinRepository + preferences table - PatternRepositoryAdapter: passthrough to patternRepository (same method names) - DecisionRepositoryAdapter: maps DecisionRepositoryPort to decisionRepository - ExplanationRepositoryAdapter: maps ExplanationRepositoryPort to explanationRepository - PolicyRepositoryAdapter: maps PolicyRepositoryPort to policyRepository Additional fixes: - Add executionRepository for persisting execution plans and results - Persist execution plan + result after auto-execute in events.ts - Execute approved actions via IronClaw after user approves (approvals.ts) - Fix approval_requests.respond() SQL referencing non-existent updated_at column - Export PatternRepositoryPort from twin-model package 14/14 build, 89/89 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion adapter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…IronClawAdapter refactor Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…le nav menu Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ne on settings save Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s open Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…secret, gitignore .gstack Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix IronClaw URLs (your-org → nearai/ironclaw), update adapter descriptions across README, CLAUDE.md, CHANGELOG, and technical spec to reflect the actual HTTP webhook integration (HMAC-SHA256 auth, retries, circuit breaker). Rewrite ironclaw-integration.md with accurate IronClaw architecture. Update test counts (89 → 119, 12 → 14 files). Add missing CHANGELOG entries for QA fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Escape `justConnected` URL param in settings.js innerHTML (reflected XSS) - Read trust tier from DB user record instead of caller-supplied request body - Add column name allowlist in twin-repository to prevent SQL injection via Object.keys on untrusted input - Add IronClaw HTTP client tests and response-handling refactor Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove trust tier self-declaration from /ask request body (Safety Invariant #3) - Replace mock policy evaluator with real PolicyEvaluator via DB adapter (#1) - Use no-op DecisionRepository in /ask to prevent persisting synthetic predictions - Return modifiedRiskAssessment in RoutingDecision so callers get bumped risk (#7) - Change OpenClaw reversibilityGuarantee to 'none' since rollback always fails (#5) - Split migration NOT NULL DEFAULT into safe 3-step pattern for CockroachDB - Add missing FK indexes on skill_gap_log, twin_exports, briefings tables - Make undoReasoning optional for undo feedback to preserve API compatibility - Stop fallback chain on non-completed status to prevent duplicate execution - Fix route ordering: /export/:userId now defined before /:userId in twin router All 164 tests pass. Build clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add gstack skill routing rules to CLAUDE.md * feat: implement milestone 1.5 scope expansion (5 phases) Adds execution router, whatWouldIDo query API, twin export, proactive evaluator, preference archaeology, cross-domain correlation, undo-with- learning, and golden path E2E test. 163 tests passing across 15 packages. Phase 1: DB migrations (6 tables, 5 column adds), shared types (18 new interfaces) Phase 2: Execution router package, whatWouldIDo + ProactiveEvaluator, twin export + archaeology Phase 3: Briefings API, skill-gaps API, CrossDomainCorrelator, undo feedback flow Phase 4: Golden path E2E integration test covering full decision pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 9 safety invariant violations found in pre-landing review - Remove trust tier self-declaration from /ask request body (Safety Invariant #3) - Replace mock policy evaluator with real PolicyEvaluator via DB adapter (#1) - Use no-op DecisionRepository in /ask to prevent persisting synthetic predictions - Return modifiedRiskAssessment in RoutingDecision so callers get bumped risk (#7) - Change OpenClaw reversibilityGuarantee to 'none' since rollback always fails (#5) - Split migration NOT NULL DEFAULT into safe 3-step pattern for CockroachDB - Add missing FK indexes on skill_gap_log, twin_exports, briefings tables - Make undoReasoning optional for undo feedback to preserve API compatibility - Stop fallback chain on non-completed status to prevent duplicate execution - Fix route ordering: /export/:userId now defined before /:userId in twin router All 164 tests pass. Build clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: coverage for inference-engine, decision-maker, rate-limit, feedback-utils - 43 tests for InferenceEngine (calculateConfidence, detectContradictions, analyzeEvidence, mergeInference, valuesAreConsistent, updateInferencesFromFeedback) - 26 tests for DecisionMaker (generateCandidates per situation type, calculatePatternBoost, calculateTraitAdjustment, shouldAutoExecute per trust tier, zero candidates escalation) - 20 tests for feedback utils (mapFeedbackType, parseUndoReasoning) - 7 tests for rate limiting (checkRateLimit per trust tier, window reset) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.2.0.0) Milestone 1.5 scope expansion: 7 new capabilities, 6 new DB tables, 96 new tests (260 total), 9 safety fixes from pre-landing review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: sync README with v0.2.0.0 changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add deploy configuration to CLAUDE.md Configured by /setup-deploy. Project is pre-deployment (no platform, no deploy workflows). /land-and-deploy will merge-only and skip deploy verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Closes #75. Three additive guards on the safety kernel: 1. ExecutionRouter throws InvariantViolationError when called without a RiskAssessment or with a CandidateAction whose id does not match the assessment. Pins Safety Invariants #1 and #7 at the boundary, so a future caller that bypasses the decision pipeline cannot silently auto-execute. (+4 unit tests) 2. DecisionMaker.whatWouldIDo no longer leaks blocked candidates as alternativeActions when policy denies every candidate. Returns an empty alternatives array and surfaces the blocking reason via policyNotes so the prediction reflects what the user could actually take. (+1 unit test pinning the no-leak contract) 3. POST /api/events/ingest emits a decision:blocked-by-policy SSE event when no action was selected and no approval was created, so users see the policy result instead of silent ingestion. (+1 unit test) Production call sites (apps/api/src/routes/events.ts:230, apps/api/src/routes/approvals.ts:264) already build matching RiskAssessments — guards are inert for them, active against new orphan callers. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…r) (#421) * P1.9 #379: global kill switch (operator env + per-user toggle + banner) Closes #379. Three coordinated levers + a chrome banner give every install the panic button CLAUDE.md launch-criterion #8 requires. Lever 1 — operator env var: - SKYTWIN_AUTO_EXECUTE_DISABLED=true on the API/worker process. - Read ONCE at PolicyEvaluator construction (override via ctor option for tests so the env-var read isn't a hidden dependency). - New early-return at the top of `evaluate()` sits AHEAD of every other check (trust tier, injection guard, autonomy, quiet hours, policy rules) so no downstream allow path can bypass it. - Uses { allowed: true, requiresApproval: true } not deny — actions still land in the Approvals queue. Lever 2 — per-user toggle: - autonomy_settings.paused: true | false on the user row. - New PUT /api/users/:userId/autonomy-pause endpoint sets/clears the flag, writes pausedAt + optional pausedReason on transition. - AutonomySettings type gains paused, pausedAt, pausedReason fields in @skytwin/shared-types/user.ts. Lever 3 — chrome banner: - index.html ships a sticky #autonomy-banner above <main>. - app.js updateAutonomyBanner() fetches GET /autonomy-state, renders operator + user lines independently, and shows a Resume button only for the user-pause line (operator pause needs an env-var change). - Refreshed on every navigate() + every 30s; backed off when API known offline. - New GET /api/users/:userId/autonomy-state returns combined state. Settings page: - New "Pause auto-execution" card with confirmation modal on both transitions and optional reason prompt. - Hydrated from /autonomy-state on render; refreshes banner + on-page state together after a flip. - Coexists with the existing "Pause everything (demote to observer)" button — different lever, clearer label. Operator pause reason wins when both flags are set so the banner copy reflects who set the pause. Tests: - 5 new policy-engine tests cover the operator-paused, user-paused, both-paused (operator wins), neither-paused regression, and isGloballyPaused() reporting matrix. - 177 policy-engine tests, 713 API tests pass. Safety Invariant #1 preserved: the new check strengthens the single PolicyEvaluator.evaluate funnel rather than adding a parallel path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * P1.9 (post-Copilot): 5 fixes — preserve deny + guard-confirmation; CSS sidebar; copy Copilot review on PR #421 surfaced five issues: 1. CRITICAL — kill switch was overriding deny verdicts. Pre-fix the early-return at the top of evaluate() turned every action — including spend-cap-exceeded, domain-blocked, policy- denied — into {allowed: true, requiresApproval: true}. Denies are strictly stricter than approvals and must never be relaxed. Also dropped the injection-guard `confirmationLevel` for extreme- severity actions. Fix: capture killSwitchActive + killSwitchReason at the top, but APPLY them at the very end alongside the existing requiresApproval merge. Denies still short-circuit first; the injection guard's confirmationLevel flows through unchanged. The quiet-hours early-return also propagates killSwitchActive so a paused user can't bypass via quiet hours. New tests in policy-evaluator.test.ts: - "kill switch does NOT override a deny — domain blocklist still wins" 2. isGloballyPaused docstring claimed the autonomy-state endpoint used it, but the endpoint reads process.env directly. Updated the docstring to honestly describe both paths and note they agree by construction (snapshot vs live read of the same env var) — future refactor can collapse to one path. 3. CSS banner pushed body content down but NOT the sidebar (which is position: fixed; top: 0), so the banner covered the sidebar header. Added `.sidebar { top: 2.5rem; height: calc(100vh - 2.5rem); }` under `body.has-autonomy-banner` so both layers move together. Caveat documented inline: 2.5rem assumes a single-line banner; two-line wrap on narrow screens needs a future CSS var driven from banner.offsetHeight. 4. HTML comment said the banner is "inside <main>" but it's actually a fixed sibling of sidebar + main. Comment updated to describe the actual structure + the `has-autonomy-banner` body-class coupling. 5. Settings prompt copy claimed pausedReason is "saved with the audit log" but no audit table is written today — only the user's autonomy_settings JSONB. Copy changed to "stored on your user record" to match reality. (A proper audit-log row was spec'd in the issue but deferred — landing the engine + banner first; the audit-log row can be a follow-up against an existing trust_tier_audit-like table.) 178 policy-engine tests pass (+1). All other tests still green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
) * P1.9 #379: global kill switch (operator env + per-user toggle + banner) Closes #379. Three coordinated levers + a chrome banner give every install the panic button CLAUDE.md launch-criterion #8 requires. Lever 1 — operator env var: - SKYTWIN_AUTO_EXECUTE_DISABLED=true on the API/worker process. - Read ONCE at PolicyEvaluator construction (override via ctor option for tests so the env-var read isn't a hidden dependency). - New early-return at the top of `evaluate()` sits AHEAD of every other check (trust tier, injection guard, autonomy, quiet hours, policy rules) so no downstream allow path can bypass it. - Uses { allowed: true, requiresApproval: true } not deny — actions still land in the Approvals queue. Lever 2 — per-user toggle: - autonomy_settings.paused: true | false on the user row. - New PUT /api/users/:userId/autonomy-pause endpoint sets/clears the flag, writes pausedAt + optional pausedReason on transition. - AutonomySettings type gains paused, pausedAt, pausedReason fields in @skytwin/shared-types/user.ts. Lever 3 — chrome banner: - index.html ships a sticky #autonomy-banner above <main>. - app.js updateAutonomyBanner() fetches GET /autonomy-state, renders operator + user lines independently, and shows a Resume button only for the user-pause line (operator pause needs an env-var change). - Refreshed on every navigate() + every 30s; backed off when API known offline. - New GET /api/users/:userId/autonomy-state returns combined state. Settings page: - New "Pause auto-execution" card with confirmation modal on both transitions and optional reason prompt. - Hydrated from /autonomy-state on render; refreshes banner + on-page state together after a flip. - Coexists with the existing "Pause everything (demote to observer)" button — different lever, clearer label. Operator pause reason wins when both flags are set so the banner copy reflects who set the pause. Tests: - 5 new policy-engine tests cover the operator-paused, user-paused, both-paused (operator wins), neither-paused regression, and isGloballyPaused() reporting matrix. - 177 policy-engine tests, 713 API tests pass. Safety Invariant #1 preserved: the new check strengthens the single PolicyEvaluator.evaluate funnel rather than adding a parallel path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * P1.7 #377: user-facing surface for OAuth re-auth Closes #377. Pre-fix silent-breakage trap: Google revokes a refresh token, worker correctly trips the per-user circuit breaker, dashboard keeps rendering "Listening" — user only notices days later when "did you get my email?" surfaces it. This PR adds the single piece of state the API can read to render a "Gmail disconnected — Reconnect" banner: DB (new): - migration 060-connector-health.sql adds connector_health(user_id, connector_name, status, error_code, last_success_at, last_failure_at, updated_at). PRIMARY KEY (user_id, connector_name); ON DELETE CASCADE on user. - packages/db/src/repositories/connector-health-repository.ts exposes upsert + findByUser. Re-exported from @skytwin/db. Worker: - apps/worker/src/index.ts:pollUserConnectors upserts status='needs_reauth' on the existing OAuthRefreshError.permanent branch (alongside the circuit-breaker force-trip). - Per-connector success upserts status='connected' so a multi-connector user with one bad connector doesn't have a working one stuck. Keyed on thisConnectorFailed (not loop-wide hadFailure) so a working Calendar isn't blocked by a failing Gmail. - New extractErrorCode helper parses 'invalid_grant' / 'unauthorized_client' / etc out of the OAuthRefreshError message so the banner can render conditional copy. - DB writes wrapped in try/catch — must not break the circuit-breaker logic; the log line is still the operator's audit trail. API: - apps/api/src/routes/connectors.ts mounts /api/connectors with one endpoint: GET /:userId/status → { connectors, anyNeedsReauth }. - Wired into apps/api/src/index.ts with sessionAuth + bindUserIdParamValidator + bindUserIdParamOwnership. Frontend: - index.html adds #connectors-banner sticky under #autonomy-banner. - styles.css: amber (warning color, not red — re-auth is housekeeping not panic); stacks below #autonomy-banner when both fire via body.has-autonomy-banner.has-connectors-banner. - app.js updateConnectorsBanner() polls /connectors/:userId/status on every navigate() + every 60s (matches worker poll cadence). Conditional copy on errorCode ('invalid_grant' → "access was revoked or expired"). - Single Reconnect CTA → #/connect-gmail (only connector this PR covers); future connectors should branch off the broken row. Build + tests: - All builds clean (db, worker, api). - 713 API tests, 301 db tests, full monorepo suite green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(p1.7): address post-merge Copilot review on PR #422 8 findings from copilot-pull-request-reviewer (2 critical + 6 substantive): 1. CRITICAL — Resolved unresolved merge-conflict markers in CHANGELOG.md (line 17-23) and apps/web/public/js/pages/settings.js (line 1052-1056). The rebase onto main left both files with HEAD/origin markers intact; I missed them on the previous push. 2. CRITICAL — extractErrorCode() regex didn't match the actual OAuthRefreshError format. The connector throws "Google OAuth token refresh failed (permanent|transient): <status> <body>" where <body> is the raw Google token-endpoint response (JSON like {"error":"invalid_grant","error_description":"..."}). The old /refresh failed:\s*([a-z_]+)/i would never have matched because of the (permanent|transient) tag before the colon. Switched to /"error"\s*:\s*"([a-z_]+)"/i which parses the JSON field directly. Extracted to apps/worker/src/oauth-error-code.ts so it can be tested without booting the worker's runtime side effects. 3. Added CHECK constraint on connector_health.status. The TS row type pins it to a narrow union but the column was STRING — DB-level constraint enforces the invariant for direct SQL writes too. 4. Switched connectorHealthRepository.upsert to DB-side now() instead of new Date(). Multi-node deployments can't get clock-skewed updated_at values, matches the repo convention. 5. Fixed doc-comment reference: apps/api/src/routes/connectors.ts now names "pollUser" (real symbol) instead of "pollUserConnectors". 6. Added apps/api/src/__tests__/connectors-routes.test.ts (4 tests): empty user, mixed connected/needs_reauth, all-connected, malformed userId rejection by the shared validator. 7. Added packages/db/src/__tests__/connector-health-repository.test.ts (7 tests): needs_reauth + connected upserts, DB-side now() lock-in (catches the convention regression Copilot flagged), COALESCE flap preservation, null-coalescence, findByUser ordering, empty user. 8. Added apps/worker/src/__tests__/oauth-error-code.test.ts (5 tests): real OAuthRefreshError message, unauthorized_client variant, whitespace-tolerant JSON, null fallback on garbage / transient 503s. CHANGELOG.md gains the missing CHECK / now() / message-format details under the existing #377 entry. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Correctness: - deadline urgency: stale (past-relative-to-now) deadlines no longer read as critical; far-out deadlines no longer DOWNGRADE a type's default urgency (#1/#2) - security markers curated to specific phrases — kill false positives on shipping notices / "welcome back" / articles (#3); marker check also applied on the LLM path so escalate-only holds regardless of classifier (safety defense-in-depth) - digest emits signalRefs[] so citation chips actually render (#4) - scope gate now covers calendar RSVP/invite write actions (#5) - commitment extractor: clause-level negation (keep real commitments sharing a sentence with "if I…") (#6); "by <person>" no longer a deadline hint (#7) - entity resolver compares full normalized string, not the truncated slug (#10) Hardening/robustness: - demo-guard isLocalDbTarget: exact host match, not substring (#8) - provisionNewUser is genuinely best-effort (try/catch) — never 500s after the user row exists - briefing-generator pinned to prompt v1 until it consumes v2 structured output (avoids requesting+discarding todos/topics); v2 deterministic_fallback fixed - briefing test mock provides userRepository.getLocale so the LLM-prose path is actually exercised (#13) Regression tests added for each. Full suite green (70/70 tasks). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
) (#488) * feat(decision-engine): SignalText multi-source accessor + capability matrix (spec 07, #480) Normalize any RawSignal (email/calendar/filesystem/voice) into a channel-agnostic SignalText so commitment/deadline/security/cluster/entity capabilities are source-agnostic. Extends AuthoringTier with authored_originated/received_shared; adds a tested capability×source coverage matrix. Foundation for #475/#476/#479. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(api,db): observer default + new-user provisioning + seedUpsert (spec 10, #483) - LOCKED: new users default to trust_tier 'observer' (users.ts) — matches DB default + CLAUDE.md; resolves the 3-way conflict that forced 'suggest'. - provisionNewUser: eager empty twin profile + conservative autonomy defaults (no spend caps, so the built-in NO_SPEND_WITHOUT_LIMIT gate blocks spend until the user sets a budget — safe by construction). - seedUpsert/buildUpsertSql: shared, tested idempotent upsert helper for re-runnable seeds (used by spec 09). Existing seed.ts already idempotent. Part C (promotion soak-floor hoursInCurrentTier + tier-ladder intro) still TODO. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(worker,db): enforce promotion soak-floor via hoursInCurrentTier (spec 10 Part C, #483) Daily promotion-eligibility job now populates hoursInCurrentTier (from last tier change or account creation), so the engine actually enforces minDurationInTierHours (24h observer->suggest, etc). Closes the documented gap where the floor was skipped in the auto path. Fail-safe 0 keeps a promotion blocked when time can't be derived. Tier-ladder intro UI folds into spec 08. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): deadline extraction feeds urgency (spec 03, #476) extractDeadline parses absolute/relative dates (chrono-node) from any text-bearing signal (SignalText-compatible) and returns the earliest credible FUTURE deadline. situation-interpreter.enrichDeadline stamps rawEvent.deadline when the connector didn't, so the existing assessUrgency consumer finally gets fed. Rejects past dates + no-match. v1 leaves per-user-timezone resolution to spec 12. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): commitment extraction from authored content (spec 02, #475) extractCommitments surfaces the user's own stated obligations ("I'll send the draft tomorrow" -> "Send the draft tomorrow") from authored SignalText. Gated to authoredByUser + the commitments source allowlist (safety invariant #8: never from inbound content). Rule extractor handles modal forms, excludes questions/past/third-party/hypotheticals, dedups, and emits a deadlineHint for spec 03. CommitmentStrategy seam left for an LLM path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): inbound security-alert classifier, escalate-only (spec 06, #479) Adds SituationType.SECURITY_ALERT (enums.ts). classifySituation matches inbound account-security markers FIRST (precedence over finance/email), urgency=high, domain=security. The candidate generator emits ONLY a human-review escalation that says "open the provider directly" with link-free parameters — never an auto-executable action, never a URL from the untrusted body (safety invariant #8). Provenance stays untrusted_external regardless of claimed sender. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): signal topic clustering for the digest (spec 04, #477) clusterSignals groups awareness signals into life-domain topic clusters for the Topics section. Anchors to known domains (beats the reference product's mis-filing), guarantees complete + non-overlapping partition, caps cluster count with overflow merged into "More updates" (logged via onMerge). Deterministic fallback ships; ClusterStrategy seam for an LLM path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): source-coverage model for graceful degradation (spec 13, #487) computeCoverage evaluates the capability x source matrix against a user's connected accounts -> per-capability available/partial/unavailable + the sources that would unlock each, plus a coldStart flag (zero sources, distinct from connected-but-quiet). Excludes mock sources. Drives "connect X to unlock Y" transparency; UI affordances render in spec 08. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(policy,decision-engine): access-faithful gates — scope + hidden (spec 11, #485) Scope gate (policy-engine): requiredWriteScope/hasWriteScope/applyScopeGate. Wired into DecisionMaker.generateCandidates — when grantedScopes is supplied, un-granted write candidates (send/calendar) downgrade to a human-review "grant access" item. Fail-safe NOT granted (safety invariant #8). Visibility filter (decision-engine): isHidden/filterVisible — the single hide predicate the digest routes input through (briefing-generator wiring lands with spec 01). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(db,worker,decision-engine): locale & timezone faithfulness (spec 12, #486) Migration 063 adds users.language + users.timezone. userRepository.getLocale + resolveLanguage/resolveTimezone/isNonEnglish helpers (safe fallbacks: en / UTC with a logged-default flag). Briefing prose locale now reads the user profile instead of hardcoded 'en'. isNonEnglish is the LLM-vs-rule routing signal for the extractors (degraded-marker wiring is a follow-up on 02/03/06). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(db): launch demo fixture — opt-in, isolated, guarded (spec 09, #482) assertDemoSafe (3-gate invariant #0): explicit-only, prod hard-blocked + non-local needs override, identity isolation via is_demo (migration 064). Never wired into bin/skytwin-dev/auto-seed — can't run for a real or new user. demo-fixture.ts guards then upserts the reserved demo user + ingests a synthetic source-varied corpus (email/calendar/file/voice) through /api/events/ingest; --reset deletes is_demo rows only. `pnpm demo:fixture`. Guard fully unit-tested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine,policy-prompts): digest to-do/FYI split (spec 01, #474) buildDigest partitions items into action-required to-dos (urgency-ordered, capped) vs domain-clustered topics, with no overlap. Composes the epic: filters hidden content first (spec 11), clusters topics (spec 04), carries sourceType+deadline for the UI (spec 07/03). New briefing-prose v2 prompt emits the two-section structured payload (todos + topics). The structured_payload column + repo read + render land with spec 08 (UI). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine): entity extraction + cross-signal resolution (spec 05, #478) extractEntities pulls people (emails) + orgs (suffix-tagged) from SignalText. resolveEntities links mentions to stable entityIds — exact email key for people (never fuzzy), token-overlap floor for orgs, conservative mint-on-doubt so a false merge can't corrupt the graph. linkEntitiesAcrossSignals aggregates "every signal touching X". Persistence reuses MemoryPort.recordEntity; the getSignalsForEntity port method is the remaining integration seam. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(web,api): digest UI — two-bucket, source-aware, cited (spec 08, #481) twin-briefing.js renders the structured digest: To-dos above Topics, each row with a source-type chip (email/calendar/file/voice) + citation chips that open the in-app signal detail (never an external URL — safety #8). Reuses the existing singleton-delegator + hash-gate + data-action conventions (new open-signal action). Falls back to prose when structured is null (back-compat). API /latest passes through structured (nullable, forward-compatible). CSS reuses card/badge tokens. Mobile BriefingScreen mirror is the remaining part of this spec. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(post-/review): address review findings across the epic Correctness: - deadline urgency: stale (past-relative-to-now) deadlines no longer read as critical; far-out deadlines no longer DOWNGRADE a type's default urgency (#1/#2) - security markers curated to specific phrases — kill false positives on shipping notices / "welcome back" / articles (#3); marker check also applied on the LLM path so escalate-only holds regardless of classifier (safety defense-in-depth) - digest emits signalRefs[] so citation chips actually render (#4) - scope gate now covers calendar RSVP/invite write actions (#5) - commitment extractor: clause-level negation (keep real commitments sharing a sentence with "if I…") (#6); "by <person>" no longer a deadline hint (#7) - entity resolver compares full normalized string, not the truncated slug (#10) Hardening/robustness: - demo-guard isLocalDbTarget: exact host match, not substring (#8) - provisionNewUser is genuinely best-effort (try/catch) — never 500s after the user row exists - briefing-generator pinned to prompt v1 until it consumes v2 structured output (avoids requesting+discarding todos/topics); v2 deterministic_fallback fixed - briefing test mock provides userRepository.getLocale so the LLM-prose path is actually exercised (#13) Regression tests added for each. Full suite green (70/70 tasks). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): collapse prose under a disclosure when the digest renders (design-review) Showing the structured two-bucket digest AND the full prose was the same briefing twice. When structured is present, the prose moves under a "Full briefing" <details> as the long-form view; falls back to inline prose when there's no structured payload. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(decision-engine,web): power view — inline technical depth (spec 14) One digest, two depths. Default stays the clean view (non-technical users unaffected); a discoverable header "Power view" toggle (persisted) + per-item "Details" expander reveal the depth SkyTwin already computes — provenance, confidence %, urgency reason, why-it-didn't-auto-run (scope/tier/policy), real source refs, and the explanation — plus a coverage panel ("what I can see, connect X to unlock Y"). Not buried in settings. buildDigestItemDetail is the pure view-model (raw codes -> human strings), unit tested. UI follows the singleton-delegator/hash-gate/data-action conventions. Digest payload carries optional per-item detail + coverage (generator populates). Verified rendering via a headless-browser screenshot. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(design): lock design system — calm command center, premium iris (DESIGN.md) Source of truth grounded in a full element-and-state inventory of the digest surfaces. Cool-neutral base (refines existing #0f1117 tokens; rejected the warm/brown direction), iris #7C72E8 as the SINGLE accent meaning "needs you / act", Fraunces voice + Geist + Geist Mono, action-vs-awareness hierarchy. Catalogs every element + EVERY state including the gaps never rendered before: cold-start, scope-blocked grant-access, loading, error, prose-fallback, distinct security treatment, provenance in default view. CLAUDE.md now points UI + /qa + /design-review at it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(web): implement DESIGN.md in the digest — iris, two-zone, gap states (spec 15) Wires the locked design system into the real digest UI: - Load Fraunces (twin voice) + Geist + Geist Mono (index.html) - Iris #7C72E8 as the single accent = "needs you / act"; killed the CAPS source-chip soup -> one neutral source mark + a single "·N sources" citation; provenance as a dot (neutral, never accent) - Action zone (to-dos: checkbox + inline Draft/Snooze/Verify/Grant, hover-reveal, always-on for security + touch) vs awareness zone (topics: lighter, no edge) - Twin voice (Fraunces) + value line ("✓ N handled · M need you · K to catch up") - Power view detail panel + coverage panel restyled to the system - GAP STATES now designed: loading skeleton, empty-quiet, cold-start ("connect a source"), prose-fallback disclosure, distinct security treatment, scope-blocked "Grant access". Verified via headless-browser render of the real CSS. Row-action wiring (draft/snooze/verify) routes/acknowledges until the act layer lands. App-wide token adoption (vs digest-scoped iris) is a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): make DOMContentLoaded handler async — SPA-breaking syntax error (pre-existing) app.js:856 registered a non-async DOMContentLoaded handler, but the pairToken branch (line ~904) uses `await fetch(...)` → "Unexpected reserved word" at parse time, which aborts ALL app initialization. Every page rendered as an empty #page-content shell. Present on origin/main; web JS has no type-check or tests, so it shipped silently. One-word fix (() => → async () =>); verified by booting the seeded app and touring dashboard/decisions/approvals/settings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(api,web): render the to-do/FYI digest live end-to-end (parity) The digest existed as tested modules but never rendered in the running app: the briefing generator produces no structured payload, so /latest returned null and the UI fell back to "No briefing content yet". This closes that seam so the AI-inbox parity (to-dos vs topics, multi-source) actually shows. - live-digest.ts: compute the structured digest from a user's recent decisions — read each decision's RawSignal through toSignalText (spec 07) for real, source-agnostic titles, partition via buildDigest (spec 01/04), attach power-view detail (spec 14) and coverage (spec 13). - twin-briefings /latest: when no structured_payload is stored, compute the digest live (best-effort; degrades to prose on error) and synthesize a briefing envelope so the page renders parity today. Forward-compatible: a stored payload still wins once the worker writes one. - dashboard: Home leads with a read-only digest hero (action zone first, DESIGN.md) linking to the full interactive /briefing; stop showing the "connect Google" nag once the twin has produced decisions. - index.html: first-class "Briefing" nav link. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(api,web,db): show "Needs you" for pending-approval decisions The decisions log mapped auto_executed=false to "You OK'd", which mislabels a decision still awaiting approval (notably an escalated security alert) as already approved. Surface the outcome's requires_approval through the API and add a distinct "Needs you" state so the log matches the Approvals page. - decision-repository.getOutcomesForDecisions: also select requires_approval. - decisions route: return requiresApproval per decision. - decisions.js: Auto / Needs you / You OK'd / Pending, in that order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(api): describePreference never renders "[object Object]" A structured preference value (e.g. a brand-preference object) fell through to String(value) and rendered as "[object Object]" in the dashboard "What I've learned" summaries. Render arrays/objects readably instead. Adds a regression test covering objects, nested objects, arrays, booleans, strings, numbers, and null. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): hide read controls on the live-computed briefing The live digest (no stored row) carries the sentinel id 'live'; its "Mark as read" button POSTed to /briefings/live/read and 400'd on the UUID check. Gate the New badge + Mark-as-read on a persisted briefing so the control only shows when there's a real row to mark. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(api,decision-engine): make power-view digest detail meaningful The power-view detail panel rendered noise: "URGENCY: Default for security", "REFS: email: 77538186" (an internal id slice), "WHY: Account notice" (just the title again), and no confidence at all. Feed it real technical depth instead: - confidence: pull decision_outcomes.confidence -> a real percentage. - source ref: the actual sender/organizer/file ("email: no-reply@accounts.example"), not an opaque decision-id slice. - urgencyReason: a real driver ("Security alert — always sent to you", "New invite — awaiting your RSVP", "Routine — no deadline detected") via a new optional urgencyReason override on buildDigestItemDetail, instead of the generic "Default for <domain>". - drop the redundant explanation (it duplicated the title). - honest whyNotAutoExecuted: use the engine's real escalation_reason, and only fall back to the trust-tier gate when the item genuinely required approval — no fabricated "trust_tier:observer" on escalate-only items. - normalizeUrgency: map the DB default 'normal' to 'medium', not 'low' (silent demotion). - name the recent-decisions window; drop the redundant maxTodos override. Adds a DB-mocked buildLiveDigest suite (cold start, to-do mapping + detail, malformed raw_event, provenance fail-safe, handledCount) plus normalizeUrgency and urgencyReasonFor helper tests. Fixes the sections-fold test's @skytwin/db mock to define query so the live-digest path resolves cleanly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): don't suppress connect heroes once the twin has data Gating the Connect-Google/Connect-Gmail heroes on `hasAnyData` hid the onboarding CTA for users who have decisions but haven't connected Gmail (the "Calendar connected, Gmail not yet" segment) — the heroes already self-suppress when actually connected, so the extra gate only hurt real users. Revert to gating on tourMode only. Also drop a dead `t.kind === 'security'` branch in the Home digest hero (buildDigest never sets kind). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(digest): show what each item says + the recommended next step The digest told you a title and a pile of system metadata (origin, confidence, "why it escalated") but not the two things that actually matter: what the item says and what to do about it. Surface both, sourced from data we already had: - body: the real content (email snippet, event description, file excerpt, transcript) via toSignalText, rendered as a one-line preview under each title — visible by default, not buried in the power view. - suggestedAction: the twin's recommended next step, taken from the pipeline's selected candidate action ("Accept this calendar invitation", "Review this security alert in the provider's official app — don't click links in the message"), with sensible fallbacks for escalate-only situations. UI: the to-do/topic rows now lead with title -> what it says; the power-view detail leads with the actionable "suggested" step, and the trust metadata (origin/confidence/refs) drops below it. The Home hero shows the content line plus an iris "→ next step" so it's actionable without opening anything. Carries body through DigestItem/DigestTodo/DigestTopicItem + buildDigest, and adds suggestedAction to DigestItemDetail. Tests cover body extraction, the pipeline-selected action, and the security/RSVP fallbacks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(digest): clean, user-facing next step on every item Two gaps from the last pass: some suggestions were the rule-based engine's raw internal text ("Apply appropriate labels to this email", "Escalate to user: Decision needed regarding: transcript"), and the suggestion only showed in the power-view detail — so in the default view most items had no visible next step. - suggestedActionFor now maps the structured selected action TYPE to plain English ("Accept the invite, or decline / propose another time", "Nothing needed — I'll file it", "Take a look and tell me what to do"), with a security-specific instruction and situation fallbacks. Every item gets a clean, user-facing step — no engine internals leak through. - The "→ next step" now renders in the row itself for every item (to-do and topic), visible without the power view. The power-view detail drops back to the trust/technical metadata (origin, confidence, refs) it's meant for. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(digest): plain-language detail — drop the system vocabulary The detail panel was accurate but spoke the way the system names things, not the way a person asks. A non-technical user can't parse "ORIGIN: Inbound — untrusted", "REFS", "NOT AUTO-RUN", a bare "CONFIDENCE: 80%", or "From your twin" — and "untrusted" reads as a threat rather than "you didn't write this". Rephrase everything user-facing: - provenance: "Inbound — untrusted" -> "From someone else"; "From your twin" -> "From your assistant"; fail-safe stays "someone else". - block reasons: "trust level (observer) asks me to check" -> "You've asked me to check with you before I act"; "From untrusted content" -> "It came from someone else, so I want your OK first". No internal codes leak. - detail labels: origin/confidence/urgency/not-auto-run/refs become "where it's from / written by / how sure I am / why now / why I'm asking you". - source ref: a real sender or a friendly "your calendar"/"a voice note", not an id slice or a filename echo. Default view was already plain; this brings the power view to the same bar so "advanced" doesn't mean "fluent in our nouns". Tests updated to assert the plain wording and that no jargon leaks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(digest): every expand earns its rows; rename to "Your briefing" Make the detail expansion uniformly useful and cut the filler: - add "when" (relative time) — was missing entirely. - "why now" is explanatory for FYI items too ("Not time-sensitive — just so you're aware") instead of the meaningless "Normal priority". - confidence gets a word: "fairly sure (80%)", "very sure (100%)". - drop the redundant "written by: someone else" (the sender already shows it); keep "written by: you" only when you authored it (genuinely notable). - friendly source when there's no sender ("a voice note", "your files"). Also rename the page "Twin Briefing" → "Your briefing" with a plain subtitle, matching the Home hero — "twin" is our metaphor, not a word a first-timer maps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(live-digest): align urgencyReasonFor assertion with new wording The critical-urgency reason changed to "Urgent — needs your attention now"; update the assertion from /critical/i to /urgent/i. (Caught by the full test run after the per-file runs passed — the prior commit shipped this red.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(decision-engine): persist candidates before risk assessments saveRiskAssessment runs `UPDATE candidate_actions ... WHERE id = ?`, but saveCandidates (the INSERT) ran AFTER it — so the UPDATE hit zero rows, the full RiskAssessment (overallTier/dimensions) was lost, and only the thin `{reasoning}` placeholder survived. At approve time the execute-preflight (getRiskAssessment → parseRiskAssessmentFromRow, which requires overallTier) then returned null → `risk_assessment_missing`, blocking the ENTIRE approve→execute path (no action could ever be executed). Move saveCandidates ahead of the risk-assessment loop so the rows exist when the UPDATEs land. Adds a regression test asserting saveCandidates is invoked before the first saveRiskAssessment (via vi.fn invocationCallOrder). Found via a safe end-to-end execution-stack test (mock adapter + isolated tokenless user + fake email); verified fixed: fresh fake email → approve → execution completed via the (mock) adapter, no risk_assessment_missing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(execution): safe end-to-end execution-stack harness + OpenClaw test docs bin/skytwin-test-execution-stack: a repeatable, no-real-side-effects test of the full execution path (ingest → decide → policy/spend/risk gate → approval → execution router → adapter → result). Two safety layers: an isolated TOKENLESS test user (Direct handlers throw at resolveAccessToken before any Google fetch) + USE_MOCK_IRONCLAW (simulated adapter). Spins up its own mock-mode API on a test port; re-runnable; asserts the stack executed and recorded a result. docs/testing-openclaw.md: how to exercise the OpenClaw adapter safely against local Ollama via the openclaw-bridge (verified working: Ollama installed, bridge completes a fake action end-to-end, simulated, nothing real touched). Notes the router trust-ranking caveat (direct outranks openclaw, so isolate it to see OpenClaw execute) and the OPENCLAW_API_URL config. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): setup — don't surface IronClaw credential-sync when it's unreachable The Connect (#/setup) page showed "Not fully synced to IronClaw" + a "Sync to IronClaw" button even when no IronClaw is configured/reachable (the common case), so clicking it failed with a connection error. Gate the sync lookup on ironclawSync.reachable: when IronClaw isn't reachable (no IronClaw, the local mock, or a remote that's down) the sync affordance is hidden entirely — it's an advanced feature that only applies to a real, reachable IronClaw. The execution adapter row still shows its true state (Running / Registered-but-unreachable / Not detected) via renderAdapterStatus. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web,api): Vault page loads under dev-auth bypass (was "API may be offline") The Credential Vault page (#/credential-vault) showed "Unable to load vault status. The API may be offline." on every load: the route's getUserId read only req.user?.id (unset under the localhost dev-auth bypass), with none of the req.query['userId'] fallback every other route has — so /credential-vault/status 400'd with "userId is required". Add the standard session→query→body userId fallback (ownership still gated by requireOwnership when a real session exists), and pass userId on the web's init/rotate/lock/unlock POST bodies so those work under bypass too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(web): setup — optional execution adapters read as optional, not failed An optional, unconfigured execution adapter (IronClaw / OpenClaw) rendered as "Not detected" in the setup page's Live status — which reads like something is broken. For optional engines, that's not a failure: most users never run them (the always-available Direct adapter handles actions). renderAdapterStatus now takes an `optional` flag; an optional adapter that isn't registered shows "Optional — not connected" (calm, muted) instead of "Not detected". Direct still shows "Not detected" if it ever went missing (a real problem). This is the proper fix — correct whether or not a mock IronClaw is running, so no demo crutch is needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs: add Codex agent instructions * fix: address inbox intelligence review findings * fix: require approval for missing-scope escalations --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Greenfield bootstrap of the SkyTwin monorepo — a delegated judgment layer that sits above IronClaw. Builds a structured digital twin of user preferences and decision patterns, then acts on behalf of the user when confidence is high, or escalates with context when it isn't.
Infrastructure:
Core packages (11):
shared-types: 20+ interfaces, 5 enums (TrustTier, ConfidenceLevel, RiskTier, etc.)twin-model: TwinService with inference engine, pattern detector, temporal analyzer, cross-domain analyzerdecision-engine: SituationInterpreter (6 types), DecisionMaker with pattern-aware scoring, RiskAssessor (6 dimensions)policy-engine: PolicyEvaluator with 5 safety policies (spend limits, irreversibility, legal, privacy, trust tier)explanations: Human-readable and structured audit recordsironclaw-adapter: HTTP client (HMAC-SHA256 auth, retries, circuit breaker) for IronClaw, DirectExecutionAdapter fallback, mock adapterconnectors: Real Gmail + Google Calendar with OAuth auto-refresh (DbTokenStore), mock connectorsevals: EvalRunner, accuracy tracker, regression detector, email triage + safety regression scenariosdb,config,core: Infrastructure with repositories for approvals, patterns, decisions, OAuthApps:
Documentation: 7 docs (product spec, technical spec, safety model, decision engine, IronClaw integration, CockroachDB architecture, evals) + 15 planning artifacts
Test Coverage
119 tests across 14 test files in 7 packages:
Pre-Landing Review
3 issues found, all auto-fixed:
settings.js:33—justConnectedURL param escaped withescapeHtml()(reflected XSS via hash)events.ts:89— trust tier now read from DB user record, not caller-supplied bodytwin-repository.ts:82— column names validated against allowlist before interpolation into SQLPrior review also fixed: trust tier default
'new'→'observer', policy evaluator failing closed on unrecognized tiers.Adversarial Review
Known v0.1.0.0 architectural debt (documented, not blocking for initial bootstrap):
Test plan
🤖 Generated with Claude Code