Epic: SkyTwin v0.4 — Local-First Product Completion (HN Launch Ready)

## Context

SkyTwin v0.3 shipped a production-grade decision pipeline: signals → interpretation → twin query → candidates → risk → policy → routing → execution → learning. 432 tests, 50+ eval scenarios, 13 domains, 7 safety invariants enforced. The *engine* works.

What's missing is the *product*. A user can't install SkyTwin, launch it, connect their accounts, and use it daily without touching a terminal. There's no desktop app, no real-time notifications, no mobile access, no process supervision. The dashboard covers the happy path but can't audit, undo, or search. Connectors crash silently on transient errors. The twin doesn't respect quiet hours.

This epic tracks everything needed to go from "engineer's working prototype" to "download it, launch it, use it every day, post it on Hacker News."

## PR Workflow

Every child issue PR must pass \`/review\` before merge. No exceptions.

## Current State (verified 2026-04-04)

### What's working well (do not break)

| Component | State | Evidence |
|-----------|-------|----------|
| Decision pipeline (signal → outcome) | Production-ready | 432 tests, E2E coverage across 13 domains |
| Safety invariants (7/7) | Enforced | Integration test suite, regression detection |
| Trust tier progression | Automated | 10/20/50/100 thresholds, rolling-window regression |
| Spend tracking | Hard-gated | Rolling 24h window, per-action + daily aggregate |
| Domain autonomy | Enforced | Most-restrictive-wins logic |
| Gmail + Calendar connectors | Real APIs | OAuth token refresh, multi-user worker |
| Twin learning (preference evolution) | Full lifecycle | Feedback → preference update → attribution tracking |
| Eval suite + CI | Automated | GitHub Actions, safety regression gate |

### What's incomplete

| Gap | Severity | Current state |
|-----|----------|---------------|
| Desktop app | **Critical** | 427 LOC Electron shell, no real UI |
| Real-time notifications | **Critical** | 30-second badge poll only, no SSE/push |
| Process supervision | **Critical** | Crashed processes stay dead |
| Connector error handling | **High** | Gmail 429 → signal lost, no retry |
| First-run experience | **High** | Onboarding wizard ends at empty dashboard |
| Audit log | **High** | Data logged to DB, no viewer |
| Rollback UI | **High** | API exists, no button in dashboard |
| Decision search/filter | **High** | Domain dropdown only |
| Mobile access | **High** | No way to approve from phone |
| Mobile responsive CSS | **High** | Basic fixes only, not systematically tested |
| Quiet hours | **Medium** | Schema exists, not enforced |
| Policy CRUD | **Medium** | No API for action policy management |
| Type safety | **Medium** | as-never casts in ask.ts |
| Structured logging | **Medium** | console.log only |
| Temporal profile persistence | **Low** | In-memory Map, lost on restart |
| Rate limiter persistence | **Low** | In-memory Map, lost on restart |

## Child Issues

| # | Title | Priority | Claude Code Est. | Dependencies |
|---|-------|----------|-----------------|--------------|
| #11 | Local resilience: retry/backoff, process supervision, graceful degradation | Critical | ~2-3h | None |
| #13 | Desktop Electron app: native install, tray, process management | Critical | ~4-6h | None |
| #8 | Live notification layer: SSE, approval expiry cron, push alerts | Critical | ~1-2h | #11 |
| #9 | Guided first-run: connect → see signals → approve → watch twin learn | High | ~2-3h | #13 |
| #10 | Dashboard completeness: audit log, rollback UI, decision search | High | ~3-4h | None |
| #14 | Engine gaps: quiet hours, policy CRUD, type safety, structured logging | Medium | ~1-2h | None |
| #15 | Mobile local access: QR pairing, mDNS, responsive dashboard | High | ~3-4h | #13 |
| #16 | Cross-platform builds: Windows, Linux, native mobile apps | High | ~5-8h | #13, #15 |

**Total serial: ~21-32h Claude Code time**

### Parallel execution plan (Conductor)

```
Wave 1 (~4-6h):  #11 + #13 + #10 + #14  (all independent)
Wave 2 (~3-4h):  #8 + #9 + #15          (deps satisfied)
Wave 3 (~5-8h):  #16                     (needs #13 + #15)
```

**Wall time with parallelism: ~12-18h across 3 waves**

## Dependency Graph

```
#11 Resilience ──────────────> #8 Live Notifications

#13 Desktop App ──┬──────────> #9 First-Run Experience
                  ├──────────> #15 Mobile Access ──> #16 Cross-Platform
                  └─────────────────────────────────/

#10 Dashboard ────────────── (independent, start anytime)

#14 Engine Gaps ──────────── (independent, start anytime)
```

### Sequencing Rationale

- **#11 first**: retry/backoff and process supervision are prerequisites for everything else. If the API crashes during onboarding or connectors lose signals, nothing else matters.
- **#13 early**: the desktop app *is* the product for local-first. It manages processes, hosts the API, and provides the install experience. #9 (first-run) and #15 (mobile) depend on it.
- **#8 after #11**: SSE connections and expiry cron need a process that stays alive.
- **#9 after #13**: first-run experience happens inside the Electron app, not a browser tab.
- **#10 and #14 anytime**: dashboard completeness and engine gaps are independent. Good parallel work.
- **#15 after #13**: mobile access requires the desktop to be serving the API reliably.
- **#16 after #13 + #15**: cross-platform builds extend the macOS desktop app and mobile web to Windows/Linux installers and native iOS/Android apps.

## Acceptance Criteria

1. User downloads DMG (macOS), MSI (Windows), or AppImage (Linux), installs and launches SkyTwin — sees onboarding wizard within 10 seconds
2. User connects Google account, sees first signals arrive within 30 seconds
3. User approves 3 actions, sees trust tier progress indicator update
4. Approval notification appears as native OS notification within 5 seconds of creation
5. User opens dashboard on phone via QR scan, approves an action from mobile
6. Native iOS and Android apps connect to desktop via local network
7. Quiet hours prevent auto-execution and suppress non-urgent notifications
8. kill -9 on API process → app restarts it within 5 seconds
9. Gmail returns 429 → connector retries with backoff, signal is not lost
10. User can search decisions by date range, situation type, and text
11. User can undo an executed action from the dashboard with structured reasoning
12. Audit page shows trust tier changes, spend events, and preference evolution
13. All 432 existing tests continue passing
14. No safety invariant regressions in eval suite
15. Every child PR passes \`/review\` before merge

## Out of Scope

- Cloud deployment (future epic)
- Auto-update mechanism
- App store distribution (Apple App Store, Google Play, Microsoft Store)
- Multi-device sync beyond local network
- Off-network mobile access

> **Note on cloud epic**: A separate epic should be created for cloud deployment. This epic is strictly local-first: everything runs on the user's machine.

## Definition of Done

1. All 15 acceptance criteria pass
2. A non-technical person can install, onboard, and use SkyTwin daily without opening a terminal — on macOS, Windows, or Linux
3. The system recovers from transient failures (connector errors, process crashes) without user intervention
4. Phone approval workflow works on same WiFi network via native app or mobile browser
5. Eval suite green, no safety regressions
6. Every PR passed \`/review\`
7. Ready to post on Hacker News: install → connect → first decision in under 5 minutes

## Rollback Plan

Each child issue is a separate PR. If a child breaks something:
- Revert the PR
- Existing v0.3 functionality is unaffected (all changes are additive)
- Desktop app is a new entrypoint — doesn't modify existing API/worker/web behavior

## Issue Archaeology

Issues #2, #3, and #4 were audited against the v0.2/v0.3 CHANGELOG on 2026-04-04. All three are now closed:

- **#2** (Make SkyTwin usable by a non-technical person): 7/10 sections shipped. Remaining items (confidence dashboard, error handling, mobile responsive, pause twin) migrated to #10, #11, #13, #15.
- **#3** (Make SkyTwin actually work end-to-end): Phases 1-2 complete. Remaining items (notifications, quiet hours, SSE, policy CRUD, structured logging, type safety) migrated to #8, #14.
- **#4** (CEO Review build plan): All 5 build phases executed in v0.2/v0.3. Closed as completed planning artifact.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: SkyTwin v0.4 — Local-First Product Completion (HN Launch Ready) #12

Context

PR Workflow

Current State (verified 2026-04-04)

What's working well (do not break)

What's incomplete

Child Issues

Parallel execution plan (Conductor)

Dependency Graph

Sequencing Rationale

Acceptance Criteria

Out of Scope

Definition of Done

Rollback Plan

Issue Archaeology

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Component	State	Evidence
Decision pipeline (signal → outcome)	Production-ready	432 tests, E2E coverage across 13 domains
Safety invariants (7/7)	Enforced	Integration test suite, regression detection
Trust tier progression	Automated	10/20/50/100 thresholds, rolling-window regression
Spend tracking	Hard-gated	Rolling 24h window, per-action + daily aggregate
Domain autonomy	Enforced	Most-restrictive-wins logic
Gmail + Calendar connectors	Real APIs	OAuth token refresh, multi-user worker
Twin learning (preference evolution)	Full lifecycle	Feedback → preference update → attribution tracking
Eval suite + CI	Automated	GitHub Actions, safety regression gate

Gap	Severity	Current state
Desktop app	Critical	427 LOC Electron shell, no real UI
Real-time notifications	Critical	30-second badge poll only, no SSE/push
Process supervision	Critical	Crashed processes stay dead
Connector error handling	High	Gmail 429 → signal lost, no retry
First-run experience	High	Onboarding wizard ends at empty dashboard
Audit log	High	Data logged to DB, no viewer
Rollback UI	High	API exists, no button in dashboard
Decision search/filter	High	Domain dropdown only
Mobile access	High	No way to approve from phone
Mobile responsive CSS	High	Basic fixes only, not systematically tested
Quiet hours	Medium	Schema exists, not enforced
Policy CRUD	Medium	No API for action policy management
Type safety	Medium	as-never casts in ask.ts
Structured logging	Medium	console.log only
Temporal profile persistence	Low	In-memory Map, lost on restart
Rate limiter persistence	Low	In-memory Map, lost on restart

#	Title	Priority	Claude Code Est.	Dependencies
#11	Local resilience: retry/backoff, process supervision, graceful degradation	Critical	~2-3h	None
#13	Desktop Electron app: native install, tray, process management	Critical	~4-6h	None
#8	Live notification layer: SSE, approval expiry cron, push alerts	Critical	~1-2h	#11
#9	Guided first-run: connect → see signals → approve → watch twin learn	High	~2-3h	#13
#10	Dashboard completeness: audit log, rollback UI, decision search	High	~3-4h	None
#14	Engine gaps: quiet hours, policy CRUD, type safety, structured logging	Medium	~1-2h	None
#15	Mobile local access: QR pairing, mDNS, responsive dashboard	High	~3-4h	#13
#16	Cross-platform builds: Windows, Linux, native mobile apps	High	~5-8h	#13, #15

Epic: SkyTwin v0.4 — Local-First Product Completion (HN Launch Ready) #12

Description

Context

PR Workflow

Current State (verified 2026-04-04)

What's working well (do not break)

What's incomplete

Child Issues

Parallel execution plan (Conductor)

Dependency Graph

Sequencing Rationale

Acceptance Criteria

Out of Scope

Definition of Done

Rollback Plan

Issue Archaeology

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions