Problem
The decision engine, inference system, and safety framework are solid — but the product has no human-facing layer. A non-technical person cannot onboard, understand what the twin is doing, approve decisions, or build trust over time. The core intelligence works but it's invisible and inaccessible.
This issue tracks everything needed to go from "engineer's prototype" to "a regular person can sit down and use this."
1. Onboarding Flow
Current state: User lands on dashboard, sees "No decisions yet. Send events to the API to get started." No explanation of what a twin is.
Required:
2. Fix the Approvals Workflow (Critical)
Current state: The approvals page always shows empty. There's no endpoint to list pending approvals. This is the single most important UI — the trust-building loop where users see what the twin wants to do and approve/reject.
Required:
3. Implement OAuth Token Storage
Current state: Google OAuth flow generates tokens via exchangeCode() but there's no OAuthTokenStore implementation. Tokens aren't persisted. Real connectors (Gmail, Calendar) exist but can't run.
Required:
4. Rewrite Twin Profile UI in Human Language
Current state: Shows database columns: domain: email, key: auto_archive, value: true, confidence: HIGH, source: inferred. Meaningless to a non-technical user.
Required:
5. Confidence & Learning Dashboard
Current state: No visibility into how well the twin is performing or how much it has learned. Detected patterns (temporal, cross-domain) are stored but invisible.
Required:
6. Wire Detected Patterns into Decision Scoring
Current state: PatternDetector, TemporalAnalyzer, and CrossDomainAnalyzer detect behavioral patterns but the DecisionMaker never uses them. The twin learns things it doesn't act on.
Required:
7. Multi-User Worker Support
Current state: Worker hardcodes userId: 'default-user' and only uses mock connectors. Can't serve multiple real users.
Required:
8. Error Handling & User-Friendly Messaging
Current state: Failed API calls surface HTTP errors. No human-readable error states in the UI.
Required:
9. Settings in Plain English
Current state: Trust tier selector shows OBSERVER, LOW_AUTONOMY, MODERATE_AUTONOMY, HIGH_AUTONOMY, FULL_AUTONOMY. Jargon.
Required:
10. Mobile-Responsive UI
Current state: Desktop-only sidebar layout. CSS has basic responsiveness but not tested or optimized.
Required:
Definition of Done
A non-technical person can:
- Visit the app and understand what it does within 30 seconds
- Connect their Google account through a guided flow
- Set their comfort level in plain language
- Watch the twin start learning from their real email and calendar
- See pending approvals explained in natural language and approve/reject
- Visit "Twin Profile" and understand what the twin has learned about them
- See accuracy and confidence metrics that make sense
- Correct the twin when it's wrong and see it learn from corrections
- Adjust settings without needing to know what "trust tier" or "domain" means
- Trust the system enough to gradually increase autonomy
Technical Notes
- The decision pipeline, inference engine, safety invariants, and IronClaw handlers are already implemented. This issue is about the human layer on top of working infrastructure.
- All changes must maintain the existing safety invariants from CLAUDE.md (policy checks, explanation records, trust tiers, spend limits, risk assessment, feedback loops).
- Existing test suite (28 tasks) must continue passing. Add tests for new endpoints and flows.
Problem
The decision engine, inference system, and safety framework are solid — but the product has no human-facing layer. A non-technical person cannot onboard, understand what the twin is doing, approve decisions, or build trust over time. The core intelligence works but it's invisible and inaccessible.
This issue tracks everything needed to go from "engineer's prototype" to "a regular person can sit down and use this."
1. Onboarding Flow
Current state: User lands on dashboard, sees "No decisions yet. Send events to the API to get started." No explanation of what a twin is.
Required:
OBSERVER/LOW_AUTONOMY/HIGH_AUTONOMY2. Fix the Approvals Workflow (Critical)
Current state: The approvals page always shows empty. There's no endpoint to list pending approvals. This is the single most important UI — the trust-building loop where users see what the twin wants to do and approve/reject.
Required:
approval_requeststable)GET /api/approvals/:userId/pendingendpoint that queries un-responded approval requestsGET /api/approvals/:userId/historyendpoint for past decisions3. Implement OAuth Token Storage
Current state: Google OAuth flow generates tokens via
exchangeCode()but there's noOAuthTokenStoreimplementation. Tokens aren't persisted. Real connectors (Gmail, Calendar) exist but can't run.Required:
OAuthTokenStorebacked byoauth_tokenstable (migration 002 already exists)/api/oauth/google/callback) to persist tokens via the storeOAuthTokenStoreintoGmailConnectorandGoogleCalendarConnectorEmailActionHandlerandCalendarActionHandler(currently expectaccessTokenin step params but nothing provides it)4. Rewrite Twin Profile UI in Human Language
Current state: Shows database columns:
domain: email, key: auto_archive, value: true, confidence: HIGH, source: inferred. Meaningless to a non-technical user.Required:
HIGH/MODERATE/LOW5. Confidence & Learning Dashboard
Current state: No visibility into how well the twin is performing or how much it has learned. Detected patterns (temporal, cross-domain) are stored but invisible.
Required:
AccuracyTrackerandContinuousEvalRunnerto real data instead of stub responses in/api/evals6. Wire Detected Patterns into Decision Scoring
Current state:
PatternDetector,TemporalAnalyzer, andCrossDomainAnalyzerdetect behavioral patterns but theDecisionMakernever uses them. The twin learns things it doesn't act on.Required:
cautious_spender, increase scrutiny on purchase-related actionsExplanationRecord7. Multi-User Worker Support
Current state: Worker hardcodes
userId: 'default-user'and only uses mock connectors. Can't serve multiple real users.Required:
8. Error Handling & User-Friendly Messaging
Current state: Failed API calls surface HTTP errors. No human-readable error states in the UI.
Required:
502 Bad Gateway9. Settings in Plain English
Current state: Trust tier selector shows
OBSERVER,LOW_AUTONOMY,MODERATE_AUTONOMY,HIGH_AUTONOMY,FULL_AUTONOMY. Jargon.Required:
10. Mobile-Responsive UI
Current state: Desktop-only sidebar layout. CSS has basic responsiveness but not tested or optimized.
Required:
Definition of Done
A non-technical person can:
Technical Notes