Skip to content

P1.6 No "delete my data" endpoint exists #376

@jayzalowitz

Description

@jayzalowitz

Parent: #361

Context

A user who connects their Gmail, lets the twin learn for a week, and then decides "this isn't for me" has no way to delete what the product has accumulated about them. Twin profile, decision history, memory pages, knowledge triples, episodic memories, preferences, OAuth tokens, spend records, encrypted credential vault meta, idle-scan results — all of it remains, indefinitely, with no user-facing way to remove it short of psql and a list of 30+ tables.

This is the "GDPR Article 17 / right to erasure" gap, and it is independently a recovery gap. A user who accidentally trains a bad twin (wrong preferences, garbage signals from a stale connector, a domain enabled by mistake) cannot start over without a developer's help. A user who can't delete their data won't trust the app with new data.

It is also the kind of issue every public download visitor checks for in a privacy policy before installing — and SkyTwin's docs/privacy.html cannot truthfully say "we can delete your data on request" until the endpoint exists.

Root Cause Analysis

Three structural reasons the delete path doesn't exist today:

  1. The users router never grew a DELETE. apps/api/src/routes/users.ts has create (POST /), list (GET /), get-by-id (GET /:userId), update-trust-tier (PUT /:userId/trust-tier), update-domains (PUT /:userId/domains), seed-preferences (POST /:userId/seed-preferences) — and no DELETE handler. The route file was last extended for seed-preferences (build(deps): bump pg from 8.20.0 to 8.21.0 #346 era); a DELETE was never added because the launch path didn't force it.

  2. Cascade FKs are inconsistent across the 36 migrations. Some tables added later (everything from migration 015 onward — ai_provider_settings, lifebooks, recovery_codes, model_downloads, brain_*, dxt_*, federation_*, oauth_pending_signin, oauth_pkce_pending, external_agent_tokens, promotion_offers, user_onboarding_state, user_risk_profiles, user_credential_vault_meta, draft_email_calls, draft_email_eval_runs) carry ON DELETE CASCADE on their user_id FK. Earlier tables (preferences, decisions, signals, oauth_tokens, twin_profiles, behavioral_patterns, cross_domain_traits, briefings, spend_records, trust_tier_audit, feedback_events, sessions, domain_autonomy_policies, escalation_triggers, preference_proposals, proactive_scans, twin_exports, skill_gap_log, eval_runs, accuracy_metrics, memory_wings, memory_drawers, memory_closets, memory_tunnels, knowledge_entities, knowledge_triples, episodic_memories, entity_codes, connected_accounts) reference users(id) with no ON DELETE clause — meaning a naive DELETE FROM users WHERE id = $1 would fail with a foreign-key violation. Tables further downstream of those (e.g. candidate_actions, decision_outcomes, explanation_records, execution_plans, execution_results) chain off decisions(id) and would orphan unless deleted in dependency order.

  3. No transactional contract. Because a per-user wipe has to touch ~30 tables in dependency order, doing it in DELETE statements outside a transaction means a partial failure leaves the user in a half-deleted state — twin profile gone but decisions remain, or memory pages gone but the OAuth token still poll-able. The right primitive is a single CRDB serializable transaction (withTransaction in @skytwin/db) that either deletes everything or rolls back.

The fix is one route, one transactional cascade query, and one Settings-page confirmation dialog — plus a privacy-policy update so the cascade list is publicly documented.

Verified Current State

Verified 2026-05-25 against jayzalowitz/launch-skytwin-demo @ 8bb5379.

File Lines What it does today
apps/api/src/routes/users.ts 19-282 createUsersRouter() exports POST /, GET /, GET /:userId, PUT /:userId/trust-tier, PUT /:userId/domains, POST /:userId/seed-preferences. No DELETE.
apps/api/src/routes/users.ts 24 router.use('/:userId', sessionAuth, requireOwnership) — any new DELETE would inherit auth + ownership middleware automatically.
packages/db/src/schemas/schema.sql 8-17 users table — primary key id UUID.
packages/db/src/connection.ts 217-237 healthCheck() proves the pattern for query('SELECT ...') calls. withTransaction lives in the same module and is the right primitive for the cascade.
docs/privacy.html n/a Privacy page exists. Does not list a cascade or right-to-erasure flow.

Per-user table audit (cascade gap by table)

Verified by scanning packages/db/src/migrations/*.sql + packages/db/src/schemas/schema.sql for REFERENCES users(id):

Table Migration FK clause today Action needed in DELETE
users schema.sql:8 (target) DELETE FROM users WHERE id = $1 — last
connected_accounts schema.sql:19 REFERENCES users(id) (no cascade) Explicit DELETE
twin_profiles schema.sql:34 REFERENCES users(id) UNIQUE (no cascade) Explicit DELETE
twin_profile_versions schema.sql:65 chains via twin_profile_id Explicit DELETE first
preferences schema.sql:80 REFERENCES users(id) (no cascade) Explicit DELETE
decisions schema.sql:99 REFERENCES users(id) (no cascade) Explicit DELETE
candidate_actions schema.sql:113 chains via decision_id Explicit DELETE first
decision_outcomes schema.sql:131 chains via decision_id Explicit DELETE first
action_policies schema.sql:151 REFERENCES users(id) (no cascade) Explicit DELETE
approval_requests schema.sql:167 REFERENCES users(id) (no cascade) Explicit DELETE
execution_plans schema.sql:185 chains via decision_id Explicit DELETE first
execution_results schema.sql:196 chains via execution_plan_id Explicit DELETE first
execution_events schema.sql:206 chains via decision_id Explicit DELETE first
explanation_records schema.sql:233 chains via decision_id Explicit DELETE first
feedback_events schema.sql:262 REFERENCES users(id) (no cascade) Explicit DELETE
oauth_tokens 002:5 REFERENCES users(id) (no cascade) Explicit DELETE
connector_configs 002:18 REFERENCES users(id) (no cascade) Explicit DELETE
behavioral_patterns 003:5 REFERENCES users(id) (no cascade) Explicit DELETE
cross_domain_traits 003:21 REFERENCES users(id) (no cascade) Explicit DELETE
eval_runs 004:6 REFERENCES users(id) (no cascade) Explicit DELETE
accuracy_metrics 004:21 REFERENCES users(id) (no cascade) Explicit DELETE
signals 005:12 REFERENCES users(id) (no cascade) Explicit DELETE
preference_proposals 005:26 REFERENCES users(id) (no cascade) Explicit DELETE
twin_exports 005:42 REFERENCES users(id) (no cascade) Explicit DELETE
skill_gap_log 005:53 REFERENCES users(id) (no cascade) Explicit DELETE
proactive_scans 005:64 REFERENCES users(id) (no cascade) Explicit DELETE
briefings 005:76 REFERENCES users(id) (no cascade) Explicit DELETE
trust_tier_audit 006:6 REFERENCES users(id) (no cascade) Explicit DELETE
spend_records 008:6 REFERENCES users(id) (no cascade) Explicit DELETE
domain_autonomy_policies 009:7 REFERENCES users(id) (no cascade) Explicit DELETE
escalation_triggers 009:20 REFERENCES users(id) (no cascade) Explicit DELETE
preference_history 010 per-user Explicit DELETE
sessions 011:4 REFERENCES users(id) (no cascade) Explicit DELETE
memory_wings 012:10 REFERENCES users(id) (no cascade) Explicit DELETE (cascades down to rooms/drawers/closets via FK)
memory_tunnels 012:72 REFERENCES users(id) (no cascade) Explicit DELETE
knowledge_entities 012:86 REFERENCES users(id) (no cascade) Explicit DELETE
knowledge_triples 012:101 REFERENCES users(id) (no cascade) Explicit DELETE
episodic_memories 012:119 REFERENCES users(id) (no cascade) Explicit DELETE
entity_codes 012:143 REFERENCES users(id) (no cascade) Explicit DELETE
ai_provider_settings 015:8 ON DELETE CASCADE Auto
forwarded_signals 022 per-user Explicit DELETE
connector_cursors 024 per-user Explicit DELETE
email_label_signals 025 per-user Explicit DELETE
assistant_threads 026 per-user Explicit DELETE (cascades to messages)
mcp_servers 027:15 ON DELETE CASCADE Auto (cascades to skills/changelogs/metrics/promotion_offers/dxt_exports)
app_suggestions 027:81 ON DELETE CASCADE Auto
capability_provenance_nodes 027:113 ON DELETE CASCADE Auto
fs_scan_roots 027:138 ON DELETE CASCADE Auto (cascades to fs_file_index)
twin_briefings 027:187 ON DELETE CASCADE Auto
dxt_exports 027:221 ON DELETE CASCADE Auto
user_risk_profiles 028:15 ON DELETE CASCADE Auto
user_onboarding_state 030:7 ON DELETE CASCADE Auto
external_agent_tokens 031:11 ON DELETE CASCADE Auto
user_credential_vault_meta 032:31 ON DELETE CASCADE Auto
dxt_imports 035:9 ON DELETE CASCADE Auto
lifebooks 036:16 ON DELETE CASCADE Auto
federation_peers 037:17 ON DELETE CASCADE Auto
federation_pairing_codes 037:48 ON DELETE CASCADE Auto
recovery_codes 038:19 ON DELETE CASCADE Auto
model_downloads 039:18 ON DELETE CASCADE Auto
brain_pages 040:36 ON DELETE CASCADE Auto
brain_entities 040:69 ON DELETE CASCADE Auto
brain_triples 040:86 ON DELETE CASCADE Auto
brain_episodes 040:110 ON DELETE CASCADE Auto
brain_signals 040:131 ON DELETE CASCADE Auto
brain_settings 040:145 ON DELETE CASCADE Auto
brain_embedding_jobs 040:165 ON DELETE CASCADE Auto
draft_email_calls 048:43 ON DELETE CASCADE Auto
promotion_offers 049:44 ON DELETE CASCADE Auto
draft_email_eval_runs 050:34 ON DELETE CASCADE Auto
oauth_pkce_pending 058 per-user Explicit DELETE
oauth_pending_signin 059:46 ON DELETE CASCADE Auto

Tally: ~30 tables need explicit DELETE before the final DELETE FROM users; ~25 cascade automatically. The implementation must walk them in dependency order, in a single serializable transaction.

What's Working Well (Do Not Touch)

  • The requireOwnership middleware (apps/api/src/middleware/require-ownership.ts) already gates /:userId routes correctly — a new DELETE inherits the right auth boundary with zero change.
  • The CASCADE FKs added in migrations 015+ are the right pattern for new tables. Don't retrofit them onto the old tables in this PR; do the explicit DELETE walk first, then file a follow-up to add ON DELETE CASCADE where it's still missing (less risky, lets us validate the explicit walk works first).
  • withTransaction in packages/db/src/connection.ts already gives serializable-isolation semantics. Use it. Do not re-implement transaction handling.

Proposed Change

New endpoint

DELETE /api/users/:userId — authenticated, ownership-gated, requires the request body to confirm intent.

Behavior:

  1. Open a serializable transaction.
  2. Walk the dependency graph in order: child tables first (decision_outcomes → candidate_actions → decisions; episodic_memories → knowledge_triples → memory_drawers → memory_rooms → memory_wings; etc.).
  3. DELETE FROM users WHERE id = $1 last.
  4. Commit. On any failure inside the transaction, ROLLBACK and return 500 with a generic message (do not leak the per-table failure).
  5. Audit log a single user_deleted event before COMMIT (in a separate gdpr_deletion_log table that does NOT carry the deleted user_id — only a timestamp, IP hash, and table-count manifest, so the audit survives the delete).

New Settings UI

In apps/web/public/js/pages/settings.js, add a "Danger zone" section with a "Delete everything" button that opens a confirmation modal:

  • Type the user's full email to confirm.
  • Read-aloud cascade summary: "This will delete N tables of data including your twin profile, X decisions, Y memory pages, Z signals."
  • "Permanently delete" button is disabled until the typed email matches.
  • On success: clear localStorage, route to #/ with a final toast "All your data has been deleted."

Privacy policy update

docs/privacy.html gains a "Right to erasure" subsection that lists every table the delete cascade touches and links to the endpoint.

Implementation Details

Endpoint contract

DELETE /api/users/:userId
Headers: Cookie: session=... (or dev auth bypass)
Body:
  {
    "confirmEmail": "user@example.com"   // must match users.email exactly
  }

Response 200:
  {
    "deleted": true,
    "tablesAffected": 30,
    "rowsAffected": {
      "decisions": 1247,
      "signals": 18394,
      "brain_pages": 4521,
      ...
    },
    "deletedAt": "2026-05-25T12:34:56Z"
  }

Response 400 if confirmEmail missing/mismatched:
  { "error": "confirm_email_required", "message": "Pass confirmEmail matching the user's email." }

Response 404 if user not found:
  { "error": "user_not_found" }

Response 500 on any DB failure:
  { "error": "deletion_failed", "message": "..." }
  (Transaction rolled back; no partial state.)

Cascade walk

apps/api/src/services/user-deletion.ts (new) exports deleteUserCascade(userId) that wraps the walk in withTransaction. The walk has three tiers:

  1. decision_id-chained tables firstexecution_results (WHERE execution_plan_id IN (SELECT id FROM execution_plans WHERE decision_id IN (SELECT id FROM decisions WHERE user_id = $1))), then execution_events, execution_plans, explanation_records, decision_outcomes, candidate_actions (all WHERE decision_id IN (SELECT id FROM decisions WHERE user_id = $1)).
  2. Direct per-user, no ON DELETE CASCADE — loop the TABLES_NEEDING_EXPLICIT_DELETE constant (~30 tables from the audit above), each WHERE user_id = $1.
  3. DELETE FROM users WHERE id = $1 last — the migration-015+ ON DELETE CASCADE tables follow automatically.

Each DELETE returns its row count, which accumulates into a manifest: Record<string, number> returned to the caller. Inside the transaction, write a row to gdpr_deletion_log containing (now(), hashIp(reqIp), Object.keys(manifest).length, JSON.stringify(manifest)) — see the new migration below. Any throw rolls back everything.

New migration

packages/db/src/migrations/060-gdpr-deletion-log.sql creates an append-only gdpr_deletion_log table: (id UUID PRIMARY KEY DEFAULT gen_random_uuid(), deleted_at TIMESTAMPTZ NOT NULL DEFAULT now(), ip_hash STRING NOT NULL, table_count INT NOT NULL, manifest_json JSONB NOT NULL) plus an index on deleted_at DESC. Schema deliberately omits the deleted userId — the audit survives the cascade.

Acceptance Criteria

  1. DELETE /api/users/:userId with body {confirmEmail: "user@example.com"} returns 200 with the documented manifest.
  2. After a successful DELETE, SELECT COUNT(*) FROM <every table> returns 0 for that userId across all ~55 per-user tables.
  3. Without confirmEmail, the request returns 400 — the user is NOT deleted.
  4. With wrong confirmEmail, the request returns 400 — the user is NOT deleted.
  5. A simulated mid-transaction failure (e.g. a DELETE FROM signals that throws) leaves the user fully intact — no rows deleted across any table.
  6. The gdpr_deletion_log table gains exactly one row per successful delete, containing the table-count manifest but NOT the deleted userId.
  7. Settings page renders a "Delete everything" button; clicking opens a type-to-confirm modal; the confirm button is disabled until the typed email matches exactly.
  8. docs/privacy.html lists every cascade-affected table by name and links to the endpoint.
  9. A user who deletes their account and re-onboards via the sample-profile path (P0.2 "Try with a sample profile" silently does nothing #363) gets a fresh user row with no carryover from the deleted account.

Testing Plan

Layer What to test Count
Unit deleteUserCascade builds the right manifest for a user with rows in all ~30 tables +1
Unit deleteUserCascade rolls back on a forced mid-transaction throw (every per-table query inside, then a thrown error) +1
Unit deleteUserCascade is a no-op for a userId that doesn't exist (returns 0-row manifest, NOT an error) +1
Integration DELETE /api/users/:userId with valid confirmEmail returns 200, manifest is non-zero +1
Integration DELETE /api/users/:userId without confirmEmail returns 400, user still exists +1
Integration DELETE /api/users/:userId with wrong confirmEmail returns 400, user still exists +1
Integration gdpr_deletion_log row gets written on success, NOT on failure +2
Integration Cross-table cleanup verified: seed 10 signals, 5 decisions, 3 brain_pages, 2 lifebooks, then delete; assert 0 rows in each +1
E2E (Playwright) Settings → "Delete everything" → type email → confirm → toast → reload shows onboarding modal again +1
Security Ownership middleware: User A cannot DELETE /api/users/<userB-id> even with correct confirmEmail +1

Total: +11 tests (3 unit, 6 integration, 1 E2E, 1 security).

Rollback Plan

This is a one-way operation by design. Rollback the code (revert PR) is trivial; rollback the data requires the user has a backup. The PR ships a one-line CHANGELOG warning that the new DELETE is unrecoverable.

If we discover after deploy that the cascade has a bug that silently leaves rows behind, the fix is to add the missing table to the walk and ship a backfill migration that re-runs the delete for any user IDs present in gdpr_deletion_log (cross-check against tables that should be empty).

Database migration 060-gdpr-deletion-log.sql is additive only — rollback is DROP TABLE gdpr_deletion_log if needed.

Effort Estimate

Component Hours
deleteUserCascade service + walk-list enumeration + tests 3h
DELETE /api/users/:userId route + auth + confirmEmail validation 1h
060-gdpr-deletion-log.sql migration 0.5h
Settings UI: Danger zone + type-to-confirm modal 2h
docs/privacy.html cascade-list documentation 0.5h
Code review + integration 1h

Total: ~8h (1 day for one engineer).

Files Reference

File Change
apps/api/src/routes/users.ts New DELETE /:userId handler
apps/api/src/services/user-deletion.ts (new) deleteUserCascade(userId) transactional walk
packages/db/src/migrations/060-gdpr-deletion-log.sql (new) Append-only audit log table
apps/web/public/js/pages/settings.js "Danger zone" section + type-to-confirm modal markup + handler
apps/web/public/styles.css .danger-zone + .confirm-modal styling
docs/privacy.html "Right to erasure" subsection enumerating tables
CHANGELOG.md Entry under Added (DELETE endpoint) + warning under Notes

Out of Scope

  • Adding ON DELETE CASCADE to the ~30 tables that lack it today. Separate follow-up — risky and orthogonal to shipping a working delete. The explicit walk works without it.
  • "Soft delete" mode (mark as deleted, retain rows for N days). The user requested erasure; honor it. If we later need a 30-day grace period, that's a UX option, not a default.
  • Cross-instance federation cleanup (federation_peers table). The federation-peer cascade is auto, but if the user has paired with another SkyTwin instance, that instance still has its own copy — out of scope here.
  • Server-side data backup before delete (the user is welcome to use skytwin-backup export first; see P2 backup issue).
  • IP-hash salt rotation in gdpr_deletion_log — single salt for v1 is fine.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority/P1Must-fix before launchprivacyPrivacy / data handling

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions