Skip to content

docs(safety-model): sync with shipped Tier-2 fixes#451

Merged
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/safety-model-docs-sync
May 26, 2026
Merged

docs(safety-model): sync with shipped Tier-2 fixes#451
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/safety-model-docs-sync

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

`docs/safety-model.md` ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish. Adding them so a reader auditing the safety model post-launch can see the current defense-in-depth picture, not the pre-Tier-2 one.

Layers added

Layer Issue Where
7. Global pause + per-user pause #379 env var + autonomy_settings, sits ahead of trust-tier gate
8. Right to erasure #376 DELETE endpoint + userPurgeRepository, cascade via migration 061
9. Access audit log #393 access_log table + DbTokenStore instrumentation

Each new entry follows the existing template (what / where / can/can't do / interaction with neighbouring layers).

Trust-tier section update

Added the time-in-tier floor (#373) as the third gate alongside `consecutiveApprovals` and `minApprovalRatio`. 24h / 72h / 168h before lifting the tier. References the shape-lock test that prevents drift.

Documentation-only — zero code change. Built and verified locally.

Test plan

  • CI matrix (markdown lint)
  • (Manual) Render on GitHub, confirm new layer sections render correctly + the trust-tier bullets show the new criterion

🤖 Generated with Claude Code

docs/safety-model.md ended at Layer 6 (Approval Routing) and didn't
document the three new layers that landed during Tier-2 polish:

  Layer 7: Global pause + per-user pause (#379)
    - SKYTWIN_AUTO_EXECUTE_DISABLED operator env var
    - autonomy_settings.paused per-user toggle
    - Sits ahead of trust-tier gate + injection guard

  Layer 8: Right to erasure (#376)
    - DELETE /api/users/:userId?confirm=delete-my-data
    - userPurgeRepository.purgeUser in a single transaction
    - Cascade via migration 061 (#413) collapses 32 tables

  Layer 9: Access audit log (#393)
    - access_log table + accessLogRepository
    - decrypt_oauth_token rows from DbTokenStore
    - Fire-and-forget; never blocks legitimate decrypt

Each new entry follows the existing layer template (what it is,
what it gates, where the code lives, can/can't do, interaction
with layers above/below).

Trust-tier-progression section also added the time-in-tier floor
(#373) as the third gate alongside consecutiveApprovals and
minApprovalRatio — 24h / 72h / 168h before lifting the tier.

Pointers to the shape-lock test (promotion-thresholds-shape.test.ts)
and the cascade E2E test (cascade-cleanup.e2e.test.ts) so a future
reader can verify the doc matches the engine.

Documentation-only — no code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 22:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates SkyTwin’s safety-model documentation to reflect Tier-2 “defense in depth” layers added post-launch, and records the docs sync in the changelog.

Changes:

  • Extend docs/safety-model.md with Layers 7–9 (pause controls, right-to-erasure, access audit log).
  • Update the Trust Tier Progression section to include the time-in-tier floor as a documented promotion criterion.
  • Add an Unreleased changelog entry describing the docs sync.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
docs/safety-model.md Adds Layers 7–9 and updates trust-tier promotion criteria documentation.
CHANGELOG.md Adds an Unreleased entry noting the safety-model docs sync.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/safety-model.md Outdated

### Layer 7: Global pause + per-user pause (#379)

Two coordinated panic-button levers that sit **ahead** of every other layer above. Both routes a candidate action that would otherwise have auto-executed to `requiresApproval: true` regardless of trust tier, autonomy settings, or policy verdicts:
Comment thread docs/safety-model.md Outdated
Comment on lines +142 to +147
Two coordinated panic-button levers that sit **ahead** of every other layer above. Both routes a candidate action that would otherwise have auto-executed to `requiresApproval: true` regardless of trust tier, autonomy settings, or policy verdicts:

- **Operator kill switch:** `SKYTWIN_AUTO_EXECUTE_DISABLED=true` on the API/worker process. Read once at `PolicyEvaluator` construction. Self-hosters / oncall use this to silence the system without redeploying. Can only be cleared by unsetting the env var.
- **Per-user pause:** `autonomy_settings.paused = true` (set via `PUT /api/users/:userId/autonomy-pause`). Same effect, scoped to one user, clearable from the chrome banner's "Resume" button.

The check sits ahead of the trust-tier gate and the injection guard so no downstream allow-path can bypass it. Actions still land in the Approvals queue — they just don't auto-execute. A sticky red banner reads from `GET /api/users/:userId/autonomy-state` on every navigation + every 30s so a user in pause-mode can't forget they're paused.
Comment thread docs/safety-model.md Outdated
Comment on lines +179 to +183
Promotion criteria below match `PROMOTION_THRESHOLDS` in `@skytwin/shared-types` (`packages/shared-types/src/policy.ts`) and are locked against drift by `packages/policy-engine/src/__tests__/promotion-thresholds-shape.test.ts`. The engine gates on **three** conditions, all of which must clear:

1. `consecutiveApprovals` (resets on any rejection)
2. `minApprovalRatio` (cumulative)
3. `minDurationInTierHours` — time-in-tier floor (#373). Twenty approvals in twenty minutes proves the user clicked through quickly, not that they calibrated the twin. The floor stops single-session ladder-climbing and closes a DB-tampering vector where an attacker bumping `consecutive_approvals` could leapfrog tiers without behavioural evidence.
Comment thread CHANGELOG.md
### Changed (docs — safety-model sync with shipped Tier-2 fixes)

- **`docs/safety-model.md` synced to what actually shipped.** The defense-layers section ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish: Layer 7 — Global pause + per-user pause (#379), Layer 8 — Right to erasure (#376), Layer 9 — Access audit log (#393). Each new entry follows the existing layer template: what the layer is, what it gates, where the code lives, what it can/can't do, and how it interacts with the layers above and below. The trust-tier-progression section also added the time-in-tier floor (#373) as the third gate alongside `consecutiveApprovals` and `minApprovalRatio` — twenty approvals in twenty minutes doesn't earn a promotion anymore, the engine waits 24h / 72h / 168h before lifting the tier. Pointers to the shape-lock test (`promotion-thresholds-shape.test.ts`) and to the cascade E2E test (`cascade-cleanup.e2e.test.ts`) so a future reader can verify the doc matches the engine. Documentation-only — no code change.

1. Grammar — "Both routes a candidate action…" was singular/plural
   mismatch. Fixed to "Both route a candidate action…".

2. Layer 7 pause-ordering claim was inaccurate. I'd written that
   the pause check "sits ahead of the trust-tier gate and the
   injection guard", but per the post-Copilot review on #421
   `PolicyEvaluator.evaluate()` captures pause state at the top
   and APPLIES it at the END so denies (domain blocklist, spend-
   limit, policy deny, injection-guard confirmationLevel) aren't
   overridden. Rewrote the paragraph to describe the actual
   semantic: pause escalates an otherwise-allowed action to
   manual approval; a denied action stays denied.

3. Trust-tier "all three must clear" claim was over-broad. The
   time-in-tier floor (#373) is engine + threshold only — the
   production callers that build ApprovalStats (the progress
   endpoint, the promotion-eligibility job) don't yet populate
   `hoursInCurrentTier`. Added an explicit "Enforcement caveat"
   bullet so the doc no longer over-promises against the engine.
   Reflects the same scope note from the original P1.3 PR.

4. CHANGELOG layout — the new "Changed (docs)" heading I added
   ended up grouping the prior #389 onboarding fix entry under
   it because I didn't re-emit the "Fixed (Epic A — onboarding,
   #389)" heading. Restored the heading so the #389 entry is back
   under its proper section.

Documentation-only — no code touched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jayzalowitz

Copy link
Copy Markdown
Owner Author

All 4 Copilot findings addressed in 5ef6e46:

  1. Grammar: Both routesBoth route.
  2. Layer 7 pause ordering: the doc claimed the check "sits ahead of the trust-tier gate and the injection guard" but per the post-Copilot review on P1.9 #379: global kill switch (operator env + per-user toggle + banner) #421, PolicyEvaluator.evaluate() captures pause at the top and APPLIES it at the END so denies (domain blocklist, spend-limit, policy deny, injection-guard) aren't overridden. Rewrote to describe the actual semantic: pause escalates an otherwise-allowed action; a denied action stays denied.
  3. Trust-tier "all three must clear": the time-in-tier floor (P1.3 Trust tier promotion has no temporal floor #373) is engine + threshold only — production callers don't yet populate hoursInCurrentTier. Added an explicit "Enforcement caveat" reflecting the original P1.3 scope note, so the doc no longer over-promises.
  4. CHANGELOG layout: my new "Changed (docs)" heading swallowed the prior P2.9 "Computer" onboarding path is a no-op stub #389 entry because I didn't re-emit the "Fixed (Epic A — onboarding, P2.9 "Computer" onboarding path is a no-op stub #389)" heading. Restored it.

Documentation-only — no code touched.

@jayzalowitz jayzalowitz merged commit 5bab9c6 into main May 26, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants