docs(safety-model): sync with shipped Tier-2 fixes by jayzalowitz · Pull Request #451 · jayzalowitz/skytwin

jayzalowitz · 2026-05-26T22:04:43Z

Summary

`docs/safety-model.md` ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish. Adding them so a reader auditing the safety model post-launch can see the current defense-in-depth picture, not the pre-Tier-2 one.

Layers added

Layer	Issue	Where
7. Global pause + per-user pause	#379	env var + autonomy_settings, sits ahead of trust-tier gate
8. Right to erasure	#376	DELETE endpoint + userPurgeRepository, cascade via migration 061
9. Access audit log	#393	access_log table + DbTokenStore instrumentation

Each new entry follows the existing template (what / where / can/can't do / interaction with neighbouring layers).

Trust-tier section update

Added the time-in-tier floor (#373) as the third gate alongside `consecutiveApprovals` and `minApprovalRatio`. 24h / 72h / 168h before lifting the tier. References the shape-lock test that prevents drift.

Documentation-only — zero code change. Built and verified locally.

Test plan

CI matrix (markdown lint)
(Manual) Render on GitHub, confirm new layer sections render correctly + the trust-tier bullets show the new criterion

🤖 Generated with Claude Code

docs/safety-model.md ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish: Layer 7: Global pause + per-user pause (#379) - SKYTWIN_AUTO_EXECUTE_DISABLED operator env var - autonomy_settings.paused per-user toggle - Sits ahead of trust-tier gate + injection guard Layer 8: Right to erasure (#376) - DELETE /api/users/:userId?confirm=delete-my-data - userPurgeRepository.purgeUser in a single transaction - Cascade via migration 061 (#413) collapses 32 tables Layer 9: Access audit log (#393) - access_log table + accessLogRepository - decrypt_oauth_token rows from DbTokenStore - Fire-and-forget; never blocks legitimate decrypt Each new entry follows the existing layer template (what it is, what it gates, where the code lives, can/can't do, interaction with layers above/below). Trust-tier-progression section also added the time-in-tier floor (#373) as the third gate alongside consecutiveApprovals and minApprovalRatio — 24h / 72h / 168h before lifting the tier. Pointers to the shape-lock test (promotion-thresholds-shape.test.ts) and the cascade E2E test (cascade-cleanup.e2e.test.ts) so a future reader can verify the doc matches the engine. Documentation-only — no code change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Updates SkyTwin’s safety-model documentation to reflect Tier-2 “defense in depth” layers added post-launch, and records the docs sync in the changelog.

Changes:

Extend docs/safety-model.md with Layers 7–9 (pause controls, right-to-erasure, access audit log).
Update the Trust Tier Progression section to include the time-in-tier floor as a documented promotion criterion.
Add an Unreleased changelog entry describing the docs sync.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
docs/safety-model.md	Adds Layers 7–9 and updates trust-tier promotion criteria documentation.
CHANGELOG.md	Adds an Unreleased entry noting the safety-model docs sync.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


+### Layer 7: Global pause + per-user pause (#379)
+
+Two coordinated panic-button levers that sit **ahead** of every other layer above. Both routes a candidate action that would otherwise have auto-executed to `requiresApproval: true` regardless of trust tier, autonomy settings, or policy verdicts:


+Two coordinated panic-button levers that sit **ahead** of every other layer above. Both routes a candidate action that would otherwise have auto-executed to `requiresApproval: true` regardless of trust tier, autonomy settings, or policy verdicts:
+
+- **Operator kill switch:** `SKYTWIN_AUTO_EXECUTE_DISABLED=true` on the API/worker process. Read once at `PolicyEvaluator` construction. Self-hosters / oncall use this to silence the system without redeploying. Can only be cleared by unsetting the env var.
+- **Per-user pause:** `autonomy_settings.paused = true` (set via `PUT /api/users/:userId/autonomy-pause`). Same effect, scoped to one user, clearable from the chrome banner's "Resume" button.
+
+The check sits ahead of the trust-tier gate and the injection guard so no downstream allow-path can bypass it. Actions still land in the Approvals queue — they just don't auto-execute. A sticky red banner reads from `GET /api/users/:userId/autonomy-state` on every navigation + every 30s so a user in pause-mode can't forget they're paused.


+Promotion criteria below match `PROMOTION_THRESHOLDS` in `@skytwin/shared-types` (`packages/shared-types/src/policy.ts`) and are locked against drift by `packages/policy-engine/src/__tests__/promotion-thresholds-shape.test.ts`. The engine gates on **three** conditions, all of which must clear:
+
+1. `consecutiveApprovals` (resets on any rejection)
+2. `minApprovalRatio` (cumulative)
+3. `minDurationInTierHours` — time-in-tier floor (#373). Twenty approvals in twenty minutes proves the user clicked through quickly, not that they calibrated the twin. The floor stops single-session ladder-climbing and closes a DB-tampering vector where an attacker bumping `consecutive_approvals` could leapfrog tiers without behavioural evidence.


+### Changed (docs — safety-model sync with shipped Tier-2 fixes)
+
+- **`docs/safety-model.md` synced to what actually shipped.** The defense-layers section ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish: Layer 7 — Global pause + per-user pause (#379), Layer 8 — Right to erasure (#376), Layer 9 — Access audit log (#393). Each new entry follows the existing layer template: what the layer is, what it gates, where the code lives, what it can/can't do, and how it interacts with the layers above and below. The trust-tier-progression section also added the time-in-tier floor (#373) as the third gate alongside `consecutiveApprovals` and `minApprovalRatio` — twenty approvals in twenty minutes doesn't earn a promotion anymore, the engine waits 24h / 72h / 168h before lifting the tier. Pointers to the shape-lock test (`promotion-thresholds-shape.test.ts`) and to the cascade E2E test (`cascade-cleanup.e2e.test.ts`) so a future reader can verify the doc matches the engine. Documentation-only — no code change.



1. Grammar — "Both routes a candidate action…" was singular/plural mismatch. Fixed to "Both route a candidate action…". 2. Layer 7 pause-ordering claim was inaccurate. I'd written that the pause check "sits ahead of the trust-tier gate and the injection guard", but per the post-Copilot review on #421 `PolicyEvaluator.evaluate()` captures pause state at the top and APPLIES it at the END so denies (domain blocklist, spend- limit, policy deny, injection-guard confirmationLevel) aren't overridden. Rewrote the paragraph to describe the actual semantic: pause escalates an otherwise-allowed action to manual approval; a denied action stays denied. 3. Trust-tier "all three must clear" claim was over-broad. The time-in-tier floor (#373) is engine + threshold only — the production callers that build ApprovalStats (the progress endpoint, the promotion-eligibility job) don't yet populate `hoursInCurrentTier`. Added an explicit "Enforcement caveat" bullet so the doc no longer over-promises against the engine. Reflects the same scope note from the original P1.3 PR. 4. CHANGELOG layout — the new "Changed (docs)" heading I added ended up grouping the prior #389 onboarding fix entry under it because I didn't re-emit the "Fixed (Epic A — onboarding, #389)" heading. Restored the heading so the #389 entry is back under its proper section. Documentation-only — no code touched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jayzalowitz · 2026-05-26T22:10:42Z

All 4 Copilot findings addressed in 5ef6e46:

Grammar: Both routes → Both route.
Layer 7 pause ordering: the doc claimed the check "sits ahead of the trust-tier gate and the injection guard" but per the post-Copilot review on P1.9 #379: global kill switch (operator env + per-user toggle + banner) #421, PolicyEvaluator.evaluate() captures pause at the top and APPLIES it at the END so denies (domain blocklist, spend-limit, policy deny, injection-guard) aren't overridden. Rewrote to describe the actual semantic: pause escalates an otherwise-allowed action; a denied action stays denied.
Trust-tier "all three must clear": the time-in-tier floor (P1.3 Trust tier promotion has no temporal floor #373) is engine + threshold only — production callers don't yet populate hoursInCurrentTier. Added an explicit "Enforcement caveat" reflecting the original P1.3 scope note, so the doc no longer over-promises.
CHANGELOG layout: my new "Changed (docs)" heading swallowed the prior P2.9 "Computer" onboarding path is a no-op stub #389 entry because I didn't re-emit the "Fixed (Epic A — onboarding, P2.9 "Computer" onboarding path is a no-op stub #389)" heading. Restored it.

Documentation-only — no code touched.

Copilot AI review requested due to automatic review settings May 26, 2026 22:04

Copilot started reviewing on behalf of jayzalowitz May 26, 2026 22:04 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

jayzalowitz merged commit 5bab9c6 into main May 26, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(safety-model): sync with shipped Tier-2 fixes#451

docs(safety-model): sync with shipped Tier-2 fixes#451
jayzalowitz merged 2 commits into
mainfrom
jayzalowitz/safety-model-docs-sync

jayzalowitz commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jayzalowitz commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### Layer 7: Global pause + per-user pause (#379)

		Two coordinated panic-button levers that sit ahead of every other layer above. Both routes a candidate action that would otherwise have auto-executed to `requiresApproval: true` regardless of trust tier, autonomy settings, or policy verdicts:

		### Changed (docs — safety-model sync with shipped Tier-2 fixes)

		- `docs/safety-model.md` synced to what actually shipped. The defense-layers section ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish: Layer 7 — Global pause + per-user pause (#379), Layer 8 — Right to erasure (#376), Layer 9 — Access audit log (#393). Each new entry follows the existing layer template: what the layer is, what it gates, where the code lives, what it can/can't do, and how it interacts with the layers above and below. The trust-tier-progression section also added the time-in-tier floor (#373) as the third gate alongside `consecutiveApprovals` and `minApprovalRatio` — twenty approvals in twenty minutes doesn't earn a promotion anymore, the engine waits 24h / 72h / 168h before lifting the tier. Pointers to the shape-lock test (`promotion-thresholds-shape.test.ts`) and to the cascade E2E test (`cascade-cleanup.e2e.test.ts`) so a future reader can verify the doc matches the engine. Documentation-only — no code change.

Conversation

jayzalowitz commented May 26, 2026

Summary

Layers added

Trust-tier section update

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

jayzalowitz commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants