docs(safety-model): sync with shipped Tier-2 fixes#451
Merged
Conversation
docs/safety-model.md ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish: Layer 7: Global pause + per-user pause (#379) - SKYTWIN_AUTO_EXECUTE_DISABLED operator env var - autonomy_settings.paused per-user toggle - Sits ahead of trust-tier gate + injection guard Layer 8: Right to erasure (#376) - DELETE /api/users/:userId?confirm=delete-my-data - userPurgeRepository.purgeUser in a single transaction - Cascade via migration 061 (#413) collapses 32 tables Layer 9: Access audit log (#393) - access_log table + accessLogRepository - decrypt_oauth_token rows from DbTokenStore - Fire-and-forget; never blocks legitimate decrypt Each new entry follows the existing layer template (what it is, what it gates, where the code lives, can/can't do, interaction with layers above/below). Trust-tier-progression section also added the time-in-tier floor (#373) as the third gate alongside consecutiveApprovals and minApprovalRatio — 24h / 72h / 168h before lifting the tier. Pointers to the shape-lock test (promotion-thresholds-shape.test.ts) and the cascade E2E test (cascade-cleanup.e2e.test.ts) so a future reader can verify the doc matches the engine. Documentation-only — no code change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates SkyTwin’s safety-model documentation to reflect Tier-2 “defense in depth” layers added post-launch, and records the docs sync in the changelog.
Changes:
- Extend
docs/safety-model.mdwith Layers 7–9 (pause controls, right-to-erasure, access audit log). - Update the Trust Tier Progression section to include the time-in-tier floor as a documented promotion criterion.
- Add an Unreleased changelog entry describing the docs sync.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docs/safety-model.md | Adds Layers 7–9 and updates trust-tier promotion criteria documentation. |
| CHANGELOG.md | Adds an Unreleased entry noting the safety-model docs sync. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ### Layer 7: Global pause + per-user pause (#379) | ||
|
|
||
| Two coordinated panic-button levers that sit **ahead** of every other layer above. Both routes a candidate action that would otherwise have auto-executed to `requiresApproval: true` regardless of trust tier, autonomy settings, or policy verdicts: |
Comment on lines
+142
to
+147
| Two coordinated panic-button levers that sit **ahead** of every other layer above. Both routes a candidate action that would otherwise have auto-executed to `requiresApproval: true` regardless of trust tier, autonomy settings, or policy verdicts: | ||
|
|
||
| - **Operator kill switch:** `SKYTWIN_AUTO_EXECUTE_DISABLED=true` on the API/worker process. Read once at `PolicyEvaluator` construction. Self-hosters / oncall use this to silence the system without redeploying. Can only be cleared by unsetting the env var. | ||
| - **Per-user pause:** `autonomy_settings.paused = true` (set via `PUT /api/users/:userId/autonomy-pause`). Same effect, scoped to one user, clearable from the chrome banner's "Resume" button. | ||
|
|
||
| The check sits ahead of the trust-tier gate and the injection guard so no downstream allow-path can bypass it. Actions still land in the Approvals queue — they just don't auto-execute. A sticky red banner reads from `GET /api/users/:userId/autonomy-state` on every navigation + every 30s so a user in pause-mode can't forget they're paused. |
Comment on lines
+179
to
+183
| Promotion criteria below match `PROMOTION_THRESHOLDS` in `@skytwin/shared-types` (`packages/shared-types/src/policy.ts`) and are locked against drift by `packages/policy-engine/src/__tests__/promotion-thresholds-shape.test.ts`. The engine gates on **three** conditions, all of which must clear: | ||
|
|
||
| 1. `consecutiveApprovals` (resets on any rejection) | ||
| 2. `minApprovalRatio` (cumulative) | ||
| 3. `minDurationInTierHours` — time-in-tier floor (#373). Twenty approvals in twenty minutes proves the user clicked through quickly, not that they calibrated the twin. The floor stops single-session ladder-climbing and closes a DB-tampering vector where an attacker bumping `consecutive_approvals` could leapfrog tiers without behavioural evidence. |
| ### Changed (docs — safety-model sync with shipped Tier-2 fixes) | ||
|
|
||
| - **`docs/safety-model.md` synced to what actually shipped.** The defense-layers section ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish: Layer 7 — Global pause + per-user pause (#379), Layer 8 — Right to erasure (#376), Layer 9 — Access audit log (#393). Each new entry follows the existing layer template: what the layer is, what it gates, where the code lives, what it can/can't do, and how it interacts with the layers above and below. The trust-tier-progression section also added the time-in-tier floor (#373) as the third gate alongside `consecutiveApprovals` and `minApprovalRatio` — twenty approvals in twenty minutes doesn't earn a promotion anymore, the engine waits 24h / 72h / 168h before lifting the tier. Pointers to the shape-lock test (`promotion-thresholds-shape.test.ts`) and to the cascade E2E test (`cascade-cleanup.e2e.test.ts`) so a future reader can verify the doc matches the engine. Documentation-only — no code change. | ||
|
|
1. Grammar — "Both routes a candidate action…" was singular/plural mismatch. Fixed to "Both route a candidate action…". 2. Layer 7 pause-ordering claim was inaccurate. I'd written that the pause check "sits ahead of the trust-tier gate and the injection guard", but per the post-Copilot review on #421 `PolicyEvaluator.evaluate()` captures pause state at the top and APPLIES it at the END so denies (domain blocklist, spend- limit, policy deny, injection-guard confirmationLevel) aren't overridden. Rewrote the paragraph to describe the actual semantic: pause escalates an otherwise-allowed action to manual approval; a denied action stays denied. 3. Trust-tier "all three must clear" claim was over-broad. The time-in-tier floor (#373) is engine + threshold only — the production callers that build ApprovalStats (the progress endpoint, the promotion-eligibility job) don't yet populate `hoursInCurrentTier`. Added an explicit "Enforcement caveat" bullet so the doc no longer over-promises against the engine. Reflects the same scope note from the original P1.3 PR. 4. CHANGELOG layout — the new "Changed (docs)" heading I added ended up grouping the prior #389 onboarding fix entry under it because I didn't re-emit the "Fixed (Epic A — onboarding, #389)" heading. Restored the heading so the #389 entry is back under its proper section. Documentation-only — no code touched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Owner
Author
|
All 4 Copilot findings addressed in 5ef6e46:
Documentation-only — no code touched. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`docs/safety-model.md` ended at Layer 6 (Approval Routing) and didn't document the three new layers that landed during Tier-2 polish. Adding them so a reader auditing the safety model post-launch can see the current defense-in-depth picture, not the pre-Tier-2 one.
Layers added
Each new entry follows the existing template (what / where / can/can't do / interaction with neighbouring layers).
Trust-tier section update
Added the time-in-tier floor (#373) as the third gate alongside `consecutiveApprovals` and `minApprovalRatio`. 24h / 72h / 168h before lifting the tier. References the shape-lock test that prevents drift.
Documentation-only — zero code change. Built and verified locally.
Test plan
🤖 Generated with Claude Code