test(safety): lock policy-check-blocks-execution invariant#78
Conversation
Closes #75. Three additive guards on the safety kernel: 1. ExecutionRouter throws InvariantViolationError when called without a RiskAssessment or with a CandidateAction whose id does not match the assessment. Pins Safety Invariants #1 and #7 at the boundary, so a future caller that bypasses the decision pipeline cannot silently auto-execute. (+4 unit tests) 2. DecisionMaker.whatWouldIDo no longer leaks blocked candidates as alternativeActions when policy denies every candidate. Returns an empty alternatives array and surfaces the blocking reason via policyNotes so the prediction reflects what the user could actually take. (+1 unit test pinning the no-leak contract) 3. POST /api/events/ingest emits a decision:blocked-by-policy SSE event when no action was selected and no approval was created, so users see the policy result instead of silent ingestion. (+1 unit test) Production call sites (apps/api/src/routes/events.ts:230, apps/api/src/routes/approvals.ts:264) already build matching RiskAssessments — guards are inert for them, active against new orphan callers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Locks several safety invariants at key boundaries by adding runtime guards in the execution router, preventing policy-blocked action leakage in the prediction API, and improving observability for “no action taken” outcomes via SSE.
Changes:
- Add
InvariantViolationError+ runtime input validation toExecutionRouter(including streaming path) with new unit tests. - Prevent
DecisionMaker.whatWouldIDo()from returning blocked candidates viaalternativeActionswhen nothing is policy-allowed; surfacepolicyNotesinstead. - Emit a new
decision:blocked-by-policySSE event fromPOST /api/events/ingestwhen no action is selected, with route-level test coverage.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/execution-router/src/index.ts | Re-export InvariantViolationError for external consumers. |
| packages/execution-router/src/execution-router.ts | Add invariant guard + new error type; enforce action/assessment consistency before executing/streaming. |
| packages/execution-router/src/tests/execution-router.test.ts | Add unit tests asserting invariant guard behavior for sync + streaming entrypoints. |
| packages/decision-engine/src/decision-maker.ts | Prevent “alternativeActions” leakage when policy blocks all candidates; add policyNotes behavior. |
| packages/decision-engine/src/tests/decision-maker.test.ts | Add test pinning the “no-leak when all blocked” whatWouldIDo contract. |
| apps/api/src/routes/events.ts | Emit decision:blocked-by-policy SSE event when ingest results in no selected action and no approval/execution. |
| apps/api/src/tests/events-routes.test.ts | Add test asserting the new blocked-by-policy SSE emission and that execution is not triggered. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Surface "no action taken" outcomes (every candidate blocked, or none | ||
| // generated) so the user can see why nothing happened. Without this the | ||
| // event ingest is silent and the policy decision is invisible — Safety | ||
| // Invariant #1 (every auto-execute path went through a policy check) is | ||
| // structurally enforced upstream, but the *result* of that check needs | ||
| // to be observable. | ||
| if (!outcome.selectedAction && !approvalRequest && !executionResult) { | ||
| sseManager.emit(userId, 'decision:blocked-by-policy', { | ||
| decisionId: decision.id, | ||
| reason: outcome.reasoning, | ||
| domain: decision.domain, | ||
| situationType: decision.situationType, | ||
| urgency: decision.urgency, | ||
| }); | ||
| } |
There was a problem hiding this comment.
The SSE event name decision:blocked-by-policy is emitted whenever selectedAction is null, which also includes the "no candidates generated" path (DecisionMaker returns selectedAction: null with reasoning like "No candidate actions could be generated..."). That means this event can represent non-policy cases and may mislead clients that treat it as a policy denial. Consider distinguishing these outcomes (e.g., emit a separate event for allCandidates.length === 0, or include a reasonCode/blockedByPolicy field in the payload, or tighten the condition to only use this event when there were candidates but none were allowed).
| it('emits decision:blocked-by-policy when no action was selected (Safety Invariant #1)', async () => { | ||
| mockEvaluate.mockResolvedValue({ | ||
| autoExecute: false, | ||
| requiresApproval: false, | ||
| reasoning: 'All candidates blocked by policy "No travel auto-bookings".', | ||
| selectedAction: null, | ||
| allCandidates: [], | ||
| }); | ||
|
|
||
| const res = await request(buildApp(), 'POST', '/api/events/ingest', { | ||
| userId: 'user-1', | ||
| source: 'test', | ||
| type: 'travel_decision', | ||
| }); |
There was a problem hiding this comment.
This test describes a travel policy block ("No travel auto-bookings") and sends type: 'travel_decision', but mockInterpret in beforeEach still returns a calendar decision (domain: 'calendar', situationType: 'calendar_conflict'). The test passes because it doesn’t assert on domain/situationType, but the mismatch makes the scenario harder to reason about. Consider overriding mockInterpret in this test to return a travel decision, or adjust the test wording/request body to match the mocked decision.
) Two follow-ups from the post-merge review of #78: - Add the missing test for assertValidExecutionInputs(null, ...). The branch is unreachable via TypeScript, but defensive code on a safety boundary deserves a test so future refactors can't silently delete it. - Convert the seven shared-fixture describe blocks in explanation-generator.test.ts from module-level const to per-test beforeEach. Saved records no longer accumulate across it() calls in the same describe, so future "save called once" assertions won't inherit cross-contamination. Net new tests: +1. Existing tests still 30/30 explanations, now 67/67 execution-router. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…h work (#87) Captures eight PRs (#77, #78, #79, #81, #82, #83, #84, #85, #86) under an [Unreleased] section. No code changes — pure documentation. Bumping VERSION + cutting a tag is left as a separate decision so the release moment stays explicit. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Production call sites (`apps/api/src/routes/events.ts:230`, `apps/api/src/routes/approvals.ts:264`) already build matching `RiskAssessment` objects — the new guards are inert for them and active against future orphan callers.
Test plan
Closes #75
🤖 Generated with Claude Code