Skip to content

test(safety): lock policy-check-blocks-execution invariant#78

Merged
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/policy-blocks-execution
Apr 26, 2026
Merged

test(safety): lock policy-check-blocks-execution invariant#78
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/policy-blocks-execution

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

  • `ExecutionRouter` now throws `InvariantViolationError` when called without a `RiskAssessment` or with a mismatched `actionId` — pins Safety Invariants feat: bootstrap SkyTwin monorepo (v0.1.0.0) #1 and feat: real OpenClaw execution, 6 new domains, expanded onboarding #7 at the boundary so a future caller that skips the decision pipeline cannot silently auto-execute (+4 unit tests)
  • `DecisionMaker.whatWouldIDo` no longer leaks blocked candidates via `alternativeActions` when policy denies every candidate; returns empty alternatives and surfaces the blocking reason via `policyNotes` (+1 unit test pinning the no-leak contract)
  • `POST /api/events/ingest` emits a `decision:blocked-by-policy` SSE event when no action was selected and no approval was created, so users can see the policy result instead of silent ingestion (+1 unit test)

Production call sites (`apps/api/src/routes/events.ts:230`, `apps/api/src/routes/approvals.ts:264`) already build matching `RiskAssessment` objects — the new guards are inert for them and active against future orphan callers.

Test plan

  • `pnpm --filter @skytwin/execution-router test` — 66/66 (was 62)
  • `pnpm --filter @skytwin/decision-engine test` — 79/79 (was 78)
  • `pnpm --filter @skytwin/api test` — 142 passed | 22 skipped (was 141)
  • `pnpm test` — 38/38 turbo tasks
  • `pnpm lint` — 30/30 turbo tasks

Closes #75

🤖 Generated with Claude Code

Closes #75. Three additive guards on the safety kernel:

1. ExecutionRouter throws InvariantViolationError when called without a
   RiskAssessment or with a CandidateAction whose id does not match the
   assessment. Pins Safety Invariants #1 and #7 at the boundary, so a
   future caller that bypasses the decision pipeline cannot silently
   auto-execute. (+4 unit tests)

2. DecisionMaker.whatWouldIDo no longer leaks blocked candidates as
   alternativeActions when policy denies every candidate. Returns an
   empty alternatives array and surfaces the blocking reason via
   policyNotes so the prediction reflects what the user could actually
   take. (+1 unit test pinning the no-leak contract)

3. POST /api/events/ingest emits a decision:blocked-by-policy SSE event
   when no action was selected and no approval was created, so users see
   the policy result instead of silent ingestion. (+1 unit test)

Production call sites (apps/api/src/routes/events.ts:230,
apps/api/src/routes/approvals.ts:264) already build matching
RiskAssessments — guards are inert for them, active against new
orphan callers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 26, 2026 20:27

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Locks several safety invariants at key boundaries by adding runtime guards in the execution router, preventing policy-blocked action leakage in the prediction API, and improving observability for “no action taken” outcomes via SSE.

Changes:

  • Add InvariantViolationError + runtime input validation to ExecutionRouter (including streaming path) with new unit tests.
  • Prevent DecisionMaker.whatWouldIDo() from returning blocked candidates via alternativeActions when nothing is policy-allowed; surface policyNotes instead.
  • Emit a new decision:blocked-by-policy SSE event from POST /api/events/ingest when no action is selected, with route-level test coverage.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/execution-router/src/index.ts Re-export InvariantViolationError for external consumers.
packages/execution-router/src/execution-router.ts Add invariant guard + new error type; enforce action/assessment consistency before executing/streaming.
packages/execution-router/src/tests/execution-router.test.ts Add unit tests asserting invariant guard behavior for sync + streaming entrypoints.
packages/decision-engine/src/decision-maker.ts Prevent “alternativeActions” leakage when policy blocks all candidates; add policyNotes behavior.
packages/decision-engine/src/tests/decision-maker.test.ts Add test pinning the “no-leak when all blocked” whatWouldIDo contract.
apps/api/src/routes/events.ts Emit decision:blocked-by-policy SSE event when ingest results in no selected action and no approval/execution.
apps/api/src/tests/events-routes.test.ts Add test asserting the new blocked-by-policy SSE emission and that execution is not triggered.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +356 to +370
// Surface "no action taken" outcomes (every candidate blocked, or none
// generated) so the user can see why nothing happened. Without this the
// event ingest is silent and the policy decision is invisible — Safety
// Invariant #1 (every auto-execute path went through a policy check) is
// structurally enforced upstream, but the *result* of that check needs
// to be observable.
if (!outcome.selectedAction && !approvalRequest && !executionResult) {
sseManager.emit(userId, 'decision:blocked-by-policy', {
decisionId: decision.id,
reason: outcome.reasoning,
domain: decision.domain,
situationType: decision.situationType,
urgency: decision.urgency,
});
}

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSE event name decision:blocked-by-policy is emitted whenever selectedAction is null, which also includes the "no candidates generated" path (DecisionMaker returns selectedAction: null with reasoning like "No candidate actions could be generated..."). That means this event can represent non-policy cases and may mislead clients that treat it as a policy denial. Consider distinguishing these outcomes (e.g., emit a separate event for allCandidates.length === 0, or include a reasonCode/blockedByPolicy field in the payload, or tighten the condition to only use this event when there were candidates but none were allowed).

Copilot uses AI. Check for mistakes.
Comment on lines +170 to +183
it('emits decision:blocked-by-policy when no action was selected (Safety Invariant #1)', async () => {
mockEvaluate.mockResolvedValue({
autoExecute: false,
requiresApproval: false,
reasoning: 'All candidates blocked by policy "No travel auto-bookings".',
selectedAction: null,
allCandidates: [],
});

const res = await request(buildApp(), 'POST', '/api/events/ingest', {
userId: 'user-1',
source: 'test',
type: 'travel_decision',
});

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test describes a travel policy block ("No travel auto-bookings") and sends type: 'travel_decision', but mockInterpret in beforeEach still returns a calendar decision (domain: 'calendar', situationType: 'calendar_conflict'). The test passes because it doesn’t assert on domain/situationType, but the mismatch makes the scenario harder to reason about. Consider overriding mockInterpret in this test to return a travel decision, or adjust the test wording/request body to match the mocked decision.

Copilot uses AI. Check for mistakes.
@jayzalowitz jayzalowitz merged commit 8f03ee7 into main Apr 26, 2026
12 checks passed
@jayzalowitz jayzalowitz deleted the jayzalowitz/policy-blocks-execution branch April 26, 2026 20:38
jayzalowitz added a commit that referenced this pull request Apr 26, 2026
)

Two follow-ups from the post-merge review of #78:

- Add the missing test for assertValidExecutionInputs(null, ...). The
  branch is unreachable via TypeScript, but defensive code on a safety
  boundary deserves a test so future refactors can't silently delete it.
- Convert the seven shared-fixture describe blocks in
  explanation-generator.test.ts from module-level const to per-test
  beforeEach. Saved records no longer accumulate across it() calls in
  the same describe, so future "save called once" assertions won't
  inherit cross-contamination.

Net new tests: +1. Existing tests still 30/30 explanations, now
67/67 execution-router.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
jayzalowitz added a commit that referenced this pull request Apr 27, 2026
…h work (#87)

Captures eight PRs (#77, #78, #79, #81, #82, #83, #84, #85, #86) under
an [Unreleased] section. No code changes — pure documentation.

Bumping VERSION + cutting a tag is left as a separate decision so the
release moment stays explicit.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Safety kernel test: prove policy denial blocks auto-execution end-to-end

2 participants