Safety kernel test: prove policy denial blocks auto-execution end-to-end

# Safety kernel test: prove policy denial blocks auto-execution end-to-end

## Context

Safety Invariant #1 (CLAUDE.md) is the non-negotiable rule of the decision pipeline: **"Never auto-execute without a policy check."** Every path that produces an `autoExecute: true` outcome must have passed `PolicyEvaluator.evaluate()`.

Today the invariant is enforced *structurally* — `DecisionMaker.evaluate()` runs every candidate through the policy evaluator, and `apps/api/src/routes/events.ts` branches on `outcome.autoExecute` before calling the execution router. A unit test in `decision-maker.test.ts:186` covers the blocked path with a mock evaluator.

What's missing is an **integration-level** test that proves the invariant at the handler boundary: when the policy layer denies all candidates, no execution adapter is invoked, no execution plan is persisted, and no `plan_completed`/`plan_failed` event is emitted. Without this, a future refactor that accidentally calls `executeWithRoutingStreaming` outside the `outcome.autoExecute` branch would pass unit tests and pass type-checking.

This is the single highest-leverage safety test in the codebase. The whole trust model collapses if the policy check is ever bypassed.

**Claude Code estimate: ~2-3h**

## Current State (verified 2026-04-23)

### Policy enforcement path
`packages/decision-engine/src/decision-maker.ts:139-162` loops scored candidates, calls `policyEvaluator.evaluate(...)`, and only sets `selectedAction` when `policyDecision.allowed === true`. If every candidate is blocked, `selectedAction` stays `null` and `outcome.autoExecute` stays `false`.

`apps/api/src/routes/events.ts:205-218` escalation branch: if `outcome.requiresApproval`, creates an approval and emits `decision:pending-approval` — does not call execution router.

`apps/api/src/routes/events.ts:219-310` execution branch: guarded by `outcome.autoExecute && outcome.selectedAction`, calls `executionRouter.executeWithRoutingStreaming(...)`, persists an `execution_plans` row, emits `decision:step` / `decision:executed`.

### Unit coverage
`packages/decision-engine/src/__tests__/decision-maker.test.ts:186-212` — "should deny action when policy blocks it" — uses a mock `PolicyEvaluator` that returns `{ allowed: false }` and asserts `outcome.autoExecute === false` and `outcome.selectedAction === null`. Good, but the mock collapses the whole policy layer into one return value.

### Integration coverage gaps
- `apps/api/src/__tests__/e2e-api.test.ts` has approval-lifecycle and policy-CRUD tests but **none that exercise the "all candidates blocked" path** and assert execution router was never called.
- No test covers the `requiresApproval` escalation branch (`events.ts:205-218`) end-to-end against a real database with a user-scoped policy.
- No test covers the `whatWouldIDo` prediction path in `decision-maker.ts:237`: does the predict/query flow honor the same policy layer, or can it leak action recommendations that would actually be blocked at execution time?
- `packages/execution-router/src/__tests__/execution-router.test.ts` does not have a "never executes without upstream policy pass" check — the router accepts any `CandidateAction` + `RiskAssessment` pair handed to it, so the invariant lives entirely in the caller.

### Observability gaps
When a policy blocks, `outcome.reasoning` gets set (`decision-maker.ts:160`) but nothing is emitted to the audit log or SSE stream. A user has no way to see "SkyTwin declined to act because policy X blocked Y" unless they read the stored `DecisionOutcome`.

## Proposed Change

### 1. Add e2e test: policy denial blocks execution
New test in `apps/api/src/__tests__/e2e-api.test.ts` (new `describe('Policy safety kernel')` block):
- Create a user at `trustTier: 'confident'` (would normally auto-execute).
- Create a custom action policy via `POST /api/policies/:userId` that blocks `actionType: 'email-send'` unconditionally (e.g. `conditions: { block: true }`).
- Ingest an event that would generate an `email-send` candidate.
- Assert: response `outcome.autoExecute === false`, `outcome.selectedAction === null`.
- Assert: `GET /api/audit/:userId` shows zero execution events for this decision.
- Assert: no row in `execution_plans` with `decision_id` matching the ingested decision.
- Assert: the `decision:executed` SSE event was never emitted (tail SSE during the test window).

### 2. Add e2e test: requiresApproval escalation path
Same block, second test:
- Create a user at `trustTier: 'observer'` (forces escalation).
- Ingest event.
- Assert: `outcome.requiresApproval === true`, `outcome.autoExecute === false`.
- Assert: approval row created with `status: 'pending'`.
- Assert: no execution plan created until approval is approved.
- Approve via `POST /api/approvals/:id/respond`.
- Assert: execution plan created *after* approval.

### 3. Lock the invariant at the router boundary
Add a runtime guard in `packages/execution-router/src/execution-router.ts` on the entry to `executeWithRoutingStreaming`:
- If caller has not supplied a `RiskAssessment` with `overallTier`, throw `InvariantViolationError`.
- Log a structured warning if called with an action whose `confidence === SPECULATIVE` and `overallTier === HIGH`/`CRITICAL` (not a block, just a tripwire).

Add unit test in `packages/execution-router/src/__tests__/execution-router.test.ts`:
- Invoking `executeWithRoutingStreaming(action, null as unknown as RiskAssessment, userId)` throws.
- Invoking with mismatched `actionId` between action and assessment throws.

### 4. Cover the predict path
Add unit test in `decision-maker.test.ts` for `whatWouldIDo`:
- Mock evaluator returns `allowed: false` for all candidates.
- Assert `whatWouldIDo` response does not recommend a blocked action — either returns the top *allowed* candidate or an explicit "no recommendation" result.

### 5. Emit blocked-by-policy audit event
In `events.ts`, add a branch: if `!outcome.selectedAction && !outcome.requiresApproval`, emit `decision:blocked-by-policy` SSE event and write an `ExplanationRecord` with `escalationRationale: outcome.reasoning`. This makes the invariant observable.

## Acceptance Criteria

1. E2E test creates user with `trustTier: 'confident'` + blocking policy for `email-send` → ingests matching event → response `outcome.autoExecute === false` and `outcome.selectedAction === null`.
2. Same test → `execution_plans` table queried by `decision_id` returns zero rows.
3. Same test → no `decision:step` or `decision:executed` SSE event emitted in the test window.
4. E2E test with `trustTier: 'observer'` + standard event → approval row created with status `pending` → `execution_plans` count unchanged before approval.
5. Same test → `POST /api/approvals/:id/respond` with action `approve` → within 2s, `execution_plans` row created with matching `decision_id`.
6. `ExecutionRouter.executeWithRoutingStreaming` throws `InvariantViolationError` when called with `null` or `undefined` `RiskAssessment`.
7. `ExecutionRouter.executeWithRoutingStreaming` throws when `action.id !== assessment.actionId`.
8. `whatWouldIDo` with all candidates blocked by a mock evaluator returns no recommended action (does not leak a blocked candidate).
9. When all candidates are blocked by policy, handler writes an `ExplanationRecord` with non-empty `escalationRationale` and emits `decision:blocked-by-policy` SSE event with shape `{ decisionId, reason }`.
10. `grep -rn "executeWithRoutingStreaming" apps/ packages/` — every call site is either inside an `if (outcome.autoExecute)` guard or behind an approval check. No orphan call sites.
11. All existing tests pass. New test count: +6 unit, +2 e2e.
12. PR passes `/review` before merge.

## Testing Plan

| Layer | What | Count |
|-------|------|-------|
| Unit (execution-router) | Null/undefined `RiskAssessment` throws | +1 |
| Unit (execution-router) | Mismatched `actionId` throws | +1 |
| Unit (decision-maker) | `whatWouldIDo` honors policy blocks | +2 |
| Unit (explanations) | Blocked-by-policy ExplanationRecord shape | +1 |
| Unit (events handler) | Emits `decision:blocked-by-policy` branch | +1 |
| E2E | Policy blocks → no execution, no plan, no SSE | +1 |
| E2E | Approval gate → plan created only after approve | +1 |

## Effort Estimate

- E2E tests (#1, #2): ~1h (mostly wiring fixtures + SSE tailing)
- Router boundary guard + tests (#3): ~30min
- Predict-path coverage (#4): ~20min
- Blocked-by-policy audit event (#5): ~30min
- Grep sweep + fixing any orphan call sites (#10): ~15min

**Total: ~2-3h Claude Code time**

## Files Reference

| File | Change |
|------|--------|
| `apps/api/src/__tests__/e2e-api.test.ts` | Add `Policy safety kernel` describe block (+2 tests) |
| `packages/execution-router/src/execution-router.ts` | Add runtime invariant guard on entry |
| `packages/execution-router/src/__tests__/execution-router.test.ts` | Add guard tests (+2 tests) |
| `packages/decision-engine/src/__tests__/decision-maker.test.ts` | Add `whatWouldIDo` policy tests (+2 tests) |
| `apps/api/src/routes/events.ts` | Add `decision:blocked-by-policy` branch + audit emit |
| `packages/explanations/src/__tests__/explanation-generator.test.ts` | Add blocked-by-policy record test (see companion issue) |
| `packages/core/src/errors.ts` | Add `InvariantViolationError` class if not present |

## Non-Goals

- Not rewriting the policy engine or changing evaluation semantics.
- Not adding UI affordances for blocked decisions (covered separately by dashboard work).
- Not adding per-candidate reasoning traces — one `reasoning` string on `DecisionOutcome` is sufficient for this iteration.


File	Change
`apps/api/src/__tests__/e2e-api.test.ts`	Add `Policy safety kernel` describe block (+2 tests)
`packages/execution-router/src/execution-router.ts`	Add runtime invariant guard on entry
`packages/execution-router/src/__tests__/execution-router.test.ts`	Add guard tests (+2 tests)
`packages/decision-engine/src/__tests__/decision-maker.test.ts`	Add `whatWouldIDo` policy tests (+2 tests)
`apps/api/src/routes/events.ts`	Add `decision:blocked-by-policy` branch + audit emit
`packages/explanations/src/__tests__/explanation-generator.test.ts`	Add blocked-by-policy record test (see companion issue)
`packages/core/src/errors.ts`	Add `InvariantViolationError` class if not present

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety kernel test: prove policy denial blocks auto-execution end-to-end #75

Safety kernel test: prove policy denial blocks auto-execution end-to-end

Context

Current State (verified 2026-04-23)

Policy enforcement path

Unit coverage

Integration coverage gaps

Observability gaps

Proposed Change

1. Add e2e test: policy denial blocks execution

2. Add e2e test: requiresApproval escalation path

3. Lock the invariant at the router boundary

4. Cover the predict path

5. Emit blocked-by-policy audit event

Acceptance Criteria

Testing Plan

Effort Estimate

Files Reference

Non-Goals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Layer	What	Count
Unit (execution-router)	Null/undefined `RiskAssessment` throws	+1
Unit (execution-router)	Mismatched `actionId` throws	+1
Unit (decision-maker)	`whatWouldIDo` honors policy blocks	+2
Unit (explanations)	Blocked-by-policy ExplanationRecord shape	+1
Unit (events handler)	Emits `decision:blocked-by-policy` branch	+1
E2E	Policy blocks → no execution, no plan, no SSE	+1
E2E	Approval gate → plan created only after approve	+1

Safety kernel test: prove policy denial blocks auto-execution end-to-end #75

Description

Safety kernel test: prove policy denial blocks auto-execution end-to-end

Context

Current State (verified 2026-04-23)

Policy enforcement path

Unit coverage

Integration coverage gaps

Observability gaps

Proposed Change

1. Add e2e test: policy denial blocks execution

2. Add e2e test: requiresApproval escalation path

3. Lock the invariant at the router boundary

4. Cover the predict path

5. Emit blocked-by-policy audit event

Acceptance Criteria

Testing Plan

Effort Estimate

Files Reference

Non-Goals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions