Skip to content

explanations: add test coverage for generator, formatters, and persistence #76

@jayzalowitz

Description

@jayzalowitz

explanations: add test coverage for generator, formatters, and persistence

Context

Safety Invariant #2 (CLAUDE.md) requires every decision that results in an action (or a deliberate non-action) to produce an ExplanationRecord. The @skytwin/explanations package is the single point that produces these records — it's how SkyTwin meets its promise that the user can always understand why something happened.

The package has a working implementation (explanation-generator.ts, 361 LOC) but zero test files. There is no __tests__/ directory under packages/explanations/src/. Behavior across the three public methods (generate, formatForUser, formatForAudit) and six private helpers is not locked in. A rename, a branch rewrite, or a contract change on DecisionOutcome could silently break user-facing explanation output and nothing would fail CI.

This is not a stub-filling job — the code already produces meaningful strings. It's a coverage job: pin the current behavior so future changes are intentional.

Claude Code estimate: ~2-3h

Current State (verified 2026-04-23)

Package layout

packages/explanations/src/
  explanation-generator.ts   361 LOC, exports ExplanationGenerator, ExplanationRepositoryPort, AuditRecord
  index.ts                   5 LOC, barrel
  (no __tests__/)

No vitest config at the package level; relies on the workspace root.

Public surface

ExplanationGenerator (class):

  • generate(decision, outcome, context): Promise<ExplanationRecord> — builds the record, calls repository.save(record), returns it.
  • formatForUser(record): string — multi-line human-readable format.
  • formatForAudit(record): AuditRecord — structured object for compliance logs.

Branching that needs coverage

  • buildSummary (lines 176-210) — 4 branches: no selectedAction, autoExecute, requiresApproval, fall-through.
  • buildConfidenceReasoning (lines 256-304) — counts high/low confidence preferences, zero-preference path, single-vs-multi candidate path.
  • buildActionRationale (lines 306-321) — no-action path, autoExecute suffix, plain path.
  • buildCorrectionGuidance (lines 327-360) — autoExecute + reversible, autoExecute + non-reversible, approval path.
  • gatherEvidenceReferences (lines 212-241) — rawData.source present/absent, preferences with 0/1/many evidenceIds (slice(0,3) cap).
  • gatherPreferenceReferences (lines 243-254) — empty vs populated.

Integration risk

generate() calls repository.save(record) once and awaits before returning. No current test verifies that a repository save failure surfaces as a rejected promise vs. a swallowed error. Invariant #2 ("always log explanations") depends on this propagating.

Field shape

ExplanationRecord type lives in @skytwin/shared-types. Field naming (evidenceUsed, preferencesInvoked, confidenceReasoning, etc.) is consumed by apps/web and apps/mobile detail views — a rename ripples across apps. No test locks the contract.

Proposed Change

Add packages/explanations/src/__tests__/explanation-generator.test.ts with vitest, using an in-memory fake repository that records saves. Build a small fixtures module for DecisionObject, DecisionContext, DecisionOutcome, Preference, RiskAssessment. Cover every branch listed above, plus the two public formatters.

No implementation changes except:

  • If a test reveals a branch that's unreachable or genuinely broken, fix narrowly and note it in the PR.
  • Extract fixture builders into packages/explanations/src/__tests__/fixtures.ts to keep the test file readable.

Acceptance Criteria

  1. packages/explanations/src/__tests__/explanation-generator.test.ts exists with ≥20 passing tests covering all branches listed above.
  2. generate() — auto-executed outcome → returned record has non-empty summary, actionRationale, correctionGuidance, empty escalationRationale.
  3. generate()requiresApproval outcome → escalationRationale is non-empty and equals outcome.reasoning.
  4. generate() — no selectedActionsummary contains the word "escalated" (or equivalent), actionRationale starts with "No action was selected".
  5. generate() — calls repository.save exactly once with the built record; returns the same object reference that was saved.
  6. generate() — when repository.save rejects, the rejection propagates (not swallowed).
  7. buildSummary branches produce distinct strings for autoExecute, requiresApproval, no-action, and default fall-through.
  8. buildCorrectionGuidance — reversible auto-executed action includes "Undo this action"; non-reversible omits it and renumbers steps.
  9. gatherEvidenceReferences — caps preference evidence at 3 per preference (respects slice(0, 3)); drops rawData reference when rawData.source is missing.
  10. gatherPreferenceReferences — returns empty array when context.relevantPreferences is empty; maps 1:1 otherwise with howUsed containing the preference value.
  11. formatForUser — output contains --- Decision Explanation ---, the summary, Risk level:, and How to correct this: sections, in order.
  12. formatForUser — omits the "Why approval was needed" section when escalationRationale is undefined.
  13. formatForAuditautoExecuted is true iff escalationRationale is undefined (this test will catch the current implementation — verify it matches intent).
  14. formatForAuditevidenceCount and preferencesCount match the array lengths on the source record.
  15. Running pnpm --filter @skytwin/explanations test exits 0.
  16. Running pnpm test at the root exits 0 with no skipped or flaky tests added.
  17. PR passes /review before merge.

Testing Plan

Layer What Count
Unit generate() — three outcome shapes + record saved + save-rejects-propagates +5
Unit buildSummary — 4 branches +4
Unit buildConfidenceReasoning — zero prefs / mixed prefs / multi-candidate +3
Unit buildActionRationale — no action / auto / plain +3
Unit buildCorrectionGuidance — auto+reversible / auto+non-reversible / approval +3
Unit gatherEvidenceReferences — rawData absent / preferences evidence cap +2
Unit gatherPreferenceReferences — empty / populated +2
Unit formatForUser — contains required sections / omits escalation when absent +2
Unit formatForAuditautoExecuted derivation / count fields +2

Target: ~26 new tests, all passing.

Effort Estimate

  • Fixtures module: ~30min
  • Test file authorship: ~90min
  • Debugging any branches that don't match expected behavior: ~30min buffer

Total: ~2-3h Claude Code time

Files Reference

File Change
packages/explanations/src/__tests__/explanation-generator.test.ts New — full test suite
packages/explanations/src/__tests__/fixtures.ts New — builders for DecisionObject/Context/Outcome/Preference
packages/explanations/package.json Verify test script exists; add if missing
packages/explanations/vitest.config.ts Add only if package-local config is required

Non-Goals

  • Not rewriting explanation copy or changing user-facing strings. Pin current behavior; rewrites are a separate product task.
  • Not adding i18n, templating, or LLM-generated prose.
  • Not changing the ExplanationRecord schema in @skytwin/shared-types.
  • Not covering persistence backends — ExplanationRepositoryPort is stubbed for tests; the DB adapter has its own coverage in @skytwin/db.

Related

  • Companion issue: "Safety kernel test: prove policy denial blocks auto-execution end-to-end" — shares a fixture for blocked-by-policy ExplanationRecord shape.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions