Skip to content

feat(workflows): add execution-evidence contract with terminal-success gate#1601

Open
shhaider wants to merge 4 commits into
coleam00:devfrom
shhaider:archon/task-archon-p3-real-execution-proof
Open

feat(workflows): add execution-evidence contract with terminal-success gate#1601
shhaider wants to merge 4 commits into
coleam00:devfrom
shhaider:archon/task-archon-p3-real-execution-proof

Conversation

@shhaider

@shhaider shhaider commented May 6, 2026

Copy link
Copy Markdown

Summary

  • Problem: PR-producing workflows can finish with status='completed' based only on a self-report ("I created the PR"), with no enforced contract proving the work actually happened. Planning artifacts, fabricated commit SHAs, unpushed branches, or unreachable PR URLs all pass undetected.
  • Why it matters: Trust in autonomous workflow runs depends on a hard, machine-checkable boundary between "claimed completion" and "verified completion." Without one, downstream automation (auto-resume, run-history dashboards, governance gates) inherits unverifiable state.
  • What changed: Added a typed evidence contract (executionEvidenceSchema, evidencePolicySchema) and a three-layer validator (evidence-validator.ts) inside @archon/workflows. The DAG executor now deterministically blocks terminal completed status when the workflow declares evidence_policy.required: true and $ARTIFACTS_DIR/evidence.json is missing, malformed, or fake. On failure the run is downgraded to failed with structured issues persisted at metadata.evidence_validation.
  • What did not change (scope boundary): No new node type, no bundled YAML changes, no CLI subcommand, no DB migration, no web UI, no provider/adapter changes, no AI-driven judging, no retroactive validation of prior runs. Reality I/O (git ls-remote, gh pr view) is opt-in via verify: 'reality'; default is 'shape'.

UX Journey

Before

Workflow author        Archon executor              Operator
─────────────────      ───────────────              ────────
runs PR workflow ───▶  executes DAG nodes
                       last node completes
                       store.completeWorkflowRun()
                       status='completed' ─────────▶ sees ✅ completed
                                                     (no verification —
                                                      may be planning-only,
                                                      fake SHA, unpushed
                                                      branch, etc.)

After

Workflow author              Archon executor                          Operator
─────────────────            ────────────────                          ────────
sets evidence_policy:        executes DAG nodes
  required: true             last node completes
  verify: 'shape'            [+] reads $ARTIFACTS_DIR/evidence.json
writes evidence.json     ──▶ [+] validateEvidence(): shape → integrity → reality
                                  │
                                  ├─ ok ──▶ store.completeWorkflowRun()
                                  │        status='completed'  ──────▶ ✅ completed
                                  │
                                  └─ fail ─▶ store.failWorkflowRun(msg)
                                            updateWorkflowRun({
                                              metadata.evidence_validation: {issues}
                                            })
                                            status='failed'  ────────▶ ❌ failed +
                                                                       actionable issues

Architecture Diagram

Before

@archon/workflows
├── schemas/
│   ├── workflow.ts          (workflowBaseSchema)
│   ├── workflow-run.ts
│   └── index.ts
├── dag-executor.ts          ──▶ store.completeWorkflowRun()  [single terminal-success]
├── store.ts                 (IWorkflowStore)
└── validator.ts             (workflow-definition validator)

After

@archon/workflows
├── schemas/
│   ├── evidence.ts          [+]   executionEvidenceSchema (discriminated union on `kind`),
│   │                              evidencePolicySchema, evidenceValidationIssueSchema,
│   │                              evidenceValidationResultSchema
│   ├── workflow.ts          [~]   workflowBaseSchema gains optional `evidence_policy`
│   ├── workflow-run.ts
│   └── index.ts             [~]   re-exports new evidence schemas
├── evidence-validator.ts    [+]   validateEvidenceShape (Zod)
│                                  validateEvidenceIntegrity (cross-field)
│                                  verifyEvidenceReality (git/gh I/O, opt-in)
│                                  loadEvidenceFromArtifacts
│                                  validateEvidence (orchestrator)
├── dag-executor.ts          [~]   gates terminal-success on evidence_policy
│                                  === store.completeWorkflowRun()  (when valid)
│                                  === store.failWorkflowRun() + updateWorkflowRun(metadata)
│                                                                  (when invalid)
├── store.ts
└── validator.ts

Connection inventory:

From To Status Notes
dag-executor.ts evidence-validator.ts::validateEvidence new Called once at terminal-success boundary when evidence_policy?.required === true
evidence-validator.ts @archon/git (execFileAsync) new Reality layer only; opt-in via verify: 'reality'. Stubbed in tests via isolated mock.module() invocation
evidence-validator.ts gh CLI (subprocess) new Reality layer only; ENOENT/auth failures produce clean error issues, never throw
schemas/index.ts schemas/evidence.ts new Re-export executionEvidenceSchema, evidencePolicySchema, derived types
schemas/workflow.ts schemas/evidence.ts new workflowBaseSchema.evidence_policy: evidencePolicySchema.optional()
dag-executor.ts store.completeWorkflowRun unchanged Still the single terminal-success transition
dag-executor.ts store.failWorkflowRun unchanged Existing failure path — reused for evidence-validation failures
dag-executor.ts store.updateWorkflowRun unchanged Used to persist metadata.evidence_validation after failWorkflowRun

Label Snapshot

  • Risk: risk: low
  • Size: size: M
  • Scope: workflows
  • Module: workflows:executor, workflows:schemas

Change Metadata

  • Change type: feature
  • Primary scope: workflows

Linked Issue

  • Closes: n/a — implements ROADMAP P3-A (ROADMAP.md)
  • Related: n/a
  • Depends on: none
  • Supersedes: none

Validation Evidence (required)

bun run validate

All gates green on first invocation (no fixes required during validation):

  • check:bundledbundled-defaults.generated.ts is up to date (36 commands, 20 workflows)
  • check:bundled-skillbundled-skill.ts is up to date (21 files)
  • type-check — all 10 packages exit 0
  • lintbun x eslint . --cache --max-warnings 0 (0 errors, 0 warnings)
  • format:checkAll matched files use Prettier code style!
  • test — all per-package tests pass (sprint-relevant: 43 evidence-validator + 6 schema + 3 dag-executor integration = 52 new tests)

Test breakdown:

Test File Tests Coverage
packages/workflows/src/evidence-validator.test.ts (new) 43 All 5 groups: shape, integrity, reality, loader, orchestrator. Every Edge Case in plan covered.
packages/workflows/src/schemas.test.ts (extended) +6 executionEvidenceSchema parse (execution + planning), evidencePolicySchema defaults, workflowBaseSchema accepts policy
packages/workflows/src/dag-executor.test.ts (extended) +3 required + missing → failed + structured issues; required + valid → completed; no-policy → completed (regression guard)
  • Evidence provided: validation.md artifact (full per-package counts in /Users/syedhaider/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/validation.md)
  • Skipped commands: none

Security Impact (required)

  • New permissions/capabilities? No — validator only reads $ARTIFACTS_DIR/evidence.json (workflow-owned) and runs read-only git ls-remote / git cat-file -e / gh pr view when reality checks are explicitly opted in.
  • New external network calls? Yes (opt-in only)git ls-remote origin <branch> and gh pr view <url> run only when evidence_policy.verify === 'reality'. Default is 'shape' (no network). Tests stub execFileAsync via isolated mock.module().
  • Secrets/tokens handling changed? Nogh uses the operator's existing auth context; no token reading or logging in this PR.
  • File system access scope changed? No new scope — validator reads only inside $ARTIFACTS_DIR (already workflow-owned). evidence_policy.path is rejected if absolute or contains .. (path-traversal guard with explicit negative test).
  • Risk and mitigation: Reality layer's git/gh subprocesses are wrapped in try/catch; failures produce structured EvidenceValidationIssue objects (never throw). gh ENOENT / auth failure produces a clean error issue with operator hint.

Compatibility / Migration

  • Backward compatible? Yesevidence_policy is optional on workflowBaseSchema. Workflows that omit it have byte-identical behavior to before this PR (regression-tested in dag-executor.test.ts).
  • Config/env changes? No — no new environment variables, no .archon/config.yaml keys.
  • Database migration needed? No — evidence-validation issues are persisted via existing WorkflowRun.metadata (z.record(z.unknown())). No migrations/ files modified.
  • Upgrade steps: none. Existing workflows continue to work unchanged.

Human Verification (required)

What was personally validated beyond CI:

  • Verified scenarios:
    • bun run validate exits 0 from a clean checkout of this branch (no env tweaking).
    • evidence-validator.test.ts was confirmed to be isolated as its own bun test invocation in packages/workflows/package.json to prevent mock.module('@archon/git', ...) cross-pollution per CLAUDE.md mock-isolation rule.
    • bun --filter @archon/workflows type-check and full-monorepo bun run type-check both green.
    • The discoverWorkflowsWithConfig flow continues to load existing bundled workflows without evidence_policy (regression test).
  • Edge cases checked (all in evidence-validator.test.ts):
    • Empty-tree zero SHA ('0' * 40)
    • Short SHA ('abc1234')
    • Uppercase / non-hex SHA
    • Protected branches (main/master/dev/develop)
    • pushed_branch claimed but git ls-remote empty
    • pr_url claimed but gh pr view returns different headRefName or different number
    • gh ENOENT (binary missing) → clean error issue
    • evidence_policy.path absolute / .. traversal
    • evidence.json missing / malformed JSON / null / string / array / number
    • kind === 'planning' rejected by integrity layer
  • What was not verified:
    • End-to-end run of a real PR-producing workflow with evidence_policy.required: true enabled (no bundled YAML opts in yet — explicitly out of scope per plan; follow-up PR).
    • Reality layer against a live GitHub repo (covered by stubbed unit tests; live integration is a CI/staging concern beyond this PR).

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows:
    • @archon/workflows (only) — schemas, validator, executor terminal-success transition.
    • No impact on @archon/core, @archon/cli, @archon/server, @archon/web, @archon/adapters, @archon/providers, @archon/git, @archon/isolation, @archon/paths (all type-check green; no exported surface touched).
    • Workflows that omit evidence_policy are unaffected (regression-guarded).
  • Potential unintended effects:
    • If a workflow author opts in to evidence_policy.required: true but does not write evidence.json, runs will fail terminally. This is the intended contract — explicitly documented in the schema and surfaced via metadata.evidence_validation.issues.
    • Tests with mock.module('@archon/git', ...) could pollute other tests if the new test file weren't isolated. Mitigated by adding bun test src/evidence-validator.test.ts as a separate invocation in package.json per CLAUDE.md.
  • Guardrails / monitoring for early detection:
    • metadata.evidence_validation is queryable from the API; failures surface as ordinary failed runs in run history.
    • Existing log events (workflow.run_failed) fire on validation failure, with structured issue payload.
    • Type-check + format + lint catch any author misuse of evidence_policy at workflow-load time.

Rollback Plan (required)

  • Fast rollback command/path: git revert 55e39055 && git push origin dev — single commit, additive only on the engine side. Reverting restores byte-identical pre-PR behavior because all bundled workflows still omit evidence_policy.
  • Feature flags or config toggles: evidence_policy.required is itself the flag. Setting it to false (or omitting the field entirely) restores prior behavior per workflow.
  • Observable failure symptoms: any spike in failed runs whose metadata.evidence_validation.issues is populated — unique signature distinguishing this gate's failures from other failure modes.

Risks and Mitigations

  • Risk: Reality I/O (git ls-remote, gh pr view) flakes in CI or sandboxed runners.
    • Mitigation: verify: 'shape' is the default. Reality is strictly opt-in. Validator tests stub execFileAsync via mock.module() in an isolated bun test invocation. ENOENT/auth/exit-non-zero produce structured error issues without throwing.
  • Risk: mock.module('@archon/git', ...) in the new test file pollutes other workflow tests in the same bun test process.
    • Mitigation: New test file is registered as its own bun test src/evidence-validator.test.ts invocation in packages/workflows/package.json. Verified by passing test run.
  • Risk: evidence_policy.path allows path traversal outside $ARTIFACTS_DIR.
    • Mitigation: Validator rejects absolute paths and .. segments before reading. Negative test in Group D (loader) covers both.
  • Risk: Plan called the terminal store method updateWorkflowRun(...) but the actual call site is completeWorkflowRun(...) / failWorkflowRun(...).
    • Mitigation: Pre-flagged in plan-confirmation.md. Implementation gates the actual completeWorkflowRun(...) call at dag-executor.ts:3134 (the single terminal-success transition); failure path uses the existing failWorkflowRun(id, error) plus a separate updateWorkflowRun(...) to persist metadata.evidence_validation. Plan intent — single-point gate, structured issues persisted to metadata.evidence_validation — fully preserved.

Plan: /Users/syedhaider/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/plan.md
Workflow ID: 4c517fe4db197edb948e3f3d4adf9927
ROADMAP entry: P3-A — Real-Execution Proof Validation for PR-Producing Workflows

Summary by CodeRabbit

  • New Features

    • Optional workflow-level evidence_policy that can block workflow completion until evidence passes shape/integrity and optional reality checks.
  • Tests

    • Comprehensive unit and integration tests covering evidence validation and execution gating.
  • Documentation

    • Roadmap and guides updated with evidence-gated workflows, schema examples, and failure behavior.
  • Chores

    • Validator and related utilities exposed via package exports.

shhaider added 2 commits May 6, 2026 22:53
…s gate

Add typed evidence schema (executionEvidenceSchema, evidencePolicySchema)
and a reusable validator (evidence-validator.ts) inside @archon/workflows
so PR-producing workflows can opt in to real-execution proof. The DAG
executor deterministically blocks terminal completed status when the
workflow declares evidence_policy.required: true and evidence is missing,
malformed, or fake.

- Three-layer validator: shape (Zod), integrity (cross-field, protected
  branches, zero-SHA), and reality (git/gh I/O, opt-in via verify: 'reality')
- Single-point gate at dag-executor terminal-success boundary; on failure
  downgrades run to failed and persists structured issues at
  metadata.evidence_validation
- 52 new tests (43 validator across 5 groups + 6 schema smoke tests +
  3 dag-executor integration tests including no-policy regression guard)
- Covers full Edge Cases checklist: every fake-completion shape (planning
  artifacts, short/uppercase/non-hex/zero-tree SHAs, protected branches,
  unpushed branches, http/non-GitHub PR URLs, gh mismatches, path traversal,
  ENOENT on gh)
- New evidence-validator.test.ts isolated as its own bun test invocation
  per CLAUDE.md mock-pollution rule (uses mock.module('@archon/git', ...))
- evidence_policy is workflow-level optional; omitting it preserves
  byte-identical behavior (regression-tested)

No bundled YAML changes, no DB migration, no CLI/web/provider surface
changes — engine surface only. Follow-up PR will turn the flag on for
bundled PR-producing workflows once stable.

Implements ROADMAP P3-A.
@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds an evidence-validation pipeline and an optional workflow-level evidence_policy that can block DAG completion: new Zod schemas, a three-layer validator (shape, integrity, reality), evidence loader/orchestrator, DAG executor gating, exports, tests, docs, and a roadmap entry. No breaking public API removals.

Changes

Real-Execution Evidence Validation

Layer / File(s) Summary
Schema & Type Definitions
packages/workflows/src/schemas/evidence.ts
New Zod schemas and types: executionEvidenceSchema (planning vs real-execution), evidencePolicySchema, evidenceValidationIssueSchema, evidenceValidationResultSchema, ExecutionEvidence, EvidencePolicy, EvidenceValidationIssue/Result, EXECUTION_EVIDENCE_REQUIRED_FIELDS.
Schema Integration & Export
packages/workflows/src/schemas/index.ts, packages/workflows/src/schemas/workflow.ts
Re-exports evidence schemas/types; extends workflowBaseSchema with optional evidence_policy field.
Evidence Validation Core
packages/workflows/src/evidence-validator.ts
New module implementing SHAPE (Zod parsing → field diagnostics), INTEGRITY (cross-field rules), REALITY (async git/gh checks), artifact loading, and validateEvidence orchestrator with typed results.
DAG Executor Integration
packages/workflows/src/dag-executor.ts
executeDagWorkflow now accepts optional evidence_policy; when evidence_policy.required is true it calls validateEvidence before final completion, persists metadata.evidence_validation (issues or { ok: true }), and fails the run on validation errors instead of completing.
Package Exports & Scripts
packages/workflows/package.json
Adds public exports: ./evidence-validator, ./utils/tool-formatter, ./test-utils. Updates test script to run the evidence-validator tests.
Tests
packages/workflows/src/evidence-validator.test.ts, packages/workflows/src/dag-executor.test.ts, packages/workflows/src/schemas.test.ts
Extensive unit tests for validator layers (shape/integrity/reality/load/orchestrator), DAG executor tests for evidence gating (missing/stale/valid evidence scenarios and regression guard), and schema tests for execution evidence and policy parsing/defaults.
Documentation & Roadmap
packages/docs-web/.../authoring-workflows.md, CLAUDE.md, ROADMAP.md
Docs guide added for "Evidence-Gated Workflows", CLAUDE note about evidence_policy, and new ROADMAP.md entry describing priorities and operating rules.

Sequence Diagram

sequenceDiagram
    participant WF as Workflow<br/>(DAG Executor)
    participant EV as Evidence Validator
    participant Load as Evidence Loader
    participant Shape as SHAPE Layer
    participant Integrity as INTEGRITY Layer
    participant Reality as REALITY Layer
    participant Meta as Metadata &<br/>Completion Gate

    WF->>EV: validateEvidence(artifactsDir, cwd, policy)
    
    EV->>Load: loadEvidenceFromArtifacts(path)
    Load-->>EV: evidence.json or load error
    
    EV->>Shape: validateEvidenceShape(raw)
    Shape-->>EV: validated evidence or shape issues
    
    alt Shape validation fails
        EV-->>Meta: persist issues -> metadata.evidence_validation
        Meta-->>WF: failWorkflowRun
    else Shape validation passes
        EV->>Integrity: validateEvidenceIntegrity(evidence)
        Integrity-->>EV: [] or integrity issues
        
        alt Integrity check fails
            EV-->>Meta: persist issues -> metadata.evidence_validation
            Meta-->>WF: failWorkflowRun
        else Integrity passes & policy.verify enabled
            EV->>Reality: verifyEvidenceReality(evidence, cwd)
            Reality-->>EV: [] or reality issues
            
            alt Reality check fails
                EV-->>Meta: persist issues -> metadata.evidence_validation
                Meta-->>WF: failWorkflowRun
            else All validations pass
                EV-->>Meta: persist { ok: true } -> metadata.evidence_validation
                Meta-->>WF: completeWorkflowRun
            end
        else Integrity passes & no verify
            EV-->>Meta: persist { ok: true } -> metadata.evidence_validation
            Meta-->>WF: completeWorkflowRun
        end
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

  • coleam00/Archon#1403: Touches executeDagWorkflow behavior and tests; related to completion/failure gating semantics.

Suggested Reviewers

  • leex279

Poem

🐰 A rabbit reads the evidence by light,

Shape, Integrity, Reality take flight.
Git checks hum, schemas stand in line,
DAGs wait patient, metadata signs.
Roadmaps sprout — Archon hops toward the bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding an execution-evidence contract with a terminal-success gate to block workflow completion.
Description check ✅ Passed The description is comprehensive and follows the template structure with all required sections: Summary, UX Journey (Before/After), Architecture Diagram, Labels, Change Metadata, Linked Issues, Validation Evidence, Security Impact, Compatibility, Human Verification, Side Effects, Rollback Plan, and Risks/Mitigations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
packages/workflows/src/evidence-validator.ts (1)

304-307: ⚡ Quick win

Rename the new validation log events to the repo’s domain.action_state format.

evidence_validation_started|failed|completed does not match the required event naming convention, so these entries will be inconsistent with the rest of the workflow logs.

Proposed fix
-    'evidence_validation_started'
+    'workflow.evidence_validation_started'
@@
-    getLog().error({ issues }, 'evidence_validation_failed');
+    getLog().error({ issues }, 'workflow.evidence_validation_failed');
@@
-    getLog().error({ err, issues }, 'evidence_validation_failed');
+    getLog().error({ err, issues }, 'workflow.evidence_validation_failed');
@@
-  getLog().info({ kind: shapeResult.evidence.kind }, 'evidence_validation_completed');
+  getLog().info({ kind: shapeResult.evidence.kind }, 'workflow.evidence_validation_completed');

As per coding guidelines, "Use Pino logger factory with event naming format {domain}.{action}_{state} (e.g., workflow.step_started, isolation.create_failed); always pair _started with _completed or _failed; include context IDs and error details".

Also applies to: 319-330, 347-386

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.ts` around lines 304 - 307, The log
events emitted from getLog() in evidence-validator.ts (e.g., the call that
currently logs 'evidence_validation_started') must be renamed to follow the repo
convention {domain}.{action}_{state} and include relevant context IDs and error
details; update all occurrences around the getLog().info call and the related
blocks referenced (the started/failed/completed groups around lines ~304,
~319-330, ~347-386) so names become something like evidence.validation_started
-> evidence.validation_started? (use domain "evidence" and action "validation")
— actually rename to evidence.validation_started, evidence.validation_completed
and evidence.validation_failed (pairing each started with completed/failed), add
context fields (artifactsDir, policy.path, policy.verify, and a policy or
request id) to the log payload, and when logging failures include the caught
error object in the process (or getLog().error) payload so the error details are
recorded; adjust the corresponding getLog().info/getLog().error calls and ensure
consistent naming across all three blocks.
packages/workflows/src/dag-executor.ts (1)

2493-2497: ⚡ Quick win

Inline workflow shape duplicates WorkflowBase; prefer a named interface or a Pick<> from the schema type.

The parameter type is now { name: string; nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } & WorkflowLevelOptions. As more optional fields from workflowBaseSchema get plumbed down (this PR adds evidence_policy; the next field will follow), this anonymous shape will keep drifting from the actual schema-derived type. Defining an interface DagExecutorWorkflow (or using Pick<WorkflowBase, 'name' | 'nodes' | 'evidence_policy' | …> & WorkflowLevelOptions) would keep the contract in one place and align with the project's preference for interface over inline type.

As per coding guidelines: "In TypeScript, prefer interface for defining object shapes rather than type".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 2493 - 2497, The
anonymous workflow parameter type should be replaced with a named interface to
avoid drifting from the schema; define an interface (e.g. DagExecutorWorkflow)
that derives from the canonical schema type (either via "interface
DagExecutorWorkflow extends Pick<WorkflowBase, 'name' | 'nodes' |
'evidence_policy' | ...> & WorkflowLevelOptions" or by explicitly declaring the
interface and keeping it synced with workflowBaseSchema), then use that
interface in the function signature instead of the inline `{ name: string;
nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } &
WorkflowLevelOptions`; update references to the parameter type to use
DagExecutorWorkflow so future schema fields stay centralized.
packages/workflows/src/dag-executor.test.ts (2)

7032-7055: ⚡ Quick win

Use the real workflow run ID in the “valid evidence” fixture

Line 7037 hardcodes workflow_run_id: 'run-1' while the run under test is dag-evidence-run-2. Using the actual run ID makes this fixture semantically accurate and better aligned with future integrity checks.

🔧 Suggested fixture tweak
-  it('evidence_policy.required + valid evidence.json -> completeWorkflowRun', async () => {
+  it('evidence_policy.required + valid evidence.json -> completeWorkflowRun', async () => {
+    const workflowRun = makeWorkflowRun('dag-evidence-run-2');
     await writeFile(
       join(artifactsDir, 'evidence.json'),
       JSON.stringify({
         kind: 'execution',
-        workflow_run_id: 'run-1',
+        workflow_run_id: workflowRun.id,
         provider: 'claude',
         provider_run_ids: ['session-abc'],
         changed_files: ['src/foo.ts'],
         diff_command: 'git diff --stat origin/dev...HEAD',
         test_commands: ['bun test'],
         test_output_summary: '15 passed',
         commit_sha: 'a'.repeat(40),
         pushed_branch: 'feature/foo',
         pr_url: 'https://github.com/owner/repo/pull/42',
         pr_number: 42,
       })
     );
 
     const mockStore = createMockStore();
     const mockDeps = createMockDeps(mockStore);
     const platform = createMockPlatform();
-    const workflowRun = makeWorkflowRun('dag-evidence-run-2');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.test.ts` around lines 7032 - 7055, The
evidence.json test fixture uses a hardcoded workflow_run_id ('run-1') which
doesn't match the run created for the test; update the fixture to use the actual
run ID from the test run (use the workflowRun created by
makeWorkflowRun('dag-evidence-run-2') — e.g., set workflow_run_id to
workflowRun.id or the literal 'dag-evidence-run-2') so the evidence.json aligns
with the run under test and any integrity checks that compare IDs will pass.

7016-7029: ⚡ Quick win

Assert evidence_validation.status to lock the metadata contract

These tests currently validate ok/issues but not the status field, so contract drift in metadata.evidence_validation.status can slip through undetected.

🧪 Suggested assertion additions
       expect(updates.metadata.evidence_validation.ok).toBe(false);
+      expect(updates.metadata.evidence_validation.status).toBe('failed');
       expect(Array.isArray(updates.metadata.evidence_validation.issues)).toBe(true);
       expect(updates.metadata.evidence_validation.ok).toBe(true);
+      expect(updates.metadata.evidence_validation.status).toBe('ok');

Also applies to: 7079-7091

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.test.ts` around lines 7016 - 7029, The
test currently checks evidence_validation.ok and issues but omits asserting
evidence_validation.status; update the assertions around the
mockStore.updateWorkflowRun inspection (the updateCalls/evidenceUpdate/updates
metadata block) to also assert updates.metadata.evidence_validation.status is
the expected status string (e.g., "failed") and that it's of type string, and
apply the same added assertion in the other analogous assertion block (the
similar check around the second occurrence referenced by lines 7079-7091) so the
metadata contract for evidence_validation.status is locked down.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3138-3208: The evidence validation block calls validateEvidence
and then performs terminal writes (deps.store.updateWorkflowRun,
deps.store.failWorkflowRun, etc.) without re-checking run status; call the
existing skipIfStatusChanged guard again immediately after validateEvidence
returns (both when evidenceResult.valid is false and true) to abort if the run
status changed during the long reality checks. Specifically, invoke
skipIfStatusChanged('dag.skip_complete_status_changed') (or the same helper used
earlier) right after the await validateEvidence(...) line and again right after
entering each branch before any call to updateWorkflowRun, failWorkflowRun,
completeWorkflowRun, logWorkflowError, emitter.emit/unregisterRun, or
safeSendMessage to ensure you don't flip a cancelled/deleted run to a terminal
state.
- Around line 3159-3171: The current updateWorkflowRun call uses the stale
in-memory workflowRun.metadata snapshot and may clobber concurrent metadata
writes; before persisting evidence_validation, either fetch the latest metadata
from deps.store (e.g., readWorkflowRun or readWorkflowRunMetadata) and merge
evidence_validation into that fresh object, or add/use a partial-merge API on
deps.store (e.g., setWorkflowRunMetadata(workflowRun.id, { evidence_validation:
... })) so you only write the single field; update both occurrences (the block
around updateWorkflowRun at lines referencing updateWorkflowRun and the similar
block later) to use one of these approaches to avoid overwriting concurrent
metadata changes.
- Around line 3150-3153: Rename the bare logger event names to include the
domain prefix and add matching success/completed partners: change occurrences of
'evidence_validation_failed' and 'evidence_validation_metadata_persist_failed'
(the getLog().error calls that include { workflowRunId: workflowRun.id, issues:
evidenceResult.issues } and the metadata persist error call) to
'dag.evidence_validation_failed' and
'dag.evidence_validation_metadata_persist_failed', and add corresponding
success/info logs 'dag.evidence_validation_succeeded' (in the success path near
the evidence validation completion) and
'dag.evidence_validation_metadata_persist_succeeded' in the metadata persist
success path; apply the same renaming for the other occurrences noted (the
blocks around the other getLog() calls), keeping getLog().error for failures and
getLog().info for succeeded/completed events.

In `@packages/workflows/src/evidence-validator.ts`:
- Around line 149-151: The validator currently only checks that
evidence.commit_sha exists locally; update the validation in the evidence
verification flow (around the execFileAsync git checks) to bind commit_sha to
the remote branch/PR head for pushed_branch: fetch origin refs for the repo (use
execFileAsync('git', ['fetch', 'origin', pushed_branch]) or ensure
origin/pushed_branch is available), resolve the remote head ref for the claimed
pushed_branch/PR (e.g. origin/{pushed_branch} or the PR head ref), then verify
that evidence.commit_sha is either exactly that remote head or an ancestor of it
(use git rev-parse to get the remote head and git merge-base --is-ancestor or
git cat-file -e with remote_head^{commit} to test ancestry); update the checks
that currently call execFileAsync('git', ['cat-file', ...], { cwd }) (lines
around evidence.commit_sha and the block covering 166-218) to perform this
remote-bound validation and fail validation if the commit is not the remote head
or ancestor.

In `@packages/workflows/src/schemas/evidence.ts`:
- Around line 90-111: The evidencePolicySchema's path currently allows
empty/whitespace or directory-like values; update the path validator on
evidencePolicySchema to reject empty or all-whitespace strings and to reject '.'
, '..', any segment equal to '..', and absolute paths (leading '/' or Windows
drive/UNC forms) by adding a z.string().refine(...) with a clear error message;
reference the existing evidencePolicySchema and its path property and implement
the predicate to trim() and check length>0, disallow '.' or '..', disallow path
separators that would indicate a directory only or upward traversal, and
disallow absolute paths so invalid workflow definitions fail at parse time.

---

Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 7032-7055: The evidence.json test fixture uses a hardcoded
workflow_run_id ('run-1') which doesn't match the run created for the test;
update the fixture to use the actual run ID from the test run (use the
workflowRun created by makeWorkflowRun('dag-evidence-run-2') — e.g., set
workflow_run_id to workflowRun.id or the literal 'dag-evidence-run-2') so the
evidence.json aligns with the run under test and any integrity checks that
compare IDs will pass.
- Around line 7016-7029: The test currently checks evidence_validation.ok and
issues but omits asserting evidence_validation.status; update the assertions
around the mockStore.updateWorkflowRun inspection (the
updateCalls/evidenceUpdate/updates metadata block) to also assert
updates.metadata.evidence_validation.status is the expected status string (e.g.,
"failed") and that it's of type string, and apply the same added assertion in
the other analogous assertion block (the similar check around the second
occurrence referenced by lines 7079-7091) so the metadata contract for
evidence_validation.status is locked down.

In `@packages/workflows/src/dag-executor.ts`:
- Around line 2493-2497: The anonymous workflow parameter type should be
replaced with a named interface to avoid drifting from the schema; define an
interface (e.g. DagExecutorWorkflow) that derives from the canonical schema type
(either via "interface DagExecutorWorkflow extends Pick<WorkflowBase, 'name' |
'nodes' | 'evidence_policy' | ...> & WorkflowLevelOptions" or by explicitly
declaring the interface and keeping it synced with workflowBaseSchema), then use
that interface in the function signature instead of the inline `{ name: string;
nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } &
WorkflowLevelOptions`; update references to the parameter type to use
DagExecutorWorkflow so future schema fields stay centralized.

In `@packages/workflows/src/evidence-validator.ts`:
- Around line 304-307: The log events emitted from getLog() in
evidence-validator.ts (e.g., the call that currently logs
'evidence_validation_started') must be renamed to follow the repo convention
{domain}.{action}_{state} and include relevant context IDs and error details;
update all occurrences around the getLog().info call and the related blocks
referenced (the started/failed/completed groups around lines ~304, ~319-330,
~347-386) so names become something like evidence.validation_started ->
evidence.validation_started? (use domain "evidence" and action "validation") —
actually rename to evidence.validation_started, evidence.validation_completed
and evidence.validation_failed (pairing each started with completed/failed), add
context fields (artifactsDir, policy.path, policy.verify, and a policy or
request id) to the log payload, and when logging failures include the caught
error object in the process (or getLog().error) payload so the error details are
recorded; adjust the corresponding getLog().info/getLog().error calls and ensure
consistent naming across all three blocks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e22a2521-b78d-4816-b1ae-337feb88168c

📥 Commits

Reviewing files that changed from the base of the PR and between f4f2725 and 55e3905.

📒 Files selected for processing (10)
  • ROADMAP.md
  • packages/workflows/package.json
  • packages/workflows/src/dag-executor.test.ts
  • packages/workflows/src/dag-executor.ts
  • packages/workflows/src/evidence-validator.test.ts
  • packages/workflows/src/evidence-validator.ts
  • packages/workflows/src/schemas.test.ts
  • packages/workflows/src/schemas/evidence.ts
  • packages/workflows/src/schemas/index.ts
  • packages/workflows/src/schemas/workflow.ts

Comment on lines +3138 to +3208
// Real-execution proof gate (ROADMAP P3-A). When the workflow declares
// `evidence_policy.required: true`, refuse terminal `completed` unless
// $ARTIFACTS_DIR/<path> parses as ExecutionEvidence and (when verify ===
// 'reality') the claimed SHA/branch/PR are reachable. On failure, downgrade
// to `failed` with structured issues persisted at metadata.evidence_validation.
if (workflow.evidence_policy?.required === true) {
const evidenceResult = await validateEvidence({
artifactsDir,
cwd,
policy: workflow.evidence_policy,
});
if (!evidenceResult.valid) {
getLog().error(
{ workflowRunId: workflowRun.id, issues: evidenceResult.issues },
'evidence_validation_failed'
);
const issueLines = evidenceResult.issues
.map(i => ` - ${i.field} — ${i.message}${i.hint ? ` (hint: ${i.hint})` : ''}`)
.join('\n');
const failMsg = `Real-execution evidence validation failed.\n${issueLines}`;
// Persist the structured issues at metadata.evidence_validation, then mark failed.
await deps.store
.updateWorkflowRun(workflowRun.id, {
metadata: {
...workflowRun.metadata,
evidence_validation: { ok: false, issues: evidenceResult.issues },
},
})
.catch((dbErr: Error) => {
getLog().error(
{ err: dbErr, workflowRunId: workflowRun.id },
'evidence_validation_metadata_persist_failed'
);
});
await deps.store.failWorkflowRun(workflowRun.id, failMsg).catch((dbErr: Error) => {
getLog().error({ err: dbErr, workflowRunId: workflowRun.id }, 'dag_db_fail_failed');
});
await logWorkflowError(logDir, workflowRun.id, failMsg).catch((logErr: Error) => {
getLog().error(
{ err: logErr, workflowRunId: workflowRun.id },
'dag.workflow_error_log_write_failed'
);
});
const emitterForFail = getWorkflowEventEmitter();
emitterForFail.emit({
type: 'workflow_failed',
runId: workflowRun.id,
workflowName: workflow.name,
error: failMsg,
});
emitterForFail.unregisterRun(workflowRun.id);
await safeSendMessage(platform, conversationId, `❌ ${failMsg}`, {
workflowId: workflowRun.id,
});
return;
}
// Evidence valid — persist the success marker alongside completion.
await deps.store
.updateWorkflowRun(workflowRun.id, {
metadata: {
...workflowRun.metadata,
evidence_validation: { ok: true },
},
})
.catch((dbErr: Error) => {
getLog().error(
{ err: dbErr, workflowRunId: workflowRun.id },
'evidence_validation_metadata_persist_failed'
);
});
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Re-check run status after validateEvidence before terminal writes.

validateEvidence can issue git/gh subprocesses (when verify === 'reality') and may take seconds. The status gate at line 3136 (skipIfStatusChanged('dag.skip_complete_status_changed')) runs before this work, but neither the failure branch (3149-3193) nor the success branch re-checks status before calling failWorkflowRun / updateWorkflowRun / completeWorkflowRun. If the run is cancelled or deleted while reality checks are in flight, we will still flip the row to failed or completed, which contradicts the existing pattern of guarding terminal writes with skipIfStatusChanged.

Consider re-running skipIfStatusChanged immediately after validateEvidence returns, on both branches.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3138 - 3208, The
evidence validation block calls validateEvidence and then performs terminal
writes (deps.store.updateWorkflowRun, deps.store.failWorkflowRun, etc.) without
re-checking run status; call the existing skipIfStatusChanged guard again
immediately after validateEvidence returns (both when evidenceResult.valid is
false and true) to abort if the run status changed during the long reality
checks. Specifically, invoke
skipIfStatusChanged('dag.skip_complete_status_changed') (or the same helper used
earlier) right after the await validateEvidence(...) line and again right after
entering each branch before any call to updateWorkflowRun, failWorkflowRun,
completeWorkflowRun, logWorkflowError, emitter.emit/unregisterRun, or
safeSendMessage to ensure you don't flip a cancelled/deleted run to a terminal
state.

Comment on lines +3150 to +3153
getLog().error(
{ workflowRunId: workflowRun.id, issues: evidenceResult.issues },
'evidence_validation_failed'
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Logger event names should follow the {domain}.{action}_{state} convention.

evidence_validation_failed and evidence_validation_metadata_persist_failed are missing a domain prefix and are inconsistent with sibling events in this file like dag.stop_detected_during_streaming, dag.node_budget_cap_exceeded, and workflow.event_persist_failed. Suggest renaming to e.g. dag.evidence_validation_failed and dag.evidence_validation_metadata_persist_failed (and pairing the success path with a matching dag.evidence_validation_succeeded info log on line ~3194 so _failed has its _completed/_succeeded partner).

As per coding guidelines: "Use Pino logger factory with event naming format {domain}.{action}_{state} (e.g., workflow.step_started, isolation.create_failed); always pair _started with _completed or _failed".

Also applies to: 3166-3170, 3202-3206

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3150 - 3153, Rename the
bare logger event names to include the domain prefix and add matching
success/completed partners: change occurrences of 'evidence_validation_failed'
and 'evidence_validation_metadata_persist_failed' (the getLog().error calls that
include { workflowRunId: workflowRun.id, issues: evidenceResult.issues } and the
metadata persist error call) to 'dag.evidence_validation_failed' and
'dag.evidence_validation_metadata_persist_failed', and add corresponding
success/info logs 'dag.evidence_validation_succeeded' (in the success path near
the evidence validation completion) and
'dag.evidence_validation_metadata_persist_succeeded' in the metadata persist
success path; apply the same renaming for the other occurrences noted (the
blocks around the other getLog() calls), keeping getLog().error for failures and
getLog().info for succeeded/completed events.

Comment on lines +3159 to +3171
await deps.store
.updateWorkflowRun(workflowRun.id, {
metadata: {
...workflowRun.metadata,
evidence_validation: { ok: false, issues: evidenceResult.issues },
},
})
.catch((dbErr: Error) => {
getLog().error(
{ err: dbErr, workflowRunId: workflowRun.id },
'evidence_validation_metadata_persist_failed'
);
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale workflowRun.metadata snapshot can clobber concurrent metadata writes.

Both updateWorkflowRun calls do metadata: { ...workflowRun.metadata, evidence_validation: ... }, but workflowRun.metadata is the in-memory snapshot captured at executor start. Anything that wrote to metadata during the run (heartbeats, approval/loop gate state cleanup, future writers) is silently overwritten when this spread is persisted. Today the gate only runs on the all-success / not-paused path so this is mostly latent, but it is a fragile contract — adding any future per-node metadata writer will start losing fields here.

Safer options:

  • Read the freshest WorkflowRun (or just its metadata) from deps.store immediately before this update, or
  • Have deps.store expose a partial-merge update (e.g., setWorkflowRunMetadata(id, { evidence_validation: ... })) so callers do not have to round-trip the whole object.

Also applies to: 3194-3207

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3159 - 3171, The current
updateWorkflowRun call uses the stale in-memory workflowRun.metadata snapshot
and may clobber concurrent metadata writes; before persisting
evidence_validation, either fetch the latest metadata from deps.store (e.g.,
readWorkflowRun or readWorkflowRunMetadata) and merge evidence_validation into
that fresh object, or add/use a partial-merge API on deps.store (e.g.,
setWorkflowRunMetadata(workflowRun.id, { evidence_validation: ... })) so you
only write the single field; update both occurrences (the block around
updateWorkflowRun at lines referencing updateWorkflowRun and the similar block
later) to use one of these approaches to avoid overwriting concurrent metadata
changes.

Comment thread packages/workflows/src/evidence-validator.ts
Comment thread packages/workflows/src/schemas/evidence.ts Outdated
@shhaider

shhaider commented May 6, 2026

Copy link
Copy Markdown
Author

🔍 Comprehensive PR Review

PR: #1601 — feat(workflows): add execution-evidence contract with terminal-success gate
Reviewed by: 5 specialized agents (code-review, error-handling, test-coverage, comment-quality, docs-impact)
Date: 2026-05-06


Summary

Three-layer evidence validator and terminal-success gate are well-designed and well-tested (49 new tests; "never throws" contract honored across all entry points; mock isolation correctly registered). Five reviewers converged on 0 CRITICAL issues and 3 HIGH items that warrant attention before this PR merges or — at the latest — before any follow-up PR flips evidence_policy.required: true on a bundled workflow.

Verdict: REQUEST_CHANGES

Severity Count
🔴 CRITICAL 0
🟠 HIGH 3
🟡 MEDIUM 6
🟢 LOW 12

🟠 High Issues

HIGH-1 — Reality layer doesn't bind commit_sha to PR head; fabricated evidence can pass

📍 packages/workflows/src/evidence-validator.ts:131-241

The reality layer validates SHA reachability, branch existence, and PR metadata independently. A fabricated evidence.json can pair any locally-reachable SHA (git rev-parse HEAD) with any real branch and any real PR URL and pass verify: 'reality'. (Also flagged by CodeRabbit.)

View fix
// Bind SHA to branch tip — ls-remote already returns the SHA, just use it:
const remoteSha = trimmed.split(/\s+/)[0];
if (remoteSha !== evidence.commit_sha) {
  issues.push({
    level: 'error',
    field: 'commit_sha',
    message: `commit_sha=${evidence.commit_sha} does not match origin/${evidence.pushed_branch} tip=${remoteSha}`,
    hint: 'evidence.json claims a commit that is not the tip of pushed_branch on origin',
  });
}

// Add headRefOid to gh pr view --json query and assert equality:
const { stdout } = await execFileAsync(
  'gh', ['pr', 'view', evidence.pr_url, '--json', 'headRefName,number,headRefOid'], { cwd },
);
if (parsedFull.headRefOid !== evidence.commit_sha) {
  issues.push({
    level: 'error',
    field: 'commit_sha',
    message: `gh pr view headRefOid='${String(parsedFull.headRefOid)}' does not match commit_sha='${evidence.commit_sha}'`,
    hint: 'pr_url points to a PR whose HEAD is a different commit',
  });
}

Also add reality-layer test cases for commit_sha != branch tip and commit_sha != PR headRefOid.

HIGH-2 — CLAUDE.md "Workflows" bullet list omits evidence_policy

📍 CLAUDE.md:709-721

CLAUDE.md is the in-repo authoritative summary of engine surface. Add one bullet to the Workflows section near the existing interactive: true bullet describing the new field.

HIGH-3 — authoring-workflows.md missing evidence_policy reference + evidence.json contract

📍 packages/docs-web/src/content/docs/guides/authoring-workflows.md

The user-facing guide lists every other workflow-level field with examples. The 11-required-field evidence.json shape, both verify modes, and the gh PATH/auth requirement are undocumented for users. Recommend adding a ## Evidence-Gated Workflows section with schema table + worked example.


🟡 Medium Issues (Auto-fixing in next step)

MED-1 — Log-event names violate {domain}.{action}_{state} convention

📍 dag-executor.ts:3152, 3169, 3205 + evidence-validator.ts:306, 319, 330, 347, 359, 366, 373, 381, 386

Events use evidence_validation_failed (no domain) and evidence_validation_metadata_persist_failed (3 segments — unparseable). Fix: prefix with workflow. and split the 3-segment ones to workflow.evidence_metadata_persist_failed. Pure mechanical rename. Tagged [must-fix] by code-review. Also flagged by CodeRabbit.

MED-2 — Stale workflowRun.metadata snapshot will clobber future concurrent writers

📍 dag-executor.ts:3159-3165, 3195-3201

Both terminal updateWorkflowRun calls do metadata: { ...workflowRun.metadata, ... } from an in-memory snapshot that's never re-fetched. No present-day bug, but a foot-gun the moment a heartbeat or hook handler is added. Recommend re-fetch via getWorkflowRun(id) before the spread.

MED-3 — gh pr view non-JSON output drops parser detail

📍 evidence-validator.ts:196-205

When gh succeeds (zero exit) but emits non-JSON (auth banner, paginator dump), the catch swallows parseErr.message and the actual stdout. Operators must reproduce externally to debug. Existing precedent at validator.ts:387 does include err.message.

MED-4 — Success-path metadata-persist failure is silently swallowed

📍 dag-executor.ts:3194-3207

The success-path evidence_validation: { ok: true } write is wrapped in .catch(log) and falls through to completeWorkflowRun. A transient DB error yields a completed run without the proof-of-validation marker — the platform's P3-A guarantee then depends on a log file. Recommend dropping the .catch on the success path; let DB errors propagate.

MED-5 — loadEvidenceFromArtifacts non-ENOENT path is untested

📍 evidence-validator.ts:272 (re-throw) + :336-349 (orchestrator catch)

EACCES, EISDIR, ENAMETOOLONG all hit an unexercised branch. One real-fs test (mkdir at evidence.json path → EISDIR) closes the gap with no new mocks.

MED-6 — git ls-remote rejection branch is untested

📍 evidence-validator.ts:179-186

Most likely real-world failure mode in CI/sandboxed environments (private repo without GH_TOKEN, network unreachable, deleted origin). Mirror the existing cat-file rejection test exactly.


🟢 Low Issues

View 12 low-priority suggestions
# Issue Location Recommendation
L1 evidence_policy.path accepts '', '.', dir-like strings schemas/evidence.ts:111 .trim().min(1).refine(...) — auto-fixable
L2 level: 'warning' enum has no producer (YAGNI) schemas/evidence.ts:121 Drop until producer exists
L3 EvidenceValidationIssue overlaps ValidationIssue (Rule of Three not yet hit) schemas/evidence.ts:120 Leave as-is
L4 Race window after slow verify: 'reality' I/O — no skipIfStatusChanged re-check dag-executor.ts:3135-3208 Required before bundled verify: reality
L5 e as Error & { stderr?: string } cast fragile to non-Error rejections evidence-validator.ts:152, 179, 221 instanceof Error narrowing
L6 gh non-ENOENT (auth-failure) branch untested evidence-validator.ts:230-237 One test
L7 Empty test_commands: [] doc-only-PR contract not asserted schemas/evidence.ts:57 One positive shape test
L8 DAG-executor .catch swallow paths untested dag-executor.ts:3158, 3168, 3171, 3197 One test
L9 verify: 'reality' integration not e2e-tested through dag-executor dag-executor.test.ts Inferred only
L10 evidence_policy.required: false no-op path untested dag-executor.ts:3140 One test
L11 Type-narrowing comment after early return is redundant evidence-validator.ts:91 Remove
L12 required JSDoc says "Defaults to false" but schema is optional(), no .default() schemas/evidence.ts:91-95 Tighten wording

✅ What's Good

  • Three-layer separation (shape → integrity → reality) is genuinely good design
  • "Never throws" contract uniformly enforced across all 5 entry points
  • Discriminated unions used correctly (kind, found, valid)
  • EXECUTION_EVIDENCE_REQUIRED_FIELDS derived from schema shape — cannot drift
  • Path-traversal defense covers absolute and .. vectors before any I/O
  • Mock isolation correctly registered as own bun test invocation
  • Regression guard test for the no-policy path
  • Plan deviations pre-flagged in implementation artifacts — no surprises
  • gh ENOENT vs auth-failure distinguished with operator-actionable hints
  • Comments are sparse and earn their place (almost all explain WHY)
  • Layered test architecture mirrors layered code (Group A/B/C/D/E ↔ entry points)
  • Schema conventions followed (camelCase, @hono/zod-openapi, z.infer, schemas/)
  • ROADMAP.md P3 entry well-scoped with explicit acceptance criteria

📋 Suggested Follow-up Issues

Issue Title Priority
Bind commit_sha to PR head and branch tip in verify: reality P1 — required before bundled flag flip
Document evidence_policy in user-facing authoring guide P1
Add mergeWorkflowRunMetadata to IWorkflowStore for transactional merges P2
Tighten log-event naming convention across evidence validator + gate P2
Test coverage gaps (ls-remote, EISDIR, gh auth, DB swallow, required:false) P3
Operator troubleshooting note for metadata.evidence_validation P3 (defer to bundled-flag PR)
End-to-end reality-mode integration test through dag-executor P3

Next Steps

  1. Auto-fix step will address 5 mechanical issues (MED-1, MED-3, MED-4, LOW-1, LOW-2)
  2. 📝 Review HIGH items above — HIGH-1 needs new test cases; HIGH-2/3 are scope decisions (this PR vs follow-up)
  3. 🚧 Before any bundled workflow flips evidence_policy.required: true, HIGH-1 and LOW-4 (race-window re-check) MUST land

Reviewed by Archon comprehensive-pr-review workflow
Artifacts: ~/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/review/

Fixes applied:
- HIGH-1: Bind commit_sha to PR head and branch tip in verify: 'reality'
  - ls-remote stdout SHA must equal evidence.commit_sha (was discarded)
  - gh pr view now queries headRefOid; assert it equals evidence.commit_sha
  - Closes the gap where any locally-reachable SHA could pair with any real PR
- HIGH-2: Add evidence_policy bullet to CLAUDE.md "Workflows" list
- HIGH-3: Add evidence_policy: to authoring-workflows.md YAML example
  and a new "## Evidence-Gated Workflows" section with schema, verify modes,
  worked example, and failure shape
- MED-1: Rename log events to {domain}.{action}_{state} convention
  (workflow.evidence_validation_started/_failed/_completed,
   workflow.evidence_metadata_persist_failed)
- MED-3: gh non-JSON output now includes parser err.message and a 200-char
  stdout preview hint (matches validator.ts:387 precedent)
- MED-4: Drop .catch on success-path metadata write — DB errors now
  propagate to the outer executor catch instead of silently completing a
  run without provable validation
- LOW-1: Tighten evidence_policy.path with .trim().min(1).refine(...)
  to reject '', '.', and trailing-slash dir-like values at parse time
- LOW-2: Drop 'warning' from level enum until a producer exists (YAGNI)

Tests added (evidence-validator.test.ts Group C):
- ls-remote tip SHA mismatches commit_sha → commit_sha error
- gh headRefOid mismatches commit_sha → commit_sha error
Existing reality tests updated to include headRefOid in gh fixtures.

Validation: type-check, lint, format, tests all pass (bun run validate).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shhaider

shhaider commented May 6, 2026

Copy link
Copy Markdown
Author

⚡ Auto-Fix Report

Status: COMPLETE
Pushed: ✅ Changes pushed to PR (commit 83dcb4b6)


Fixes Applied

Severity Fixed Skipped
🔴 CRITICAL 0 0 (none in review)
🟠 HIGH 3 0
🟡 MEDIUM (auto-fix scope) 3 1
⚪ LOW (auto-fix scope) 2

What Was Fixed

  • HIGH-1: Reality layer doesn't bind commit_sha to PR head (evidence-validator.ts:131-241) — ls-remote stdout SHA now compared to evidence.commit_sha; gh pr view query extended with headRefOid and asserted equal to commit_sha. Closes the fabricated-evidence gap.
  • HIGH-2: CLAUDE.md missing evidence_policy field (CLAUDE.md:715-716) — added engine-surface bullet.
  • HIGH-3: authoring-workflows.md missing evidence_policy section (packages/docs-web/.../authoring-workflows.md) — added evidence_policy: to YAML example AND a new ## Evidence-Gated Workflows section with schema table, verify modes, worked example, and failure shape.
  • MED-1: Log events violate {domain}.{action}_{state} — all 12 sites prefixed with workflow.; the 3-segment one split to workflow.evidence_metadata_persist_failed.
  • MED-3: gh pr view non-JSON detail dropped — catch now includes parseErr.message and 200-char stdout preview hint.
  • MED-4: Success-path metadata persist swallowed — dropped .catch on success branch; DB errors propagate to outer executor catch instead of completing without provable validation.
  • LOW-1: path schema accepts empty/dir-like values — tightened with .trim().min(1).refine(...).
  • LOW-2: level: 'warning' enum has no producer (YAGNI) — dropped to z.enum(['error']).

Tests Added

  • packages/workflows/src/evidence-validator.test.ts (Group C):
    • negative: ls-remote tip SHA mismatches commit_sha → commit_sha error
    • negative: gh headRefOid mismatches commit_sha → commit_sha error
  • 6 existing reality-layer tests updated to include headRefOid in gh JSON fixtures so they exercise the new SHA binding.

Test count for evidence-validator.test.ts: 43 → 45 (all passing).


❌ Not Fixed (Deferred per Review Recommendation)

  • MED-2 stale workflowRun.metadata snapshot (dag-executor.ts:3159-3201) — reviewer notes the pattern is wrong but no current sibling writer exists, so this is future-proofing. Consolidated review explicitly recommends "track Option B as a follow-up roadmap entry." Suggested follow-up issue captures the long-term answer (mergeWorkflowRunMetadata on IWorkflowStore).

  • MED-5, MED-6, LOW-3 through LOW-13 — test-coverage gaps and verify: 'reality' race-window items deferred to a focused test-debt issue and the bundled-flag-flip follow-up PR (per review note: "needed before any bundled workflow flips verify: 'reality'").


📋 Suggested Follow-up Issues

  1. "Add mergeWorkflowRunMetadata to IWorkflowStore for transactional metadata merges" (P2) — MED-2
  2. "Test coverage gaps in evidence-validator (ls-remote rejection, EISDIR/EACCES load, gh auth, DB swallow paths, required: false no-op, reality-mode integration via dag-executor)" (P3) — MED-5, MED-6, LOW-6/8/10
  3. "Reality-mode race-window re-check + operator troubleshooting note" (P1, required before bundled-flag flip) — LOW-4, LOW-13
  4. "Bind commit_sha to PR head and branch tip in verify: reality" — ✅ DONE in this commit

Validation

bun run check:bundled | ✅ check:bundled-skill | ✅ type-check (10 packages) | ✅ lint --max-warnings 0 | ✅ format:check | ✅ bun run test

bun run validate exits 0 (all 6 gates green).


Auto-fixed by Archon comprehensive-pr-review workflow
Fixes pushed to branch archon/task-archon-p3-real-execution-proof

@shhaider

shhaider commented May 6, 2026

Copy link
Copy Markdown
Author

🎯 Workflow Summary

Plan: ROADMAP P3-A — Real-Execution Proof Validation for PR-Producing Workflows
Workflow ID: 4c517fe4db197edb948e3f3d4adf9927
Status: ✅ Implementation complete, review fixes applied, PR ready for human review


Implementation vs Plan

Metric Planned Actual
Files created 3 3
Files updated 6 7 (incl. ROADMAP.md whitespace)
Tests added ≥30 negative cases 52 (43 validator + 6 schema + 3 executor) → +2 from review = 54
Deviations 2 (both pre-flagged, intent preserved)
📋 Deviations from Plan (2)

Deviation 1 — completeWorkflowRun, not updateWorkflowRun
Plan referenced the terminal-success transition by an approximate name. Actual store method is completeWorkflowRun at dag-executor.ts:3134; failure path is failWorkflowRun. Gate inserted at the single correct boundary; structured issues persisted at metadata.evidence_validation via a separate updateWorkflowRun call (since failWorkflowRun(id, error) only takes a string). Pre-flagged in plan-confirmation.md.

Deviation 2 — ROADMAP.md prettier reformat
File added by prior commit 661e72c9 failed prettier --check; that blocked bun run validate repo-wide. Ran prettier --write ROADMAP.md (whitespace-only — no content change). Pre-existing tech debt; not new content.


Review Summary

5 reviewer agents ran (code-review, error-handling, test-coverage, comment-quality, docs-impact). 24 raw findings → 21 distinct after de-dup.

Severity Found Fixed Remaining
CRITICAL 0 0 0
HIGH 3 3 0
MEDIUM 6 3 (auto-fix) 3 (deferred)
LOW 13 2 (auto-fix) 11 (deferred)

HIGH fixes landed in commit 83dcb4b6:

  • HIGH-1 (security): Reality layer now binds commit_sha to PR head (headRefOid) and ls-remote tip — fabricated evidence can no longer pass verify: 'reality'.
  • HIGH-2: CLAUDE.md workflow-engine bullet list now documents evidence_policy.
  • HIGH-3: authoring-workflows.md adds ## Evidence-Gated Workflows section with schema, modes, worked example, failure shape.

Auto-fixed MEDIUM/LOW: log-event naming convention (workflow.* prefix, ~12 sites), gh non-JSON detail enrichment, success-path metadata-persist no longer silently swallowed, evidence_policy.path schema tightened (.trim().min(1).refine(...)), unused 'warning' enum dropped.


📋 Suggested Follow-Up Issues

# Title Priority Source
1 Reality-mode race-window re-check + operator troubleshooting note for metadata.evidence_validation P1 LOW-4, LOW-13 — required before any bundled workflow flips verify: 'reality'
2 Flip evidence_policy.required: true on bundled workflows (archon-fix-github-issue, archon-idea-to-pr, archon-plan-to-pr) P1 NOT Building — explicitly deferred until engine surface is stable
3 Add mergeWorkflowRunMetadata to IWorkflowStore for transactional metadata merges P2 MED-2 — latent foot-gun once a sibling writer touches metadata
4 Test coverage: ls-remote rejection, EISDIR/EACCES load, gh auth-failure non-ENOENT, DB swallow paths, required: false no-op P3 MED-5, MED-6, LOW-6, LOW-8, LOW-10
5 Reality-mode integration test through dag-executor P3 LOW-9

Reply with: @archon create follow-up issues to file these (or pick a subset).


📝 Documentation Updates

Already shipped in 83dcb4b6 (HIGH-2, HIGH-3). No additional docs gaps blocking merge.

Operator-facing troubleshooting note for metadata.evidence_validation is intentionally deferred to the bundled-flag-flip PR — see follow-up issue #1 above.


ℹ️ Deferred Items (NOT Building — intentionally excluded)

These were excluded from this PR's scope by design. Each is its own follow-up surface; do not flag as bugs:

  • No new node type — validator runs once at terminal-success boundary; not encoded as evidence: node.
  • No bundled YAML changesarchon-fix-github-issue.yaml, archon-idea-to-pr.yaml, archon-plan-to-pr.yaml not modified. Engine surface ships first; flag flip lands separately (follow-up updated code to use locally hosted llama LLM, nomic-embed-text model. #2).
  • No CLI surface changes — no archon workflow validate-evidence subcommand; no --json output changes.
  • No DB schema migration — evidence persists via existing WorkflowRun.metadata (z.record(z.unknown())).
  • No web UI changesfailed runs with metadata.evidence_validation render like any other failure.
  • No provider/adapter changes — evidence is workflow-author-supplied; SDK harness does not auto-collect commit SHAs in this PR.
  • No AI-driven evidence "judging" — deterministic validation only (Zod + git/gh I/O).
  • No retroactive validation — existing completed runs are not re-checked.
  • No bundled-defaults.generated.ts regeneration — no bundled YAML changes.
  • Reality I/O is opt-in — default is verify: 'shape' (hermetic). Operators flip per workflow.
  • test_commands === [] warning-level flag — out of scope.

Validation Evidence

bun run validate exits 0 on 83dcb4b6:

Check Status
check:bundled ✅ up to date (36 commands, 20 workflows)
check:bundled-skill ✅ up to date (21 files)
type-check ✅ all 10 packages
lint --max-warnings 0 ✅ 0 errors / 0 warnings
format:check ✅ all matched files
test ✅ 522 tests passing across @archon/workflows (11 isolated batches), all packages green

Artifacts: ~/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/
Final commit: 83dcb4b6 fix(workflows): address review findings on execution-evidence gate

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
packages/workflows/src/dag-executor.ts (2)

3159-3165: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid merging evidence metadata into the stale workflowRun.metadata snapshot.

Both writes spread the executor-start copy of workflowRun.metadata, so any metadata written during the run can be lost when evidence_validation is persisted. Re-read the latest metadata or use a partial/transactional merge API here.

Also applies to: 3199-3203

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3159 - 3165, The update
is merging into the stale executor-start snapshot by spreading
workflowRun.metadata when calling deps.store.updateWorkflowRun to set
metadata.evidence_validation; instead re-fetch the latest metadata (or use your
store's partial/transactional merge API) and merge evidence_validation into that
latest object before calling deps.store.updateWorkflowRun so concurrent metadata
writes aren't lost; apply the same change for the other occurrence that updates
metadata at the block around lines referencing deps.store.updateWorkflowRun and
evidence_validation (where evidenceResult.issues is used).

3144-3205: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Re-check run status after validateEvidence() returns.

verify: 'reality' can spend seconds in git/gh. If the run is cancelled or deleted during that window, this block can still flip it to failed or completed because the last skipIfStatusChanged() ran before validation started. Please guard again before either terminal write.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3144 - 3205, After
validateEvidence() returns, re-check the run status before doing any terminal
writes: call the existing skipIfStatusChanged(workflowRun.id) (or if
unavailable, re-fetch the run via deps.store.getWorkflowRun(workflowRun.id) and
ensure it is still running) and abort/return if the run was
cancelled/deleted/changed; only then proceed to call
deps.store.updateWorkflowRun(... evidence_validation ...) or
deps.store.failWorkflowRun(...) and to emit
events/safeSendMessage/logWorkflowError. Ensure this check is applied both on
the failure path (before persisting failure and sending notifications) and on
the success path (before persisting the success marker).
🧹 Nitpick comments (1)
packages/workflows/src/evidence-validator.test.ts (1)

567-632: ⚡ Quick win

Please cover the non-ENOENT loader branches here.

validateEvidence() has dedicated handling for read failures like EISDIR/EACCES, but this suite only exercises missing files and malformed JSON. A couple of focused regression tests for those branches would protect the evidence_policy.path diagnostics on a changed critical path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.test.ts` around lines 567 - 632,
Add two tests to evidence-validator.test.ts that exercise the non-ENOENT
read-failure branches in validateEvidence: one that creates a directory at the
policy path (e.g., mkdir join(testDir, 'evidence.json')) and calls
validateEvidence(...) with policy.path set to that path to trigger an EISDIR
read error, and another that simulates an EACCES read error (either by creating
a file and chmod 0 on it or by stubbing fs.readFile to throw an { code: 'EACCES'
} error) and calling validateEvidence(...) with policy.path pointing to that
file; assert result.valid is false and that result.issues includes an issue with
field 'evidence_policy.path' for each case. Ensure you use the existing
helpers/testDir setup and reference validateEvidence and
loadEvidenceFromArtifacts in the new tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/docs-web/src/content/docs/guides/authoring-workflows.md`:
- Around line 796-815: The example under id "write-evidence" uses hardcoded
origin/dev and undocumented vars (${SESSION_ID}, ${PR_URL}, ${PR_NUMBER});
change it to use the documented/base branch variable and the outputs from the
open-pr step/job: replace origin/dev with $BASE_BRANCH (or
${BASE_BRANCH:-origin/main}) and replace ${SESSION_ID}, ${PR_URL}, ${PR_NUMBER}
with the explicit open-pr outputs you document and reference (e.g. the job/step
outputs such as $OPEN_PR_SESSION_ID, $OPEN_PR_PR_URL, $OPEN_PR_PR_NUMBER or the
appropriate GitHub Actions expression form like ${{
jobs.open-pr.outputs.session_id }}), and update the surrounding text to document
those variables so the snippet is copy-pastable and portable.

In `@packages/workflows/src/evidence-validator.ts`:
- Around line 315-322: The schema requires workflow_run_id to match the active
run but the ValidateEvidenceArgs and validateEvidence() never receive or check
it; update the ValidateEvidenceArgs interface to include a workflowRunId (or
workflow_run_id) field and then, at the start of validateEvidence(), compare the
incoming evidence.workflow_run_id to the provided
ValidateEvidenceArgs.workflowRunId (or workflow_run_id) and throw/reject on
mismatch before proceeding (use the existing error handling pattern); reference
ValidateEvidenceArgs and validateEvidence() when making these changes and ensure
the error message clearly states the workflow run id mismatch.

---

Duplicate comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3159-3165: The update is merging into the stale executor-start
snapshot by spreading workflowRun.metadata when calling
deps.store.updateWorkflowRun to set metadata.evidence_validation; instead
re-fetch the latest metadata (or use your store's partial/transactional merge
API) and merge evidence_validation into that latest object before calling
deps.store.updateWorkflowRun so concurrent metadata writes aren't lost; apply
the same change for the other occurrence that updates metadata at the block
around lines referencing deps.store.updateWorkflowRun and evidence_validation
(where evidenceResult.issues is used).
- Around line 3144-3205: After validateEvidence() returns, re-check the run
status before doing any terminal writes: call the existing
skipIfStatusChanged(workflowRun.id) (or if unavailable, re-fetch the run via
deps.store.getWorkflowRun(workflowRun.id) and ensure it is still running) and
abort/return if the run was cancelled/deleted/changed; only then proceed to call
deps.store.updateWorkflowRun(... evidence_validation ...) or
deps.store.failWorkflowRun(...) and to emit
events/safeSendMessage/logWorkflowError. Ensure this check is applied both on
the failure path (before persisting failure and sending notifications) and on
the success path (before persisting the success marker).

---

Nitpick comments:
In `@packages/workflows/src/evidence-validator.test.ts`:
- Around line 567-632: Add two tests to evidence-validator.test.ts that exercise
the non-ENOENT read-failure branches in validateEvidence: one that creates a
directory at the policy path (e.g., mkdir join(testDir, 'evidence.json')) and
calls validateEvidence(...) with policy.path set to that path to trigger an
EISDIR read error, and another that simulates an EACCES read error (either by
creating a file and chmod 0 on it or by stubbing fs.readFile to throw an { code:
'EACCES' } error) and calling validateEvidence(...) with policy.path pointing to
that file; assert result.valid is false and that result.issues includes an issue
with field 'evidence_policy.path' for each case. Ensure you use the existing
helpers/testDir setup and reference validateEvidence and
loadEvidenceFromArtifacts in the new tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6b1fdfae-ad23-415c-906e-999063d6e8bb

📥 Commits

Reviewing files that changed from the base of the PR and between 55e3905 and 83dcb4b.

📒 Files selected for processing (6)
  • CLAUDE.md
  • packages/docs-web/src/content/docs/guides/authoring-workflows.md
  • packages/workflows/src/dag-executor.ts
  • packages/workflows/src/evidence-validator.test.ts
  • packages/workflows/src/evidence-validator.ts
  • packages/workflows/src/schemas/evidence.ts
✅ Files skipped from review due to trivial changes (1)
  • CLAUDE.md

Comment on lines +796 to +815
- id: write-evidence
bash: |
cat > "$ARTIFACTS_DIR/evidence.json" <<EOF
{
"kind": "execution",
"workflow_run_id": "$WORKFLOW_ID",
"provider": "claude",
"provider_run_ids": ["${SESSION_ID}"],
"changed_files": $(git diff --name-only origin/dev...HEAD | jq -R -s -c 'split("\n") | map(select(. != ""))'),
"diff_command": "git diff --stat origin/dev...HEAD",
"test_commands": ["bun test"],
"test_output_summary": "all tests passed",
"commit_sha": "$(git rev-parse HEAD)",
"pushed_branch": "$(git rev-parse --abbrev-ref HEAD)",
"pr_url": "${PR_URL}",
"pr_number": ${PR_NUMBER}
}
EOF
depends_on: [open-pr]
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make this example use documented variables and a portable base branch.

Right now the snippet hardcodes origin/dev and relies on ${SESSION_ID}, ${PR_URL}, and ${PR_NUMBER} without introducing them anywhere in this guide. As written, it's easy to copy-paste into a workflow that cannot actually produce a valid evidence.json. Please switch the example to documented values like $BASE_BRANCH / $nodeId.output..., or document those variables alongside the snippet.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/docs-web/src/content/docs/guides/authoring-workflows.md` around
lines 796 - 815, The example under id "write-evidence" uses hardcoded origin/dev
and undocumented vars (${SESSION_ID}, ${PR_URL}, ${PR_NUMBER}); change it to use
the documented/base branch variable and the outputs from the open-pr step/job:
replace origin/dev with $BASE_BRANCH (or ${BASE_BRANCH:-origin/main}) and
replace ${SESSION_ID}, ${PR_URL}, ${PR_NUMBER} with the explicit open-pr outputs
you document and reference (e.g. the job/step outputs such as
$OPEN_PR_SESSION_ID, $OPEN_PR_PR_URL, $OPEN_PR_PR_NUMBER or the appropriate
GitHub Actions expression form like ${{ jobs.open-pr.outputs.session_id }}), and
update the surrounding text to document those variables so the snippet is
copy-pastable and portable.

Comment thread packages/workflows/src/evidence-validator.ts
@shhaider

shhaider commented May 6, 2026

Copy link
Copy Markdown
Author

Manual dogfood follow-up: pushed f6f71a8 to bind execution evidence to the active workflow run. validateEvidence now requires workflowRunId, rejects stale/copied evidence when evidence.workflow_run_id differs from the active run, and dag-executor passes workflowRun.id into the gate. Added validator and DAG executor regression tests. Verification after this patch: bun test packages/workflows/src/evidence-validator.test.ts, bun test packages/workflows/src/dag-executor.test.ts, bun --filter @archon/workflows test, and bun run validate all passed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
packages/workflows/src/dag-executor.ts (2)

3160-3165: ⚠️ Potential issue | 🟡 Minor | ⚖️ Poor tradeoff

Stale workflowRun.metadata snapshot can silently drop concurrent writes.

Both metadata updates (lines 3162–3165 and 3201–3204) spread workflowRun.metadata, which is the in-memory snapshot captured at executor startup. Any metadata written during the run (e.g., approval/loop gate state) is silently overwritten when this spread is persisted.

Prefer fetching the freshest metadata from deps.store immediately before writing, or exposing a partial-merge API (e.g., setWorkflowRunMetadata(id, { evidence_validation: ... })) so the spread is unnecessary.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3160 - 3165, The update
currently spreads the stale in-memory snapshot workflowRun.metadata into
deps.store.updateWorkflowRun (when setting evidence_validation), which can
overwrite concurrent metadata changes; modify the write to first fetch the
latest metadata from deps.store (e.g.,
deps.store.getWorkflowRun/workflowRunById) and merge evidence_validation into
that fresh object before calling deps.store.updateWorkflowRun, or change to a
partial-merge API such as deps.store.setWorkflowRunMetadata(workflowRun.id, {
evidence_validation: ... }) so you avoid spreading workflowRun.metadata and
prevent lost concurrent writes.

3136-3193: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

TOCTOU: status is not re-checked after validateEvidence returns.

The guard at line 3136 (skipIfStatusChanged) runs before validateEvidence. When policy.verify === 'reality', that call can take several seconds (git/gh subprocesses). If the run is cancelled or deleted during that window, the failure branch at line 3173 will still call failWorkflowRun — overwriting a cancelled/deleted row with failed. The same window exists on the success path at line 3200.

The existing skipIfStatusChanged helper is already available; calling it again immediately after validateEvidence returns (on both branches) closes the window:

   const evidenceResult = await validateEvidence({ ... });
+  if (await skipIfStatusChanged('dag.skip_complete_status_changed')) return;
   if (!evidenceResult.valid) {
     ...
   }
   // success path
+  if (await skipIfStatusChanged('dag.skip_complete_status_changed')) return;
   await deps.store.updateWorkflowRun(...);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3136 - 3193, The
time-of-check/time-of-use race occurs because
skipIfStatusChanged('dag.skip_complete_status_changed') runs before
validateEvidence() but not after it; add a second check immediately after
validateEvidence() returns and before taking either branch (i.e., before
persisting evidence_validation via deps.store.updateWorkflowRun, before
deps.store.failWorkflowRun/logWorkflowError/emitterForFail.emit/unregisterRun/safeSendMessage
on the failure path, and likewise before the success path that marks the run
completed). Concretely, call the existing
skipIfStatusChanged('dag.skip_complete_status_changed') right after the
validateEvidence(...) call and return early if it signals status changed to
prevent overwriting cancelled/deleted runs.
🧹 Nitpick comments (3)
packages/workflows/src/dag-executor.test.ts (1)

7032-7091: ⚡ Quick win

Add a regression test for success-path metadata write failure propagation.

This suite validates the happy path, but it doesn’t pin the “do not silently complete if evidence metadata persistence fails” behavior. A focused test here would guard the reliability fix from regressions.

Suggested test addition
+  it('required evidence: metadata write failure on success path propagates (no silent completion)', async () => {
+    await writeFile(
+      join(artifactsDir, 'evidence.json'),
+      JSON.stringify({
+        kind: 'execution',
+        workflow_run_id: 'dag-evidence-run-db-fail',
+        provider: 'claude',
+        provider_run_ids: ['session-abc'],
+        changed_files: ['src/foo.ts'],
+        diff_command: 'git diff --stat origin/dev...HEAD',
+        test_commands: ['bun test'],
+        test_output_summary: '15 passed',
+        commit_sha: 'a'.repeat(40),
+        pushed_branch: 'feature/foo',
+        pr_url: 'https://github.com/owner/repo/pull/42',
+        pr_number: 42,
+      })
+    );
+
+    const mockStore = createMockStore();
+    (mockStore.updateWorkflowRun as ReturnType<typeof mock>).mockRejectedValueOnce(
+      new Error('db write failed')
+    );
+    const mockDeps = createMockDeps(mockStore);
+    const platform = createMockPlatform();
+    const workflowRun = makeWorkflowRun('dag-evidence-run-db-fail');
+
+    await expect(
+      executeDagWorkflow(
+        mockDeps,
+        platform,
+        'conv-evidence',
+        testDir,
+        {
+          name: 'evidence-required-valid-db-fail',
+          nodes: [{ id: 'work', bash: 'echo done' } as BashNode],
+          evidence_policy: { required: true, verify: 'shape', path: 'evidence.json' },
+        },
+        workflowRun,
+        'claude',
+        undefined,
+        artifactsDir,
+        join(testDir, 'logs'),
+        'main',
+        'docs/',
+        minimalConfig
+      )
+    ).rejects.toThrow('db write failed');
+
+    expect((mockStore.completeWorkflowRun as ReturnType<typeof mock>).mock.calls.length).toBe(0);
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.test.ts` around lines 7032 - 7091, Add a
regression test that simulates updateWorkflowRun failing when writing evidence
metadata so the happy-path does not silently complete; in the existing test case
around executeDagWorkflow for evidence_policy (the 'evidence-required-valid'
scenario) mock mockStore.updateWorkflowRun to throw/reject when called for
metadata updates, then call executeDagWorkflow and assert that
mockStore.completeWorkflowRun was NOT called and mockStore.failWorkflowRun WAS
called (or that the error propagated), and also assert the updateWorkflowRun
mock was invoked for evidence metadata. Target the same symbols used in the
file: executeDagWorkflow, mockStore.updateWorkflowRun,
mockStore.completeWorkflowRun, mockStore.failWorkflowRun, and the
evidence_policy/path handling so the test specifically verifies metadata
persistence failure prevents completion.
packages/workflows/src/evidence-validator.test.ts (2)

432-454: ⚡ Quick win

Add an explicit git ls-remote rejection test branch.

You cover empty output and mismatch, but not the thrown ls-remote path. That’s a distinct failure mode and worth pinning directly.

Suggested test addition
+  it('negative: git ls-remote rejects -> pushed_branch issue', async () => {
+    execFileImpl = (cmd, args) => {
+      if (cmd === 'git' && args[0] === 'cat-file') {
+        return Promise.resolve({ stdout: '', stderr: '' });
+      }
+      if (cmd === 'git' && args[0] === 'ls-remote') {
+        return Promise.reject(Object.assign(new Error('ls-remote failed'), { stderr: 'fatal: network error' }));
+      }
+      if (cmd === 'gh') {
+        return Promise.resolve({
+          stdout: JSON.stringify({
+            headRefName: 'feature/foo',
+            number: 42,
+            headRefOid: 'a'.repeat(40),
+          }),
+          stderr: '',
+        });
+      }
+      return Promise.resolve({ stdout: '', stderr: '' });
+    };
+
+    const issues = await verifyEvidenceReality(REAL_EVIDENCE, '/repo');
+    expect(issues.some(i => i.field === 'pushed_branch')).toBe(true);
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.test.ts` around lines 432 - 454,
Add a new test case in packages/workflows/src/evidence-validator.test.ts that
specifically simulates a rejected git ls-remote call: in the new "it" (e.g.,
'negative: git ls-remote rejects → pushed_branch error') set execFileImpl so it
returns Promise.reject(new Error('ls-remote failed')) when cmd === 'git' &&
args[0] === 'ls-remote' (keep existing handling for 'git cat-file' and 'gh' as
in other tests), then call verifyEvidenceReality(REAL_EVIDENCE, '/repo') and
assert that the returned issues include an entry with field === 'pushed_branch';
reference the verifyEvidenceReality function, execFileImpl mock, and
REAL_EVIDENCE to locate where to add this test.

567-634: ⚡ Quick win

Add loader tests for non-ENOENT filesystem errors (e.g., EISDIR/EACCES).

loadEvidenceFromArtifacts currently checks missing/valid/malformed, but not other read failures. Pinning these keeps the “structured issues, no crash” contract safe.

Suggested deterministic branch test (EISDIR)
+  it('directory at evidence path is handled without throw (EISDIR path)', async () => {
+    await mkdir(join(testDir, 'evidence.json'), { recursive: true });
+    const result = await validateEvidence({
+      artifactsDir: testDir,
+      cwd: '/repo',
+      workflowRunId: 'run-1',
+      policy: { required: true, verify: 'shape', path: 'evidence.json' },
+    });
+    expect(result.valid).toBe(false);
+    if (!result.valid) {
+      expect(result.issues.some(i => i.field === 'evidence_policy.path')).toBe(true);
+    }
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.test.ts` around lines 567 - 634,
Add a deterministic test that simulates non-ENOENT fs errors by mocking
fs.promises.readFile (for the evidence.json path) to throw an error with code
'EISDIR' (and another case for 'EACCES'), then call loadEvidenceFromArtifacts
and assert it does not throw and returns a structured result (i.e., a handled
response rather than crashing — e.g., result.found truthy and result.raw is a
string or error-info), and also add a companion validateEvidence call (using
validateEvidence with a policy) to assert validation returns a non-valid result
with an appropriate issue reported; reference the existing test patterns and the
functions loadEvidenceFromArtifacts and validateEvidence and the artifact
filename 'evidence.json' when adding the new tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/workflows/src/evidence-validator.ts`:
- Around line 151-207: In verifyEvidenceReality, the three execFileAsync calls
(the git cat-file -e call, the git ls-remote --heads origin call, and the gh pr
view call) must include a timeout option to avoid indefinite hangs; update the
options object passed to execFileAsync for the git cat-file invocation to
include a short timeout (e.g. 30_000 ms) and for the networked git ls-remote and
gh pr view calls to include a longer timeout (e.g. 60_000 ms), preserving the
existing cwd property (i.e., change { cwd } to { cwd, timeout: <ms> }) so Node's
child_process.execFile will kill stalled subprocesses.

---

Duplicate comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3160-3165: The update currently spreads the stale in-memory
snapshot workflowRun.metadata into deps.store.updateWorkflowRun (when setting
evidence_validation), which can overwrite concurrent metadata changes; modify
the write to first fetch the latest metadata from deps.store (e.g.,
deps.store.getWorkflowRun/workflowRunById) and merge evidence_validation into
that fresh object before calling deps.store.updateWorkflowRun, or change to a
partial-merge API such as deps.store.setWorkflowRunMetadata(workflowRun.id, {
evidence_validation: ... }) so you avoid spreading workflowRun.metadata and
prevent lost concurrent writes.
- Around line 3136-3193: The time-of-check/time-of-use race occurs because
skipIfStatusChanged('dag.skip_complete_status_changed') runs before
validateEvidence() but not after it; add a second check immediately after
validateEvidence() returns and before taking either branch (i.e., before
persisting evidence_validation via deps.store.updateWorkflowRun, before
deps.store.failWorkflowRun/logWorkflowError/emitterForFail.emit/unregisterRun/safeSendMessage
on the failure path, and likewise before the success path that marks the run
completed). Concretely, call the existing
skipIfStatusChanged('dag.skip_complete_status_changed') right after the
validateEvidence(...) call and return early if it signals status changed to
prevent overwriting cancelled/deleted runs.

---

Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 7032-7091: Add a regression test that simulates updateWorkflowRun
failing when writing evidence metadata so the happy-path does not silently
complete; in the existing test case around executeDagWorkflow for
evidence_policy (the 'evidence-required-valid' scenario) mock
mockStore.updateWorkflowRun to throw/reject when called for metadata updates,
then call executeDagWorkflow and assert that mockStore.completeWorkflowRun was
NOT called and mockStore.failWorkflowRun WAS called (or that the error
propagated), and also assert the updateWorkflowRun mock was invoked for evidence
metadata. Target the same symbols used in the file: executeDagWorkflow,
mockStore.updateWorkflowRun, mockStore.completeWorkflowRun,
mockStore.failWorkflowRun, and the evidence_policy/path handling so the test
specifically verifies metadata persistence failure prevents completion.

In `@packages/workflows/src/evidence-validator.test.ts`:
- Around line 432-454: Add a new test case in
packages/workflows/src/evidence-validator.test.ts that specifically simulates a
rejected git ls-remote call: in the new "it" (e.g., 'negative: git ls-remote
rejects → pushed_branch error') set execFileImpl so it returns
Promise.reject(new Error('ls-remote failed')) when cmd === 'git' && args[0] ===
'ls-remote' (keep existing handling for 'git cat-file' and 'gh' as in other
tests), then call verifyEvidenceReality(REAL_EVIDENCE, '/repo') and assert that
the returned issues include an entry with field === 'pushed_branch'; reference
the verifyEvidenceReality function, execFileImpl mock, and REAL_EVIDENCE to
locate where to add this test.
- Around line 567-634: Add a deterministic test that simulates non-ENOENT fs
errors by mocking fs.promises.readFile (for the evidence.json path) to throw an
error with code 'EISDIR' (and another case for 'EACCES'), then call
loadEvidenceFromArtifacts and assert it does not throw and returns a structured
result (i.e., a handled response rather than crashing — e.g., result.found
truthy and result.raw is a string or error-info), and also add a companion
validateEvidence call (using validateEvidence with a policy) to assert
validation returns a non-valid result with an appropriate issue reported;
reference the existing test patterns and the functions loadEvidenceFromArtifacts
and validateEvidence and the artifact filename 'evidence.json' when adding the
new tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79600dcf-d763-4d63-a960-886426bc50c8

📥 Commits

Reviewing files that changed from the base of the PR and between 83dcb4b and f6f71a8.

📒 Files selected for processing (4)
  • packages/workflows/src/dag-executor.test.ts
  • packages/workflows/src/dag-executor.ts
  • packages/workflows/src/evidence-validator.test.ts
  • packages/workflows/src/evidence-validator.ts

Comment on lines +151 to +207
await execFileAsync('git', ['cat-file', '-e', `${evidence.commit_sha}^{commit}`], { cwd });
} catch (err) {
const e = err as Error & { stderr?: string };
issues.push({
level: 'error',
field: 'commit_sha',
message: `git cat-file -e ${evidence.commit_sha}^{commit} failed: ${
e.stderr?.trim() || e.message
}`,
hint: 'commit_sha is not reachable in this checkout — fake SHA or wrong cwd',
});
}

// 2. pushed_branch present on origin AND its tip SHA matches commit_sha
try {
const { stdout } = await execFileAsync(
'git',
['ls-remote', '--heads', 'origin', evidence.pushed_branch],
{ cwd }
);
const trimmed = stdout.trim();
if (trimmed.length === 0) {
issues.push({
level: 'error',
field: 'pushed_branch',
message: `git ls-remote --heads origin ${evidence.pushed_branch} returned empty; branch is not on origin`,
hint: 'Did the workflow run `git push -u origin <branch>` after committing?',
});
} else {
// Bind commit_sha to the branch tip on origin. Without this check, any
// locally-reachable SHA can be paired with any real branch and pass.
const remoteSha = trimmed.split(/\s+/)[0];
if (remoteSha !== evidence.commit_sha) {
issues.push({
level: 'error',
field: 'commit_sha',
message: `commit_sha=${evidence.commit_sha} does not match origin/${evidence.pushed_branch} tip=${remoteSha}`,
hint: 'evidence.json claims a commit that is not the tip of pushed_branch on origin',
});
}
}
} catch (err) {
const e = err as Error & { stderr?: string };
issues.push({
level: 'error',
field: 'pushed_branch',
message: `git ls-remote failed: ${e.stderr?.trim() || e.message}`,
});
}

// 3. gh pr view confirms PR url maps to pushed_branch, pr_number, and commit_sha
try {
const { stdout } = await execFileAsync(
'gh',
['pr', 'view', evidence.pr_url, '--json', 'headRefName,number,headRefOid'],
{ cwd }
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if `@archon/git`'s execFileAsync sets a default timeout.
# Expected: if a default timeout exists, the concern is mitigated.
ast-grep --pattern 'function execFileAsync($$$) { $$$ }'

Repository: coleam00/Archon

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Find the `@archon/git` package and locate execFileAsync definition
fd -t f -e ts -e tsx 'git' | grep -E '(git|src)' | head -20

Repository: coleam00/Archon

Length of output: 177


🏁 Script executed:

#!/bin/bash
# Search for execFileAsync definition more broadly
rg 'export.*execFileAsync|function execFileAsync' -t ts -t tsx -A 5

Repository: coleam00/Archon

Length of output: 87


🏁 Script executed:

#!/bin/bash
# Look for `@archon/git` package structure
find . -type d -name git -o -name '@archon' | head -10

Repository: coleam00/Archon

Length of output: 74


🏁 Script executed:

#!/bin/bash
# Search for execFileAsync in `@archon/git` package
rg 'execFileAsync' packages/git/src/ -t ts -A 10 -B 2

Repository: coleam00/Archon

Length of output: 40547


🏁 Script executed:

#!/bin/bash
# List files in `@archon/git` package
ls -la packages/git/src/

Repository: coleam00/Archon

Length of output: 592


🏁 Script executed:

#!/bin/bash
# Read the full exec.ts file to see if there are any default timeouts
cat packages/git/src/exec.ts

Repository: coleam00/Archon

Length of output: 871


🏁 Script executed:

#!/bin/bash
# Verify the specific lines in evidence-validator.ts to confirm they lack timeout
sed -n '151p;166,170p;203,207p' packages/workflows/src/evidence-validator.ts

Repository: coleam00/Archon

Length of output: 458


Add timeout to verifyEvidenceReality subprocess calls to prevent indefinite hangs.

All three execFileAsync calls lack a timeout option:

Line Call Risk
151 git cat-file -e Local; credential helper/lock contention
166–170 git ls-remote origin Network; unreachable → ~2 min TCP timeout
203–207 gh pr view Network; GitHub API flaky → ~2 min TCP timeout

Without timeout, the default is 0 (infinite). The execFileAsync wrapper in @archon/git does not apply a built-in default—it passes options directly to Node.js's child_process.execFile. Network stalls on git ls-remote or gh pr view will hang indefinitely, permanently blocking the DAG executor's completion path (line 3144 of dag-executor.ts), preventing both failWorkflowRun and completeWorkflowRun.

Other subprocess calls in this codebase (executeBashNode, executeScriptNode) and throughout the @archon/git module all explicitly set timeout (10–120 seconds depending on operation). Apply the same pattern here.

Proposed fix
+/** Max wall-clock time for each individual reality-check subprocess. */
+const REALITY_CHECK_TIMEOUT_MS = 30_000;
+
 // 1. commit_sha reachable in worktree
 try {
   await execFileAsync('git', ['cat-file', '-e', `${evidence.commit_sha}^{commit}`], {
-    cwd,
+    cwd,
+    timeout: REALITY_CHECK_TIMEOUT_MS,
   });
 } catch (err) {
   ...
 }

 // 2. pushed_branch present on origin AND its tip SHA matches commit_sha
 try {
   const { stdout } = await execFileAsync(
     'git',
     ['ls-remote', '--heads', 'origin', evidence.pushed_branch],
-    { cwd }
+    { cwd, timeout: REALITY_CHECK_TIMEOUT_MS }
   );
   ...
 }

 // 3. gh pr view ...
 try {
   const { stdout } = await execFileAsync(
     'gh',
     ['pr', 'view', evidence.pr_url, '--json', 'headRefName,number,headRefOid'],
-    { cwd }
+    { cwd, timeout: REALITY_CHECK_TIMEOUT_MS }
   );
   ...
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.ts` around lines 151 - 207, In
verifyEvidenceReality, the three execFileAsync calls (the git cat-file -e call,
the git ls-remote --heads origin call, and the gh pr view call) must include a
timeout option to avoid indefinite hangs; update the options object passed to
execFileAsync for the git cat-file invocation to include a short timeout (e.g.
30_000 ms) and for the networked git ls-remote and gh pr view calls to include a
longer timeout (e.g. 60_000 ms), preserving the existing cwd property (i.e.,
change { cwd } to { cwd, timeout: <ms> }) so Node's child_process.execFile will
kill stalled subprocesses.

@Wirasm

Wirasm commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Review Summary

Verdict: minor-fixes-needed

This PR implements a real-execution proof validation system for PR-producing workflows (ROADMAP P3-A). The three-layer validator, evidence schemas, and DAG executor integration are solid — 52 new tests, clean TypeScript, CLAUDE.md-compliant throughout. Two categories of work remain: documentation for the new public surface, and comment hygiene (ROADMAP references and redundant docstrings).

Blocking issues

  • Missing docs for new public surface (packages/docs-web/src/content/docs/guides/authoring-workflows.md): The evidence_policy workflow field, evidence.json file contract, and protected-branch rejection behavior are all new user-facing features with no documentation.
    • Add evidence_policy to the DAG schema section: all three sub-fields (required, verify, path) with examples.
    • Document the evidence.json discriminated union, required execution fields, GitHub-only pr_url constraint, 40-char hex commit_sha format, and non-empty array requirements.
    • Note that pushed_branch cannot be main, master, dev, or develop.

Suggested fixes

  • packages/workflows/src/dag-executor.ts (~lines 3161-3204): The failure path swallows DB errors (updateWorkflowRun(...).catch(...)) while the success path propagates them. If the DB write fails on the failure path (e.g., connection lost), the run may be left in a non-terminal state despite workflow_failed being emitted. Either propagate the error or do not emit workflow_failed unless the DB state is confirmed.

    try {
      await deps.store.updateWorkflowRun(workflowRun.id, { metadata: { ... } });
    } catch (dbErr) {
      getLog().error({ err: dbErr, workflowRunId: workflowRun.id }, 'workflow.evidence_metadata_persist_failed');
      throw dbErr; // let outer executor catch handle
    }
  • Comment rot — remove "ROADMAP P3-A" from module docstrings in evidence-validator.ts and evidence.ts. These references will become meaningless over time. Per CLAUDE.md comment policy: comments only where WHY is non-obvious. Schema field names are self-documenting — multi-paragraph bullets describing commit_sha and provider_run_ids restate what the code already expresses.

  • CHANGELOG entry missing under [Unreleased]### Added: Add a line for evidence_policy describing the field and linking to the authoring-workflows guide.

Minor / nice-to-have

  • evidence-validator.ts:~3177-3193: The three sequential await DB writes in the failure handler are fire-and-forget with silent .catch(). Add a comment: // DB writes and log best-effort; run state (failed) is already set above. to make the semantics explicit.
  • evidence.ts:148-151: Remove the instructional aside from the level field comment — comments should describe current behavior, not future plans.
  • Test for DB rejection on the success marker persist: spy on mockStore.updateWorkflowRun to simulate a rejection and assert the run still calls completeWorkflowRun. Confirms "log and continue" is intentional.
  • Test file cleanup (afterEach): add // test cleanup: non-fatal comment to the catch {} blocks in evidence-validator.test.ts and dag-executor.test.ts.

Compliments

  • The verifyEvidenceReality function documents a subtle security invariant explicitly: "Bind commit_sha to the branch tip on origin. Without this check, any locally-reachable SHA can be paired with any real branch and pass." This kind of WHY-captured reasoning is exactly what CLAUDE.md asks for — keep this pattern.
  • The PROTECTED_BRANCHES defensive guard with the "feature-branch heuristic" comment explains the intent clearly. The schema hint surfaces the constraint at runtime. Well done.
  • Test isolation: evidence-validator.test.ts correctly runs as its own bun test invocation (registered in package.json) to avoid mock pollution — exactly as CLAUDE.md requires for mock.module() usage.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact.

@Wirasm

Wirasm commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Hi @shhaider — thanks for the substantial work here. We want to build in this direction (verifiable PR-evidence is a real maturity gap), but at 2137 LOC the proposed surface is more than we want to commit to before we've validated the pattern with users.

Could you slice this into an MVP and a follow-up?

MVP scope (target ~200 LOC):

  • The executor refuses to mark a run completed when evidence_policy.required: true is declared on the workflow AND $ARTIFACTS_DIR/evidence.json is missing
  • Run is downgraded to failed with a structured error in metadata.evidence_validation
  • No schema validation, no reality I/O (git ls-remote/gh pr view), no evidencePolicySchema/evidence-validator.ts for now

Out of MVP, future PR after we see the gate get used:

  • The typed executionEvidenceSchema + three-layer validator
  • The opt-in verify: 'reality' mode
  • The structured metadata.evidence_validation.issues array

This way we land the load-bearing executor gate first, see how workflow authors adopt it, and then expand the contract surface based on what they actually need rather than what we guess they'll need. If the slim version gets used and people start asking "can the contract enforce ?", we add <X> in a follow-up.

Happy to chat through the slice if any of the cut surface feels load-bearing to you.

@Wirasm

Wirasm commented May 27, 2026

Copy link
Copy Markdown
Collaborator

@$shhaider related to #1433 — overlapping area or partial fix.

@Wirasm

Wirasm commented May 27, 2026

Copy link
Copy Markdown
Collaborator

@$shhaider related to #1720 — overlapping area or partial fix.

@Wirasm

Wirasm commented May 27, 2026

Copy link
Copy Markdown
Collaborator

@$shhaider related to #1433 — overlapping area or partial fix.

@Wirasm

Wirasm commented May 27, 2026

Copy link
Copy Markdown
Collaborator

@$shhaider related to #1720 — overlapping area or partial fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants