feat(workflows): add execution-evidence contract with terminal-success gate by shhaider · Pull Request #1601 · coleam00/Archon

shhaider · 2026-05-06T20:36:49Z

Summary

Problem: PR-producing workflows can finish with status='completed' based only on a self-report ("I created the PR"), with no enforced contract proving the work actually happened. Planning artifacts, fabricated commit SHAs, unpushed branches, or unreachable PR URLs all pass undetected.
Why it matters: Trust in autonomous workflow runs depends on a hard, machine-checkable boundary between "claimed completion" and "verified completion." Without one, downstream automation (auto-resume, run-history dashboards, governance gates) inherits unverifiable state.
What changed: Added a typed evidence contract (executionEvidenceSchema, evidencePolicySchema) and a three-layer validator (evidence-validator.ts) inside @archon/workflows. The DAG executor now deterministically blocks terminal completed status when the workflow declares evidence_policy.required: true and $ARTIFACTS_DIR/evidence.json is missing, malformed, or fake. On failure the run is downgraded to failed with structured issues persisted at metadata.evidence_validation.
What did not change (scope boundary): No new node type, no bundled YAML changes, no CLI subcommand, no DB migration, no web UI, no provider/adapter changes, no AI-driven judging, no retroactive validation of prior runs. Reality I/O (git ls-remote, gh pr view) is opt-in via verify: 'reality'; default is 'shape'.

UX Journey

Before

Workflow author        Archon executor              Operator
─────────────────      ───────────────              ────────
runs PR workflow ───▶  executes DAG nodes
                       last node completes
                       store.completeWorkflowRun()
                       status='completed' ─────────▶ sees ✅ completed
                                                     (no verification —
                                                      may be planning-only,
                                                      fake SHA, unpushed
                                                      branch, etc.)

After

Workflow author              Archon executor                          Operator
─────────────────            ────────────────                          ────────
sets evidence_policy:        executes DAG nodes
  required: true             last node completes
  verify: 'shape'            [+] reads $ARTIFACTS_DIR/evidence.json
writes evidence.json     ──▶ [+] validateEvidence(): shape → integrity → reality
                                  │
                                  ├─ ok ──▶ store.completeWorkflowRun()
                                  │        status='completed'  ──────▶ ✅ completed
                                  │
                                  └─ fail ─▶ store.failWorkflowRun(msg)
                                            updateWorkflowRun({
                                              metadata.evidence_validation: {issues}
                                            })
                                            status='failed'  ────────▶ ❌ failed +
                                                                       actionable issues

Architecture Diagram

Before

@archon/workflows
├── schemas/
│   ├── workflow.ts          (workflowBaseSchema)
│   ├── workflow-run.ts
│   └── index.ts
├── dag-executor.ts          ──▶ store.completeWorkflowRun()  [single terminal-success]
├── store.ts                 (IWorkflowStore)
└── validator.ts             (workflow-definition validator)

After

@archon/workflows
├── schemas/
│   ├── evidence.ts          [+]   executionEvidenceSchema (discriminated union on `kind`),
│   │                              evidencePolicySchema, evidenceValidationIssueSchema,
│   │                              evidenceValidationResultSchema
│   ├── workflow.ts          [~]   workflowBaseSchema gains optional `evidence_policy`
│   ├── workflow-run.ts
│   └── index.ts             [~]   re-exports new evidence schemas
├── evidence-validator.ts    [+]   validateEvidenceShape (Zod)
│                                  validateEvidenceIntegrity (cross-field)
│                                  verifyEvidenceReality (git/gh I/O, opt-in)
│                                  loadEvidenceFromArtifacts
│                                  validateEvidence (orchestrator)
├── dag-executor.ts          [~]   gates terminal-success on evidence_policy
│                                  === store.completeWorkflowRun()  (when valid)
│                                  === store.failWorkflowRun() + updateWorkflowRun(metadata)
│                                                                  (when invalid)
├── store.ts
└── validator.ts

Connection inventory:

From	To	Status	Notes
`dag-executor.ts`	`evidence-validator.ts::validateEvidence`	new	Called once at terminal-success boundary when `evidence_policy?.required === true`
`evidence-validator.ts`	`@archon/git` (`execFileAsync`)	new	Reality layer only; opt-in via `verify: 'reality'`. Stubbed in tests via isolated `mock.module()` invocation
`evidence-validator.ts`	`gh` CLI (subprocess)	new	Reality layer only; ENOENT/auth failures produce clean error issues, never throw
`schemas/index.ts`	`schemas/evidence.ts`	new	Re-export `executionEvidenceSchema`, `evidencePolicySchema`, derived types
`schemas/workflow.ts`	`schemas/evidence.ts`	new	`workflowBaseSchema.evidence_policy: evidencePolicySchema.optional()`
`dag-executor.ts`	`store.completeWorkflowRun`	unchanged	Still the single terminal-success transition
`dag-executor.ts`	`store.failWorkflowRun`	unchanged	Existing failure path — reused for evidence-validation failures
`dag-executor.ts`	`store.updateWorkflowRun`	unchanged	Used to persist `metadata.evidence_validation` after `failWorkflowRun`

Label Snapshot

Risk: risk: low
Size: size: M
Scope: workflows
Module: workflows:executor, workflows:schemas

Change Metadata

Change type: feature
Primary scope: workflows

Linked Issue

Closes: n/a — implements ROADMAP P3-A (ROADMAP.md)
Related: n/a
Depends on: none
Supersedes: none

Validation Evidence (required)

bun run validate

All gates green on first invocation (no fixes required during validation):

check:bundled — bundled-defaults.generated.ts is up to date (36 commands, 20 workflows)
check:bundled-skill — bundled-skill.ts is up to date (21 files)
type-check — all 10 packages exit 0
lint — bun x eslint . --cache --max-warnings 0 (0 errors, 0 warnings)
format:check — All matched files use Prettier code style!
test — all per-package tests pass (sprint-relevant: 43 evidence-validator + 6 schema + 3 dag-executor integration = 52 new tests)

Test breakdown:

Test File	Tests	Coverage
`packages/workflows/src/evidence-validator.test.ts` (new)	43	All 5 groups: shape, integrity, reality, loader, orchestrator. Every Edge Case in plan covered.
`packages/workflows/src/schemas.test.ts` (extended)	+6	`executionEvidenceSchema` parse (execution + planning), `evidencePolicySchema` defaults, `workflowBaseSchema` accepts policy
`packages/workflows/src/dag-executor.test.ts` (extended)	+3	required + missing → `failed` + structured issues; required + valid → `completed`; no-policy → `completed` (regression guard)

Evidence provided: validation.md artifact (full per-package counts in /Users/syedhaider/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/validation.md)
Skipped commands: none

Security Impact (required)

New permissions/capabilities? No — validator only reads $ARTIFACTS_DIR/evidence.json (workflow-owned) and runs read-only git ls-remote / git cat-file -e / gh pr view when reality checks are explicitly opted in.
New external network calls? Yes (opt-in only) — git ls-remote origin <branch> and gh pr view <url> run only when evidence_policy.verify === 'reality'. Default is 'shape' (no network). Tests stub execFileAsync via isolated mock.module().
Secrets/tokens handling changed? No — gh uses the operator's existing auth context; no token reading or logging in this PR.
File system access scope changed? No new scope — validator reads only inside $ARTIFACTS_DIR (already workflow-owned). evidence_policy.path is rejected if absolute or contains .. (path-traversal guard with explicit negative test).
Risk and mitigation: Reality layer's git/gh subprocesses are wrapped in try/catch; failures produce structured EvidenceValidationIssue objects (never throw). gh ENOENT / auth failure produces a clean error issue with operator hint.

Compatibility / Migration

Backward compatible? Yes — evidence_policy is optional on workflowBaseSchema. Workflows that omit it have byte-identical behavior to before this PR (regression-tested in dag-executor.test.ts).
Config/env changes? No — no new environment variables, no .archon/config.yaml keys.
Database migration needed? No — evidence-validation issues are persisted via existing WorkflowRun.metadata (z.record(z.unknown())). No migrations/ files modified.
Upgrade steps: none. Existing workflows continue to work unchanged.

Human Verification (required)

What was personally validated beyond CI:

Verified scenarios:
- bun run validate exits 0 from a clean checkout of this branch (no env tweaking).
- evidence-validator.test.ts was confirmed to be isolated as its own bun test invocation in packages/workflows/package.json to prevent mock.module('@archon/git', ...) cross-pollution per CLAUDE.md mock-isolation rule.
- bun --filter @archon/workflows type-check and full-monorepo bun run type-check both green.
- The discoverWorkflowsWithConfig flow continues to load existing bundled workflows without evidence_policy (regression test).
Edge cases checked (all in evidence-validator.test.ts):
- Empty-tree zero SHA ('0' * 40)
- Short SHA ('abc1234')
- Uppercase / non-hex SHA
- Protected branches (main/master/dev/develop)
- pushed_branch claimed but git ls-remote empty
- pr_url claimed but gh pr view returns different headRefName or different number
- gh ENOENT (binary missing) → clean error issue
- evidence_policy.path absolute / .. traversal
- evidence.json missing / malformed JSON / null / string / array / number
- kind === 'planning' rejected by integrity layer
What was not verified:
- End-to-end run of a real PR-producing workflow with evidence_policy.required: true enabled (no bundled YAML opts in yet — explicitly out of scope per plan; follow-up PR).
- Reality layer against a live GitHub repo (covered by stubbed unit tests; live integration is a CI/staging concern beyond this PR).

Side Effects / Blast Radius (required)

Affected subsystems/workflows:
- @archon/workflows (only) — schemas, validator, executor terminal-success transition.
- No impact on @archon/core, @archon/cli, @archon/server, @archon/web, @archon/adapters, @archon/providers, @archon/git, @archon/isolation, @archon/paths (all type-check green; no exported surface touched).
- Workflows that omit evidence_policy are unaffected (regression-guarded).
Potential unintended effects:
- If a workflow author opts in to evidence_policy.required: true but does not write evidence.json, runs will fail terminally. This is the intended contract — explicitly documented in the schema and surfaced via metadata.evidence_validation.issues.
- Tests with mock.module('@archon/git', ...) could pollute other tests if the new test file weren't isolated. Mitigated by adding bun test src/evidence-validator.test.ts as a separate invocation in package.json per CLAUDE.md.
Guardrails / monitoring for early detection:
- metadata.evidence_validation is queryable from the API; failures surface as ordinary failed runs in run history.
- Existing log events (workflow.run_failed) fire on validation failure, with structured issue payload.
- Type-check + format + lint catch any author misuse of evidence_policy at workflow-load time.

Rollback Plan (required)

Fast rollback command/path: git revert 55e39055 && git push origin dev — single commit, additive only on the engine side. Reverting restores byte-identical pre-PR behavior because all bundled workflows still omit evidence_policy.
Feature flags or config toggles: evidence_policy.required is itself the flag. Setting it to false (or omitting the field entirely) restores prior behavior per workflow.
Observable failure symptoms: any spike in failed runs whose metadata.evidence_validation.issues is populated — unique signature distinguishing this gate's failures from other failure modes.

Risks and Mitigations

Risk: Reality I/O (git ls-remote, gh pr view) flakes in CI or sandboxed runners.
- Mitigation: verify: 'shape' is the default. Reality is strictly opt-in. Validator tests stub execFileAsync via mock.module() in an isolated bun test invocation. ENOENT/auth/exit-non-zero produce structured error issues without throwing.
Risk: mock.module('@archon/git', ...) in the new test file pollutes other workflow tests in the same bun test process.
- Mitigation: New test file is registered as its own bun test src/evidence-validator.test.ts invocation in packages/workflows/package.json. Verified by passing test run.
Risk: evidence_policy.path allows path traversal outside $ARTIFACTS_DIR.
- Mitigation: Validator rejects absolute paths and .. segments before reading. Negative test in Group D (loader) covers both.
Risk: Plan called the terminal store method updateWorkflowRun(...) but the actual call site is completeWorkflowRun(...) / failWorkflowRun(...).
- Mitigation: Pre-flagged in plan-confirmation.md. Implementation gates the actual completeWorkflowRun(...) call at dag-executor.ts:3134 (the single terminal-success transition); failure path uses the existing failWorkflowRun(id, error) plus a separate updateWorkflowRun(...) to persist metadata.evidence_validation. Plan intent — single-point gate, structured issues persisted to metadata.evidence_validation — fully preserved.

Plan: /Users/syedhaider/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/plan.md
Workflow ID: 4c517fe4db197edb948e3f3d4adf9927
ROADMAP entry: P3-A — Real-Execution Proof Validation for PR-Producing Workflows

Summary by CodeRabbit

New Features
- Optional workflow-level evidence_policy that can block workflow completion until evidence passes shape/integrity and optional reality checks.
Tests
- Comprehensive unit and integration tests covering evidence validation and execution gating.
Documentation
- Roadmap and guides updated with evidence-gated workflows, schema examples, and failure behavior.
Chores
- Validator and related utilities exposed via package exports.

…s gate Add typed evidence schema (executionEvidenceSchema, evidencePolicySchema) and a reusable validator (evidence-validator.ts) inside @archon/workflows so PR-producing workflows can opt in to real-execution proof. The DAG executor deterministically blocks terminal completed status when the workflow declares evidence_policy.required: true and evidence is missing, malformed, or fake. - Three-layer validator: shape (Zod), integrity (cross-field, protected branches, zero-SHA), and reality (git/gh I/O, opt-in via verify: 'reality') - Single-point gate at dag-executor terminal-success boundary; on failure downgrades run to failed and persists structured issues at metadata.evidence_validation - 52 new tests (43 validator across 5 groups + 6 schema smoke tests + 3 dag-executor integration tests including no-policy regression guard) - Covers full Edge Cases checklist: every fake-completion shape (planning artifacts, short/uppercase/non-hex/zero-tree SHAs, protected branches, unpushed branches, http/non-GitHub PR URLs, gh mismatches, path traversal, ENOENT on gh) - New evidence-validator.test.ts isolated as its own bun test invocation per CLAUDE.md mock-pollution rule (uses mock.module('@archon/git', ...)) - evidence_policy is workflow-level optional; omitting it preserves byte-identical behavior (regression-tested) No bundled YAML changes, no DB migration, no CLI/web/provider surface changes — engine surface only. Follow-up PR will turn the flag on for bundled PR-producing workflows once stable. Implements ROADMAP P3-A.

coderabbitai · 2026-05-06T20:37:03Z

📝 Walkthrough

Walkthrough

Adds an evidence-validation pipeline and an optional workflow-level evidence_policy that can block DAG completion: new Zod schemas, a three-layer validator (shape, integrity, reality), evidence loader/orchestrator, DAG executor gating, exports, tests, docs, and a roadmap entry. No breaking public API removals.

Changes

Real-Execution Evidence Validation

Layer / File(s)	Summary
Schema & Type Definitions `packages/workflows/src/schemas/evidence.ts`	New Zod schemas and types: `executionEvidenceSchema` (planning vs real-execution), `evidencePolicySchema`, `evidenceValidationIssueSchema`, `evidenceValidationResultSchema`, `ExecutionEvidence`, `EvidencePolicy`, `EvidenceValidationIssue/Result`, `EXECUTION_EVIDENCE_REQUIRED_FIELDS`.
Schema Integration & Export `packages/workflows/src/schemas/index.ts`, `packages/workflows/src/schemas/workflow.ts`	Re-exports evidence schemas/types; extends `workflowBaseSchema` with optional `evidence_policy` field.
Evidence Validation Core `packages/workflows/src/evidence-validator.ts`	New module implementing SHAPE (Zod parsing → field diagnostics), INTEGRITY (cross-field rules), REALITY (async git/gh checks), artifact loading, and `validateEvidence` orchestrator with typed results.
DAG Executor Integration `packages/workflows/src/dag-executor.ts`	`executeDagWorkflow` now accepts optional `evidence_policy`; when `evidence_policy.required` is true it calls `validateEvidence` before final completion, persists `metadata.evidence_validation` (issues or `{ ok: true }`), and fails the run on validation errors instead of completing.
Package Exports & Scripts `packages/workflows/package.json`	Adds public exports: `./evidence-validator`, `./utils/tool-formatter`, `./test-utils`. Updates test script to run the evidence-validator tests.
Tests `packages/workflows/src/evidence-validator.test.ts`, `packages/workflows/src/dag-executor.test.ts`, `packages/workflows/src/schemas.test.ts`	Extensive unit tests for validator layers (shape/integrity/reality/load/orchestrator), DAG executor tests for evidence gating (missing/stale/valid evidence scenarios and regression guard), and schema tests for execution evidence and policy parsing/defaults.
Documentation & Roadmap `packages/docs-web/.../authoring-workflows.md`, `CLAUDE.md`, `ROADMAP.md`	Docs guide added for "Evidence-Gated Workflows", CLAUDE note about `evidence_policy`, and new ROADMAP.md entry describing priorities and operating rules.

Sequence Diagram

sequenceDiagram
    participant WF as Workflow<br/>(DAG Executor)
    participant EV as Evidence Validator
    participant Load as Evidence Loader
    participant Shape as SHAPE Layer
    participant Integrity as INTEGRITY Layer
    participant Reality as REALITY Layer
    participant Meta as Metadata &<br/>Completion Gate

    WF->>EV: validateEvidence(artifactsDir, cwd, policy)
    
    EV->>Load: loadEvidenceFromArtifacts(path)
    Load-->>EV: evidence.json or load error
    
    EV->>Shape: validateEvidenceShape(raw)
    Shape-->>EV: validated evidence or shape issues
    
    alt Shape validation fails
        EV-->>Meta: persist issues -> metadata.evidence_validation
        Meta-->>WF: failWorkflowRun
    else Shape validation passes
        EV->>Integrity: validateEvidenceIntegrity(evidence)
        Integrity-->>EV: [] or integrity issues
        
        alt Integrity check fails
            EV-->>Meta: persist issues -> metadata.evidence_validation
            Meta-->>WF: failWorkflowRun
        else Integrity passes & policy.verify enabled
            EV->>Reality: verifyEvidenceReality(evidence, cwd)
            Reality-->>EV: [] or reality issues
            
            alt Reality check fails
                EV-->>Meta: persist issues -> metadata.evidence_validation
                Meta-->>WF: failWorkflowRun
            else All validations pass
                EV-->>Meta: persist { ok: true } -> metadata.evidence_validation
                Meta-->>WF: completeWorkflowRun
            end
        else Integrity passes & no verify
            EV-->>Meta: persist { ok: true } -> metadata.evidence_validation
            Meta-->>WF: completeWorkflowRun
        end
    end

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

coleam00/Archon#1403: Touches executeDagWorkflow behavior and tests; related to completion/failure gating semantics.

Suggested Reviewers

leex279

Poem

🐰 A rabbit reads the evidence by light,

Shape, Integrity, Reality take flight.
Git checks hum, schemas stand in line,
DAGs wait patient, metadata signs.
Roadmaps sprout — Archon hops toward the bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding an execution-evidence contract with a terminal-success gate to block workflow completion.
Description check	✅ Passed	The description is comprehensive and follows the template structure with all required sections: Summary, UX Journey (Before/After), Architecture Diagram, Labels, Change Metadata, Linked Issues, Validation Evidence, Security Impact, Compatibility, Human Verification, Side Effects, Rollback Plan, and Risks/Mitigations.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

packages/workflows/src/evidence-validator.ts (1)

304-307: ⚡ Quick win

Rename the new validation log events to the repo’s domain.action_state format.

evidence_validation_started|failed|completed does not match the required event naming convention, so these entries will be inconsistent with the rest of the workflow logs.

Proposed fix

-    'evidence_validation_started'
+    'workflow.evidence_validation_started'
@@
-    getLog().error({ issues }, 'evidence_validation_failed');
+    getLog().error({ issues }, 'workflow.evidence_validation_failed');
@@
-    getLog().error({ err, issues }, 'evidence_validation_failed');
+    getLog().error({ err, issues }, 'workflow.evidence_validation_failed');
@@
-  getLog().info({ kind: shapeResult.evidence.kind }, 'evidence_validation_completed');
+  getLog().info({ kind: shapeResult.evidence.kind }, 'workflow.evidence_validation_completed');

As per coding guidelines, "Use Pino logger factory with event naming format {domain}.{action}_{state} (e.g., workflow.step_started, isolation.create_failed); always pair _started with _completed or _failed; include context IDs and error details".

Also applies to: 319-330, 347-386

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.ts` around lines 304 - 307, The log
events emitted from getLog() in evidence-validator.ts (e.g., the call that
currently logs 'evidence_validation_started') must be renamed to follow the repo
convention {domain}.{action}_{state} and include relevant context IDs and error
details; update all occurrences around the getLog().info call and the related
blocks referenced (the started/failed/completed groups around lines ~304,
~319-330, ~347-386) so names become something like evidence.validation_started
-> evidence.validation_started? (use domain "evidence" and action "validation")
— actually rename to evidence.validation_started, evidence.validation_completed
and evidence.validation_failed (pairing each started with completed/failed), add
context fields (artifactsDir, policy.path, policy.verify, and a policy or
request id) to the log payload, and when logging failures include the caught
error object in the process (or getLog().error) payload so the error details are
recorded; adjust the corresponding getLog().info/getLog().error calls and ensure
consistent naming across all three blocks.

packages/workflows/src/dag-executor.ts (1)

2493-2497: ⚡ Quick win

Inline workflow shape duplicates WorkflowBase; prefer a named interface or a Pick<> from the schema type.

The parameter type is now { name: string; nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } & WorkflowLevelOptions. As more optional fields from workflowBaseSchema get plumbed down (this PR adds evidence_policy; the next field will follow), this anonymous shape will keep drifting from the actual schema-derived type. Defining an interface DagExecutorWorkflow (or using Pick<WorkflowBase, 'name' | 'nodes' | 'evidence_policy' | …> & WorkflowLevelOptions) would keep the contract in one place and align with the project's preference for interface over inline type.

As per coding guidelines: "In TypeScript, prefer interface for defining object shapes rather than type".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 2493 - 2497, The
anonymous workflow parameter type should be replaced with a named interface to
avoid drifting from the schema; define an interface (e.g. DagExecutorWorkflow)
that derives from the canonical schema type (either via "interface
DagExecutorWorkflow extends Pick<WorkflowBase, 'name' | 'nodes' |
'evidence_policy' | ...> & WorkflowLevelOptions" or by explicitly declaring the
interface and keeping it synced with workflowBaseSchema), then use that
interface in the function signature instead of the inline `{ name: string;
nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } &
WorkflowLevelOptions`; update references to the parameter type to use
DagExecutorWorkflow so future schema fields stay centralized.

packages/workflows/src/dag-executor.test.ts (2)

7032-7055: ⚡ Quick win

Use the real workflow run ID in the “valid evidence” fixture

Line 7037 hardcodes workflow_run_id: 'run-1' while the run under test is dag-evidence-run-2. Using the actual run ID makes this fixture semantically accurate and better aligned with future integrity checks.

🔧 Suggested fixture tweak

-  it('evidence_policy.required + valid evidence.json -> completeWorkflowRun', async () => {
+  it('evidence_policy.required + valid evidence.json -> completeWorkflowRun', async () => {
+    const workflowRun = makeWorkflowRun('dag-evidence-run-2');
     await writeFile(
       join(artifactsDir, 'evidence.json'),
       JSON.stringify({
         kind: 'execution',
-        workflow_run_id: 'run-1',
+        workflow_run_id: workflowRun.id,
         provider: 'claude',
         provider_run_ids: ['session-abc'],
         changed_files: ['src/foo.ts'],
         diff_command: 'git diff --stat origin/dev...HEAD',
         test_commands: ['bun test'],
         test_output_summary: '15 passed',
         commit_sha: 'a'.repeat(40),
         pushed_branch: 'feature/foo',
         pr_url: 'https://github.com/owner/repo/pull/42',
         pr_number: 42,
       })
     );
 
     const mockStore = createMockStore();
     const mockDeps = createMockDeps(mockStore);
     const platform = createMockPlatform();
-    const workflowRun = makeWorkflowRun('dag-evidence-run-2');

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.test.ts` around lines 7032 - 7055, The
evidence.json test fixture uses a hardcoded workflow_run_id ('run-1') which
doesn't match the run created for the test; update the fixture to use the actual
run ID from the test run (use the workflowRun created by
makeWorkflowRun('dag-evidence-run-2') — e.g., set workflow_run_id to
workflowRun.id or the literal 'dag-evidence-run-2') so the evidence.json aligns
with the run under test and any integrity checks that compare IDs will pass.

7016-7029: ⚡ Quick win

Assert evidence_validation.status to lock the metadata contract

These tests currently validate ok/issues but not the status field, so contract drift in metadata.evidence_validation.status can slip through undetected.

🧪 Suggested assertion additions

       expect(updates.metadata.evidence_validation.ok).toBe(false);
+      expect(updates.metadata.evidence_validation.status).toBe('failed');
       expect(Array.isArray(updates.metadata.evidence_validation.issues)).toBe(true);

       expect(updates.metadata.evidence_validation.ok).toBe(true);
+      expect(updates.metadata.evidence_validation.status).toBe('ok');

Also applies to: 7079-7091

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.test.ts` around lines 7016 - 7029, The
test currently checks evidence_validation.ok and issues but omits asserting
evidence_validation.status; update the assertions around the
mockStore.updateWorkflowRun inspection (the updateCalls/evidenceUpdate/updates
metadata block) to also assert updates.metadata.evidence_validation.status is
the expected status string (e.g., "failed") and that it's of type string, and
apply the same added assertion in the other analogous assertion block (the
similar check around the second occurrence referenced by lines 7079-7091) so the
metadata contract for evidence_validation.status is locked down.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3138-3208: The evidence validation block calls validateEvidence
and then performs terminal writes (deps.store.updateWorkflowRun,
deps.store.failWorkflowRun, etc.) without re-checking run status; call the
existing skipIfStatusChanged guard again immediately after validateEvidence
returns (both when evidenceResult.valid is false and true) to abort if the run
status changed during the long reality checks. Specifically, invoke
skipIfStatusChanged('dag.skip_complete_status_changed') (or the same helper used
earlier) right after the await validateEvidence(...) line and again right after
entering each branch before any call to updateWorkflowRun, failWorkflowRun,
completeWorkflowRun, logWorkflowError, emitter.emit/unregisterRun, or
safeSendMessage to ensure you don't flip a cancelled/deleted run to a terminal
state.
- Around line 3159-3171: The current updateWorkflowRun call uses the stale
in-memory workflowRun.metadata snapshot and may clobber concurrent metadata
writes; before persisting evidence_validation, either fetch the latest metadata
from deps.store (e.g., readWorkflowRun or readWorkflowRunMetadata) and merge
evidence_validation into that fresh object, or add/use a partial-merge API on
deps.store (e.g., setWorkflowRunMetadata(workflowRun.id, { evidence_validation:
... })) so you only write the single field; update both occurrences (the block
around updateWorkflowRun at lines referencing updateWorkflowRun and the similar
block later) to use one of these approaches to avoid overwriting concurrent
metadata changes.
- Around line 3150-3153: Rename the bare logger event names to include the
domain prefix and add matching success/completed partners: change occurrences of
'evidence_validation_failed' and 'evidence_validation_metadata_persist_failed'
(the getLog().error calls that include { workflowRunId: workflowRun.id, issues:
evidenceResult.issues } and the metadata persist error call) to
'dag.evidence_validation_failed' and
'dag.evidence_validation_metadata_persist_failed', and add corresponding
success/info logs 'dag.evidence_validation_succeeded' (in the success path near
the evidence validation completion) and
'dag.evidence_validation_metadata_persist_succeeded' in the metadata persist
success path; apply the same renaming for the other occurrences noted (the
blocks around the other getLog() calls), keeping getLog().error for failures and
getLog().info for succeeded/completed events.

In `@packages/workflows/src/evidence-validator.ts`:
- Around line 149-151: The validator currently only checks that
evidence.commit_sha exists locally; update the validation in the evidence
verification flow (around the execFileAsync git checks) to bind commit_sha to
the remote branch/PR head for pushed_branch: fetch origin refs for the repo (use
execFileAsync('git', ['fetch', 'origin', pushed_branch]) or ensure
origin/pushed_branch is available), resolve the remote head ref for the claimed
pushed_branch/PR (e.g. origin/{pushed_branch} or the PR head ref), then verify
that evidence.commit_sha is either exactly that remote head or an ancestor of it
(use git rev-parse to get the remote head and git merge-base --is-ancestor or
git cat-file -e with remote_head^{commit} to test ancestry); update the checks
that currently call execFileAsync('git', ['cat-file', ...], { cwd }) (lines
around evidence.commit_sha and the block covering 166-218) to perform this
remote-bound validation and fail validation if the commit is not the remote head
or ancestor.

In `@packages/workflows/src/schemas/evidence.ts`:
- Around line 90-111: The evidencePolicySchema's path currently allows
empty/whitespace or directory-like values; update the path validator on
evidencePolicySchema to reject empty or all-whitespace strings and to reject '.'
, '..', any segment equal to '..', and absolute paths (leading '/' or Windows
drive/UNC forms) by adding a z.string().refine(...) with a clear error message;
reference the existing evidencePolicySchema and its path property and implement
the predicate to trim() and check length>0, disallow '.' or '..', disallow path
separators that would indicate a directory only or upward traversal, and
disallow absolute paths so invalid workflow definitions fail at parse time.

---

Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 7032-7055: The evidence.json test fixture uses a hardcoded
workflow_run_id ('run-1') which doesn't match the run created for the test;
update the fixture to use the actual run ID from the test run (use the
workflowRun created by makeWorkflowRun('dag-evidence-run-2') — e.g., set
workflow_run_id to workflowRun.id or the literal 'dag-evidence-run-2') so the
evidence.json aligns with the run under test and any integrity checks that
compare IDs will pass.
- Around line 7016-7029: The test currently checks evidence_validation.ok and
issues but omits asserting evidence_validation.status; update the assertions
around the mockStore.updateWorkflowRun inspection (the
updateCalls/evidenceUpdate/updates metadata block) to also assert
updates.metadata.evidence_validation.status is the expected status string (e.g.,
"failed") and that it's of type string, and apply the same added assertion in
the other analogous assertion block (the similar check around the second
occurrence referenced by lines 7079-7091) so the metadata contract for
evidence_validation.status is locked down.

In `@packages/workflows/src/dag-executor.ts`:
- Around line 2493-2497: The anonymous workflow parameter type should be
replaced with a named interface to avoid drifting from the schema; define an
interface (e.g. DagExecutorWorkflow) that derives from the canonical schema type
(either via "interface DagExecutorWorkflow extends Pick<WorkflowBase, 'name' |
'nodes' | 'evidence_policy' | ...> & WorkflowLevelOptions" or by explicitly
declaring the interface and keeping it synced with workflowBaseSchema), then use
that interface in the function signature instead of the inline `{ name: string;
nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } &
WorkflowLevelOptions`; update references to the parameter type to use
DagExecutorWorkflow so future schema fields stay centralized.

In `@packages/workflows/src/evidence-validator.ts`:
- Around line 304-307: The log events emitted from getLog() in
evidence-validator.ts (e.g., the call that currently logs
'evidence_validation_started') must be renamed to follow the repo convention
{domain}.{action}_{state} and include relevant context IDs and error details;
update all occurrences around the getLog().info call and the related blocks
referenced (the started/failed/completed groups around lines ~304, ~319-330,
~347-386) so names become something like evidence.validation_started ->
evidence.validation_started? (use domain "evidence" and action "validation") —
actually rename to evidence.validation_started, evidence.validation_completed
and evidence.validation_failed (pairing each started with completed/failed), add
context fields (artifactsDir, policy.path, policy.verify, and a policy or
request id) to the log payload, and when logging failures include the caught
error object in the process (or getLog().error) payload so the error details are
recorded; adjust the corresponding getLog().info/getLog().error calls and ensure
consistent naming across all three blocks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e22a2521-b78d-4816-b1ae-337feb88168c

📥 Commits

Reviewing files that changed from the base of the PR and between f4f2725 and 55e3905.

📒 Files selected for processing (10)

ROADMAP.md
packages/workflows/package.json
packages/workflows/src/dag-executor.test.ts
packages/workflows/src/dag-executor.ts
packages/workflows/src/evidence-validator.test.ts
packages/workflows/src/evidence-validator.ts
packages/workflows/src/schemas.test.ts
packages/workflows/src/schemas/evidence.ts
packages/workflows/src/schemas/index.ts
packages/workflows/src/schemas/workflow.ts

coderabbitai · 2026-05-06T20:43:32Z

+  // Real-execution proof gate (ROADMAP P3-A). When the workflow declares
+  // `evidence_policy.required: true`, refuse terminal `completed` unless
+  // $ARTIFACTS_DIR/<path> parses as ExecutionEvidence and (when verify ===
+  // 'reality') the claimed SHA/branch/PR are reachable. On failure, downgrade
+  // to `failed` with structured issues persisted at metadata.evidence_validation.
+  if (workflow.evidence_policy?.required === true) {
+    const evidenceResult = await validateEvidence({
+      artifactsDir,
+      cwd,
+      policy: workflow.evidence_policy,
+    });
+    if (!evidenceResult.valid) {
+      getLog().error(
+        { workflowRunId: workflowRun.id, issues: evidenceResult.issues },
+        'evidence_validation_failed'
+      );
+      const issueLines = evidenceResult.issues
+        .map(i => `  - ${i.field} — ${i.message}${i.hint ? ` (hint: ${i.hint})` : ''}`)
+        .join('\n');
+      const failMsg = `Real-execution evidence validation failed.\n${issueLines}`;
+      // Persist the structured issues at metadata.evidence_validation, then mark failed.
+      await deps.store
+        .updateWorkflowRun(workflowRun.id, {
+          metadata: {
+            ...workflowRun.metadata,
+            evidence_validation: { ok: false, issues: evidenceResult.issues },
+          },
+        })
+        .catch((dbErr: Error) => {
+          getLog().error(
+            { err: dbErr, workflowRunId: workflowRun.id },
+            'evidence_validation_metadata_persist_failed'
+          );
+        });
+      await deps.store.failWorkflowRun(workflowRun.id, failMsg).catch((dbErr: Error) => {
+        getLog().error({ err: dbErr, workflowRunId: workflowRun.id }, 'dag_db_fail_failed');
+      });
+      await logWorkflowError(logDir, workflowRun.id, failMsg).catch((logErr: Error) => {
+        getLog().error(
+          { err: logErr, workflowRunId: workflowRun.id },
+          'dag.workflow_error_log_write_failed'
+        );
+      });
+      const emitterForFail = getWorkflowEventEmitter();
+      emitterForFail.emit({
+        type: 'workflow_failed',
+        runId: workflowRun.id,
+        workflowName: workflow.name,
+        error: failMsg,
+      });
+      emitterForFail.unregisterRun(workflowRun.id);
+      await safeSendMessage(platform, conversationId, `❌ ${failMsg}`, {
+        workflowId: workflowRun.id,
+      });
+      return;
+    }
+    // Evidence valid — persist the success marker alongside completion.
+    await deps.store
+      .updateWorkflowRun(workflowRun.id, {
+        metadata: {
+          ...workflowRun.metadata,
+          evidence_validation: { ok: true },
+        },
+      })
+      .catch((dbErr: Error) => {
+        getLog().error(
+          { err: dbErr, workflowRunId: workflowRun.id },
+          'evidence_validation_metadata_persist_failed'
+        );
+      });
+  }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Re-check run status after validateEvidence before terminal writes.

validateEvidence can issue git/gh subprocesses (when verify === 'reality') and may take seconds. The status gate at line 3136 (skipIfStatusChanged('dag.skip_complete_status_changed')) runs before this work, but neither the failure branch (3149-3193) nor the success branch re-checks status before calling failWorkflowRun / updateWorkflowRun / completeWorkflowRun. If the run is cancelled or deleted while reality checks are in flight, we will still flip the row to failed or completed, which contradicts the existing pattern of guarding terminal writes with skipIfStatusChanged.

Consider re-running skipIfStatusChanged immediately after validateEvidence returns, on both branches.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.ts` around lines 3138 - 3208, The evidence validation block calls validateEvidence and then performs terminal writes (deps.store.updateWorkflowRun, deps.store.failWorkflowRun, etc.) without re-checking run status; call the existing skipIfStatusChanged guard again immediately after validateEvidence returns (both when evidenceResult.valid is false and true) to abort if the run status changed during the long reality checks. Specifically, invoke skipIfStatusChanged('dag.skip_complete_status_changed') (or the same helper used earlier) right after the await validateEvidence(...) line and again right after entering each branch before any call to updateWorkflowRun, failWorkflowRun, completeWorkflowRun, logWorkflowError, emitter.emit/unregisterRun, or safeSendMessage to ensure you don't flip a cancelled/deleted run to a terminal state.

coderabbitai · 2026-05-06T20:43:32Z

+      getLog().error(
+        { workflowRunId: workflowRun.id, issues: evidenceResult.issues },
+        'evidence_validation_failed'
+      );


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Logger event names should follow the {domain}.{action}_{state} convention.

evidence_validation_failed and evidence_validation_metadata_persist_failed are missing a domain prefix and are inconsistent with sibling events in this file like dag.stop_detected_during_streaming, dag.node_budget_cap_exceeded, and workflow.event_persist_failed. Suggest renaming to e.g. dag.evidence_validation_failed and dag.evidence_validation_metadata_persist_failed (and pairing the success path with a matching dag.evidence_validation_succeeded info log on line ~3194 so _failed has its _completed/_succeeded partner).

As per coding guidelines: "Use Pino logger factory with event naming format {domain}.{action}_{state} (e.g., workflow.step_started, isolation.create_failed); always pair _started with _completed or _failed".

Also applies to: 3166-3170, 3202-3206

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.ts` around lines 3150 - 3153, Rename the bare logger event names to include the domain prefix and add matching success/completed partners: change occurrences of 'evidence_validation_failed' and 'evidence_validation_metadata_persist_failed' (the getLog().error calls that include { workflowRunId: workflowRun.id, issues: evidenceResult.issues } and the metadata persist error call) to 'dag.evidence_validation_failed' and 'dag.evidence_validation_metadata_persist_failed', and add corresponding success/info logs 'dag.evidence_validation_succeeded' (in the success path near the evidence validation completion) and 'dag.evidence_validation_metadata_persist_succeeded' in the metadata persist success path; apply the same renaming for the other occurrences noted (the blocks around the other getLog() calls), keeping getLog().error for failures and getLog().info for succeeded/completed events.

coderabbitai · 2026-05-06T20:43:32Z

+      await deps.store
+        .updateWorkflowRun(workflowRun.id, {
+          metadata: {
+            ...workflowRun.metadata,
+            evidence_validation: { ok: false, issues: evidenceResult.issues },
+          },
+        })
+        .catch((dbErr: Error) => {
+          getLog().error(
+            { err: dbErr, workflowRunId: workflowRun.id },
+            'evidence_validation_metadata_persist_failed'
+          );
+        });


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale workflowRun.metadata snapshot can clobber concurrent metadata writes.

Both updateWorkflowRun calls do metadata: { ...workflowRun.metadata, evidence_validation: ... }, but workflowRun.metadata is the in-memory snapshot captured at executor start. Anything that wrote to metadata during the run (heartbeats, approval/loop gate state cleanup, future writers) is silently overwritten when this spread is persisted. Today the gate only runs on the all-success / not-paused path so this is mostly latent, but it is a fragile contract — adding any future per-node metadata writer will start losing fields here.

Safer options:

Read the freshest WorkflowRun (or just its metadata) from deps.store immediately before this update, or

Have deps.store expose a partial-merge update (e.g., setWorkflowRunMetadata(id, { evidence_validation: ... })) so callers do not have to round-trip the whole object.

Also applies to: 3194-3207

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.ts` around lines 3159 - 3171, The current updateWorkflowRun call uses the stale in-memory workflowRun.metadata snapshot and may clobber concurrent metadata writes; before persisting evidence_validation, either fetch the latest metadata from deps.store (e.g., readWorkflowRun or readWorkflowRunMetadata) and merge evidence_validation into that fresh object, or add/use a partial-merge API on deps.store (e.g., setWorkflowRunMetadata(workflowRun.id, { evidence_validation: ... })) so you only write the single field; update both occurrences (the block around updateWorkflowRun at lines referencing updateWorkflowRun and the similar block later) to use one of these approaches to avoid overwriting concurrent metadata changes.

shhaider · 2026-05-06T20:52:26Z

🔍 Comprehensive PR Review

PR: #1601 — feat(workflows): add execution-evidence contract with terminal-success gate
Reviewed by: 5 specialized agents (code-review, error-handling, test-coverage, comment-quality, docs-impact)
Date: 2026-05-06

Summary

Three-layer evidence validator and terminal-success gate are well-designed and well-tested (49 new tests; "never throws" contract honored across all entry points; mock isolation correctly registered). Five reviewers converged on 0 CRITICAL issues and 3 HIGH items that warrant attention before this PR merges or — at the latest — before any follow-up PR flips evidence_policy.required: true on a bundled workflow.

Verdict: REQUEST_CHANGES

Severity	Count
🔴 CRITICAL	0
🟠 HIGH	3
🟡 MEDIUM	6
🟢 LOW	12

🟠 High Issues

HIGH-1 — Reality layer doesn't bind `commit_sha` to PR head; fabricated evidence can pass

📍 packages/workflows/src/evidence-validator.ts:131-241

The reality layer validates SHA reachability, branch existence, and PR metadata independently. A fabricated evidence.json can pair any locally-reachable SHA (git rev-parse HEAD) with any real branch and any real PR URL and pass verify: 'reality'. (Also flagged by CodeRabbit.)

View fix

// Bind SHA to branch tip — ls-remote already returns the SHA, just use it:
const remoteSha = trimmed.split(/\s+/)[0];
if (remoteSha !== evidence.commit_sha) {
  issues.push({
    level: 'error',
    field: 'commit_sha',
    message: `commit_sha=${evidence.commit_sha} does not match origin/${evidence.pushed_branch} tip=${remoteSha}`,
    hint: 'evidence.json claims a commit that is not the tip of pushed_branch on origin',
  });
}

// Add headRefOid to gh pr view --json query and assert equality:
const { stdout } = await execFileAsync(
  'gh', ['pr', 'view', evidence.pr_url, '--json', 'headRefName,number,headRefOid'], { cwd },
);
if (parsedFull.headRefOid !== evidence.commit_sha) {
  issues.push({
    level: 'error',
    field: 'commit_sha',
    message: `gh pr view headRefOid='${String(parsedFull.headRefOid)}' does not match commit_sha='${evidence.commit_sha}'`,
    hint: 'pr_url points to a PR whose HEAD is a different commit',
  });
}

Also add reality-layer test cases for commit_sha != branch tip and commit_sha != PR headRefOid.

HIGH-2 — CLAUDE.md "Workflows" bullet list omits `evidence_policy`

📍 CLAUDE.md:709-721

CLAUDE.md is the in-repo authoritative summary of engine surface. Add one bullet to the Workflows section near the existing interactive: true bullet describing the new field.

HIGH-3 — `authoring-workflows.md` missing `evidence_policy` reference + `evidence.json` contract

📍 packages/docs-web/src/content/docs/guides/authoring-workflows.md

The user-facing guide lists every other workflow-level field with examples. The 11-required-field evidence.json shape, both verify modes, and the gh PATH/auth requirement are undocumented for users. Recommend adding a ## Evidence-Gated Workflows section with schema table + worked example.

🟡 Medium Issues (Auto-fixing in next step)

MED-1 — Log-event names violate `{domain}.{action}_{state}` convention

📍 dag-executor.ts:3152, 3169, 3205 + evidence-validator.ts:306, 319, 330, 347, 359, 366, 373, 381, 386

Events use evidence_validation_failed (no domain) and evidence_validation_metadata_persist_failed (3 segments — unparseable). Fix: prefix with workflow. and split the 3-segment ones to workflow.evidence_metadata_persist_failed. Pure mechanical rename. Tagged [must-fix] by code-review. Also flagged by CodeRabbit.

MED-2 — Stale `workflowRun.metadata` snapshot will clobber future concurrent writers

📍 dag-executor.ts:3159-3165, 3195-3201

Both terminal updateWorkflowRun calls do metadata: { ...workflowRun.metadata, ... } from an in-memory snapshot that's never re-fetched. No present-day bug, but a foot-gun the moment a heartbeat or hook handler is added. Recommend re-fetch via getWorkflowRun(id) before the spread.

MED-3 — `gh pr view` non-JSON output drops parser detail

📍 evidence-validator.ts:196-205

When gh succeeds (zero exit) but emits non-JSON (auth banner, paginator dump), the catch swallows parseErr.message and the actual stdout. Operators must reproduce externally to debug. Existing precedent at validator.ts:387 does include err.message.

MED-4 — Success-path metadata-persist failure is silently swallowed

📍 dag-executor.ts:3194-3207

The success-path evidence_validation: { ok: true } write is wrapped in .catch(log) and falls through to completeWorkflowRun. A transient DB error yields a completed run without the proof-of-validation marker — the platform's P3-A guarantee then depends on a log file. Recommend dropping the .catch on the success path; let DB errors propagate.

MED-5 — `loadEvidenceFromArtifacts` non-ENOENT path is untested

📍 evidence-validator.ts:272 (re-throw) + :336-349 (orchestrator catch)

EACCES, EISDIR, ENAMETOOLONG all hit an unexercised branch. One real-fs test (mkdir at evidence.json path → EISDIR) closes the gap with no new mocks.

MED-6 — `git ls-remote` rejection branch is untested

📍 evidence-validator.ts:179-186

Most likely real-world failure mode in CI/sandboxed environments (private repo without GH_TOKEN, network unreachable, deleted origin). Mirror the existing cat-file rejection test exactly.

🟢 Low Issues

View 12 low-priority suggestions

#	Issue	Location	Recommendation
L1	`evidence_policy.path` accepts `''`, `'.'`, dir-like strings	`schemas/evidence.ts:111`	`.trim().min(1).refine(...)` — auto-fixable
L2	`level: 'warning'` enum has no producer (YAGNI)	`schemas/evidence.ts:121`	Drop until producer exists
L3	`EvidenceValidationIssue` overlaps `ValidationIssue` (Rule of Three not yet hit)	`schemas/evidence.ts:120`	Leave as-is
L4	Race window after slow `verify: 'reality'` I/O — no `skipIfStatusChanged` re-check	`dag-executor.ts:3135-3208`	Required before bundled `verify: reality`
L5	`e as Error & { stderr?: string }` cast fragile to non-Error rejections	`evidence-validator.ts:152, 179, 221`	`instanceof Error` narrowing
L6	`gh` non-ENOENT (auth-failure) branch untested	`evidence-validator.ts:230-237`	One test
L7	Empty `test_commands: []` doc-only-PR contract not asserted	`schemas/evidence.ts:57`	One positive shape test
L8	DAG-executor `.catch` swallow paths untested	`dag-executor.ts:3158, 3168, 3171, 3197`	One test
L9	`verify: 'reality'` integration not e2e-tested through dag-executor	`dag-executor.test.ts`	Inferred only
L10	`evidence_policy.required: false` no-op path untested	`dag-executor.ts:3140`	One test
L11	Type-narrowing comment after early return is redundant	`evidence-validator.ts:91`	Remove
L12	`required` JSDoc says "Defaults to `false`" but schema is `optional()`, no `.default()`	`schemas/evidence.ts:91-95`	Tighten wording

✅ What's Good

Three-layer separation (shape → integrity → reality) is genuinely good design
"Never throws" contract uniformly enforced across all 5 entry points
Discriminated unions used correctly (kind, found, valid)
EXECUTION_EVIDENCE_REQUIRED_FIELDS derived from schema shape — cannot drift
Path-traversal defense covers absolute and .. vectors before any I/O
Mock isolation correctly registered as own bun test invocation
Regression guard test for the no-policy path
Plan deviations pre-flagged in implementation artifacts — no surprises
gh ENOENT vs auth-failure distinguished with operator-actionable hints
Comments are sparse and earn their place (almost all explain WHY)
Layered test architecture mirrors layered code (Group A/B/C/D/E ↔ entry points)
Schema conventions followed (camelCase, @hono/zod-openapi, z.infer, schemas/)
ROADMAP.md P3 entry well-scoped with explicit acceptance criteria

📋 Suggested Follow-up Issues

Issue Title	Priority
Bind `commit_sha` to PR head and branch tip in `verify: reality`	P1 — required before bundled flag flip
Document `evidence_policy` in user-facing authoring guide	P1
Add `mergeWorkflowRunMetadata` to `IWorkflowStore` for transactional merges	P2
Tighten log-event naming convention across evidence validator + gate	P2
Test coverage gaps (ls-remote, EISDIR, gh auth, DB swallow, required:false)	P3
Operator troubleshooting note for `metadata.evidence_validation`	P3 (defer to bundled-flag PR)
End-to-end reality-mode integration test through dag-executor	P3

Next Steps

⚡ Auto-fix step will address 5 mechanical issues (MED-1, MED-3, MED-4, LOW-1, LOW-2)
📝 Review HIGH items above — HIGH-1 needs new test cases; HIGH-2/3 are scope decisions (this PR vs follow-up)
🚧 Before any bundled workflow flips evidence_policy.required: true, HIGH-1 and LOW-4 (race-window re-check) MUST land

Reviewed by Archon comprehensive-pr-review workflow
Artifacts: ~/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/review/

Fixes applied: - HIGH-1: Bind commit_sha to PR head and branch tip in verify: 'reality' - ls-remote stdout SHA must equal evidence.commit_sha (was discarded) - gh pr view now queries headRefOid; assert it equals evidence.commit_sha - Closes the gap where any locally-reachable SHA could pair with any real PR - HIGH-2: Add evidence_policy bullet to CLAUDE.md "Workflows" list - HIGH-3: Add evidence_policy: to authoring-workflows.md YAML example and a new "## Evidence-Gated Workflows" section with schema, verify modes, worked example, and failure shape - MED-1: Rename log events to {domain}.{action}_{state} convention (workflow.evidence_validation_started/_failed/_completed, workflow.evidence_metadata_persist_failed) - MED-3: gh non-JSON output now includes parser err.message and a 200-char stdout preview hint (matches validator.ts:387 precedent) - MED-4: Drop .catch on success-path metadata write — DB errors now propagate to the outer executor catch instead of silently completing a run without provable validation - LOW-1: Tighten evidence_policy.path with .trim().min(1).refine(...) to reject '', '.', and trailing-slash dir-like values at parse time - LOW-2: Drop 'warning' from level enum until a producer exists (YAGNI) Tests added (evidence-validator.test.ts Group C): - ls-remote tip SHA mismatches commit_sha → commit_sha error - gh headRefOid mismatches commit_sha → commit_sha error Existing reality tests updated to include headRefOid in gh fixtures. Validation: type-check, lint, format, tests all pass (bun run validate). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

shhaider · 2026-05-06T21:06:09Z

⚡ Auto-Fix Report

Status: COMPLETE
Pushed: ✅ Changes pushed to PR (commit 83dcb4b6)

Fixes Applied

Severity	Fixed	Skipped
🔴 CRITICAL	0	0 (none in review)
🟠 HIGH	3	0
🟡 MEDIUM (auto-fix scope)	3	1
⚪ LOW (auto-fix scope)	2	—

What Was Fixed

✅ HIGH-1: Reality layer doesn't bind commit_sha to PR head (evidence-validator.ts:131-241) — ls-remote stdout SHA now compared to evidence.commit_sha; gh pr view query extended with headRefOid and asserted equal to commit_sha. Closes the fabricated-evidence gap.
✅ HIGH-2: CLAUDE.md missing evidence_policy field (CLAUDE.md:715-716) — added engine-surface bullet.
✅ HIGH-3: authoring-workflows.md missing evidence_policy section (packages/docs-web/.../authoring-workflows.md) — added evidence_policy: to YAML example AND a new ## Evidence-Gated Workflows section with schema table, verify modes, worked example, and failure shape.
✅ MED-1: Log events violate {domain}.{action}_{state} — all 12 sites prefixed with workflow.; the 3-segment one split to workflow.evidence_metadata_persist_failed.
✅ MED-3: gh pr view non-JSON detail dropped — catch now includes parseErr.message and 200-char stdout preview hint.
✅ MED-4: Success-path metadata persist swallowed — dropped .catch on success branch; DB errors propagate to outer executor catch instead of completing without provable validation.
✅ LOW-1: path schema accepts empty/dir-like values — tightened with .trim().min(1).refine(...).
✅ LOW-2: level: 'warning' enum has no producer (YAGNI) — dropped to z.enum(['error']).

Tests Added

packages/workflows/src/evidence-validator.test.ts (Group C):
- negative: ls-remote tip SHA mismatches commit_sha → commit_sha error
- negative: gh headRefOid mismatches commit_sha → commit_sha error
6 existing reality-layer tests updated to include headRefOid in gh JSON fixtures so they exercise the new SHA binding.

Test count for evidence-validator.test.ts: 43 → 45 (all passing).

❌ Not Fixed (Deferred per Review Recommendation)

MED-2 stale workflowRun.metadata snapshot (dag-executor.ts:3159-3201) — reviewer notes the pattern is wrong but no current sibling writer exists, so this is future-proofing. Consolidated review explicitly recommends "track Option B as a follow-up roadmap entry." Suggested follow-up issue captures the long-term answer (mergeWorkflowRunMetadata on IWorkflowStore).
MED-5, MED-6, LOW-3 through LOW-13 — test-coverage gaps and verify: 'reality' race-window items deferred to a focused test-debt issue and the bundled-flag-flip follow-up PR (per review note: "needed before any bundled workflow flips verify: 'reality'").

📋 Suggested Follow-up Issues

"Add mergeWorkflowRunMetadata to IWorkflowStore for transactional metadata merges" (P2) — MED-2
"Test coverage gaps in evidence-validator (ls-remote rejection, EISDIR/EACCES load, gh auth, DB swallow paths, required: false no-op, reality-mode integration via dag-executor)" (P3) — MED-5, MED-6, LOW-6/8/10
"Reality-mode race-window re-check + operator troubleshooting note" (P1, required before bundled-flag flip) — LOW-4, LOW-13
"Bind commit_sha to PR head and branch tip in verify: reality" — ✅ DONE in this commit

Validation

bun run validate exits 0 (all 6 gates green).

Auto-fixed by Archon comprehensive-pr-review workflow
Fixes pushed to branch archon/task-archon-p3-real-execution-proof

shhaider · 2026-05-06T21:08:33Z

🎯 Workflow Summary

Plan: ROADMAP P3-A — Real-Execution Proof Validation for PR-Producing Workflows
Workflow ID: 4c517fe4db197edb948e3f3d4adf9927
Status: ✅ Implementation complete, review fixes applied, PR ready for human review

Implementation vs Plan

Metric	Planned	Actual
Files created	3	3
Files updated	6	7 (incl. `ROADMAP.md` whitespace)
Tests added	≥30 negative cases	52 (43 validator + 6 schema + 3 executor) → +2 from review = 54
Deviations	—	2 (both pre-flagged, intent preserved)

📋 Deviations from Plan (2)

Deviation 1 — completeWorkflowRun, not updateWorkflowRun
Plan referenced the terminal-success transition by an approximate name. Actual store method is completeWorkflowRun at dag-executor.ts:3134; failure path is failWorkflowRun. Gate inserted at the single correct boundary; structured issues persisted at metadata.evidence_validation via a separate updateWorkflowRun call (since failWorkflowRun(id, error) only takes a string). Pre-flagged in plan-confirmation.md.

Deviation 2 — ROADMAP.md prettier reformat
File added by prior commit 661e72c9 failed prettier --check; that blocked bun run validate repo-wide. Ran prettier --write ROADMAP.md (whitespace-only — no content change). Pre-existing tech debt; not new content.

Review Summary

5 reviewer agents ran (code-review, error-handling, test-coverage, comment-quality, docs-impact). 24 raw findings → 21 distinct after de-dup.

Severity	Found	Fixed	Remaining
CRITICAL	0	0	0
HIGH	3	3	0
MEDIUM	6	3 (auto-fix)	3 (deferred)
LOW	13	2 (auto-fix)	11 (deferred)

HIGH fixes landed in commit 83dcb4b6:

HIGH-1 (security): Reality layer now binds commit_sha to PR head (headRefOid) and ls-remote tip — fabricated evidence can no longer pass verify: 'reality'.
HIGH-2: CLAUDE.md workflow-engine bullet list now documents evidence_policy.
HIGH-3: authoring-workflows.md adds ## Evidence-Gated Workflows section with schema, modes, worked example, failure shape.

Auto-fixed MEDIUM/LOW: log-event naming convention (workflow.* prefix, ~12 sites), gh non-JSON detail enrichment, success-path metadata-persist no longer silently swallowed, evidence_policy.path schema tightened (.trim().min(1).refine(...)), unused 'warning' enum dropped.

📋 Suggested Follow-Up Issues

#	Title	Priority	Source
1	Reality-mode race-window re-check + operator troubleshooting note for `metadata.evidence_validation`	P1	LOW-4, LOW-13 — required before any bundled workflow flips `verify: 'reality'`
2	Flip `evidence_policy.required: true` on bundled workflows (`archon-fix-github-issue`, `archon-idea-to-pr`, `archon-plan-to-pr`)	P1	NOT Building — explicitly deferred until engine surface is stable
3	Add `mergeWorkflowRunMetadata` to `IWorkflowStore` for transactional metadata merges	P2	MED-2 — latent foot-gun once a sibling writer touches `metadata`
4	Test coverage: `ls-remote` rejection, EISDIR/EACCES load, `gh` auth-failure non-ENOENT, DB swallow paths, `required: false` no-op	P3	MED-5, MED-6, LOW-6, LOW-8, LOW-10
5	Reality-mode integration test through `dag-executor`	P3	LOW-9

Reply with: @archon create follow-up issues to file these (or pick a subset).

📝 Documentation Updates

Already shipped in 83dcb4b6 (HIGH-2, HIGH-3). No additional docs gaps blocking merge.

Operator-facing troubleshooting note for metadata.evidence_validation is intentionally deferred to the bundled-flag-flip PR — see follow-up issue #1 above.

ℹ️ Deferred Items (NOT Building — intentionally excluded)

These were excluded from this PR's scope by design. Each is its own follow-up surface; do not flag as bugs:

No new node type — validator runs once at terminal-success boundary; not encoded as evidence: node.
No bundled YAML changes — archon-fix-github-issue.yaml, archon-idea-to-pr.yaml, archon-plan-to-pr.yaml not modified. Engine surface ships first; flag flip lands separately (follow-up updated code to use locally hosted llama LLM, nomic-embed-text model. #2).
No CLI surface changes — no archon workflow validate-evidence subcommand; no --json output changes.
No DB schema migration — evidence persists via existing WorkflowRun.metadata (z.record(z.unknown())).
No web UI changes — failed runs with metadata.evidence_validation render like any other failure.
No provider/adapter changes — evidence is workflow-author-supplied; SDK harness does not auto-collect commit SHAs in this PR.
No AI-driven evidence "judging" — deterministic validation only (Zod + git/gh I/O).
No retroactive validation — existing completed runs are not re-checked.
No bundled-defaults.generated.ts regeneration — no bundled YAML changes.
Reality I/O is opt-in — default is verify: 'shape' (hermetic). Operators flip per workflow.
test_commands === [] warning-level flag — out of scope.

Validation Evidence

bun run validate exits 0 on 83dcb4b6:

Check	Status
`check:bundled`	✅ up to date (36 commands, 20 workflows)
`check:bundled-skill`	✅ up to date (21 files)
`type-check`	✅ all 10 packages
`lint --max-warnings 0`	✅ 0 errors / 0 warnings
`format:check`	✅ all matched files
`test`	✅ 522 tests passing across `@archon/workflows` (11 isolated batches), all packages green

Artifacts: ~/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/
Final commit: 83dcb4b6 fix(workflows): address review findings on execution-evidence gate

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

packages/workflows/src/dag-executor.ts (2)
3159-3165: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid merging evidence metadata into the stale workflowRun.metadata snapshot.

Both writes spread the executor-start copy of workflowRun.metadata, so any metadata written during the run can be lost when evidence_validation is persisted. Re-read the latest metadata or use a partial/transactional merge API here.

Also applies to: 3199-3203
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3159 - 3165, The update
is merging into the stale executor-start snapshot by spreading
workflowRun.metadata when calling deps.store.updateWorkflowRun to set
metadata.evidence_validation; instead re-fetch the latest metadata (or use your
store's partial/transactional merge API) and merge evidence_validation into that
latest object before calling deps.store.updateWorkflowRun so concurrent metadata
writes aren't lost; apply the same change for the other occurrence that updates
metadata at the block around lines referencing deps.store.updateWorkflowRun and
evidence_validation (where evidenceResult.issues is used).
3144-3205: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Re-check run status after validateEvidence() returns.

verify: 'reality' can spend seconds in git/gh. If the run is cancelled or deleted during that window, this block can still flip it to failed or completed because the last skipIfStatusChanged() ran before validation started. Please guard again before either terminal write.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3144 - 3205, After
validateEvidence() returns, re-check the run status before doing any terminal
writes: call the existing skipIfStatusChanged(workflowRun.id) (or if
unavailable, re-fetch the run via deps.store.getWorkflowRun(workflowRun.id) and
ensure it is still running) and abort/return if the run was
cancelled/deleted/changed; only then proceed to call
deps.store.updateWorkflowRun(... evidence_validation ...) or
deps.store.failWorkflowRun(...) and to emit
events/safeSendMessage/logWorkflowError. Ensure this check is applied both on
the failure path (before persisting failure and sending notifications) and on
the success path (before persisting the success marker).

🧹 Nitpick comments (1)

packages/workflows/src/evidence-validator.test.ts (1)

567-632: ⚡ Quick win

Please cover the non-ENOENT loader branches here.

validateEvidence() has dedicated handling for read failures like EISDIR/EACCES, but this suite only exercises missing files and malformed JSON. A couple of focused regression tests for those branches would protect the evidence_policy.path diagnostics on a changed critical path.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.test.ts` around lines 567 - 632,
Add two tests to evidence-validator.test.ts that exercise the non-ENOENT
read-failure branches in validateEvidence: one that creates a directory at the
policy path (e.g., mkdir join(testDir, 'evidence.json')) and calls
validateEvidence(...) with policy.path set to that path to trigger an EISDIR
read error, and another that simulates an EACCES read error (either by creating
a file and chmod 0 on it or by stubbing fs.readFile to throw an { code: 'EACCES'
} error) and calling validateEvidence(...) with policy.path pointing to that
file; assert result.valid is false and that result.issues includes an issue with
field 'evidence_policy.path' for each case. Ensure you use the existing
helpers/testDir setup and reference validateEvidence and
loadEvidenceFromArtifacts in the new tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/docs-web/src/content/docs/guides/authoring-workflows.md`:
- Around line 796-815: The example under id "write-evidence" uses hardcoded
origin/dev and undocumented vars (${SESSION_ID}, ${PR_URL}, ${PR_NUMBER});
change it to use the documented/base branch variable and the outputs from the
open-pr step/job: replace origin/dev with $BASE_BRANCH (or
${BASE_BRANCH:-origin/main}) and replace ${SESSION_ID}, ${PR_URL}, ${PR_NUMBER}
with the explicit open-pr outputs you document and reference (e.g. the job/step
outputs such as $OPEN_PR_SESSION_ID, $OPEN_PR_PR_URL, $OPEN_PR_PR_NUMBER or the
appropriate GitHub Actions expression form like ${{
jobs.open-pr.outputs.session_id }}), and update the surrounding text to document
those variables so the snippet is copy-pastable and portable.

In `@packages/workflows/src/evidence-validator.ts`:
- Around line 315-322: The schema requires workflow_run_id to match the active
run but the ValidateEvidenceArgs and validateEvidence() never receive or check
it; update the ValidateEvidenceArgs interface to include a workflowRunId (or
workflow_run_id) field and then, at the start of validateEvidence(), compare the
incoming evidence.workflow_run_id to the provided
ValidateEvidenceArgs.workflowRunId (or workflow_run_id) and throw/reject on
mismatch before proceeding (use the existing error handling pattern); reference
ValidateEvidenceArgs and validateEvidence() when making these changes and ensure
the error message clearly states the workflow run id mismatch.

---

Duplicate comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3159-3165: The update is merging into the stale executor-start
snapshot by spreading workflowRun.metadata when calling
deps.store.updateWorkflowRun to set metadata.evidence_validation; instead
re-fetch the latest metadata (or use your store's partial/transactional merge
API) and merge evidence_validation into that latest object before calling
deps.store.updateWorkflowRun so concurrent metadata writes aren't lost; apply
the same change for the other occurrence that updates metadata at the block
around lines referencing deps.store.updateWorkflowRun and evidence_validation
(where evidenceResult.issues is used).
- Around line 3144-3205: After validateEvidence() returns, re-check the run
status before doing any terminal writes: call the existing
skipIfStatusChanged(workflowRun.id) (or if unavailable, re-fetch the run via
deps.store.getWorkflowRun(workflowRun.id) and ensure it is still running) and
abort/return if the run was cancelled/deleted/changed; only then proceed to call
deps.store.updateWorkflowRun(... evidence_validation ...) or
deps.store.failWorkflowRun(...) and to emit
events/safeSendMessage/logWorkflowError. Ensure this check is applied both on
the failure path (before persisting failure and sending notifications) and on
the success path (before persisting the success marker).

---

Nitpick comments:
In `@packages/workflows/src/evidence-validator.test.ts`:
- Around line 567-632: Add two tests to evidence-validator.test.ts that exercise
the non-ENOENT read-failure branches in validateEvidence: one that creates a
directory at the policy path (e.g., mkdir join(testDir, 'evidence.json')) and
calls validateEvidence(...) with policy.path set to that path to trigger an
EISDIR read error, and another that simulates an EACCES read error (either by
creating a file and chmod 0 on it or by stubbing fs.readFile to throw an { code:
'EACCES' } error) and calling validateEvidence(...) with policy.path pointing to
that file; assert result.valid is false and that result.issues includes an issue
with field 'evidence_policy.path' for each case. Ensure you use the existing
helpers/testDir setup and reference validateEvidence and
loadEvidenceFromArtifacts in the new tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6b1fdfae-ad23-415c-906e-999063d6e8bb

📥 Commits

Reviewing files that changed from the base of the PR and between 55e3905 and 83dcb4b.

📒 Files selected for processing (6)

CLAUDE.md
packages/docs-web/src/content/docs/guides/authoring-workflows.md
packages/workflows/src/dag-executor.ts
packages/workflows/src/evidence-validator.test.ts
packages/workflows/src/evidence-validator.ts
packages/workflows/src/schemas/evidence.ts

✅ Files skipped from review due to trivial changes (1)

CLAUDE.md

coderabbitai · 2026-05-06T21:12:42Z

+  - id: write-evidence
+    bash: |
+      cat > "$ARTIFACTS_DIR/evidence.json" <<EOF
+      {
+        "kind": "execution",
+        "workflow_run_id": "$WORKFLOW_ID",
+        "provider": "claude",
+        "provider_run_ids": ["${SESSION_ID}"],
+        "changed_files": $(git diff --name-only origin/dev...HEAD | jq -R -s -c 'split("\n") | map(select(. != ""))'),
+        "diff_command": "git diff --stat origin/dev...HEAD",
+        "test_commands": ["bun test"],
+        "test_output_summary": "all tests passed",
+        "commit_sha": "$(git rev-parse HEAD)",
+        "pushed_branch": "$(git rev-parse --abbrev-ref HEAD)",
+        "pr_url": "${PR_URL}",
+        "pr_number": ${PR_NUMBER}
+      }
+      EOF
+    depends_on: [open-pr]
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make this example use documented variables and a portable base branch.

Right now the snippet hardcodes origin/dev and relies on ${SESSION_ID}, ${PR_URL}, and ${PR_NUMBER} without introducing them anywhere in this guide. As written, it's easy to copy-paste into a workflow that cannot actually produce a valid evidence.json. Please switch the example to documented values like $BASE_BRANCH / $nodeId.output..., or document those variables alongside the snippet.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/docs-web/src/content/docs/guides/authoring-workflows.md` around lines 796 - 815, The example under id "write-evidence" uses hardcoded origin/dev and undocumented vars (${SESSION_ID}, ${PR_URL}, ${PR_NUMBER}); change it to use the documented/base branch variable and the outputs from the open-pr step/job: replace origin/dev with $BASE_BRANCH (or ${BASE_BRANCH:-origin/main}) and replace ${SESSION_ID}, ${PR_URL}, ${PR_NUMBER} with the explicit open-pr outputs you document and reference (e.g. the job/step outputs such as $OPEN_PR_SESSION_ID, $OPEN_PR_PR_URL, $OPEN_PR_PR_NUMBER or the appropriate GitHub Actions expression form like ${{ jobs.open-pr.outputs.session_id }}), and update the surrounding text to document those variables so the snippet is copy-pastable and portable.

shhaider · 2026-05-06T21:16:21Z

Manual dogfood follow-up: pushed f6f71a8 to bind execution evidence to the active workflow run. validateEvidence now requires workflowRunId, rejects stale/copied evidence when evidence.workflow_run_id differs from the active run, and dag-executor passes workflowRun.id into the gate. Added validator and DAG executor regression tests. Verification after this patch: bun test packages/workflows/src/evidence-validator.test.ts, bun test packages/workflows/src/dag-executor.test.ts, bun --filter @archon/workflows test, and bun run validate all passed.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

packages/workflows/src/dag-executor.ts (2)
3160-3165: ⚠️ Potential issue | 🟡 Minor | ⚖️ Poor tradeoff

Stale workflowRun.metadata snapshot can silently drop concurrent writes.

Both metadata updates (lines 3162–3165 and 3201–3204) spread workflowRun.metadata, which is the in-memory snapshot captured at executor startup. Any metadata written during the run (e.g., approval/loop gate state) is silently overwritten when this spread is persisted.

Prefer fetching the freshest metadata from deps.store immediately before writing, or exposing a partial-merge API (e.g., setWorkflowRunMetadata(id, { evidence_validation: ... })) so the spread is unnecessary.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3160 - 3165, The update
currently spreads the stale in-memory snapshot workflowRun.metadata into
deps.store.updateWorkflowRun (when setting evidence_validation), which can
overwrite concurrent metadata changes; modify the write to first fetch the
latest metadata from deps.store (e.g.,
deps.store.getWorkflowRun/workflowRunById) and merge evidence_validation into
that fresh object before calling deps.store.updateWorkflowRun, or change to a
partial-merge API such as deps.store.setWorkflowRunMetadata(workflowRun.id, {
evidence_validation: ... }) so you avoid spreading workflowRun.metadata and
prevent lost concurrent writes.
3136-3193: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

TOCTOU: status is not re-checked after validateEvidence returns.

The guard at line 3136 (skipIfStatusChanged) runs before validateEvidence. When policy.verify === 'reality', that call can take several seconds (git/gh subprocesses). If the run is cancelled or deleted during that window, the failure branch at line 3173 will still call failWorkflowRun — overwriting a cancelled/deleted row with failed. The same window exists on the success path at line 3200.

The existing skipIfStatusChanged helper is already available; calling it again immediately after validateEvidence returns (on both branches) closes the window:
   const evidenceResult = await validateEvidence({ ... });
+  if (await skipIfStatusChanged('dag.skip_complete_status_changed')) return;
   if (!evidenceResult.valid) {
     ...
   }
   // success path
+  if (await skipIfStatusChanged('dag.skip_complete_status_changed')) return;
   await deps.store.updateWorkflowRun(...);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.ts` around lines 3136 - 3193, The
time-of-check/time-of-use race occurs because
skipIfStatusChanged('dag.skip_complete_status_changed') runs before
validateEvidence() but not after it; add a second check immediately after
validateEvidence() returns and before taking either branch (i.e., before
persisting evidence_validation via deps.store.updateWorkflowRun, before
deps.store.failWorkflowRun/logWorkflowError/emitterForFail.emit/unregisterRun/safeSendMessage
on the failure path, and likewise before the success path that marks the run
completed). Concretely, call the existing
skipIfStatusChanged('dag.skip_complete_status_changed') right after the
validateEvidence(...) call and return early if it signals status changed to
prevent overwriting cancelled/deleted runs.

🧹 Nitpick comments (3)

packages/workflows/src/dag-executor.test.ts (1)

7032-7091: ⚡ Quick win

Add a regression test for success-path metadata write failure propagation.

This suite validates the happy path, but it doesn’t pin the “do not silently complete if evidence metadata persistence fails” behavior. A focused test here would guard the reliability fix from regressions.

Suggested test addition

+  it('required evidence: metadata write failure on success path propagates (no silent completion)', async () => {
+    await writeFile(
+      join(artifactsDir, 'evidence.json'),
+      JSON.stringify({
+        kind: 'execution',
+        workflow_run_id: 'dag-evidence-run-db-fail',
+        provider: 'claude',
+        provider_run_ids: ['session-abc'],
+        changed_files: ['src/foo.ts'],
+        diff_command: 'git diff --stat origin/dev...HEAD',
+        test_commands: ['bun test'],
+        test_output_summary: '15 passed',
+        commit_sha: 'a'.repeat(40),
+        pushed_branch: 'feature/foo',
+        pr_url: 'https://github.com/owner/repo/pull/42',
+        pr_number: 42,
+      })
+    );
+
+    const mockStore = createMockStore();
+    (mockStore.updateWorkflowRun as ReturnType<typeof mock>).mockRejectedValueOnce(
+      new Error('db write failed')
+    );
+    const mockDeps = createMockDeps(mockStore);
+    const platform = createMockPlatform();
+    const workflowRun = makeWorkflowRun('dag-evidence-run-db-fail');
+
+    await expect(
+      executeDagWorkflow(
+        mockDeps,
+        platform,
+        'conv-evidence',
+        testDir,
+        {
+          name: 'evidence-required-valid-db-fail',
+          nodes: [{ id: 'work', bash: 'echo done' } as BashNode],
+          evidence_policy: { required: true, verify: 'shape', path: 'evidence.json' },
+        },
+        workflowRun,
+        'claude',
+        undefined,
+        artifactsDir,
+        join(testDir, 'logs'),
+        'main',
+        'docs/',
+        minimalConfig
+      )
+    ).rejects.toThrow('db write failed');
+
+    expect((mockStore.completeWorkflowRun as ReturnType<typeof mock>).mock.calls.length).toBe(0);
+  });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/dag-executor.test.ts` around lines 7032 - 7091, Add a
regression test that simulates updateWorkflowRun failing when writing evidence
metadata so the happy-path does not silently complete; in the existing test case
around executeDagWorkflow for evidence_policy (the 'evidence-required-valid'
scenario) mock mockStore.updateWorkflowRun to throw/reject when called for
metadata updates, then call executeDagWorkflow and assert that
mockStore.completeWorkflowRun was NOT called and mockStore.failWorkflowRun WAS
called (or that the error propagated), and also assert the updateWorkflowRun
mock was invoked for evidence metadata. Target the same symbols used in the
file: executeDagWorkflow, mockStore.updateWorkflowRun,
mockStore.completeWorkflowRun, mockStore.failWorkflowRun, and the
evidence_policy/path handling so the test specifically verifies metadata
persistence failure prevents completion.

packages/workflows/src/evidence-validator.test.ts (2)

432-454: ⚡ Quick win

Add an explicit git ls-remote rejection test branch.

You cover empty output and mismatch, but not the thrown ls-remote path. That’s a distinct failure mode and worth pinning directly.

Suggested test addition

+  it('negative: git ls-remote rejects -> pushed_branch issue', async () => {
+    execFileImpl = (cmd, args) => {
+      if (cmd === 'git' && args[0] === 'cat-file') {
+        return Promise.resolve({ stdout: '', stderr: '' });
+      }
+      if (cmd === 'git' && args[0] === 'ls-remote') {
+        return Promise.reject(Object.assign(new Error('ls-remote failed'), { stderr: 'fatal: network error' }));
+      }
+      if (cmd === 'gh') {
+        return Promise.resolve({
+          stdout: JSON.stringify({
+            headRefName: 'feature/foo',
+            number: 42,
+            headRefOid: 'a'.repeat(40),
+          }),
+          stderr: '',
+        });
+      }
+      return Promise.resolve({ stdout: '', stderr: '' });
+    };
+
+    const issues = await verifyEvidenceReality(REAL_EVIDENCE, '/repo');
+    expect(issues.some(i => i.field === 'pushed_branch')).toBe(true);
+  });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.test.ts` around lines 432 - 454,
Add a new test case in packages/workflows/src/evidence-validator.test.ts that
specifically simulates a rejected git ls-remote call: in the new "it" (e.g.,
'negative: git ls-remote rejects → pushed_branch error') set execFileImpl so it
returns Promise.reject(new Error('ls-remote failed')) when cmd === 'git' &&
args[0] === 'ls-remote' (keep existing handling for 'git cat-file' and 'gh' as
in other tests), then call verifyEvidenceReality(REAL_EVIDENCE, '/repo') and
assert that the returned issues include an entry with field === 'pushed_branch';
reference the verifyEvidenceReality function, execFileImpl mock, and
REAL_EVIDENCE to locate where to add this test.

567-634: ⚡ Quick win

Add loader tests for non-ENOENT filesystem errors (e.g., EISDIR/EACCES).

loadEvidenceFromArtifacts currently checks missing/valid/malformed, but not other read failures. Pinning these keeps the “structured issues, no crash” contract safe.

Suggested deterministic branch test (EISDIR)

+  it('directory at evidence path is handled without throw (EISDIR path)', async () => {
+    await mkdir(join(testDir, 'evidence.json'), { recursive: true });
+    const result = await validateEvidence({
+      artifactsDir: testDir,
+      cwd: '/repo',
+      workflowRunId: 'run-1',
+      policy: { required: true, verify: 'shape', path: 'evidence.json' },
+    });
+    expect(result.valid).toBe(false);
+    if (!result.valid) {
+      expect(result.issues.some(i => i.field === 'evidence_policy.path')).toBe(true);
+    }
+  });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/workflows/src/evidence-validator.test.ts` around lines 567 - 634,
Add a deterministic test that simulates non-ENOENT fs errors by mocking
fs.promises.readFile (for the evidence.json path) to throw an error with code
'EISDIR' (and another case for 'EACCES'), then call loadEvidenceFromArtifacts
and assert it does not throw and returns a structured result (i.e., a handled
response rather than crashing — e.g., result.found truthy and result.raw is a
string or error-info), and also add a companion validateEvidence call (using
validateEvidence with a policy) to assert validation returns a non-valid result
with an appropriate issue reported; reference the existing test patterns and the
functions loadEvidenceFromArtifacts and validateEvidence and the artifact
filename 'evidence.json' when adding the new tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/workflows/src/evidence-validator.ts`:
- Around line 151-207: In verifyEvidenceReality, the three execFileAsync calls
(the git cat-file -e call, the git ls-remote --heads origin call, and the gh pr
view call) must include a timeout option to avoid indefinite hangs; update the
options object passed to execFileAsync for the git cat-file invocation to
include a short timeout (e.g. 30_000 ms) and for the networked git ls-remote and
gh pr view calls to include a longer timeout (e.g. 60_000 ms), preserving the
existing cwd property (i.e., change { cwd } to { cwd, timeout: <ms> }) so Node's
child_process.execFile will kill stalled subprocesses.

---

Duplicate comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3160-3165: The update currently spreads the stale in-memory
snapshot workflowRun.metadata into deps.store.updateWorkflowRun (when setting
evidence_validation), which can overwrite concurrent metadata changes; modify
the write to first fetch the latest metadata from deps.store (e.g.,
deps.store.getWorkflowRun/workflowRunById) and merge evidence_validation into
that fresh object before calling deps.store.updateWorkflowRun, or change to a
partial-merge API such as deps.store.setWorkflowRunMetadata(workflowRun.id, {
evidence_validation: ... }) so you avoid spreading workflowRun.metadata and
prevent lost concurrent writes.
- Around line 3136-3193: The time-of-check/time-of-use race occurs because
skipIfStatusChanged('dag.skip_complete_status_changed') runs before
validateEvidence() but not after it; add a second check immediately after
validateEvidence() returns and before taking either branch (i.e., before
persisting evidence_validation via deps.store.updateWorkflowRun, before
deps.store.failWorkflowRun/logWorkflowError/emitterForFail.emit/unregisterRun/safeSendMessage
on the failure path, and likewise before the success path that marks the run
completed). Concretely, call the existing
skipIfStatusChanged('dag.skip_complete_status_changed') right after the
validateEvidence(...) call and return early if it signals status changed to
prevent overwriting cancelled/deleted runs.

---

Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 7032-7091: Add a regression test that simulates updateWorkflowRun
failing when writing evidence metadata so the happy-path does not silently
complete; in the existing test case around executeDagWorkflow for
evidence_policy (the 'evidence-required-valid' scenario) mock
mockStore.updateWorkflowRun to throw/reject when called for metadata updates,
then call executeDagWorkflow and assert that mockStore.completeWorkflowRun was
NOT called and mockStore.failWorkflowRun WAS called (or that the error
propagated), and also assert the updateWorkflowRun mock was invoked for evidence
metadata. Target the same symbols used in the file: executeDagWorkflow,
mockStore.updateWorkflowRun, mockStore.completeWorkflowRun,
mockStore.failWorkflowRun, and the evidence_policy/path handling so the test
specifically verifies metadata persistence failure prevents completion.

In `@packages/workflows/src/evidence-validator.test.ts`:
- Around line 432-454: Add a new test case in
packages/workflows/src/evidence-validator.test.ts that specifically simulates a
rejected git ls-remote call: in the new "it" (e.g., 'negative: git ls-remote
rejects → pushed_branch error') set execFileImpl so it returns
Promise.reject(new Error('ls-remote failed')) when cmd === 'git' && args[0] ===
'ls-remote' (keep existing handling for 'git cat-file' and 'gh' as in other
tests), then call verifyEvidenceReality(REAL_EVIDENCE, '/repo') and assert that
the returned issues include an entry with field === 'pushed_branch'; reference
the verifyEvidenceReality function, execFileImpl mock, and REAL_EVIDENCE to
locate where to add this test.
- Around line 567-634: Add a deterministic test that simulates non-ENOENT fs
errors by mocking fs.promises.readFile (for the evidence.json path) to throw an
error with code 'EISDIR' (and another case for 'EACCES'), then call
loadEvidenceFromArtifacts and assert it does not throw and returns a structured
result (i.e., a handled response rather than crashing — e.g., result.found
truthy and result.raw is a string or error-info), and also add a companion
validateEvidence call (using validateEvidence with a policy) to assert
validation returns a non-valid result with an appropriate issue reported;
reference the existing test patterns and the functions loadEvidenceFromArtifacts
and validateEvidence and the artifact filename 'evidence.json' when adding the
new tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79600dcf-d763-4d63-a960-886426bc50c8

📥 Commits

Reviewing files that changed from the base of the PR and between 83dcb4b and f6f71a8.

📒 Files selected for processing (4)

packages/workflows/src/dag-executor.test.ts
packages/workflows/src/dag-executor.ts
packages/workflows/src/evidence-validator.test.ts
packages/workflows/src/evidence-validator.ts

coderabbitai · 2026-05-06T21:33:27Z

+    await execFileAsync('git', ['cat-file', '-e', `${evidence.commit_sha}^{commit}`], { cwd });
+  } catch (err) {
+    const e = err as Error & { stderr?: string };
+    issues.push({
+      level: 'error',
+      field: 'commit_sha',
+      message: `git cat-file -e ${evidence.commit_sha}^{commit} failed: ${
+        e.stderr?.trim() || e.message
+      }`,
+      hint: 'commit_sha is not reachable in this checkout — fake SHA or wrong cwd',
+    });
+  }
+
+  // 2. pushed_branch present on origin AND its tip SHA matches commit_sha
+  try {
+    const { stdout } = await execFileAsync(
+      'git',
+      ['ls-remote', '--heads', 'origin', evidence.pushed_branch],
+      { cwd }
+    );
+    const trimmed = stdout.trim();
+    if (trimmed.length === 0) {
+      issues.push({
+        level: 'error',
+        field: 'pushed_branch',
+        message: `git ls-remote --heads origin ${evidence.pushed_branch} returned empty; branch is not on origin`,
+        hint: 'Did the workflow run `git push -u origin <branch>` after committing?',
+      });
+    } else {
+      // Bind commit_sha to the branch tip on origin. Without this check, any
+      // locally-reachable SHA can be paired with any real branch and pass.
+      const remoteSha = trimmed.split(/\s+/)[0];
+      if (remoteSha !== evidence.commit_sha) {
+        issues.push({
+          level: 'error',
+          field: 'commit_sha',
+          message: `commit_sha=${evidence.commit_sha} does not match origin/${evidence.pushed_branch} tip=${remoteSha}`,
+          hint: 'evidence.json claims a commit that is not the tip of pushed_branch on origin',
+        });
+      }
+    }
+  } catch (err) {
+    const e = err as Error & { stderr?: string };
+    issues.push({
+      level: 'error',
+      field: 'pushed_branch',
+      message: `git ls-remote failed: ${e.stderr?.trim() || e.message}`,
+    });
+  }
+
+  // 3. gh pr view confirms PR url maps to pushed_branch, pr_number, and commit_sha
+  try {
+    const { stdout } = await execFileAsync(
+      'gh',
+      ['pr', 'view', evidence.pr_url, '--json', 'headRefName,number,headRefOid'],
+      { cwd }
+    );


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check if `@archon/git`'s execFileAsync sets a default timeout. # Expected: if a default timeout exists, the concern is mitigated. ast-grep --pattern 'function execFileAsync($$$) { $$$ }'

Repository: coleam00/Archon

Length of output: 41

🏁 Script executed:

#!/bin/bash # Find the `@archon/git` package and locate execFileAsync definition fd -t f -e ts -e tsx 'git' | grep -E '(git|src)' | head -20

Repository: coleam00/Archon

Length of output: 177

🏁 Script executed:

#!/bin/bash # Search for execFileAsync definition more broadly rg 'export.*execFileAsync|function execFileAsync' -t ts -t tsx -A 5

Repository: coleam00/Archon

Length of output: 87

🏁 Script executed:

#!/bin/bash # Look for `@archon/git` package structure find . -type d -name git -o -name '@archon' | head -10

Repository: coleam00/Archon

Length of output: 74

🏁 Script executed:

#!/bin/bash # Search for execFileAsync in `@archon/git` package rg 'execFileAsync' packages/git/src/ -t ts -A 10 -B 2

Repository: coleam00/Archon

Length of output: 40547

🏁 Script executed:

#!/bin/bash # List files in `@archon/git` package ls -la packages/git/src/

Repository: coleam00/Archon

Length of output: 592

🏁 Script executed:

#!/bin/bash # Read the full exec.ts file to see if there are any default timeouts cat packages/git/src/exec.ts

Repository: coleam00/Archon

Length of output: 871

🏁 Script executed:

#!/bin/bash # Verify the specific lines in evidence-validator.ts to confirm they lack timeout sed -n '151p;166,170p;203,207p' packages/workflows/src/evidence-validator.ts

Repository: coleam00/Archon

Length of output: 458

Add timeout to verifyEvidenceReality subprocess calls to prevent indefinite hangs.

All three execFileAsync calls lack a timeout option:

Line Call Risk

151 git cat-file -e Local; credential helper/lock contention

166–170 git ls-remote origin Network; unreachable → ~2 min TCP timeout

203–207 gh pr view Network; GitHub API flaky → ~2 min TCP timeout

Without timeout, the default is 0 (infinite). The execFileAsync wrapper in @archon/git does not apply a built-in default—it passes options directly to Node.js's child_process.execFile. Network stalls on git ls-remote or gh pr view will hang indefinitely, permanently blocking the DAG executor's completion path (line 3144 of dag-executor.ts), preventing both failWorkflowRun and completeWorkflowRun.

Other subprocess calls in this codebase (executeBashNode, executeScriptNode) and throughout the @archon/git module all explicitly set timeout (10–120 seconds depending on operation). Apply the same pattern here.

Proposed fix

+/** Max wall-clock time for each individual reality-check subprocess. */ +const REALITY_CHECK_TIMEOUT_MS = 30_000; + // 1. commit_sha reachable in worktree try { await execFileAsync('git', ['cat-file', '-e', `${evidence.commit_sha}^{commit}`], { - cwd, + cwd, + timeout: REALITY_CHECK_TIMEOUT_MS, }); } catch (err) { ... } // 2. pushed_branch present on origin AND its tip SHA matches commit_sha try { const { stdout } = await execFileAsync( 'git', ['ls-remote', '--heads', 'origin', evidence.pushed_branch], - { cwd } + { cwd, timeout: REALITY_CHECK_TIMEOUT_MS } ); ... } // 3. gh pr view ... try { const { stdout } = await execFileAsync( 'gh', ['pr', 'view', evidence.pr_url, '--json', 'headRefName,number,headRefOid'], - { cwd } + { cwd, timeout: REALITY_CHECK_TIMEOUT_MS } ); ... }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/evidence-validator.ts` around lines 151 - 207, In verifyEvidenceReality, the three execFileAsync calls (the git cat-file -e call, the git ls-remote --heads origin call, and the gh pr view call) must include a timeout option to avoid indefinite hangs; update the options object passed to execFileAsync for the git cat-file invocation to include a short timeout (e.g. 30_000 ms) and for the networked git ls-remote and gh pr view calls to include a longer timeout (e.g. 60_000 ms), preserving the existing cwd property (i.e., change { cwd } to { cwd, timeout: <ms> }) so Node's child_process.execFile will kill stalled subprocesses.

Wirasm · 2026-05-13T14:14:07Z

Review Summary

Verdict: minor-fixes-needed

This PR implements a real-execution proof validation system for PR-producing workflows (ROADMAP P3-A). The three-layer validator, evidence schemas, and DAG executor integration are solid — 52 new tests, clean TypeScript, CLAUDE.md-compliant throughout. Two categories of work remain: documentation for the new public surface, and comment hygiene (ROADMAP references and redundant docstrings).

Blocking issues

Missing docs for new public surface (packages/docs-web/src/content/docs/guides/authoring-workflows.md): The evidence_policy workflow field, evidence.json file contract, and protected-branch rejection behavior are all new user-facing features with no documentation.
- Add evidence_policy to the DAG schema section: all three sub-fields (required, verify, path) with examples.
- Document the evidence.json discriminated union, required execution fields, GitHub-only pr_url constraint, 40-char hex commit_sha format, and non-empty array requirements.
- Note that pushed_branch cannot be main, master, dev, or develop.

Suggested fixes

packages/workflows/src/dag-executor.ts (~lines 3161-3204): The failure path swallows DB errors (updateWorkflowRun(...).catch(...)) while the success path propagates them. If the DB write fails on the failure path (e.g., connection lost), the run may be left in a non-terminal state despite workflow_failed being emitted. Either propagate the error or do not emit workflow_failed unless the DB state is confirmed.
```
try {
  await deps.store.updateWorkflowRun(workflowRun.id, { metadata: { ... } });
} catch (dbErr) {
  getLog().error({ err: dbErr, workflowRunId: workflowRun.id }, 'workflow.evidence_metadata_persist_failed');
  throw dbErr; // let outer executor catch handle
}
```
Comment rot — remove "ROADMAP P3-A" from module docstrings in evidence-validator.ts and evidence.ts. These references will become meaningless over time. Per CLAUDE.md comment policy: comments only where WHY is non-obvious. Schema field names are self-documenting — multi-paragraph bullets describing commit_sha and provider_run_ids restate what the code already expresses.
CHANGELOG entry missing under [Unreleased] → ### Added: Add a line for evidence_policy describing the field and linking to the authoring-workflows guide.

Minor / nice-to-have

evidence-validator.ts:~3177-3193: The three sequential await DB writes in the failure handler are fire-and-forget with silent .catch(). Add a comment: // DB writes and log best-effort; run state (failed) is already set above. to make the semantics explicit.
evidence.ts:148-151: Remove the instructional aside from the level field comment — comments should describe current behavior, not future plans.
Test for DB rejection on the success marker persist: spy on mockStore.updateWorkflowRun to simulate a rejection and assert the run still calls completeWorkflowRun. Confirms "log and continue" is intentional.
Test file cleanup (afterEach): add // test cleanup: non-fatal comment to the catch {} blocks in evidence-validator.test.ts and dag-executor.test.ts.

Compliments

The verifyEvidenceReality function documents a subtle security invariant explicitly: "Bind commit_sha to the branch tip on origin. Without this check, any locally-reachable SHA can be paired with any real branch and pass." This kind of WHY-captured reasoning is exactly what CLAUDE.md asks for — keep this pattern.
The PROTECTED_BRANCHES defensive guard with the "feature-branch heuristic" comment explains the intent clearly. The schema hint surfaces the constraint at runtime. Well done.
Test isolation: evidence-validator.test.ts correctly runs as its own bun test invocation (registered in package.json) to avoid mock pollution — exactly as CLAUDE.md requires for mock.module() usage.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact.

Wirasm · 2026-05-25T19:06:33Z

Hi @shhaider — thanks for the substantial work here. We want to build in this direction (verifiable PR-evidence is a real maturity gap), but at 2137 LOC the proposed surface is more than we want to commit to before we've validated the pattern with users.

Could you slice this into an MVP and a follow-up?

MVP scope (target ~200 LOC):

The executor refuses to mark a run completed when evidence_policy.required: true is declared on the workflow AND $ARTIFACTS_DIR/evidence.json is missing
Run is downgraded to failed with a structured error in metadata.evidence_validation
No schema validation, no reality I/O (git ls-remote/gh pr view), no evidencePolicySchema/evidence-validator.ts for now

Out of MVP, future PR after we see the gate get used:

The typed executionEvidenceSchema + three-layer validator
The opt-in verify: 'reality' mode
The structured metadata.evidence_validation.issues array

This way we land the load-bearing executor gate first, see how workflow authors adopt it, and then expand the contract surface based on what they actually need rather than what we guess they'll need. If the slim version gets used and people start asking "can the contract enforce ?", we add <X> in a follow-up.

Happy to chat through the slice if any of the cut surface feels load-bearing to you.

Wirasm · 2026-05-27T09:47:21Z

@$shhaider related to #1433 — overlapping area or partial fix.

Wirasm · 2026-05-27T09:47:26Z

@$shhaider related to #1720 — overlapping area or partial fix.

Wirasm · 2026-05-27T09:54:21Z

@$shhaider related to #1433 — overlapping area or partial fix.

Wirasm · 2026-05-27T09:54:26Z

@$shhaider related to #1720 — overlapping area or partial fix.

shhaider added 2 commits May 6, 2026 22:53

docs: add standalone Archon upgrade roadmap

661e72c

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

fix(workflows): bind execution evidence to active run

f6f71a8

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Wirasm mentioned this pull request May 13, 2026

fix(providers,workflows): treat Claude SDK stop_sequence success as success (#1425) #1662

Merged

Wirasm mentioned this pull request May 27, 2026

feat(cli): per-run --provider/--model override for archon workflow run (and HTTP API parity) #1433

Open

Wirasm mentioned this pull request May 27, 2026

lms-commit-style claude.exe subagent silent no-op on fresh worktree — commit-node reports success in <15s, no commit lands, generator's work uncommitted in worktree #1720

Open

Line	Call	Risk
151	`git cat-file -e`	Local; credential helper/lock contention
166–170	`git ls-remote origin`	Network; unreachable → ~2 min TCP timeout
203–207	`gh pr view`	Network; GitHub API flaky → ~2 min TCP timeout

Conversation

shhaider commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

UX Journey

Before

After

Architecture Diagram

Before

After

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Compatibility / Migration

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shhaider commented May 6, 2026

🔍 Comprehensive PR Review

Summary

🟠 High Issues

HIGH-1 — Reality layer doesn't bind commit_sha to PR head; fabricated evidence can pass

HIGH-2 — CLAUDE.md "Workflows" bullet list omits evidence_policy

HIGH-3 — authoring-workflows.md missing evidence_policy reference + evidence.json contract

🟡 Medium Issues (Auto-fixing in next step)

MED-1 — Log-event names violate {domain}.{action}_{state} convention

MED-2 — Stale workflowRun.metadata snapshot will clobber future concurrent writers

MED-3 — gh pr view non-JSON output drops parser detail

MED-4 — Success-path metadata-persist failure is silently swallowed

MED-5 — loadEvidenceFromArtifacts non-ENOENT path is untested

MED-6 — git ls-remote rejection branch is untested

🟢 Low Issues

✅ What's Good

📋 Suggested Follow-up Issues

Next Steps

Uh oh!

shhaider commented May 6, 2026

⚡ Auto-Fix Report

Fixes Applied

What Was Fixed

Tests Added

❌ Not Fixed (Deferred per Review Recommendation)

📋 Suggested Follow-up Issues

Validation

Uh oh!

shhaider commented May 6, 2026

🎯 Workflow Summary

Implementation vs Plan

Review Summary

📋 Suggested Follow-Up Issues

shhaider commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading

HIGH-1 — Reality layer doesn't bind `commit_sha` to PR head; fabricated evidence can pass

HIGH-2 — CLAUDE.md "Workflows" bullet list omits `evidence_policy`

HIGH-3 — `authoring-workflows.md` missing `evidence_policy` reference + `evidence.json` contract

MED-1 — Log-event names violate `{domain}.{action}_{state}` convention

MED-2 — Stale `workflowRun.metadata` snapshot will clobber future concurrent writers

MED-3 — `gh pr view` non-JSON output drops parser detail

MED-5 — `loadEvidenceFromArtifacts` non-ENOENT path is untested

MED-6 — `git ls-remote` rejection branch is untested