feat(workflows): add execution-evidence contract with terminal-success gate#1601
feat(workflows): add execution-evidence contract with terminal-success gate#1601shhaider wants to merge 4 commits into
Conversation
…s gate
Add typed evidence schema (executionEvidenceSchema, evidencePolicySchema)
and a reusable validator (evidence-validator.ts) inside @archon/workflows
so PR-producing workflows can opt in to real-execution proof. The DAG
executor deterministically blocks terminal completed status when the
workflow declares evidence_policy.required: true and evidence is missing,
malformed, or fake.
- Three-layer validator: shape (Zod), integrity (cross-field, protected
branches, zero-SHA), and reality (git/gh I/O, opt-in via verify: 'reality')
- Single-point gate at dag-executor terminal-success boundary; on failure
downgrades run to failed and persists structured issues at
metadata.evidence_validation
- 52 new tests (43 validator across 5 groups + 6 schema smoke tests +
3 dag-executor integration tests including no-policy regression guard)
- Covers full Edge Cases checklist: every fake-completion shape (planning
artifacts, short/uppercase/non-hex/zero-tree SHAs, protected branches,
unpushed branches, http/non-GitHub PR URLs, gh mismatches, path traversal,
ENOENT on gh)
- New evidence-validator.test.ts isolated as its own bun test invocation
per CLAUDE.md mock-pollution rule (uses mock.module('@archon/git', ...))
- evidence_policy is workflow-level optional; omitting it preserves
byte-identical behavior (regression-tested)
No bundled YAML changes, no DB migration, no CLI/web/provider surface
changes — engine surface only. Follow-up PR will turn the flag on for
bundled PR-producing workflows once stable.
Implements ROADMAP P3-A.
📝 WalkthroughWalkthroughAdds an evidence-validation pipeline and an optional workflow-level evidence_policy that can block DAG completion: new Zod schemas, a three-layer validator (shape, integrity, reality), evidence loader/orchestrator, DAG executor gating, exports, tests, docs, and a roadmap entry. No breaking public API removals. ChangesReal-Execution Evidence Validation
Sequence DiagramsequenceDiagram
participant WF as Workflow<br/>(DAG Executor)
participant EV as Evidence Validator
participant Load as Evidence Loader
participant Shape as SHAPE Layer
participant Integrity as INTEGRITY Layer
participant Reality as REALITY Layer
participant Meta as Metadata &<br/>Completion Gate
WF->>EV: validateEvidence(artifactsDir, cwd, policy)
EV->>Load: loadEvidenceFromArtifacts(path)
Load-->>EV: evidence.json or load error
EV->>Shape: validateEvidenceShape(raw)
Shape-->>EV: validated evidence or shape issues
alt Shape validation fails
EV-->>Meta: persist issues -> metadata.evidence_validation
Meta-->>WF: failWorkflowRun
else Shape validation passes
EV->>Integrity: validateEvidenceIntegrity(evidence)
Integrity-->>EV: [] or integrity issues
alt Integrity check fails
EV-->>Meta: persist issues -> metadata.evidence_validation
Meta-->>WF: failWorkflowRun
else Integrity passes & policy.verify enabled
EV->>Reality: verifyEvidenceReality(evidence, cwd)
Reality-->>EV: [] or reality issues
alt Reality check fails
EV-->>Meta: persist issues -> metadata.evidence_validation
Meta-->>WF: failWorkflowRun
else All validations pass
EV-->>Meta: persist { ok: true } -> metadata.evidence_validation
Meta-->>WF: completeWorkflowRun
end
else Integrity passes & no verify
EV-->>Meta: persist { ok: true } -> metadata.evidence_validation
Meta-->>WF: completeWorkflowRun
end
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
packages/workflows/src/evidence-validator.ts (1)
304-307: ⚡ Quick winRename the new validation log events to the repo’s
domain.action_stateformat.
evidence_validation_started|failed|completeddoes not match the required event naming convention, so these entries will be inconsistent with the rest of the workflow logs.Proposed fix
- 'evidence_validation_started' + 'workflow.evidence_validation_started' @@ - getLog().error({ issues }, 'evidence_validation_failed'); + getLog().error({ issues }, 'workflow.evidence_validation_failed'); @@ - getLog().error({ err, issues }, 'evidence_validation_failed'); + getLog().error({ err, issues }, 'workflow.evidence_validation_failed'); @@ - getLog().info({ kind: shapeResult.evidence.kind }, 'evidence_validation_completed'); + getLog().info({ kind: shapeResult.evidence.kind }, 'workflow.evidence_validation_completed');As per coding guidelines, "Use Pino logger factory with event naming format
{domain}.{action}_{state}(e.g.,workflow.step_started,isolation.create_failed); always pair_startedwith_completedor_failed; include context IDs and error details".Also applies to: 319-330, 347-386
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/evidence-validator.ts` around lines 304 - 307, The log events emitted from getLog() in evidence-validator.ts (e.g., the call that currently logs 'evidence_validation_started') must be renamed to follow the repo convention {domain}.{action}_{state} and include relevant context IDs and error details; update all occurrences around the getLog().info call and the related blocks referenced (the started/failed/completed groups around lines ~304, ~319-330, ~347-386) so names become something like evidence.validation_started -> evidence.validation_started? (use domain "evidence" and action "validation") — actually rename to evidence.validation_started, evidence.validation_completed and evidence.validation_failed (pairing each started with completed/failed), add context fields (artifactsDir, policy.path, policy.verify, and a policy or request id) to the log payload, and when logging failures include the caught error object in the process (or getLog().error) payload so the error details are recorded; adjust the corresponding getLog().info/getLog().error calls and ensure consistent naming across all three blocks.packages/workflows/src/dag-executor.ts (1)
2493-2497: ⚡ Quick winInline workflow shape duplicates
WorkflowBase; prefer a named interface or aPick<>from the schema type.The parameter type is now
{ name: string; nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } & WorkflowLevelOptions. As more optional fields fromworkflowBaseSchemaget plumbed down (this PR addsevidence_policy; the next field will follow), this anonymous shape will keep drifting from the actual schema-derived type. Defining aninterface DagExecutorWorkflow(or usingPick<WorkflowBase, 'name' | 'nodes' | 'evidence_policy' | …> & WorkflowLevelOptions) would keep the contract in one place and align with the project's preference forinterfaceover inlinetype.As per coding guidelines: "In TypeScript, prefer
interfacefor defining object shapes rather thantype".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.ts` around lines 2493 - 2497, The anonymous workflow parameter type should be replaced with a named interface to avoid drifting from the schema; define an interface (e.g. DagExecutorWorkflow) that derives from the canonical schema type (either via "interface DagExecutorWorkflow extends Pick<WorkflowBase, 'name' | 'nodes' | 'evidence_policy' | ...> & WorkflowLevelOptions" or by explicitly declaring the interface and keeping it synced with workflowBaseSchema), then use that interface in the function signature instead of the inline `{ name: string; nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } & WorkflowLevelOptions`; update references to the parameter type to use DagExecutorWorkflow so future schema fields stay centralized.packages/workflows/src/dag-executor.test.ts (2)
7032-7055: ⚡ Quick winUse the real workflow run ID in the “valid evidence” fixture
Line 7037 hardcodes
workflow_run_id: 'run-1'while the run under test isdag-evidence-run-2. Using the actual run ID makes this fixture semantically accurate and better aligned with future integrity checks.🔧 Suggested fixture tweak
- it('evidence_policy.required + valid evidence.json -> completeWorkflowRun', async () => { + it('evidence_policy.required + valid evidence.json -> completeWorkflowRun', async () => { + const workflowRun = makeWorkflowRun('dag-evidence-run-2'); await writeFile( join(artifactsDir, 'evidence.json'), JSON.stringify({ kind: 'execution', - workflow_run_id: 'run-1', + workflow_run_id: workflowRun.id, provider: 'claude', provider_run_ids: ['session-abc'], changed_files: ['src/foo.ts'], diff_command: 'git diff --stat origin/dev...HEAD', test_commands: ['bun test'], test_output_summary: '15 passed', commit_sha: 'a'.repeat(40), pushed_branch: 'feature/foo', pr_url: 'https://github.com/owner/repo/pull/42', pr_number: 42, }) ); const mockStore = createMockStore(); const mockDeps = createMockDeps(mockStore); const platform = createMockPlatform(); - const workflowRun = makeWorkflowRun('dag-evidence-run-2');🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.test.ts` around lines 7032 - 7055, The evidence.json test fixture uses a hardcoded workflow_run_id ('run-1') which doesn't match the run created for the test; update the fixture to use the actual run ID from the test run (use the workflowRun created by makeWorkflowRun('dag-evidence-run-2') — e.g., set workflow_run_id to workflowRun.id or the literal 'dag-evidence-run-2') so the evidence.json aligns with the run under test and any integrity checks that compare IDs will pass.
7016-7029: ⚡ Quick winAssert
evidence_validation.statusto lock the metadata contractThese tests currently validate
ok/issuesbut not thestatusfield, so contract drift inmetadata.evidence_validation.statuscan slip through undetected.🧪 Suggested assertion additions
expect(updates.metadata.evidence_validation.ok).toBe(false); + expect(updates.metadata.evidence_validation.status).toBe('failed'); expect(Array.isArray(updates.metadata.evidence_validation.issues)).toBe(true);expect(updates.metadata.evidence_validation.ok).toBe(true); + expect(updates.metadata.evidence_validation.status).toBe('ok');Also applies to: 7079-7091
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.test.ts` around lines 7016 - 7029, The test currently checks evidence_validation.ok and issues but omits asserting evidence_validation.status; update the assertions around the mockStore.updateWorkflowRun inspection (the updateCalls/evidenceUpdate/updates metadata block) to also assert updates.metadata.evidence_validation.status is the expected status string (e.g., "failed") and that it's of type string, and apply the same added assertion in the other analogous assertion block (the similar check around the second occurrence referenced by lines 7079-7091) so the metadata contract for evidence_validation.status is locked down.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3138-3208: The evidence validation block calls validateEvidence
and then performs terminal writes (deps.store.updateWorkflowRun,
deps.store.failWorkflowRun, etc.) without re-checking run status; call the
existing skipIfStatusChanged guard again immediately after validateEvidence
returns (both when evidenceResult.valid is false and true) to abort if the run
status changed during the long reality checks. Specifically, invoke
skipIfStatusChanged('dag.skip_complete_status_changed') (or the same helper used
earlier) right after the await validateEvidence(...) line and again right after
entering each branch before any call to updateWorkflowRun, failWorkflowRun,
completeWorkflowRun, logWorkflowError, emitter.emit/unregisterRun, or
safeSendMessage to ensure you don't flip a cancelled/deleted run to a terminal
state.
- Around line 3159-3171: The current updateWorkflowRun call uses the stale
in-memory workflowRun.metadata snapshot and may clobber concurrent metadata
writes; before persisting evidence_validation, either fetch the latest metadata
from deps.store (e.g., readWorkflowRun or readWorkflowRunMetadata) and merge
evidence_validation into that fresh object, or add/use a partial-merge API on
deps.store (e.g., setWorkflowRunMetadata(workflowRun.id, { evidence_validation:
... })) so you only write the single field; update both occurrences (the block
around updateWorkflowRun at lines referencing updateWorkflowRun and the similar
block later) to use one of these approaches to avoid overwriting concurrent
metadata changes.
- Around line 3150-3153: Rename the bare logger event names to include the
domain prefix and add matching success/completed partners: change occurrences of
'evidence_validation_failed' and 'evidence_validation_metadata_persist_failed'
(the getLog().error calls that include { workflowRunId: workflowRun.id, issues:
evidenceResult.issues } and the metadata persist error call) to
'dag.evidence_validation_failed' and
'dag.evidence_validation_metadata_persist_failed', and add corresponding
success/info logs 'dag.evidence_validation_succeeded' (in the success path near
the evidence validation completion) and
'dag.evidence_validation_metadata_persist_succeeded' in the metadata persist
success path; apply the same renaming for the other occurrences noted (the
blocks around the other getLog() calls), keeping getLog().error for failures and
getLog().info for succeeded/completed events.
In `@packages/workflows/src/evidence-validator.ts`:
- Around line 149-151: The validator currently only checks that
evidence.commit_sha exists locally; update the validation in the evidence
verification flow (around the execFileAsync git checks) to bind commit_sha to
the remote branch/PR head for pushed_branch: fetch origin refs for the repo (use
execFileAsync('git', ['fetch', 'origin', pushed_branch]) or ensure
origin/pushed_branch is available), resolve the remote head ref for the claimed
pushed_branch/PR (e.g. origin/{pushed_branch} or the PR head ref), then verify
that evidence.commit_sha is either exactly that remote head or an ancestor of it
(use git rev-parse to get the remote head and git merge-base --is-ancestor or
git cat-file -e with remote_head^{commit} to test ancestry); update the checks
that currently call execFileAsync('git', ['cat-file', ...], { cwd }) (lines
around evidence.commit_sha and the block covering 166-218) to perform this
remote-bound validation and fail validation if the commit is not the remote head
or ancestor.
In `@packages/workflows/src/schemas/evidence.ts`:
- Around line 90-111: The evidencePolicySchema's path currently allows
empty/whitespace or directory-like values; update the path validator on
evidencePolicySchema to reject empty or all-whitespace strings and to reject '.'
, '..', any segment equal to '..', and absolute paths (leading '/' or Windows
drive/UNC forms) by adding a z.string().refine(...) with a clear error message;
reference the existing evidencePolicySchema and its path property and implement
the predicate to trim() and check length>0, disallow '.' or '..', disallow path
separators that would indicate a directory only or upward traversal, and
disallow absolute paths so invalid workflow definitions fail at parse time.
---
Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 7032-7055: The evidence.json test fixture uses a hardcoded
workflow_run_id ('run-1') which doesn't match the run created for the test;
update the fixture to use the actual run ID from the test run (use the
workflowRun created by makeWorkflowRun('dag-evidence-run-2') — e.g., set
workflow_run_id to workflowRun.id or the literal 'dag-evidence-run-2') so the
evidence.json aligns with the run under test and any integrity checks that
compare IDs will pass.
- Around line 7016-7029: The test currently checks evidence_validation.ok and
issues but omits asserting evidence_validation.status; update the assertions
around the mockStore.updateWorkflowRun inspection (the
updateCalls/evidenceUpdate/updates metadata block) to also assert
updates.metadata.evidence_validation.status is the expected status string (e.g.,
"failed") and that it's of type string, and apply the same added assertion in
the other analogous assertion block (the similar check around the second
occurrence referenced by lines 7079-7091) so the metadata contract for
evidence_validation.status is locked down.
In `@packages/workflows/src/dag-executor.ts`:
- Around line 2493-2497: The anonymous workflow parameter type should be
replaced with a named interface to avoid drifting from the schema; define an
interface (e.g. DagExecutorWorkflow) that derives from the canonical schema type
(either via "interface DagExecutorWorkflow extends Pick<WorkflowBase, 'name' |
'nodes' | 'evidence_policy' | ...> & WorkflowLevelOptions" or by explicitly
declaring the interface and keeping it synced with workflowBaseSchema), then use
that interface in the function signature instead of the inline `{ name: string;
nodes: readonly DagNode[]; evidence_policy?: EvidencePolicy } &
WorkflowLevelOptions`; update references to the parameter type to use
DagExecutorWorkflow so future schema fields stay centralized.
In `@packages/workflows/src/evidence-validator.ts`:
- Around line 304-307: The log events emitted from getLog() in
evidence-validator.ts (e.g., the call that currently logs
'evidence_validation_started') must be renamed to follow the repo convention
{domain}.{action}_{state} and include relevant context IDs and error details;
update all occurrences around the getLog().info call and the related blocks
referenced (the started/failed/completed groups around lines ~304, ~319-330,
~347-386) so names become something like evidence.validation_started ->
evidence.validation_started? (use domain "evidence" and action "validation") —
actually rename to evidence.validation_started, evidence.validation_completed
and evidence.validation_failed (pairing each started with completed/failed), add
context fields (artifactsDir, policy.path, policy.verify, and a policy or
request id) to the log payload, and when logging failures include the caught
error object in the process (or getLog().error) payload so the error details are
recorded; adjust the corresponding getLog().info/getLog().error calls and ensure
consistent naming across all three blocks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e22a2521-b78d-4816-b1ae-337feb88168c
📒 Files selected for processing (10)
ROADMAP.mdpackages/workflows/package.jsonpackages/workflows/src/dag-executor.test.tspackages/workflows/src/dag-executor.tspackages/workflows/src/evidence-validator.test.tspackages/workflows/src/evidence-validator.tspackages/workflows/src/schemas.test.tspackages/workflows/src/schemas/evidence.tspackages/workflows/src/schemas/index.tspackages/workflows/src/schemas/workflow.ts
| // Real-execution proof gate (ROADMAP P3-A). When the workflow declares | ||
| // `evidence_policy.required: true`, refuse terminal `completed` unless | ||
| // $ARTIFACTS_DIR/<path> parses as ExecutionEvidence and (when verify === | ||
| // 'reality') the claimed SHA/branch/PR are reachable. On failure, downgrade | ||
| // to `failed` with structured issues persisted at metadata.evidence_validation. | ||
| if (workflow.evidence_policy?.required === true) { | ||
| const evidenceResult = await validateEvidence({ | ||
| artifactsDir, | ||
| cwd, | ||
| policy: workflow.evidence_policy, | ||
| }); | ||
| if (!evidenceResult.valid) { | ||
| getLog().error( | ||
| { workflowRunId: workflowRun.id, issues: evidenceResult.issues }, | ||
| 'evidence_validation_failed' | ||
| ); | ||
| const issueLines = evidenceResult.issues | ||
| .map(i => ` - ${i.field} — ${i.message}${i.hint ? ` (hint: ${i.hint})` : ''}`) | ||
| .join('\n'); | ||
| const failMsg = `Real-execution evidence validation failed.\n${issueLines}`; | ||
| // Persist the structured issues at metadata.evidence_validation, then mark failed. | ||
| await deps.store | ||
| .updateWorkflowRun(workflowRun.id, { | ||
| metadata: { | ||
| ...workflowRun.metadata, | ||
| evidence_validation: { ok: false, issues: evidenceResult.issues }, | ||
| }, | ||
| }) | ||
| .catch((dbErr: Error) => { | ||
| getLog().error( | ||
| { err: dbErr, workflowRunId: workflowRun.id }, | ||
| 'evidence_validation_metadata_persist_failed' | ||
| ); | ||
| }); | ||
| await deps.store.failWorkflowRun(workflowRun.id, failMsg).catch((dbErr: Error) => { | ||
| getLog().error({ err: dbErr, workflowRunId: workflowRun.id }, 'dag_db_fail_failed'); | ||
| }); | ||
| await logWorkflowError(logDir, workflowRun.id, failMsg).catch((logErr: Error) => { | ||
| getLog().error( | ||
| { err: logErr, workflowRunId: workflowRun.id }, | ||
| 'dag.workflow_error_log_write_failed' | ||
| ); | ||
| }); | ||
| const emitterForFail = getWorkflowEventEmitter(); | ||
| emitterForFail.emit({ | ||
| type: 'workflow_failed', | ||
| runId: workflowRun.id, | ||
| workflowName: workflow.name, | ||
| error: failMsg, | ||
| }); | ||
| emitterForFail.unregisterRun(workflowRun.id); | ||
| await safeSendMessage(platform, conversationId, `❌ ${failMsg}`, { | ||
| workflowId: workflowRun.id, | ||
| }); | ||
| return; | ||
| } | ||
| // Evidence valid — persist the success marker alongside completion. | ||
| await deps.store | ||
| .updateWorkflowRun(workflowRun.id, { | ||
| metadata: { | ||
| ...workflowRun.metadata, | ||
| evidence_validation: { ok: true }, | ||
| }, | ||
| }) | ||
| .catch((dbErr: Error) => { | ||
| getLog().error( | ||
| { err: dbErr, workflowRunId: workflowRun.id }, | ||
| 'evidence_validation_metadata_persist_failed' | ||
| ); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Re-check run status after validateEvidence before terminal writes.
validateEvidence can issue git/gh subprocesses (when verify === 'reality') and may take seconds. The status gate at line 3136 (skipIfStatusChanged('dag.skip_complete_status_changed')) runs before this work, but neither the failure branch (3149-3193) nor the success branch re-checks status before calling failWorkflowRun / updateWorkflowRun / completeWorkflowRun. If the run is cancelled or deleted while reality checks are in flight, we will still flip the row to failed or completed, which contradicts the existing pattern of guarding terminal writes with skipIfStatusChanged.
Consider re-running skipIfStatusChanged immediately after validateEvidence returns, on both branches.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/workflows/src/dag-executor.ts` around lines 3138 - 3208, The
evidence validation block calls validateEvidence and then performs terminal
writes (deps.store.updateWorkflowRun, deps.store.failWorkflowRun, etc.) without
re-checking run status; call the existing skipIfStatusChanged guard again
immediately after validateEvidence returns (both when evidenceResult.valid is
false and true) to abort if the run status changed during the long reality
checks. Specifically, invoke
skipIfStatusChanged('dag.skip_complete_status_changed') (or the same helper used
earlier) right after the await validateEvidence(...) line and again right after
entering each branch before any call to updateWorkflowRun, failWorkflowRun,
completeWorkflowRun, logWorkflowError, emitter.emit/unregisterRun, or
safeSendMessage to ensure you don't flip a cancelled/deleted run to a terminal
state.
| getLog().error( | ||
| { workflowRunId: workflowRun.id, issues: evidenceResult.issues }, | ||
| 'evidence_validation_failed' | ||
| ); |
There was a problem hiding this comment.
Logger event names should follow the {domain}.{action}_{state} convention.
evidence_validation_failed and evidence_validation_metadata_persist_failed are missing a domain prefix and are inconsistent with sibling events in this file like dag.stop_detected_during_streaming, dag.node_budget_cap_exceeded, and workflow.event_persist_failed. Suggest renaming to e.g. dag.evidence_validation_failed and dag.evidence_validation_metadata_persist_failed (and pairing the success path with a matching dag.evidence_validation_succeeded info log on line ~3194 so _failed has its _completed/_succeeded partner).
As per coding guidelines: "Use Pino logger factory with event naming format {domain}.{action}_{state} (e.g., workflow.step_started, isolation.create_failed); always pair _started with _completed or _failed".
Also applies to: 3166-3170, 3202-3206
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/workflows/src/dag-executor.ts` around lines 3150 - 3153, Rename the
bare logger event names to include the domain prefix and add matching
success/completed partners: change occurrences of 'evidence_validation_failed'
and 'evidence_validation_metadata_persist_failed' (the getLog().error calls that
include { workflowRunId: workflowRun.id, issues: evidenceResult.issues } and the
metadata persist error call) to 'dag.evidence_validation_failed' and
'dag.evidence_validation_metadata_persist_failed', and add corresponding
success/info logs 'dag.evidence_validation_succeeded' (in the success path near
the evidence validation completion) and
'dag.evidence_validation_metadata_persist_succeeded' in the metadata persist
success path; apply the same renaming for the other occurrences noted (the
blocks around the other getLog() calls), keeping getLog().error for failures and
getLog().info for succeeded/completed events.
| await deps.store | ||
| .updateWorkflowRun(workflowRun.id, { | ||
| metadata: { | ||
| ...workflowRun.metadata, | ||
| evidence_validation: { ok: false, issues: evidenceResult.issues }, | ||
| }, | ||
| }) | ||
| .catch((dbErr: Error) => { | ||
| getLog().error( | ||
| { err: dbErr, workflowRunId: workflowRun.id }, | ||
| 'evidence_validation_metadata_persist_failed' | ||
| ); | ||
| }); |
There was a problem hiding this comment.
Stale workflowRun.metadata snapshot can clobber concurrent metadata writes.
Both updateWorkflowRun calls do metadata: { ...workflowRun.metadata, evidence_validation: ... }, but workflowRun.metadata is the in-memory snapshot captured at executor start. Anything that wrote to metadata during the run (heartbeats, approval/loop gate state cleanup, future writers) is silently overwritten when this spread is persisted. Today the gate only runs on the all-success / not-paused path so this is mostly latent, but it is a fragile contract — adding any future per-node metadata writer will start losing fields here.
Safer options:
- Read the freshest
WorkflowRun(or just itsmetadata) fromdeps.storeimmediately before this update, or - Have
deps.storeexpose a partial-merge update (e.g.,setWorkflowRunMetadata(id, { evidence_validation: ... })) so callers do not have to round-trip the whole object.
Also applies to: 3194-3207
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/workflows/src/dag-executor.ts` around lines 3159 - 3171, The current
updateWorkflowRun call uses the stale in-memory workflowRun.metadata snapshot
and may clobber concurrent metadata writes; before persisting
evidence_validation, either fetch the latest metadata from deps.store (e.g.,
readWorkflowRun or readWorkflowRunMetadata) and merge evidence_validation into
that fresh object, or add/use a partial-merge API on deps.store (e.g.,
setWorkflowRunMetadata(workflowRun.id, { evidence_validation: ... })) so you
only write the single field; update both occurrences (the block around
updateWorkflowRun at lines referencing updateWorkflowRun and the similar block
later) to use one of these approaches to avoid overwriting concurrent metadata
changes.
🔍 Comprehensive PR ReviewPR: #1601 — feat(workflows): add execution-evidence contract with terminal-success gate SummaryThree-layer evidence validator and terminal-success gate are well-designed and well-tested (49 new tests; "never throws" contract honored across all entry points; mock isolation correctly registered). Five reviewers converged on 0 CRITICAL issues and 3 HIGH items that warrant attention before this PR merges or — at the latest — before any follow-up PR flips Verdict:
🟠 High IssuesHIGH-1 — Reality layer doesn't bind
|
| # | Issue | Location | Recommendation |
|---|---|---|---|
| L1 | evidence_policy.path accepts '', '.', dir-like strings |
schemas/evidence.ts:111 |
.trim().min(1).refine(...) — auto-fixable |
| L2 | level: 'warning' enum has no producer (YAGNI) |
schemas/evidence.ts:121 |
Drop until producer exists |
| L3 | EvidenceValidationIssue overlaps ValidationIssue (Rule of Three not yet hit) |
schemas/evidence.ts:120 |
Leave as-is |
| L4 | Race window after slow verify: 'reality' I/O — no skipIfStatusChanged re-check |
dag-executor.ts:3135-3208 |
Required before bundled verify: reality |
| L5 | e as Error & { stderr?: string } cast fragile to non-Error rejections |
evidence-validator.ts:152, 179, 221 |
instanceof Error narrowing |
| L6 | gh non-ENOENT (auth-failure) branch untested |
evidence-validator.ts:230-237 |
One test |
| L7 | Empty test_commands: [] doc-only-PR contract not asserted |
schemas/evidence.ts:57 |
One positive shape test |
| L8 | DAG-executor .catch swallow paths untested |
dag-executor.ts:3158, 3168, 3171, 3197 |
One test |
| L9 | verify: 'reality' integration not e2e-tested through dag-executor |
dag-executor.test.ts |
Inferred only |
| L10 | evidence_policy.required: false no-op path untested |
dag-executor.ts:3140 |
One test |
| L11 | Type-narrowing comment after early return is redundant | evidence-validator.ts:91 |
Remove |
| L12 | required JSDoc says "Defaults to false" but schema is optional(), no .default() |
schemas/evidence.ts:91-95 |
Tighten wording |
✅ What's Good
- Three-layer separation (shape → integrity → reality) is genuinely good design
- "Never throws" contract uniformly enforced across all 5 entry points
- Discriminated unions used correctly (
kind,found,valid) EXECUTION_EVIDENCE_REQUIRED_FIELDSderived from schema shape — cannot drift- Path-traversal defense covers absolute and
..vectors before any I/O - Mock isolation correctly registered as own
bun testinvocation - Regression guard test for the no-policy path
- Plan deviations pre-flagged in implementation artifacts — no surprises
ghENOENT vs auth-failure distinguished with operator-actionable hints- Comments are sparse and earn their place (almost all explain WHY)
- Layered test architecture mirrors layered code (Group A/B/C/D/E ↔ entry points)
- Schema conventions followed (camelCase,
@hono/zod-openapi,z.infer,schemas/) - ROADMAP.md P3 entry well-scoped with explicit acceptance criteria
📋 Suggested Follow-up Issues
| Issue Title | Priority |
|---|---|
Bind commit_sha to PR head and branch tip in verify: reality |
P1 — required before bundled flag flip |
Document evidence_policy in user-facing authoring guide |
P1 |
Add mergeWorkflowRunMetadata to IWorkflowStore for transactional merges |
P2 |
| Tighten log-event naming convention across evidence validator + gate | P2 |
| Test coverage gaps (ls-remote, EISDIR, gh auth, DB swallow, required:false) | P3 |
Operator troubleshooting note for metadata.evidence_validation |
P3 (defer to bundled-flag PR) |
| End-to-end reality-mode integration test through dag-executor | P3 |
Next Steps
- ⚡ Auto-fix step will address 5 mechanical issues (MED-1, MED-3, MED-4, LOW-1, LOW-2)
- 📝 Review HIGH items above — HIGH-1 needs new test cases; HIGH-2/3 are scope decisions (this PR vs follow-up)
- 🚧 Before any bundled workflow flips
evidence_policy.required: true, HIGH-1 and LOW-4 (race-window re-check) MUST land
Reviewed by Archon comprehensive-pr-review workflow
Artifacts: ~/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/review/
Fixes applied:
- HIGH-1: Bind commit_sha to PR head and branch tip in verify: 'reality'
- ls-remote stdout SHA must equal evidence.commit_sha (was discarded)
- gh pr view now queries headRefOid; assert it equals evidence.commit_sha
- Closes the gap where any locally-reachable SHA could pair with any real PR
- HIGH-2: Add evidence_policy bullet to CLAUDE.md "Workflows" list
- HIGH-3: Add evidence_policy: to authoring-workflows.md YAML example
and a new "## Evidence-Gated Workflows" section with schema, verify modes,
worked example, and failure shape
- MED-1: Rename log events to {domain}.{action}_{state} convention
(workflow.evidence_validation_started/_failed/_completed,
workflow.evidence_metadata_persist_failed)
- MED-3: gh non-JSON output now includes parser err.message and a 200-char
stdout preview hint (matches validator.ts:387 precedent)
- MED-4: Drop .catch on success-path metadata write — DB errors now
propagate to the outer executor catch instead of silently completing a
run without provable validation
- LOW-1: Tighten evidence_policy.path with .trim().min(1).refine(...)
to reject '', '.', and trailing-slash dir-like values at parse time
- LOW-2: Drop 'warning' from level enum until a producer exists (YAGNI)
Tests added (evidence-validator.test.ts Group C):
- ls-remote tip SHA mismatches commit_sha → commit_sha error
- gh headRefOid mismatches commit_sha → commit_sha error
Existing reality tests updated to include headRefOid in gh fixtures.
Validation: type-check, lint, format, tests all pass (bun run validate).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
⚡ Auto-Fix ReportStatus: COMPLETE Fixes Applied
What Was Fixed
Tests Added
Test count for ❌ Not Fixed (Deferred per Review Recommendation)
📋 Suggested Follow-up Issues
Validation✅
Auto-fixed by Archon comprehensive-pr-review workflow |
🎯 Workflow SummaryPlan: ROADMAP P3-A — Real-Execution Proof Validation for PR-Producing Workflows Implementation vs Plan
📋 Deviations from Plan (2)Deviation 1 — Deviation 2 — Review Summary5 reviewer agents ran (code-review, error-handling, test-coverage, comment-quality, docs-impact). 24 raw findings → 21 distinct after de-dup.
HIGH fixes landed in commit
Auto-fixed MEDIUM/LOW: log-event naming convention ( 📋 Suggested Follow-Up Issues
Reply with: 📝 Documentation UpdatesAlready shipped in Operator-facing troubleshooting note for ℹ️ Deferred Items (NOT Building — intentionally excluded)These were excluded from this PR's scope by design. Each is its own follow-up surface; do not flag as bugs:
Validation Evidence
Artifacts: |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
packages/workflows/src/dag-executor.ts (2)
3159-3165:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAvoid merging evidence metadata into the stale
workflowRun.metadatasnapshot.Both writes spread the executor-start copy of
workflowRun.metadata, so any metadata written during the run can be lost whenevidence_validationis persisted. Re-read the latest metadata or use a partial/transactional merge API here.Also applies to: 3199-3203
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.ts` around lines 3159 - 3165, The update is merging into the stale executor-start snapshot by spreading workflowRun.metadata when calling deps.store.updateWorkflowRun to set metadata.evidence_validation; instead re-fetch the latest metadata (or use your store's partial/transactional merge API) and merge evidence_validation into that latest object before calling deps.store.updateWorkflowRun so concurrent metadata writes aren't lost; apply the same change for the other occurrence that updates metadata at the block around lines referencing deps.store.updateWorkflowRun and evidence_validation (where evidenceResult.issues is used).
3144-3205:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRe-check run status after
validateEvidence()returns.
verify: 'reality'can spend seconds ingit/gh. If the run is cancelled or deleted during that window, this block can still flip it tofailedorcompletedbecause the lastskipIfStatusChanged()ran before validation started. Please guard again before either terminal write.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.ts` around lines 3144 - 3205, After validateEvidence() returns, re-check the run status before doing any terminal writes: call the existing skipIfStatusChanged(workflowRun.id) (or if unavailable, re-fetch the run via deps.store.getWorkflowRun(workflowRun.id) and ensure it is still running) and abort/return if the run was cancelled/deleted/changed; only then proceed to call deps.store.updateWorkflowRun(... evidence_validation ...) or deps.store.failWorkflowRun(...) and to emit events/safeSendMessage/logWorkflowError. Ensure this check is applied both on the failure path (before persisting failure and sending notifications) and on the success path (before persisting the success marker).
🧹 Nitpick comments (1)
packages/workflows/src/evidence-validator.test.ts (1)
567-632: ⚡ Quick winPlease cover the non-ENOENT loader branches here.
validateEvidence()has dedicated handling for read failures likeEISDIR/EACCES, but this suite only exercises missing files and malformed JSON. A couple of focused regression tests for those branches would protect theevidence_policy.pathdiagnostics on a changed critical path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/evidence-validator.test.ts` around lines 567 - 632, Add two tests to evidence-validator.test.ts that exercise the non-ENOENT read-failure branches in validateEvidence: one that creates a directory at the policy path (e.g., mkdir join(testDir, 'evidence.json')) and calls validateEvidence(...) with policy.path set to that path to trigger an EISDIR read error, and another that simulates an EACCES read error (either by creating a file and chmod 0 on it or by stubbing fs.readFile to throw an { code: 'EACCES' } error) and calling validateEvidence(...) with policy.path pointing to that file; assert result.valid is false and that result.issues includes an issue with field 'evidence_policy.path' for each case. Ensure you use the existing helpers/testDir setup and reference validateEvidence and loadEvidenceFromArtifacts in the new tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/docs-web/src/content/docs/guides/authoring-workflows.md`:
- Around line 796-815: The example under id "write-evidence" uses hardcoded
origin/dev and undocumented vars (${SESSION_ID}, ${PR_URL}, ${PR_NUMBER});
change it to use the documented/base branch variable and the outputs from the
open-pr step/job: replace origin/dev with $BASE_BRANCH (or
${BASE_BRANCH:-origin/main}) and replace ${SESSION_ID}, ${PR_URL}, ${PR_NUMBER}
with the explicit open-pr outputs you document and reference (e.g. the job/step
outputs such as $OPEN_PR_SESSION_ID, $OPEN_PR_PR_URL, $OPEN_PR_PR_NUMBER or the
appropriate GitHub Actions expression form like ${{
jobs.open-pr.outputs.session_id }}), and update the surrounding text to document
those variables so the snippet is copy-pastable and portable.
In `@packages/workflows/src/evidence-validator.ts`:
- Around line 315-322: The schema requires workflow_run_id to match the active
run but the ValidateEvidenceArgs and validateEvidence() never receive or check
it; update the ValidateEvidenceArgs interface to include a workflowRunId (or
workflow_run_id) field and then, at the start of validateEvidence(), compare the
incoming evidence.workflow_run_id to the provided
ValidateEvidenceArgs.workflowRunId (or workflow_run_id) and throw/reject on
mismatch before proceeding (use the existing error handling pattern); reference
ValidateEvidenceArgs and validateEvidence() when making these changes and ensure
the error message clearly states the workflow run id mismatch.
---
Duplicate comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3159-3165: The update is merging into the stale executor-start
snapshot by spreading workflowRun.metadata when calling
deps.store.updateWorkflowRun to set metadata.evidence_validation; instead
re-fetch the latest metadata (or use your store's partial/transactional merge
API) and merge evidence_validation into that latest object before calling
deps.store.updateWorkflowRun so concurrent metadata writes aren't lost; apply
the same change for the other occurrence that updates metadata at the block
around lines referencing deps.store.updateWorkflowRun and evidence_validation
(where evidenceResult.issues is used).
- Around line 3144-3205: After validateEvidence() returns, re-check the run
status before doing any terminal writes: call the existing
skipIfStatusChanged(workflowRun.id) (or if unavailable, re-fetch the run via
deps.store.getWorkflowRun(workflowRun.id) and ensure it is still running) and
abort/return if the run was cancelled/deleted/changed; only then proceed to call
deps.store.updateWorkflowRun(... evidence_validation ...) or
deps.store.failWorkflowRun(...) and to emit
events/safeSendMessage/logWorkflowError. Ensure this check is applied both on
the failure path (before persisting failure and sending notifications) and on
the success path (before persisting the success marker).
---
Nitpick comments:
In `@packages/workflows/src/evidence-validator.test.ts`:
- Around line 567-632: Add two tests to evidence-validator.test.ts that exercise
the non-ENOENT read-failure branches in validateEvidence: one that creates a
directory at the policy path (e.g., mkdir join(testDir, 'evidence.json')) and
calls validateEvidence(...) with policy.path set to that path to trigger an
EISDIR read error, and another that simulates an EACCES read error (either by
creating a file and chmod 0 on it or by stubbing fs.readFile to throw an { code:
'EACCES' } error) and calling validateEvidence(...) with policy.path pointing to
that file; assert result.valid is false and that result.issues includes an issue
with field 'evidence_policy.path' for each case. Ensure you use the existing
helpers/testDir setup and reference validateEvidence and
loadEvidenceFromArtifacts in the new tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6b1fdfae-ad23-415c-906e-999063d6e8bb
📒 Files selected for processing (6)
CLAUDE.mdpackages/docs-web/src/content/docs/guides/authoring-workflows.mdpackages/workflows/src/dag-executor.tspackages/workflows/src/evidence-validator.test.tspackages/workflows/src/evidence-validator.tspackages/workflows/src/schemas/evidence.ts
✅ Files skipped from review due to trivial changes (1)
- CLAUDE.md
| - id: write-evidence | ||
| bash: | | ||
| cat > "$ARTIFACTS_DIR/evidence.json" <<EOF | ||
| { | ||
| "kind": "execution", | ||
| "workflow_run_id": "$WORKFLOW_ID", | ||
| "provider": "claude", | ||
| "provider_run_ids": ["${SESSION_ID}"], | ||
| "changed_files": $(git diff --name-only origin/dev...HEAD | jq -R -s -c 'split("\n") | map(select(. != ""))'), | ||
| "diff_command": "git diff --stat origin/dev...HEAD", | ||
| "test_commands": ["bun test"], | ||
| "test_output_summary": "all tests passed", | ||
| "commit_sha": "$(git rev-parse HEAD)", | ||
| "pushed_branch": "$(git rev-parse --abbrev-ref HEAD)", | ||
| "pr_url": "${PR_URL}", | ||
| "pr_number": ${PR_NUMBER} | ||
| } | ||
| EOF | ||
| depends_on: [open-pr] | ||
| ``` |
There was a problem hiding this comment.
Make this example use documented variables and a portable base branch.
Right now the snippet hardcodes origin/dev and relies on ${SESSION_ID}, ${PR_URL}, and ${PR_NUMBER} without introducing them anywhere in this guide. As written, it's easy to copy-paste into a workflow that cannot actually produce a valid evidence.json. Please switch the example to documented values like $BASE_BRANCH / $nodeId.output..., or document those variables alongside the snippet.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/docs-web/src/content/docs/guides/authoring-workflows.md` around
lines 796 - 815, The example under id "write-evidence" uses hardcoded origin/dev
and undocumented vars (${SESSION_ID}, ${PR_URL}, ${PR_NUMBER}); change it to use
the documented/base branch variable and the outputs from the open-pr step/job:
replace origin/dev with $BASE_BRANCH (or ${BASE_BRANCH:-origin/main}) and
replace ${SESSION_ID}, ${PR_URL}, ${PR_NUMBER} with the explicit open-pr outputs
you document and reference (e.g. the job/step outputs such as
$OPEN_PR_SESSION_ID, $OPEN_PR_PR_URL, $OPEN_PR_PR_NUMBER or the appropriate
GitHub Actions expression form like ${{ jobs.open-pr.outputs.session_id }}), and
update the surrounding text to document those variables so the snippet is
copy-pastable and portable.
|
Manual dogfood follow-up: pushed f6f71a8 to bind execution evidence to the active workflow run. validateEvidence now requires workflowRunId, rejects stale/copied evidence when evidence.workflow_run_id differs from the active run, and dag-executor passes workflowRun.id into the gate. Added validator and DAG executor regression tests. Verification after this patch: bun test packages/workflows/src/evidence-validator.test.ts, bun test packages/workflows/src/dag-executor.test.ts, bun --filter @archon/workflows test, and bun run validate all passed. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
packages/workflows/src/dag-executor.ts (2)
3160-3165:⚠️ Potential issue | 🟡 Minor | ⚖️ Poor tradeoffStale
workflowRun.metadatasnapshot can silently drop concurrent writes.Both metadata updates (lines 3162–3165 and 3201–3204) spread
workflowRun.metadata, which is the in-memory snapshot captured at executor startup. Any metadata written during the run (e.g., approval/loop gate state) is silently overwritten when this spread is persisted.Prefer fetching the freshest metadata from
deps.storeimmediately before writing, or exposing a partial-merge API (e.g.,setWorkflowRunMetadata(id, { evidence_validation: ... })) so the spread is unnecessary.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.ts` around lines 3160 - 3165, The update currently spreads the stale in-memory snapshot workflowRun.metadata into deps.store.updateWorkflowRun (when setting evidence_validation), which can overwrite concurrent metadata changes; modify the write to first fetch the latest metadata from deps.store (e.g., deps.store.getWorkflowRun/workflowRunById) and merge evidence_validation into that fresh object before calling deps.store.updateWorkflowRun, or change to a partial-merge API such as deps.store.setWorkflowRunMetadata(workflowRun.id, { evidence_validation: ... }) so you avoid spreading workflowRun.metadata and prevent lost concurrent writes.
3136-3193:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTOCTOU: status is not re-checked after
validateEvidencereturns.The guard at line 3136 (
skipIfStatusChanged) runs beforevalidateEvidence. Whenpolicy.verify === 'reality', that call can take several seconds (git/gh subprocesses). If the run is cancelled or deleted during that window, the failure branch at line 3173 will still callfailWorkflowRun— overwriting acancelled/deletedrow withfailed. The same window exists on the success path at line 3200.The existing
skipIfStatusChangedhelper is already available; calling it again immediately aftervalidateEvidencereturns (on both branches) closes the window:const evidenceResult = await validateEvidence({ ... }); + if (await skipIfStatusChanged('dag.skip_complete_status_changed')) return; if (!evidenceResult.valid) { ... } // success path + if (await skipIfStatusChanged('dag.skip_complete_status_changed')) return; await deps.store.updateWorkflowRun(...);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.ts` around lines 3136 - 3193, The time-of-check/time-of-use race occurs because skipIfStatusChanged('dag.skip_complete_status_changed') runs before validateEvidence() but not after it; add a second check immediately after validateEvidence() returns and before taking either branch (i.e., before persisting evidence_validation via deps.store.updateWorkflowRun, before deps.store.failWorkflowRun/logWorkflowError/emitterForFail.emit/unregisterRun/safeSendMessage on the failure path, and likewise before the success path that marks the run completed). Concretely, call the existing skipIfStatusChanged('dag.skip_complete_status_changed') right after the validateEvidence(...) call and return early if it signals status changed to prevent overwriting cancelled/deleted runs.
🧹 Nitpick comments (3)
packages/workflows/src/dag-executor.test.ts (1)
7032-7091: ⚡ Quick winAdd a regression test for success-path metadata write failure propagation.
This suite validates the happy path, but it doesn’t pin the “do not silently complete if evidence metadata persistence fails” behavior. A focused test here would guard the reliability fix from regressions.
Suggested test addition
+ it('required evidence: metadata write failure on success path propagates (no silent completion)', async () => { + await writeFile( + join(artifactsDir, 'evidence.json'), + JSON.stringify({ + kind: 'execution', + workflow_run_id: 'dag-evidence-run-db-fail', + provider: 'claude', + provider_run_ids: ['session-abc'], + changed_files: ['src/foo.ts'], + diff_command: 'git diff --stat origin/dev...HEAD', + test_commands: ['bun test'], + test_output_summary: '15 passed', + commit_sha: 'a'.repeat(40), + pushed_branch: 'feature/foo', + pr_url: 'https://github.com/owner/repo/pull/42', + pr_number: 42, + }) + ); + + const mockStore = createMockStore(); + (mockStore.updateWorkflowRun as ReturnType<typeof mock>).mockRejectedValueOnce( + new Error('db write failed') + ); + const mockDeps = createMockDeps(mockStore); + const platform = createMockPlatform(); + const workflowRun = makeWorkflowRun('dag-evidence-run-db-fail'); + + await expect( + executeDagWorkflow( + mockDeps, + platform, + 'conv-evidence', + testDir, + { + name: 'evidence-required-valid-db-fail', + nodes: [{ id: 'work', bash: 'echo done' } as BashNode], + evidence_policy: { required: true, verify: 'shape', path: 'evidence.json' }, + }, + workflowRun, + 'claude', + undefined, + artifactsDir, + join(testDir, 'logs'), + 'main', + 'docs/', + minimalConfig + ) + ).rejects.toThrow('db write failed'); + + expect((mockStore.completeWorkflowRun as ReturnType<typeof mock>).mock.calls.length).toBe(0); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/dag-executor.test.ts` around lines 7032 - 7091, Add a regression test that simulates updateWorkflowRun failing when writing evidence metadata so the happy-path does not silently complete; in the existing test case around executeDagWorkflow for evidence_policy (the 'evidence-required-valid' scenario) mock mockStore.updateWorkflowRun to throw/reject when called for metadata updates, then call executeDagWorkflow and assert that mockStore.completeWorkflowRun was NOT called and mockStore.failWorkflowRun WAS called (or that the error propagated), and also assert the updateWorkflowRun mock was invoked for evidence metadata. Target the same symbols used in the file: executeDagWorkflow, mockStore.updateWorkflowRun, mockStore.completeWorkflowRun, mockStore.failWorkflowRun, and the evidence_policy/path handling so the test specifically verifies metadata persistence failure prevents completion.packages/workflows/src/evidence-validator.test.ts (2)
432-454: ⚡ Quick winAdd an explicit
git ls-remoterejection test branch.You cover empty output and mismatch, but not the thrown
ls-remotepath. That’s a distinct failure mode and worth pinning directly.Suggested test addition
+ it('negative: git ls-remote rejects -> pushed_branch issue', async () => { + execFileImpl = (cmd, args) => { + if (cmd === 'git' && args[0] === 'cat-file') { + return Promise.resolve({ stdout: '', stderr: '' }); + } + if (cmd === 'git' && args[0] === 'ls-remote') { + return Promise.reject(Object.assign(new Error('ls-remote failed'), { stderr: 'fatal: network error' })); + } + if (cmd === 'gh') { + return Promise.resolve({ + stdout: JSON.stringify({ + headRefName: 'feature/foo', + number: 42, + headRefOid: 'a'.repeat(40), + }), + stderr: '', + }); + } + return Promise.resolve({ stdout: '', stderr: '' }); + }; + + const issues = await verifyEvidenceReality(REAL_EVIDENCE, '/repo'); + expect(issues.some(i => i.field === 'pushed_branch')).toBe(true); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/evidence-validator.test.ts` around lines 432 - 454, Add a new test case in packages/workflows/src/evidence-validator.test.ts that specifically simulates a rejected git ls-remote call: in the new "it" (e.g., 'negative: git ls-remote rejects → pushed_branch error') set execFileImpl so it returns Promise.reject(new Error('ls-remote failed')) when cmd === 'git' && args[0] === 'ls-remote' (keep existing handling for 'git cat-file' and 'gh' as in other tests), then call verifyEvidenceReality(REAL_EVIDENCE, '/repo') and assert that the returned issues include an entry with field === 'pushed_branch'; reference the verifyEvidenceReality function, execFileImpl mock, and REAL_EVIDENCE to locate where to add this test.
567-634: ⚡ Quick winAdd loader tests for non-ENOENT filesystem errors (e.g., EISDIR/EACCES).
loadEvidenceFromArtifactscurrently checks missing/valid/malformed, but not other read failures. Pinning these keeps the “structured issues, no crash” contract safe.Suggested deterministic branch test (EISDIR)
+ it('directory at evidence path is handled without throw (EISDIR path)', async () => { + await mkdir(join(testDir, 'evidence.json'), { recursive: true }); + const result = await validateEvidence({ + artifactsDir: testDir, + cwd: '/repo', + workflowRunId: 'run-1', + policy: { required: true, verify: 'shape', path: 'evidence.json' }, + }); + expect(result.valid).toBe(false); + if (!result.valid) { + expect(result.issues.some(i => i.field === 'evidence_policy.path')).toBe(true); + } + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/workflows/src/evidence-validator.test.ts` around lines 567 - 634, Add a deterministic test that simulates non-ENOENT fs errors by mocking fs.promises.readFile (for the evidence.json path) to throw an error with code 'EISDIR' (and another case for 'EACCES'), then call loadEvidenceFromArtifacts and assert it does not throw and returns a structured result (i.e., a handled response rather than crashing — e.g., result.found truthy and result.raw is a string or error-info), and also add a companion validateEvidence call (using validateEvidence with a policy) to assert validation returns a non-valid result with an appropriate issue reported; reference the existing test patterns and the functions loadEvidenceFromArtifacts and validateEvidence and the artifact filename 'evidence.json' when adding the new tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/workflows/src/evidence-validator.ts`:
- Around line 151-207: In verifyEvidenceReality, the three execFileAsync calls
(the git cat-file -e call, the git ls-remote --heads origin call, and the gh pr
view call) must include a timeout option to avoid indefinite hangs; update the
options object passed to execFileAsync for the git cat-file invocation to
include a short timeout (e.g. 30_000 ms) and for the networked git ls-remote and
gh pr view calls to include a longer timeout (e.g. 60_000 ms), preserving the
existing cwd property (i.e., change { cwd } to { cwd, timeout: <ms> }) so Node's
child_process.execFile will kill stalled subprocesses.
---
Duplicate comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 3160-3165: The update currently spreads the stale in-memory
snapshot workflowRun.metadata into deps.store.updateWorkflowRun (when setting
evidence_validation), which can overwrite concurrent metadata changes; modify
the write to first fetch the latest metadata from deps.store (e.g.,
deps.store.getWorkflowRun/workflowRunById) and merge evidence_validation into
that fresh object before calling deps.store.updateWorkflowRun, or change to a
partial-merge API such as deps.store.setWorkflowRunMetadata(workflowRun.id, {
evidence_validation: ... }) so you avoid spreading workflowRun.metadata and
prevent lost concurrent writes.
- Around line 3136-3193: The time-of-check/time-of-use race occurs because
skipIfStatusChanged('dag.skip_complete_status_changed') runs before
validateEvidence() but not after it; add a second check immediately after
validateEvidence() returns and before taking either branch (i.e., before
persisting evidence_validation via deps.store.updateWorkflowRun, before
deps.store.failWorkflowRun/logWorkflowError/emitterForFail.emit/unregisterRun/safeSendMessage
on the failure path, and likewise before the success path that marks the run
completed). Concretely, call the existing
skipIfStatusChanged('dag.skip_complete_status_changed') right after the
validateEvidence(...) call and return early if it signals status changed to
prevent overwriting cancelled/deleted runs.
---
Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 7032-7091: Add a regression test that simulates updateWorkflowRun
failing when writing evidence metadata so the happy-path does not silently
complete; in the existing test case around executeDagWorkflow for
evidence_policy (the 'evidence-required-valid' scenario) mock
mockStore.updateWorkflowRun to throw/reject when called for metadata updates,
then call executeDagWorkflow and assert that mockStore.completeWorkflowRun was
NOT called and mockStore.failWorkflowRun WAS called (or that the error
propagated), and also assert the updateWorkflowRun mock was invoked for evidence
metadata. Target the same symbols used in the file: executeDagWorkflow,
mockStore.updateWorkflowRun, mockStore.completeWorkflowRun,
mockStore.failWorkflowRun, and the evidence_policy/path handling so the test
specifically verifies metadata persistence failure prevents completion.
In `@packages/workflows/src/evidence-validator.test.ts`:
- Around line 432-454: Add a new test case in
packages/workflows/src/evidence-validator.test.ts that specifically simulates a
rejected git ls-remote call: in the new "it" (e.g., 'negative: git ls-remote
rejects → pushed_branch error') set execFileImpl so it returns
Promise.reject(new Error('ls-remote failed')) when cmd === 'git' && args[0] ===
'ls-remote' (keep existing handling for 'git cat-file' and 'gh' as in other
tests), then call verifyEvidenceReality(REAL_EVIDENCE, '/repo') and assert that
the returned issues include an entry with field === 'pushed_branch'; reference
the verifyEvidenceReality function, execFileImpl mock, and REAL_EVIDENCE to
locate where to add this test.
- Around line 567-634: Add a deterministic test that simulates non-ENOENT fs
errors by mocking fs.promises.readFile (for the evidence.json path) to throw an
error with code 'EISDIR' (and another case for 'EACCES'), then call
loadEvidenceFromArtifacts and assert it does not throw and returns a structured
result (i.e., a handled response rather than crashing — e.g., result.found
truthy and result.raw is a string or error-info), and also add a companion
validateEvidence call (using validateEvidence with a policy) to assert
validation returns a non-valid result with an appropriate issue reported;
reference the existing test patterns and the functions loadEvidenceFromArtifacts
and validateEvidence and the artifact filename 'evidence.json' when adding the
new tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79600dcf-d763-4d63-a960-886426bc50c8
📒 Files selected for processing (4)
packages/workflows/src/dag-executor.test.tspackages/workflows/src/dag-executor.tspackages/workflows/src/evidence-validator.test.tspackages/workflows/src/evidence-validator.ts
| await execFileAsync('git', ['cat-file', '-e', `${evidence.commit_sha}^{commit}`], { cwd }); | ||
| } catch (err) { | ||
| const e = err as Error & { stderr?: string }; | ||
| issues.push({ | ||
| level: 'error', | ||
| field: 'commit_sha', | ||
| message: `git cat-file -e ${evidence.commit_sha}^{commit} failed: ${ | ||
| e.stderr?.trim() || e.message | ||
| }`, | ||
| hint: 'commit_sha is not reachable in this checkout — fake SHA or wrong cwd', | ||
| }); | ||
| } | ||
|
|
||
| // 2. pushed_branch present on origin AND its tip SHA matches commit_sha | ||
| try { | ||
| const { stdout } = await execFileAsync( | ||
| 'git', | ||
| ['ls-remote', '--heads', 'origin', evidence.pushed_branch], | ||
| { cwd } | ||
| ); | ||
| const trimmed = stdout.trim(); | ||
| if (trimmed.length === 0) { | ||
| issues.push({ | ||
| level: 'error', | ||
| field: 'pushed_branch', | ||
| message: `git ls-remote --heads origin ${evidence.pushed_branch} returned empty; branch is not on origin`, | ||
| hint: 'Did the workflow run `git push -u origin <branch>` after committing?', | ||
| }); | ||
| } else { | ||
| // Bind commit_sha to the branch tip on origin. Without this check, any | ||
| // locally-reachable SHA can be paired with any real branch and pass. | ||
| const remoteSha = trimmed.split(/\s+/)[0]; | ||
| if (remoteSha !== evidence.commit_sha) { | ||
| issues.push({ | ||
| level: 'error', | ||
| field: 'commit_sha', | ||
| message: `commit_sha=${evidence.commit_sha} does not match origin/${evidence.pushed_branch} tip=${remoteSha}`, | ||
| hint: 'evidence.json claims a commit that is not the tip of pushed_branch on origin', | ||
| }); | ||
| } | ||
| } | ||
| } catch (err) { | ||
| const e = err as Error & { stderr?: string }; | ||
| issues.push({ | ||
| level: 'error', | ||
| field: 'pushed_branch', | ||
| message: `git ls-remote failed: ${e.stderr?.trim() || e.message}`, | ||
| }); | ||
| } | ||
|
|
||
| // 3. gh pr view confirms PR url maps to pushed_branch, pr_number, and commit_sha | ||
| try { | ||
| const { stdout } = await execFileAsync( | ||
| 'gh', | ||
| ['pr', 'view', evidence.pr_url, '--json', 'headRefName,number,headRefOid'], | ||
| { cwd } | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check if `@archon/git`'s execFileAsync sets a default timeout.
# Expected: if a default timeout exists, the concern is mitigated.
ast-grep --pattern 'function execFileAsync($$$) { $$$ }'Repository: coleam00/Archon
Length of output: 41
🏁 Script executed:
#!/bin/bash
# Find the `@archon/git` package and locate execFileAsync definition
fd -t f -e ts -e tsx 'git' | grep -E '(git|src)' | head -20Repository: coleam00/Archon
Length of output: 177
🏁 Script executed:
#!/bin/bash
# Search for execFileAsync definition more broadly
rg 'export.*execFileAsync|function execFileAsync' -t ts -t tsx -A 5Repository: coleam00/Archon
Length of output: 87
🏁 Script executed:
#!/bin/bash
# Look for `@archon/git` package structure
find . -type d -name git -o -name '@archon' | head -10Repository: coleam00/Archon
Length of output: 74
🏁 Script executed:
#!/bin/bash
# Search for execFileAsync in `@archon/git` package
rg 'execFileAsync' packages/git/src/ -t ts -A 10 -B 2Repository: coleam00/Archon
Length of output: 40547
🏁 Script executed:
#!/bin/bash
# List files in `@archon/git` package
ls -la packages/git/src/Repository: coleam00/Archon
Length of output: 592
🏁 Script executed:
#!/bin/bash
# Read the full exec.ts file to see if there are any default timeouts
cat packages/git/src/exec.tsRepository: coleam00/Archon
Length of output: 871
🏁 Script executed:
#!/bin/bash
# Verify the specific lines in evidence-validator.ts to confirm they lack timeout
sed -n '151p;166,170p;203,207p' packages/workflows/src/evidence-validator.tsRepository: coleam00/Archon
Length of output: 458
Add timeout to verifyEvidenceReality subprocess calls to prevent indefinite hangs.
All three execFileAsync calls lack a timeout option:
| Line | Call | Risk |
|---|---|---|
| 151 | git cat-file -e |
Local; credential helper/lock contention |
| 166–170 | git ls-remote origin |
Network; unreachable → ~2 min TCP timeout |
| 203–207 | gh pr view |
Network; GitHub API flaky → ~2 min TCP timeout |
Without timeout, the default is 0 (infinite). The execFileAsync wrapper in @archon/git does not apply a built-in default—it passes options directly to Node.js's child_process.execFile. Network stalls on git ls-remote or gh pr view will hang indefinitely, permanently blocking the DAG executor's completion path (line 3144 of dag-executor.ts), preventing both failWorkflowRun and completeWorkflowRun.
Other subprocess calls in this codebase (executeBashNode, executeScriptNode) and throughout the @archon/git module all explicitly set timeout (10–120 seconds depending on operation). Apply the same pattern here.
Proposed fix
+/** Max wall-clock time for each individual reality-check subprocess. */
+const REALITY_CHECK_TIMEOUT_MS = 30_000;
+
// 1. commit_sha reachable in worktree
try {
await execFileAsync('git', ['cat-file', '-e', `${evidence.commit_sha}^{commit}`], {
- cwd,
+ cwd,
+ timeout: REALITY_CHECK_TIMEOUT_MS,
});
} catch (err) {
...
}
// 2. pushed_branch present on origin AND its tip SHA matches commit_sha
try {
const { stdout } = await execFileAsync(
'git',
['ls-remote', '--heads', 'origin', evidence.pushed_branch],
- { cwd }
+ { cwd, timeout: REALITY_CHECK_TIMEOUT_MS }
);
...
}
// 3. gh pr view ...
try {
const { stdout } = await execFileAsync(
'gh',
['pr', 'view', evidence.pr_url, '--json', 'headRefName,number,headRefOid'],
- { cwd }
+ { cwd, timeout: REALITY_CHECK_TIMEOUT_MS }
);
...
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/workflows/src/evidence-validator.ts` around lines 151 - 207, In
verifyEvidenceReality, the three execFileAsync calls (the git cat-file -e call,
the git ls-remote --heads origin call, and the gh pr view call) must include a
timeout option to avoid indefinite hangs; update the options object passed to
execFileAsync for the git cat-file invocation to include a short timeout (e.g.
30_000 ms) and for the networked git ls-remote and gh pr view calls to include a
longer timeout (e.g. 60_000 ms), preserving the existing cwd property (i.e.,
change { cwd } to { cwd, timeout: <ms> }) so Node's child_process.execFile will
kill stalled subprocesses.
Review SummaryVerdict: minor-fixes-needed This PR implements a real-execution proof validation system for PR-producing workflows (ROADMAP P3-A). The three-layer validator, evidence schemas, and DAG executor integration are solid — 52 new tests, clean TypeScript, CLAUDE.md-compliant throughout. Two categories of work remain: documentation for the new public surface, and comment hygiene (ROADMAP references and redundant docstrings). Blocking issues
Suggested fixes
Minor / nice-to-have
Compliments
Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact. |
|
Hi @shhaider — thanks for the substantial work here. We want to build in this direction (verifiable PR-evidence is a real maturity gap), but at 2137 LOC the proposed surface is more than we want to commit to before we've validated the pattern with users. Could you slice this into an MVP and a follow-up? MVP scope (target ~200 LOC):
Out of MVP, future PR after we see the gate get used:
This way we land the load-bearing executor gate first, see how workflow authors adopt it, and then expand the contract surface based on what they actually need rather than what we guess they'll need. If the slim version gets used and people start asking "can the contract enforce ?", we add <X> in a follow-up. Happy to chat through the slice if any of the cut surface feels load-bearing to you. |
|
@$shhaider related to #1433 — overlapping area or partial fix. |
|
@$shhaider related to #1720 — overlapping area or partial fix. |
|
@$shhaider related to #1433 — overlapping area or partial fix. |
|
@$shhaider related to #1720 — overlapping area or partial fix. |
Summary
status='completed'based only on a self-report ("I created the PR"), with no enforced contract proving the work actually happened. Planning artifacts, fabricated commit SHAs, unpushed branches, or unreachable PR URLs all pass undetected.executionEvidenceSchema,evidencePolicySchema) and a three-layer validator (evidence-validator.ts) inside@archon/workflows. The DAG executor now deterministically blocks terminalcompletedstatus when the workflow declaresevidence_policy.required: trueand$ARTIFACTS_DIR/evidence.jsonis missing, malformed, or fake. On failure the run is downgraded tofailedwith structured issues persisted atmetadata.evidence_validation.git ls-remote,gh pr view) is opt-in viaverify: 'reality'; default is'shape'.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
dag-executor.tsevidence-validator.ts::validateEvidenceevidence_policy?.required === trueevidence-validator.ts@archon/git(execFileAsync)verify: 'reality'. Stubbed in tests via isolatedmock.module()invocationevidence-validator.tsghCLI (subprocess)schemas/index.tsschemas/evidence.tsexecutionEvidenceSchema,evidencePolicySchema, derived typesschemas/workflow.tsschemas/evidence.tsworkflowBaseSchema.evidence_policy: evidencePolicySchema.optional()dag-executor.tsstore.completeWorkflowRundag-executor.tsstore.failWorkflowRundag-executor.tsstore.updateWorkflowRunmetadata.evidence_validationafterfailWorkflowRunLabel Snapshot
risk: lowsize: Mworkflowsworkflows:executor,workflows:schemasChange Metadata
featureworkflowsLinked Issue
ROADMAP.md)Validation Evidence (required)
All gates green on first invocation (no fixes required during validation):
check:bundled—bundled-defaults.generated.ts is up to date (36 commands, 20 workflows)check:bundled-skill—bundled-skill.ts is up to date (21 files)type-check— all 10 packages exit 0lint—bun x eslint . --cache --max-warnings 0(0 errors, 0 warnings)format:check—All matched files use Prettier code style!test— all per-package tests pass (sprint-relevant: 43 evidence-validator + 6 schema + 3 dag-executor integration = 52 new tests)Test breakdown:
packages/workflows/src/evidence-validator.test.ts(new)packages/workflows/src/schemas.test.ts(extended)executionEvidenceSchemaparse (execution + planning),evidencePolicySchemadefaults,workflowBaseSchemaaccepts policypackages/workflows/src/dag-executor.test.ts(extended)failed+ structured issues; required + valid →completed; no-policy →completed(regression guard)validation.mdartifact (full per-package counts in/Users/syedhaider/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/validation.md)Security Impact (required)
$ARTIFACTS_DIR/evidence.json(workflow-owned) and runs read-onlygit ls-remote/git cat-file -e/gh pr viewwhen reality checks are explicitly opted in.git ls-remote origin <branch>andgh pr view <url>run only whenevidence_policy.verify === 'reality'. Default is'shape'(no network). Tests stubexecFileAsyncvia isolatedmock.module().ghuses the operator's existing auth context; no token reading or logging in this PR.$ARTIFACTS_DIR(already workflow-owned).evidence_policy.pathis rejected if absolute or contains..(path-traversal guard with explicit negative test).try/catch; failures produce structuredEvidenceValidationIssueobjects (never throw).ghENOENT / auth failure produces a clean error issue with operator hint.Compatibility / Migration
evidence_policyis optional onworkflowBaseSchema. Workflows that omit it have byte-identical behavior to before this PR (regression-tested indag-executor.test.ts)..archon/config.yamlkeys.WorkflowRun.metadata(z.record(z.unknown())). Nomigrations/files modified.Human Verification (required)
What was personally validated beyond CI:
bun run validateexits 0 from a clean checkout of this branch (no env tweaking).evidence-validator.test.tswas confirmed to be isolated as its ownbun testinvocation inpackages/workflows/package.jsonto preventmock.module('@archon/git', ...)cross-pollution per CLAUDE.md mock-isolation rule.bun --filter @archon/workflows type-checkand full-monorepobun run type-checkboth green.discoverWorkflowsWithConfigflow continues to load existing bundled workflows withoutevidence_policy(regression test).evidence-validator.test.ts):'0' * 40)'abc1234')main/master/dev/develop)pushed_branchclaimed butgit ls-remoteemptypr_urlclaimed butgh pr viewreturns differentheadRefNameor differentnumberghENOENT (binary missing) → clean error issueevidence_policy.pathabsolute /..traversalevidence.jsonmissing / malformed JSON /null/ string / array / numberkind === 'planning'rejected by integrity layerevidence_policy.required: trueenabled (no bundled YAML opts in yet — explicitly out of scope per plan; follow-up PR).Side Effects / Blast Radius (required)
@archon/workflows(only) — schemas, validator, executor terminal-success transition.@archon/core,@archon/cli,@archon/server,@archon/web,@archon/adapters,@archon/providers,@archon/git,@archon/isolation,@archon/paths(all type-check green; no exported surface touched).evidence_policyare unaffected (regression-guarded).evidence_policy.required: truebut does not writeevidence.json, runs will fail terminally. This is the intended contract — explicitly documented in the schema and surfaced viametadata.evidence_validation.issues.mock.module('@archon/git', ...)could pollute other tests if the new test file weren't isolated. Mitigated by addingbun test src/evidence-validator.test.tsas a separate invocation inpackage.jsonper CLAUDE.md.metadata.evidence_validationis queryable from the API; failures surface as ordinaryfailedruns in run history.workflow.run_failed) fire on validation failure, with structured issue payload.evidence_policyat workflow-load time.Rollback Plan (required)
git revert 55e39055 && git push origin dev— single commit, additive only on the engine side. Reverting restores byte-identical pre-PR behavior because all bundled workflows still omitevidence_policy.evidence_policy.requiredis itself the flag. Setting it tofalse(or omitting the field entirely) restores prior behavior per workflow.failedruns whosemetadata.evidence_validation.issuesis populated — unique signature distinguishing this gate's failures from other failure modes.Risks and Mitigations
git ls-remote,gh pr view) flakes in CI or sandboxed runners.verify: 'shape'is the default. Reality is strictly opt-in. Validator tests stubexecFileAsyncviamock.module()in an isolatedbun testinvocation. ENOENT/auth/exit-non-zero produce structured error issues without throwing.mock.module('@archon/git', ...)in the new test file pollutes other workflow tests in the samebun testprocess.bun test src/evidence-validator.test.tsinvocation inpackages/workflows/package.json. Verified by passing test run.evidence_policy.pathallows path traversal outside$ARTIFACTS_DIR...segments before reading. Negative test in Group D (loader) covers both.updateWorkflowRun(...)but the actual call site iscompleteWorkflowRun(...)/failWorkflowRun(...).plan-confirmation.md. Implementation gates the actualcompleteWorkflowRun(...)call atdag-executor.ts:3134(the single terminal-success transition); failure path uses the existingfailWorkflowRun(id, error)plus a separateupdateWorkflowRun(...)to persistmetadata.evidence_validation. Plan intent — single-point gate, structured issues persisted tometadata.evidence_validation— fully preserved.Plan:
/Users/syedhaider/.archon/workspaces/shhaider/Archon/artifacts/runs/4c517fe4db197edb948e3f3d4adf9927/plan.mdWorkflow ID:
4c517fe4db197edb948e3f3d4adf9927ROADMAP entry: P3-A — Real-Execution Proof Validation for PR-Producing Workflows
Summary by CodeRabbit
New Features
Tests
Documentation
Chores