Workflow exits success when condition_json_parse_failed cascades to silent node skips

## Summary

- What broke: When a workflow node's output can't be parsed for a downstream `when:` condition (e.g. `$node.output.verdict`), the `condition-evaluator` logs `condition_json_parse_failed` and the dependent nodes are marked `Skipped (when_condition)`. The workflow then exits 0 — indistinguishable from a successful run where the gate model legitimately decided to skip those branches.
- When it started (if known): pre-existing; surfaces frequently with Pi/Minimax outputs in `maintainer-review-pr` (the gate node sometimes prepends reasoning prose or wraps JSON in a markdown fence, breaking field extraction).
- Severity: `major`

## Steps to Reproduce

1. Run `bun run cli workflow run maintainer-review-pr "<PR#>"` against a PR with Pi/Minimax as the provider.
2. Wait for the gate node to complete. ~10–20% of the time Pi emits prose like `Let me read the direction.md...` before the JSON, or wraps it in ` ```json `.
3. Workflow "completes successfully" — exit code 0. But no review is posted, no findings recorded, every aspect node shows `Skipped (when_condition)`.

## Expected vs Actual

- **Expected**: A `condition_json_parse_failed` against a `when:` clause should fail the workflow with a clear error pointing at the offending node and the unparseable output, so the user can retry or fix the prompt.
- **Actual**: The error is logged at `warn` level and the DAG silently continues by treating the missing field as falsy. Every dependent node is skipped. The CLI reports "Workflow completed successfully."

## User Flow

```
User                       Archon                       Pi
────                       ──────                       ──
runs maintainer-review-pr ▶ executes gate node ────────▶ returns "Let me think...\n```json\n{...}\n```"
                            condition_json_parse_failed (warn log only)
                            $gate.output.verdict == 'review' → falsy
                            [X] every downstream aspect node → Skipped (when_condition)
                            workflow finishes
sees "Workflow completed   ◀── exit 0, no review posted
successfully" but no
review exists on PR
```

## Environment

- Platform: CLI (also reproducible from any adapter)
- Database: SQLite
- Running in worktree? No (workflow has `worktree.enabled: false`)
- OS: macOS / Linux (provider-agnostic; issue is in `@archon/workflows` condition-evaluator)

## Logs

```
{"level":40,"module":"workflow.condition-evaluator","nodeId":"gate","field":"verdict","outputPreview":"\n\nLet me read the direction.md to verify my alignment assessment before writing artifacts.\n\n\nAll gat","msg":"condition_json_parse_failed"}
{"level":30,"module":"workflow.dag-executor","nodeId":"review-classify","when":"$gate.output.verdict == 'review'","msg":"dag_node_skipped_condition"}
[review-classify] Skipped (when_condition)
[approve-decline] Skipped (when_condition)
[approve-unclear] Skipped (when_condition)
[maintainer-review-code-review] Skipped (trigger_rule)
[maintainer-review-test-coverage] Skipped (trigger_rule)
... (all downstream nodes skipped)
{"module":"workflow.dag-executor","nodeCount":18,"anyCompleted":true,"anyFailed":false,"msg":"dag_workflow_finished"}
Workflow completed successfully.
```

## Impact

- Affected workflows/commands: any workflow with `when:` clauses referencing `$node.output.<field>` where the producing node is an LLM and the field isn't guaranteed parseable. In practice most frequent with Pi/Minimax (no structured-output SDK enforcement), but also possible with Claude/Codex if the prompt drifts.
- Reproduction rate: Intermittent on Pi — observed twice in ~12 `maintainer-review-pr` runs in one session (~17%).
- Workaround available: Re-run the workflow; usually succeeds on retry. No way to detect from the CLI exit code that the review didn't actually happen — must inspect logs.
- Data loss risk: No, but real cost — a 5-minute workflow "succeeded" without doing the work.

## Scope

- Package(s) likely involved: `workflows`
- Module: `workflows:condition-evaluator`, `workflows:dag-executor`

## Proposed direction

Treat `condition_json_parse_failed` against a `when:` reference as a node error, not a warning. The DAG should fail the dependent node (and its descendants via the existing failure-propagation path), and the CLI should exit non-zero with a message naming the unparseable node and surfacing the output preview. This aligns with the project's "Fail Fast + Explicit Errors" principle — silent fallback in agent runtimes can create unsafe or costly behavior, and `condition_json_parse_failed cascading to skipped nodes` is precisely that.

Open questions for the implementer:
- Should this be opt-out per-node (e.g. `condition_strict: false` for legacy workflows that depended on the silent fallback)? Probably not — the silent path is wrong everywhere.
- Should the field-extraction layer also attempt graceful repairs (strip ` ```json ` fences, find balanced braces) before declaring parse failure? Worth doing, but as a separate improvement — the failure surfacing is the root issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow exits success when condition_json_parse_failed cascades to silent node skips #1673

Summary

Steps to Reproduce

Expected vs Actual

User Flow

Environment

Logs

Impact

Scope

Proposed direction

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Workflow exits success when condition_json_parse_failed cascades to silent node skips #1673

Description

Summary

Steps to Reproduce

Expected vs Actual

User Flow

Environment

Logs

Impact

Scope

Proposed direction

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions