Workflow node failure messages dump entire script body — make errors actionable for self-correcting agents

## Problem

When a `bash:` or `script:` node fails, the executor surfaces an error message that includes the **entire substituted script body** inline (often 1000+ chars), wrapping the actual diagnostic in noise. For an agent (or human) trying to figure out what went wrong, the signal is buried.

Example excerpt from a real failed `script:` node (truncated for brevity):

```
[format-changelog] Failed: Script node 'format-changelog' failed: Command failed: bun --no-env-file -e import { mkdirSync, writeFileSync } from \"node:fs\";
import { join } from \"node:path\";

const cl = JSON.parse(String.raw`{\"summary\":\"Binary path auto-detection for Claude/Codex...\",\"added\":[\"Automatic detection of canonical install paths for the Claude Code and Codex binaries — `claudeBinaryPath`/`codexBinaryPath` no longer need to be set manually in most environments (#1361)\"],...}`);
... [the entire 50-line script repeated three times — once in err.message, once in err.stack, once in err.cmd] ...
4 | r the Claude Code and Codex binaries — `claudeBinaryPath`/`codexBinaryPath` no longer need to be set manually in most
                                                                                                                                                                                                                                                    ^
error: Expected \")\" but found \"claudeBinaryPath\"
    at /Users/rasmus/Projects/cole/Archon/[eval]:4:241
```

The actually useful part — `Expected \")\" but found \"claudeBinaryPath\" at [eval]:4:241` — is one line at the end. The script body is reproduced ~3x in the JSON log entry (once in `err.message`, once in `err.stack`, once in `err.cmd`), and again in the human-readable `[node-id] Failed: ...` output line.

## Why this matters more for Archon than for typical CLIs

Archon's value prop is agents authoring and running workflows. When a workflow fails, the most common follow-up is the agent reading the error and editing the YAML. The current format trains agents to:

1. Skip past the noise looking for the real error
2. Sometimes miss the real error entirely (it's often after 2KB of script-body repetition)
3. Burn context-window tokens that could go toward the fix

A self-correcting agent loop (\"workflow failed → AI reads error → AI patches YAML → re-run\") is much more reliable when the error is concise and points at the failing line.

## Proposed direction

The principle: **error messages should be informative enough that an agent running Archon can self-correct the workflow without reading the script body**.

Concrete changes that would help (sketched, not prescriptive):

1. **Don't repeat the script body 3× in the log entry.** `err.cmd` contains it; `err.message` and `err.stack` should reference \"node 'format-changelog' script (see err.cmd)\" instead of inlining it again.
2. **Surface stderr first.** The actually-useful diagnostic from `bun`/the failed binary is always in stderr. Lead with stderr, demote the script body to a separate field or a debug-level log.
3. **For `script:` nodes specifically**, parse the stderr for `at [eval]:LINE:COL` and emit a structured `failedAt: { line, col, snippet }` field. Agents can then jump directly to the right line in the YAML.
4. **In the human-readable `[node-id] Failed: ...` output**, print just the diagnostic summary (\"Script syntax error at line 4 col 241: Expected ')' but found 'claudeBinaryPath'\") and keep the verbose stack trace in the JSON log.
5. **Optional**: include a hint when a recognizable pattern matches. For the case above, the failure was triggered by a backtick in the substituted `\$nodeId.output`. A heuristic check (\"substituted value contained backticks; this may have terminated the JS template literal\") would shortcut a real category of mistakes.

## Repro

- Author a `script:` node that consumes \`String.raw\\\`\$other-node.output\\\`\` where the upstream output is AI-generated content containing backticks (any markdown code spans).
- Run the workflow.
- Observe the failure log: the actual JS syntax error is buried in ~1KB of repeated script body.

## Found while

Building a custom \`archon-release\` workflow during a workshop prep session. The failing pattern was the canonical \`String.raw\\\`\$nodeId.output\\\`\` substitution shown in the docs example (\`.claude/skills/archon/examples/dag-workflow.yaml\`), which is safe for \`gh\`-CLI JSON output but breaks for AI prompt outputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow node failure messages dump entire script body — make errors actionable for self-correcting agents #1389

Problem

Why this matters more for Archon than for typical CLIs

Proposed direction

Repro

Found while

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Workflow node failure messages dump entire script body — make errors actionable for self-correcting agents #1389

Description

Problem

Why this matters more for Archon than for typical CLIs

Proposed direction

Repro

Found while

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions