Skip to content

fix(workflows): concise failure messages for bash/script nodes (#1389)#1393

Merged
Wirasm merged 2 commits into
devfrom
fix/issue-1389-concise-node-failure-messages
Apr 29, 2026
Merged

fix(workflows): concise failure messages for bash/script nodes (#1389)#1393
Wirasm merged 2 commits into
devfrom
fix/issue-1389-concise-node-failure-messages

Conversation

@Wirasm

@Wirasm Wirasm commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1389.

When a bash: or script: node fails, the user-visible error message currently embeds the entire substituted script body. Node's promisify(execFile) puts the body into err.message, err.cmd, and the first line of err.stack; the old catch blocks interpolated err.message verbatim and logged the raw err, so Pino's default serializer ended up writing the body three times per failure and the CLI / web UI / DB event all carried the same noisy string. Useful diagnostics (e.g. Expected ")" but found "x" at [eval]:4:241) were buried at the end.

This change routes both catch blocks through a new formatSubprocessFailure(err, label) helper in executor-shared.ts that prefers err.stderr, strips the Command failed: <cmd> prefix line, and tail-caps at 2 KB. The helper also returns a controlled logFields subset so Pino never re-serializes message/stack/cmd.

Root Cause

Two lines, one in each catch block:

  • dag-executor.ts:1325 (bash) — errorMsg = \Bash node '${node.id}' failed: ${err.message}``
  • dag-executor.ts:1582 (script) — errorMsg = \Script node '${node.id}' failed: ${err.message}${stderrHint}``

err.message from Node's ExecFileException is "Command failed: <cmd> <args>\n<stderr>". For bash -c <body> / bun -e <body>, the first line contains the substituted script body. Compounded by getLog().error({ err, … }) which serializes err.message + err.stack + err.cmd — three copies of the body in one log line.

Changes

File Change
packages/workflows/src/executor-shared.ts Add formatSubprocessFailure(err, label) + RawSubprocessError type
packages/workflows/src/dag-executor.ts Bash + script catch blocks route default branch through the helper; log controlled logFields instead of raw err; drop script-node redundant stderrHint
packages/workflows/src/executor-shared.test.ts 7 unit tests for the helper (prefix stripping, stderr-first, 2 KB tail truncation, log-field safety, no-code fallback, empty-error fallback)
packages/workflows/src/dag-executor.test.ts 2 regression tests asserting the stored data.error on the node_failed event does not contain the Command failed: prefix and caps at a small size

Timeout / ENOENT / EACCES branches are preserved verbatim.

Before / After

Before (real output from the issue):

[format-changelog] Failed: Script node 'format-changelog' failed: Command failed: bun --no-env-file -e import { mkdirSync, writeFileSync } from "node:fs";
import { join } from "node:path";
… [the 50-line script repeated 3× — once in err.message, once in err.stack, once in err.cmd] …
error: Expected ")" but found "claudeBinaryPath"
    at /Users/rasmus/Projects/cole/Archon/[eval]:4:241

After:

[format-changelog] Failed: Script node 'format-changelog' failed [exit 1]: error: Expected ")" but found "claudeBinaryPath"
    at /Users/rasmus/Projects/cole/Archon/[eval]:4:241

Location markers (at [eval]:L:C) are preserved so agents can still parse them for structured follow-up edits.

Test plan

  • bun run type-check — 10/10 packages
  • bun test packages/workflows/src/executor-shared.test.ts — 58/58 pass (7 new)
  • bun test packages/workflows/src/dag-executor.test.ts — 191/191 pass (2 new regression)
  • bun run validate — clean exit 0 (bundled check, type-check, lint, prettier, full test suite)
  • Manual repro: create a script: node with a body containing a syntax error, run the workflow, confirm the error is ~150 chars and preserves the at [eval]:L:C location

Scope

  • In scope: the two else branches in the bash/script node catch blocks and the new helper
  • Out of scope (deferred as separate issues if needed):
    • Structured failedAt: { line, col } field on NodeFailedEvent
    • Heuristic hint for substituted values containing backticks
    • Other getLog().error({ err, … }) callsites (AI-provider errors don't exhibit the same body-leak)
    • Changes to the @archon/git execFileAsync primitive

Summary by CodeRabbit

  • Bug Fixes
    • Sanitized subprocess failure messages for bash/script nodes to remove embedded script content, standardize wording, and cap diagnostic size.
  • Tests
    • Added regression tests to verify sanitized error messages, truncation behavior, and safe logging of failure details.
  • Documentation
    • Clarified script-node capture and failure reporting so stderr is shown in failures but script bodies are not returned.

@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Subprocess failure formatting was added and integrated into DAG executor error paths for bash: and script: nodes. A new formatSubprocessFailure utility sanitizes and truncates diagnostics (preferring stderr, stripping Node's "Command failed:" prefix, appending [exit N] when present) and returns limited logFields. Tests were added to validate behavior and storage of sanitized messages.

Changes

Cohort / File(s) Summary
Formatter utility
packages/workflows/src/executor-shared.ts
Adds exported formatSubprocessFailure(err, label) and RawSubprocessError plus SUBPROCESS_ERROR_MAX_CHARS. Produces a user-facing message (stderr-preferred, strips "Command failed:", tail-truncates, appends [exit N]) and limited logFields (exitCode, killed, stderrTail).
Formatter tests
packages/workflows/src/executor-shared.test.ts
New tests covering: stderr vs message preference, removal of "Command failed:", truncation behavior with [truncated] marker and size bounds, safe handling of empty errors, and ensuring logFields do not embed full message/stack/cmd.
Executor integration
packages/workflows/src/dag-executor.ts
Integrates formatSubprocessFailure() into executeBashNode and executeScriptNode catch paths; derives node-specific labels; uses formatted.userMessage for stored node_failed.data.error (except normalized timeout/ENOENT/EACCES cases); logs { ...formatted.logFields, nodeId, nodeType, isTimeout } instead of raw error.
Executor regression tests
packages/workflows/src/dag-executor.test.ts
Adds two regression tests asserting that bash/script node failures preserve diagnostic text and exit code, omit "Command failed:" and script body leakage, and keep stored error sizes within expected limits.
Docs & Changelog
packages/docs-web/src/content/docs/guides/script-nodes.md, CHANGELOG.md
Docs updated to clarify stderr surfacing for failed script nodes and state that script body is not returned; CHANGELOG notes the new sanitization behavior.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Executor as DAG Executor
  participant Subproc as Subprocess (bash/script)
  participant Formatter as formatSubprocessFailure
  participant Logger as Logger / Node Store

  Executor->>Subproc: spawn/run node subprocess
  Subproc-->>Executor: exit non‑zero / error (stderr, message, code, killed)
  Executor->>Formatter: formatSubprocessFailure(err, label)
  Formatter-->>Executor: { userMessage, logFields }
  Executor->>Logger: store node_failed.data.error = userMessage
  Executor->>Logger: emit dag_node_failed with { ...logFields, nodeId, nodeType, isTimeout }
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰
I nibble noise and keep the seed—
stderr sings the fix we need.
No script spam stuffed in the log,
Just [exit N] and a tidy cog.
Hop on, retry, the path's now freed. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing failure messages for bash/script workflow nodes to be concise rather than dumping the entire script body.
Description check ✅ Passed The PR description is comprehensive, covering the problem, root cause, detailed changes across files, before/after examples, test results, and explicit scope boundaries. It maps directly to the template structure.
Linked Issues check ✅ Passed The PR fully implements the core objectives from #1389: stopping script-body repetition, surfacing stderr first, tail-capping at 2KB, and preserving location markers for agent parsing.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to the formatSubprocessFailure helper and its integration into bash/script catch blocks. No unrelated modifications to error handling, logging infrastructure, or other subsystems are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-1389-concise-node-failure-messages

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm

Wirasm commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator Author

Automated self-review

Verdict: PASS (0 critical, 0 important)

Strengths

  • All three body-leak paths are genuinely closed: err.message interpolation replaced by formatSubprocessFailure, raw err removed from the Pino payload, and the redundant stderrHint concatenation in the script-node catch block is deleted.
  • The fallback chain in the helper is safe: when hasCommandFailedPrefix is true and both stderr and bodyAfterPrefix are empty, the result is 'no diagnostic output'rawMessage is never used as a final fallback in that case.
  • Retry classification (classifyError) is unaffected — those matchers operate on the formatted errorMsg, and 'exited with code' never appeared in bash/script node error messages before or after this change (bash/script nodes classified UNKNOWN via this path, Claude/Codex SDK crash messages classified via a different path).
  • logFields excludes message, stack, and cmd by construction, so Pino's default error serializer has nothing to expand into the log line.
  • Regression tests exercise a live subprocess with a unique body marker; unit tests cover empty stderr, empty message, missing code, huge stderr, no Command failed: prefix, and mixed stderr+message content.

Suggestions (non-blocking)

  • executor-shared.ts:132 — the third operand rawMessage in the fallback chain is reachable only when hasCommandFailedPrefix is false AND both stderr and bodyAfterPrefix are falsy. In that case bodyAfterPrefix === rawMessage, so the branch resolves to 'unknown error' anyway. Slightly redundant but not a bug.
  • executor-shared.ts:140typeof err.code === 'string' accepts the empty string. In practice Node only produces numeric codes or undefined, but an explicit err.code !== '' guard would make intent clearer. Minor.

No security concerns.

Checklist

  • Fix addresses all three root causes from the investigation
  • Code follows codebase patterns (helper extraction mirrors classifyError)
  • Tests cover the change (unit + regression)
  • No obvious bugs introduced
  • Retry/classification behavior preserved

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/workflows/src/dag-executor.ts (2)

1315-1336: Consider populating logFields in the timeout/ENOENT/EACCES branches for consistency.

In the non-formatSubprocessFailure branches, logFields stays {}, so the resulting dag_node_failed log entry loses potentially useful diagnostic fields (exitCode, killed) that are present on err. The user-facing message is already concise for these branches, but the structured log entry is now less informative than before this change (where the raw err was logged). A minimal adjustment that keeps the script body out while preserving structured diagnostics:

♻️ Proposed refactor
-    const err = error as Error & { killed?: boolean; code?: number | string; stderr?: string };
+    const err = error as Error & { killed?: boolean; code?: number | string; stderr?: string };
     const isTimeout = err.killed === true || (err.message ?? '').includes('timed out');
     const label = `Bash node '${node.id}'`;
     let errorMsg: string;
-    let logFields: Record<string, unknown> = {};
+    let logFields: Record<string, unknown> = {
+      exitCode: err.code,
+      killed: err.killed === true,
+    };
     if (isTimeout) {
       errorMsg = `${label} timed out after ${String(timeout)}ms`;
     } else if (err.message?.includes('ENOENT')) {
       errorMsg = `${label} failed: bash executable not found in PATH`;
     } else if (err.message?.includes('EACCES')) {
       errorMsg = `${label} failed: permission denied (check cwd permissions)`;
     } else {
       const formatted = formatSubprocessFailure(err, label);
       errorMsg = formatted.userMessage;
       logFields = formatted.logFields;
     }

Non-blocking — feel free to skip if you prefer to keep those branches minimal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.ts` around lines 1315 - 1336, The
structured log (getLog().error) drops diagnostic fields in the
timeout/ENOENT/EACCES branches because logFields remains empty; update those
branches in the catch block around node execution to populate logFields with
relevant properties from err (e.g., killed, code or exitCode, stderr) so they
match the data returned by formatSubprocessFailure; keep the existing concise
user-facing errorMsg for each branch but set logFields = { killed: err.killed,
exitCode: err.code ?? (err as any).exitCode, stderr: err.stderr } (or similar
keys used by formatSubprocessFailure) before calling getLog().error for
consistent structured diagnostics for node.id and timeout cases.

1315-1362: Near-identical catch blocks in executeBashNode and executeScriptNode — consider extracting.

The post-execFileAsync catch logic (error classification → errorMsg/logFields → structured log → logNodeError → event persist → emitter.emit → return failed NodeOutput) is duplicated between the two subprocess node executors, differing only in the label, the missing-executable name, and the event type: 'bash' | 'script'. A small helper (e.g., handleSubprocessFailure(…)) would keep both nodes in lockstep on future changes (idle-timeout handling, cancellation classification, etc.) and reduce the risk of the two paths drifting.

Not required for this PR — the current change is already a clear improvement; noting as a follow-up.

Also applies to: 1578-1625

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.ts` around lines 1315 - 1362, The catch
block logic in executeBashNode and executeScriptNode is duplicated (error
classification → errorMsg/logFields → structured logging → logNodeError → event
persist → emitter.emit → return failed NodeOutput); extract this into a helper
like handleSubprocessFailure(node, workflowRun, err, label, eventType, timeout,
logDir, deps, emitter) that: classifies timeouts/ENOENT/EACCES using
formatSubprocessFailure, builds errorMsg/logFields, calls getLog().error with
nodeId/isTimeout/errType, awaits logNodeError(logDir, workflowRun.id, node.id,
errorMsg), calls deps.store.createWorkflowEvent({... data: { error: errorMsg,
type: eventType }}).catch(...), emits the same emitter.emit payload, and returns
the standard { state: 'failed', output: '', error: errorMsg } so both
executeBashNode and executeScriptNode just call handleSubprocessFailure(...) in
their catch blocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 1315-1336: The structured log (getLog().error) drops diagnostic
fields in the timeout/ENOENT/EACCES branches because logFields remains empty;
update those branches in the catch block around node execution to populate
logFields with relevant properties from err (e.g., killed, code or exitCode,
stderr) so they match the data returned by formatSubprocessFailure; keep the
existing concise user-facing errorMsg for each branch but set logFields = {
killed: err.killed, exitCode: err.code ?? (err as any).exitCode, stderr:
err.stderr } (or similar keys used by formatSubprocessFailure) before calling
getLog().error for consistent structured diagnostics for node.id and timeout
cases.
- Around line 1315-1362: The catch block logic in executeBashNode and
executeScriptNode is duplicated (error classification → errorMsg/logFields →
structured logging → logNodeError → event persist → emitter.emit → return failed
NodeOutput); extract this into a helper like handleSubprocessFailure(node,
workflowRun, err, label, eventType, timeout, logDir, deps, emitter) that:
classifies timeouts/ENOENT/EACCES using formatSubprocessFailure, builds
errorMsg/logFields, calls getLog().error with nodeId/isTimeout/errType, awaits
logNodeError(logDir, workflowRun.id, node.id, errorMsg), calls
deps.store.createWorkflowEvent({... data: { error: errorMsg, type: eventType
}}).catch(...), emits the same emitter.emit payload, and returns the standard {
state: 'failed', output: '', error: errorMsg } so both executeBashNode and
executeScriptNode just call handleSubprocessFailure(...) in their catch blocks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8da317c5-a401-4122-8dd8-a90c8eb05165

📥 Commits

Reviewing files that changed from the base of the PR and between 2c15439 and ba561bc.

📒 Files selected for processing (4)
  • packages/workflows/src/dag-executor.test.ts
  • packages/workflows/src/dag-executor.ts
  • packages/workflows/src/executor-shared.test.ts
  • packages/workflows/src/executor-shared.ts

@Wirasm

Wirasm commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator Author

Hi @Wirasm — thanks for opening this PR.

This repository uses a PR template at .github/pull_request_template.md with several required sections. A few of them appear to be empty or placeholder here:

  • UX Journey
  • Architecture Diagram
  • Label Snapshot
  • Change Metadata
  • Linked Issue
  • Validation Evidence
  • Security Impact
  • Compatibility / Migration
  • Human Verification
  • Side Effects / Blast Radius
  • Rollback Plan
  • Risks and Mitigations

Could you fill those out (even briefly)? The template helps reviewers understand scope, risk, and rollback — it speeds up review significantly.

If a section genuinely doesn't apply, just write "N/A" in it rather than leaving it blank.

@Wirasm

Wirasm commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator Author

PR Review Summary

Multi-agent review of PR 1393 — formatSubprocessFailure helper for bash/script node failures. CI green, mergeable clean. The fix is real and well-scoped; nothing critical, but a few worthwhile follow-ups.

Critical Issues

None.

Important Issues

Agent Issue Location
code-reviewer getLog().error() was moved out of each branch and now spreads logFields (initialized to {}) for all four branches. Timeout / ENOENT / EACCES previously logged the full err; now they only log { nodeId, isTimeout, errType: 'Error' }. The PR description says only the default branch is affected — the diff actually narrows logging on all branches. Fix: log err.message (or move the getLog().error back inside each branch) for the non-default cases. packages/workflows/src/dag-executor.ts (bash ~1339, script ~1596)
code-reviewer errType: err.constructor.name is 'Error' for every real execFileAsync throw and 'Object' for the synthetic test inputs. It never discriminates anything in production — drop it or replace with nodeType: 'bash' / 'script'. packages/workflows/src/dag-executor.ts (both catch blocks)
docs-impact script-nodes.md says "stderr is logged as a warning and posted to the conversation but does not fail the node" — true on success, but on failure stderr is now the primary content of the user-facing error, not a side-channel warning. Split the sentence into success-path and failure-path behavior. packages/docs-web/src/content/docs/guides/script-nodes.md:75-77
docs-impact No [Unreleased] / Fixed entry for this user-visible error-format change. CHANGELOG.md

Suggestions

Agent Suggestion Location
pr-test-analyzer The two integration regression tests assert on the persisted node_failed event but not on what reaches Pino. Spy on getLog().error and assert the first arg never contains the script body — closes the integration gap on the secondary goal. (criticality 6) packages/workflows/src/dag-executor.test.ts
pr-test-analyzer No integration test for ENOENT/EACCES branches in either bash or script — a future refactor could silently change those user-visible strings. (criticality 6) packages/workflows/src/dag-executor.test.ts
pr-test-analyzer Script regression test asserts errorMsg.length < 4000 but the cap is ~2048 chars — would not catch an accidental doubling of the constant. Use < 2100. (criticality 4) dag-executor.test.ts:5680
pr-test-analyzer Final regex /error|expected|unexpected|syntax/ is broad enough to match the empty-error fallback ("unknown error"). Tighten to .toContain('[eval]') or similar. (criticality 4) dag-executor.test.ts:5683
type-design-analyzer RawSubprocessError is exported but has no external consumer (catch blocks use an inline intersection, not the named type). Drop the export to shrink the module surface. executor-shared.ts:97
type-design-analyzer code?: number | string should be code?: number | string | null to mirror Node's ExecFileException. The runtime guard already handles null; the type just needs to say so. A one-line comment explaining the union (numeric exit code OR errno symbol like 'ENOENT') would prevent future "simplification" to number. executor-shared.ts:102
code-simplifier The diagnostic fallback chain mixes || and a ternary. The third branch is partially unreachable (bodyAfterPrefix === rawMessage when no prefix). An explicit if/else if makes each case independently verifiable. executor-shared.ts:129-134
code-simplifier typeof err.code === 'number' || typeof err.code === 'string' reduces to err.code != null given the declared type. executor-shared.ts:139-143
code-simplifier stderrTail || undefined is the empty-string-to-undefined idiom. stderrTail.length > 0 ? stderrTail : undefined makes the intent explicit. executor-shared.ts:152
code-simplifier let logFields: Record<string, unknown> = {} then conditional reassignment in two parallel catch blocks. Capture formatted from the else branch and derive logFields once with const. dag-executor.ts:1320-1330, 1583-1593
comment-analyzer (issue #1389) in describe-block names and // Regression for issue #1389 … headers belong in the PR description, not in code. The it(...) names already state the behavior. executor-shared.test.ts, dag-executor.test.ts (both regression blocks)
silent-failure-hunter Tail-truncation is correct for Bun/Node syntax errors and Python tracebacks (error class is at the end). For combined uv-resolution + Python traceback >2 KB the uv header (at the top) could be lost — low-severity edge case. executor-shared.ts:133
silent-failure-hunter err.signal (e.g. SIGKILL) is no longer in structured logs; only killed: true survives. Minor debugging regression for timeout correlation with OS logs. executor-shared.ts logFields

Strengths

  • Bug is real and reproducible (50-line script body repeated 3× in err.message + err.stack + err.cmd); fix is targeted and well-tested.
  • formatSubprocessFailure correctly strips the Command failed: <cmd> prefix line, prefers stderr, tail-caps at 2 KB.
  • Controlled logFields (exitCode, killed, stderrTail) prevents Pino's default error serializer from re-attaching the body via err.stack/err.cmd.
  • [exit N] suffix is always present when an exit code exists, so the failure is never fully silent even when stderr is empty.
  • Bash regression test uses a UNIQUE_CMDLINE_MARKER_1389 strategy that makes leak detection trivial; script test pads the body to ~3.2 KB so any future re-leak is caught by the size assertion.
  • Type union number | string for code is intentional and accurate (covers numeric exit codes AND errno symbols like 'ENOENT').
  • All seven documented behaviors of the helper have a named unit test.
  • Timeout / ENOENT / EACCES error string literals are preserved verbatim (PR claim is accurate at the user-facing message level — see Important Issue Model stucked at response stream text #1 above for the log-side caveat).

Documentation Issues

  • packages/docs-web/src/content/docs/guides/script-nodes.md:75-77 — stale stderr-behavior wording (success/failure paths now diverge).
  • CHANGELOG.md — missing [Unreleased] / Fixed entry.

Verdict

NEEDS FIXES — non-blocking but worth addressing before merge. The two code-reviewer log-narrowing findings are real bugs (the timeout/ENOENT/EACCES branches lost log information that the PR description claimed was preserved), and the two doc updates are user-facing.

Recommended Actions

  1. Fix the log-narrowing on the timeout/ENOENT/EACCES branches in both catch blocks (dag-executor.ts).
  2. Drop or replace the always-'Error' errType field.
  3. Update script-nodes.md and add a CHANGELOG.md entry.
  4. Apply the high-value simplifications (explicit if/else for diagnostic, const logFields, err.code != null).
  5. Optionally: tighten test assertions, add an ENOENT integration test, and a log spy in the regression tests.
  6. Optionally: drop RawSubprocessError export and add | null to code.

Generated by /prp-review-agents (7 specialist agents).

Wirasm added 2 commits April 29, 2026 11:54
When a `bash:` or `script:` node fails, the executor was embedding
`err.message` verbatim into the user-visible error. For inline scripts run
via `bash -c <body>` / `bun -e <body>`, Node's `promisify(execFile)` puts
the entire substituted script body into `err.message`, `err.cmd`, and the
first line of `err.stack` — and the Pino log serialized all three, repeating
the body ~3× in one structured log line. The actionable diagnostic was
buried under kilobytes of echoed source.

Route both catch-block default branches through a new
`formatSubprocessFailure(err, label)` helper in `executor-shared.ts` that:

- strips the `Command failed: <cmd>` prefix line (which carries the body)
- prefers `err.stderr` — Bun/bash emit the actionable diagnostic there
- tail-truncates at 2 KB with a `…[truncated]` marker
- returns a controlled `logFields` subset (`exitCode`, `killed`, `stderrTail`)
  so Pino never re-serializes `err.message` / `err.stack` / `err.cmd`

Also drops the script-node `stderrHint` concatenation — stderr is already
handled by the helper, so the previous code appended it twice.

Timeout / ENOENT / EACCES branches are preserved verbatim.

Fixes #1389
- Run formatSubprocessFailure unconditionally so timeout / ENOENT / EACCES
  branches also get sanitized log fields (the timeout message also embeds
  the `Command failed: bash -c <body>` line).
- Drop `errType: err.constructor.name` (always 'Error' in production) and
  replace with `nodeType: 'bash' | 'script'` for actual discriminating value.
- Replace chained `||` + ternary in diagnostic selection with explicit
  if/else for readability.
- Simplify exit suffix guard: `err.code != null` instead of double typeof.
- Make stderrTail emptiness check explicit.
- Drop `RawSubprocessError` export (no external consumer) and widen `code`
  to `number | string | null` to mirror Node's ExecFileException.
- Tighten test length bound to <2100 (was <4000 / <2200) so a doubling of
  SUBPROCESS_ERROR_MAX_CHARS would actually trip the assertion.
- Replace broad regex assertion with `.toContain('[eval]')` — the location
  marker is the strongest signal that diagnostic content survived.
- Strip issue-number citations from describe block and test comments.
- Update script-nodes.md to distinguish stderr behavior on success vs
  failure paths.
- Add CHANGELOG Unreleased / Fixed entry for the user-visible change.
@Wirasm Wirasm force-pushed the fix/issue-1389-concise-node-failure-messages branch from ba561bc to fe1deba Compare April 29, 2026 08:56

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/workflows/src/dag-executor.test.ts (1)

1141-1187: Add a logger assertion here too.

These regressions cover persisted node_failed.data.error, but they still leave the sanitized getLog().error(...) path untested. Since the leak fix also depends on passing only formatted.logFields in packages/workflows/src/dag-executor.ts, a future reintroduction of raw err logging would still pass this suite. A small mockLogger.error assertion that rejects Command failed: / body markers would close that gap.

Also applies to: 6177-6229

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.test.ts` around lines 1141 - 1187, Add an
assertion against the logger used by executeDagWorkflow to ensure the sanitized
error was logged (so regressions that only fix persisted data but still log raw
errors will fail): inspect mockDeps.logger.error (or whatever mock logger is
provided by createMockDeps) call(s) after executeDagWorkflow and assert the
logged message for the 'node_failed' for step 'fail-bash-1389' contains
"diagnostic from stderr" and "[exit 1]" and does NOT contain "Command failed:"
or "UNIQUE_CMDLINE_MARKER_1389" (similar to the checks on
failedEvent.data.error); place this immediately after you locate failedEvent so
it validates the getLog().error path for the same failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/docs-web/src/content/docs/guides/script-nodes.md`:
- Around line 76-80: The paragraph currently implies the full stderr is always
returned verbatim; update the wording so it matches formatSubprocessFailure():
state that the formatter prefers stderr but falls back to other output when
stderr is empty, and that diagnostics are tail-truncated to ~2 KB rather than
returned in full; keep the example failure pattern (Script node 'X' failed [exit
N]: <stderr>) but clarify that the shown <stderr> may be truncated or
substituted per formatSubprocessFailure() and that the script body is never
echoed back to users.

---

Nitpick comments:
In `@packages/workflows/src/dag-executor.test.ts`:
- Around line 1141-1187: Add an assertion against the logger used by
executeDagWorkflow to ensure the sanitized error was logged (so regressions that
only fix persisted data but still log raw errors will fail): inspect
mockDeps.logger.error (or whatever mock logger is provided by createMockDeps)
call(s) after executeDagWorkflow and assert the logged message for the
'node_failed' for step 'fail-bash-1389' contains "diagnostic from stderr" and
"[exit 1]" and does NOT contain "Command failed:" or
"UNIQUE_CMDLINE_MARKER_1389" (similar to the checks on failedEvent.data.error);
place this immediately after you locate failedEvent so it validates the
getLog().error path for the same failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d44f3cc-e5be-411c-b2cf-43cf420ba8fd

📥 Commits

Reviewing files that changed from the base of the PR and between ba561bc and fe1deba.

📒 Files selected for processing (6)
  • CHANGELOG.md
  • packages/docs-web/src/content/docs/guides/script-nodes.md
  • packages/workflows/src/dag-executor.test.ts
  • packages/workflows/src/dag-executor.ts
  • packages/workflows/src/executor-shared.test.ts
  • packages/workflows/src/executor-shared.ts
✅ Files skipped from review due to trivial changes (2)
  • CHANGELOG.md
  • packages/workflows/src/executor-shared.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/workflows/src/executor-shared.test.ts
  • packages/workflows/src/dag-executor.ts

Comment on lines +76 to +80
`$nodeId.output`. On a successful run, `stderr` is logged as a warning and
posted to the conversation but does **not** fail the node. A non-zero exit
code fails the node; on failure, `stderr` is the diagnostic surfaced in the
error message (`Script node 'X' failed [exit N]: <stderr>`) — the script
body is never echoed back to users.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Soften the failure-format wording to match the formatter.

This reads like the full stderr is always returned verbatim, but formatSubprocessFailure() actually prefers stderr, falls back when it is empty, and tail-truncates diagnostics at 2 KB. A wording tweak here would avoid setting the wrong expectation for long or stderr-less failures.

📝 Suggested wording
-  `$nodeId.output`. On a successful run, `stderr` is logged as a warning and
-  posted to the conversation but does **not** fail the node. A non-zero exit
-  code fails the node; on failure, `stderr` is the diagnostic surfaced in the
-  error message (`Script node 'X' failed [exit N]: <stderr>`) — the script
-  body is never echoed back to users.
+  `$nodeId.output`. On a successful run, `stderr` is logged as a warning and
+  posted to the conversation but does **not** fail the node. A non-zero exit
+  code fails the node; on failure, the error message prefers `stderr`, strips
+  the leading `Command failed:` wrapper, and tail-caps very large diagnostics
+  (for example: `Script node 'X' failed [exit N]: <diagnostic>`). The script
+  body is never echoed back to users.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
`$nodeId.output`. On a successful run, `stderr` is logged as a warning and
posted to the conversation but does **not** fail the node. A non-zero exit
code fails the node; on failure, `stderr` is the diagnostic surfaced in the
error message (`Script node 'X' failed [exit N]: <stderr>`) — the script
body is never echoed back to users.
`$nodeId.output`. On a successful run, `stderr` is logged as a warning and
posted to the conversation but does **not** fail the node. A non-zero exit
code fails the node; on failure, the error message prefers `stderr`, strips
the leading `Command failed:` wrapper, and tail-caps very large diagnostics
(for example: `Script node 'X' failed [exit N]: <diagnostic>`). The script
body is never echoed back to users.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/docs-web/src/content/docs/guides/script-nodes.md` around lines 76 -
80, The paragraph currently implies the full stderr is always returned verbatim;
update the wording so it matches formatSubprocessFailure(): state that the
formatter prefers stderr but falls back to other output when stderr is empty,
and that diagnostics are tail-truncated to ~2 KB rather than returned in full;
keep the example failure pattern (Script node 'X' failed [exit N]: <stderr>) but
clarify that the shown <stderr> may be truncated or substituted per
formatSubprocessFailure() and that the script body is never echoed back to
users.

@Wirasm Wirasm merged commit fccfe42 into dev Apr 29, 2026
4 checks passed
@Wirasm Wirasm deleted the fix/issue-1389-concise-node-failure-messages branch April 29, 2026 09:34
@Wirasm Wirasm mentioned this pull request Apr 29, 2026
prospapledge88 added a commit to prospapledge88/Archon that referenced this pull request May 5, 2026
…h 7 (#12)

* fix(workflows): make archon-adversarial-dev sed replacement macOS-safe (coleam00#1155)

* fix(workflows): make adversarial init sed portable on macOS

* chore: regenerate bundled-defaults after adversarial-dev sed fix

Sync generated bundle with the new temp-file sed pattern in
archon-adversarial-dev.yaml so check:bundled passes and binary
distributions ship the macOS-safe version.

---------

Co-authored-by: laplace young <yangqk12@whu.edu.cn>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>

* fix(workflows): filter user-plugin MCP noise out of workflow warnings (coleam00#1327)

* fix(workflows): filter user-plugin MCP noise out of workflow warnings

Before this change, the dag-executor surfaced every entry in the Claude
SDK's "MCP server connection failed: …" system message to the user. That
message includes user-level plugin MCPs inherited from ~/.claude/ (e.g.
`telegram`) that fail to connect in the headless workflow subprocess —
they're non-actionable noise for the workflow author.

Fix:
- Pre-compute the set of workflow-configured MCP server names per node
  by parsing the `mcp:` config file once at the start of
  executeNodeInternal. No caller-facing API change; no duplication of
  the provider's env-var expansion logic (we only need the keys).
- Split the system-message handler: the `MCP server connection failed:`
  path now surfaces only the subset of failing names that match the
  node's configured set; user-plugin failures are debug-logged as
  `dag.mcp_plugin_connection_suppressed`. The `⚠️` branch is unchanged.

Supersedes coleam00#1134 (closed as stale — the Windows HOME fix in that PR was
already shipped via coleam00#1302, and the claude.ts enabledPlugins change
targeted a file that has since moved into @archon/providers).

Credits @MrFadiAi for identifying and reporting the underlying issue.

Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>

* fix(workflows): preserve MCP failure status in filtered message + observability

Address review feedback on PR coleam00#1327:

- parseMcpFailureServerNames now returns {name, segment} entries so the
  forwarded "MCP server connection failed: ..." message preserves the
  per-server status detail (e.g. "(timeout)", "(disconnected)") that the
  bare-name reconstruction was dropping.
- loadConfiguredMcpServerNames now debug-logs read failures as
  dag.mcp_filter_config_read_failed instead of swallowing them silently.
  A transient EMFILE/EBUSY at filter time would otherwise silently
  reclassify a real workflow-MCP failure as plugin noise.
- Add 4 integration tests through executeDagWorkflow covering the mixed
  workflow/plugin split, all-plugin suppression, no-mcp:-config nodes,
  and the unchanged ⚠️ warning path.
- Drop a WHAT comment above configuredMcpNames and a temporal phrase
  ("before this filter landed") that would rot on merge.
- Document the filtering boundary in guides/mcp-servers.md and add a
  troubleshooting row for users debugging silently-suppressed plugin MCPs.

---------

Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>

* test(workflows): add anyFailed status derivation coverage for DAG executor (coleam00#1403)

PIV Task 1: Adds three new tests in a dedicated describe block
'executeDagWorkflow -- final status derivation' covering the anyFailed
branch (dag-executor.ts ~line 2956) that previously had no direct test:
- one success + one independent failure calls failWorkflowRun (not completeWorkflowRun)
- multiple successes + one failure calls failWorkflowRun (not completeWorkflowRun)
- trigger_rule: none_failed skips dependent node but anyFailed still marks run failed

Fixes coleam00#1381.

* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (coleam00#1398)

* fix(workflow): migrate piv-loop plan handoff to $ARTIFACTS_DIR (coleam00#1380)

The create-plan node used a relative path (.claude/archon/plans/{slug}.plan.md)
that the AI agent would sometimes write to a different location, breaking all
downstream nodes that glob for the plan file. Migrated all plan/progress file
references to $ARTIFACTS_DIR/plan.md and $ARTIFACTS_DIR/progress.txt, matching
the pattern used by archon-fix-github-issue and other workflows.

Changes:
- Replace slug-based plan path with $ARTIFACTS_DIR/plan.md in create-plan node
- Replace ls -t glob discovery with direct $ARTIFACTS_DIR/plan.md reads in
  refine-plan, code-review, and fix-feedback nodes
- Replace empty-string guard with file-existence check in implement-setup bash
- Migrate progress.txt references in implement loop to $ARTIFACTS_DIR/
- Add explicit plan/progress paths in finalize node
- Regenerated bundled-defaults.generated.ts

Fixes coleam00#1380

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(workflow): address review findings in archon-piv-loop

- Rename 'Step 2: Write the Plan' to 'Step 2: Plan File Location' to
  eliminate the duplicate heading that collided with Step 3's identical
  title in the create-plan node
- Guard implement-setup against a 0-task plan file: exit 1 with a
  clear error when no '### Task N:' sections are found, preventing a
  silent no-op implement loop
- Remove 2>/dev/null from code-review commit so pre-commit hook failures
  and other stderr are visible to the agent instead of silently swallowed
- Replace '|| true' on git push in finalize with an explicit WARNING echo
  so push failures (auth, upstream conflict, no remote) surface to the
  agent rather than being silently ignored
- Regenerate bundled-defaults.generated.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(workflows): regenerate bundled defaults to match opus[1m] alias

The bundle was stale relative to the YAML sources after coleam00#1395 merged —
check:bundled was failing CI. Regenerated; no YAML edits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(workflows): switch default Opus pin to opus[1m] alias (coleam00#1395)

Anthropic's Opus 4.7 landed 2026-04-16; on the Anthropic API, opus /
opus[1m] now resolve to 4.7 with a 1M context window at standard
pricing. Using the alias instead of the hard-pinned claude-opus-4-6[1m]
lets bundled default workflows auto-track the recommended Opus version.

No explicit effort is set, so nodes inherit the per-model default
(xhigh on 4.7, high on 4.6).

* fix(workflows): approval gate bypass after reject-with-redraft on resume (coleam00#1435)

* fix(workflows): approval gate bypass after reject-with-redraft on resume

When an approval node was rejected with on_reject.prompt, the synthetic
PromptNode built to run the on_reject prompt reused the approval gate's
own node ID. executeNodeInternal then wrote a node_completed event with
that ID, causing getCompletedDagNodeOutputs to treat the gate as already
completed on the next resume — bypassing the human gate entirely.

Fix: give the synthetic node the ID `${node.id}:on_reject` so its
node_completed event has a distinct step_name that won't match the
approval gate slot in priorCompletedNodes.

Adds a regression test asserting no node_completed event with the
approval gate's ID is written during on_reject execution.

Fixes coleam00#1429

* test(workflows): add positive assertion and SSE side-effect comment for on_reject synthetic node

Add complementary positive assertion to the regression test to verify that
node_completed is written exactly once with step_name 'review:on_reject',
ensuring future refactors that suppress the event entirely would be caught.

Add inline comment in executeApprovalNode documenting the known SSE side-effect:
node_started/node_completed events with nodeId='review:on_reject' flow through
the SSE pipeline into the web UI, resulting in a transient phantom node in the
execution view. This is cosmetic-only — the human gate contract is preserved.

* simplify: reduce duplicate cast pattern in on_reject test assertions

* fix(workflows): concise failure messages for bash/script nodes (coleam00#1389) (coleam00#1393)

* fix(workflows): concise failure messages for bash/script nodes (coleam00#1389)

When a `bash:` or `script:` node fails, the executor was embedding
`err.message` verbatim into the user-visible error. For inline scripts run
via `bash -c <body>` / `bun -e <body>`, Node's `promisify(execFile)` puts
the entire substituted script body into `err.message`, `err.cmd`, and the
first line of `err.stack` — and the Pino log serialized all three, repeating
the body ~3× in one structured log line. The actionable diagnostic was
buried under kilobytes of echoed source.

Route both catch-block default branches through a new
`formatSubprocessFailure(err, label)` helper in `executor-shared.ts` that:

- strips the `Command failed: <cmd>` prefix line (which carries the body)
- prefers `err.stderr` — Bun/bash emit the actionable diagnostic there
- tail-truncates at 2 KB with a `…[truncated]` marker
- returns a controlled `logFields` subset (`exitCode`, `killed`, `stderrTail`)
  so Pino never re-serializes `err.message` / `err.stack` / `err.cmd`

Also drops the script-node `stderrHint` concatenation — stderr is already
handled by the helper, so the previous code appended it twice.

Timeout / ENOENT / EACCES branches are preserved verbatim.

Fixes coleam00#1389

* fix(workflows): address PR review feedback for coleam00#1389

- Run formatSubprocessFailure unconditionally so timeout / ENOENT / EACCES
  branches also get sanitized log fields (the timeout message also embeds
  the `Command failed: bash -c <body>` line).
- Drop `errType: err.constructor.name` (always 'Error' in production) and
  replace with `nodeType: 'bash' | 'script'` for actual discriminating value.
- Replace chained `||` + ternary in diagnostic selection with explicit
  if/else for readability.
- Simplify exit suffix guard: `err.code != null` instead of double typeof.
- Make stderrTail emptiness check explicit.
- Drop `RawSubprocessError` export (no external consumer) and widen `code`
  to `number | string | null` to mirror Node's ExecFileException.
- Tighten test length bound to <2100 (was <4000 / <2200) so a doubling of
  SUBPROCESS_ERROR_MAX_CHARS would actually trip the assertion.
- Replace broad regex assertion with `.toContain('[eval]')` — the location
  marker is the strongest signal that diagnostic content survived.
- Strip issue-number citations from describe block and test comments.
- Update script-nodes.md to distinguish stderr behavior on success vs
  failure paths.
- Add CHANGELOG Unreleased / Fixed entry for the user-visible change.

* chore(workflows): regenerate bundled defaults after Tier 6 picks

* docs: add CHANGELOG entry for Tier 6 workflow polish batch

---------

Co-authored-by: CauchYoung <2024302072042@whu.edu.cn>
Co-authored-by: laplace young <yangqk12@whu.edu.cn>
Co-authored-by: Rasmus Widing <rasmus.widing@gmail.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Co-authored-by: Fadi Ai <MrFadiAi@users.noreply.github.com>
Co-authored-by: Cole Medin <cole@dynamous.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: cjnprospa <sirhcle.j23@gmail.com>
bluedevilcollectibles added a commit to bluedevilcollectibles/bdc-harness that referenced this pull request May 16, 2026
… [WO-160] (#86)

## Summary

Triage of pre-existing failure surfaced during MB's WO-153 testing
(unrelated to that PR; was already broken on dev).

Closes bluedevilcollectibles/bdc-xo#160

## What the test was testing

`packages/workflows/src/dag-executor.test.ts` -- `failure message strips
the "Command failed: bun -e <body>" prefix and stays small` (added in
fccfe42 / PR coleam00#1393 / fix coleam00#1389).

It guards subprocess-failure sanitization: when an inline `bun -e
<multi-KB-script>` fails with a parse error, the user-visible
`node_failed` event must (a) carry the `Script node '<id>' failed`
prefix, (b) NOT leak the raw `Command failed: bun -e <full body>`
message, (c) stay under 2.1 KB, and (d) still contain the `[eval]`
source marker. Real value -- this is the SUBPROCESS_ERROR_MAX_CHARS
guard.

## Why broken

Platform-fragile on Windows dev boxes, not a real production bug.

Root cause: on Windows, `npm`-installed bun resolves to `C:\...\bun.cmd`
(a shim), not `bun.exe`. Node's `execFile('bun', ['--no-env-file', '-e',
<3.2KB-script>])` then invokes the cmd shim. The shim mangles or
truncates the multi-KB `-e` argument, so Bun receives nothing
meaningful, exits 0 with empty stdout/stderr, and the dag-executor takes
the success branch -- it emits `node_completed` instead of
`node_failed`, and the test's `eventCalls.find(... 'node_failed' ...)`
returns undefined.

Verified directly:
- `bun --no-env-file -e '<padding>; const x = "marker"; this is not
valid javascript'` from PowerShell: exits 1, stderr ~422 bytes, contains
`[eval]` marker. Works.
- Same script via `execFile('bun', ...)` from Node on Windows: NO throw,
empty stdout, empty stderr, exit 0. Broken.

Production runtime is the Linux `archon-app` Docker container where
`bun` is a single binary executed directly -- the cmd-shim path doesn't
exist, the test passes there, and the production guard works.

## Action taken

`it` -> `it.skipIf(isWindows)` with a 9-line comment explaining the
cmd-shim mechanism + pointer to follow-up. Skipped only on Windows;
still runs on Linux CI where the assertion is meaningful.

Follow-up filed: **bluedevilcollectibles/bdc-xo#186** -- rewrite using a
named-script fixture (writeFile to `.archon/scripts/`) instead of inline
`-e`, which bypasses the shim and restores cross-platform coverage.

## Test plan

- [x] `cd packages/workflows && bun test src/dag-executor.test.ts` --
before: 222 pass, 1 fail (this test). After: 222 pass, 1 skip, 0 fail.
- [ ] Linux CI runs the test unskipped and it passes (existing behavior
pre-WO-160).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Workflow node failure messages dump entire script body — make errors actionable for self-correcting agents

1 participant