You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What broke: When a bash node references $nodeId.output from an upstream LLM node whose output is large (~40KB+), the substituted value reaching the bash subprocess is corrupted. The downstream consumer fails with a parse / shape error even though the consumer handles the same input correctly when invoked directly with the same stdin.
When it started (if known): pre-existing; surfaces consistently on the maintainer-standup workflow whose persist node consumes the synthesize node's brief + state JSON. The synth output grew past ~30-40KB as the project's PR/issue count grew, and the bash node started failing reliably.
Severity: major
Steps to Reproduce
Run bun run cli workflow run maintainer-standup on a repo with enough activity that the synth output is >30KB. (Today's run: 42KB synth output.)
The synthesize node completes successfully and emits a valid brief + state JSON between ARCHON_STATE_JSON_BEGIN/END markers.
The persist bash node fails with Bash node 'persist' failed [exit 1]: ... and the run is reported as failed.
Save the exact synth output the persist node should have received (extract from the run log between the # Maintainer Standup heading and the first ARCHON_STATE_JSON_END marker).
Run the persist script manually on that input: cat /tmp/synth-output.txt | bun .archon/scripts/maintainer-standup-persist.ts
It succeeds. Brief + state are written correctly. The persist script handled the input fine; the bash node never delivered the same input to it.
Expected vs Actual
Expected: $synthesize.output substitution in a bash node delivers the upstream output as-is to the subprocess's environment / stdin / wherever the bash code routes it. A consumer that works on the same string read from stdin manually should work the same when invoked via the bash node.
Actual: For large outputs (~42KB confirmed; threshold unknown), the bash node's substitution silently corrupts the value en route to the subprocess. The downstream consumer sees malformed input.
Affected node
- id: persistdepends_on: [synthesize]timeout: 30000bash: | set -uo pipefail RAW=$synthesize.output printf '%s' "$RAW" | bun .archon/scripts/maintainer-standup-persist.ts
The framework substitutes $synthesize.output into the bash code as a literal before bash interprets RAW=.... For multi-line, multi-KB strings the RAW= assignment without surrounding quotes (RAW="...") word-splits and glob-expands the substituted text. With special characters (the synth output contains markdown, embedded JSON braces, parentheses, asterisks, $ signs in narrative, etc.) the assignment can fail or produce a truncated value.
Even if RAW="$synthesize.output" (with quotes) is the right pattern, the substitution layer needs to escape any embedded " in the upstream output or the closing quote ends early.
User Flow
Workflow author Archon bash subprocess
─────────────── ────── ───────────────
defines bash node ────────▶ schedule node
substitute $synthesize.output ─▶ `RAW=<42KB of mixed text>`
◀── bash parse error or
truncated/mangled $RAW
[X] persist consumer sees broken input → exit 1
sees workflow failure ◀── DAG node failed
Environment
Platform: CLI (also reproducible from web)
Database: SQLite
Running in worktree? No (worktree.enabled: false)
OS: macOS / Linux (shell behaviour is the issue, not OS-specific)
Logs
{"level":50,"module":"workflow.dag-executor","exitCode":1,"killed":false,
"stderrTail":"...truncated state JSON...ARCHON_STATE_JSON_END\n--- END raw output ---",
"nodeId":"persist","nodeType":"bash","isTimeout":false,"msg":"dag_node_failed"}
[persist] Failed: Bash node 'persist' failed [exit 1]: ...
The script's own PERSIST FAILED stderr is what's quoted in the log — meaning the persist script DID run, but received corrupted stdin (or no stdin at all, with the bash code itself crashing before the pipe). Either failure mode is the same root cause: the upstream substitution didn't deliver the value cleanly.
Impact
Affected workflows/commands: any workflow where a bash node consumes a multi-KB LLM node's output via $nodeId.output substitution. maintainer-standup is the most prominent — fails on every run as the synth output grows. Also affects any custom workflow that pipes a large prompt result into a script via bash.
Reproduction rate: Reliable once the source output exceeds the threshold (today: 42KB → fail; earlier runs at ~20KB → pass).
Workaround: Extract the synth output from the run log manually and pipe it directly into the persist script. Tedious but possible because the data IS in the log.
Data loss risk: No (raw run output is preserved on disk).
Pass $nodeId.output via subprocess env vars, not via bash code substitution. The existing shellSafe: true plumbing from fix(workflows): pass user-controlled variables via env vars in bash nodes #1651 already does this for user-controlled variables ($USER_MESSAGE, $ARGUMENTS, $LOOP_USER_INPUT, ...) — extend it to cover $nodeId.output references when the consuming node is bash:. The bash code would then reference $SYNTHESIZE_OUTPUT from env, where Archon set it via the subprocess env: block. No shell-quoting issue, no size limit beyond ARG_MAX.
Quote the substitution as a single shell-safe argument. When substituting $nodeId.output into a bash node's code, wrap it as a single-quoted shell string with ' escaped — i.e. emit RAW=$(printf '%s' '$SYNTH_OUTPUT_PLACEHOLDER') where the placeholder is the substituted text with embedded ' escaped via '"'"'. Heavy-handed but correct for arbitrary text content.
Option 1 aligns with the precedent already set by #1651 and is the cleaner long-term direction. Option 2 is a backstop that works without changing the substitution contract.
A regression test should exercise a bash node consuming a $nodeId.output of at least 50KB of mixed-content text (markdown + embedded JSON + special chars: $, ", ', backticks, parens) and assert the bash subprocess receives the exact bytes.
Reproducer artifact
Today's exact failing synth output (42,695 bytes) is at /Users/rasmus/.claude/projects/-Users-rasmus-Projects-cole-Archon/c8052aa7-3a9a-46e1-9516-c8abd374f716/tool-results/bxccgh2b7.txt locally — happy to attach as a fixture if useful for the test suite.
Summary
$nodeId.outputfrom an upstream LLM node whose output is large (~40KB+), the substituted value reaching the bash subprocess is corrupted. The downstream consumer fails with a parse / shape error even though the consumer handles the same input correctly when invoked directly with the same stdin.maintainer-standupworkflow whosepersistnode consumes thesynthesizenode's brief + state JSON. The synth output grew past ~30-40KB as the project's PR/issue count grew, and the bash node started failing reliably.majorSteps to Reproduce
bun run cli workflow run maintainer-standupon a repo with enough activity that the synth output is >30KB. (Today's run: 42KB synth output.)synthesizenode completes successfully and emits a valid brief + state JSON betweenARCHON_STATE_JSON_BEGIN/ENDmarkers.persistbash node fails withBash node 'persist' failed [exit 1]: ...and the run is reported as failed.# Maintainer Standupheading and the firstARCHON_STATE_JSON_ENDmarker).cat /tmp/synth-output.txt | bun .archon/scripts/maintainer-standup-persist.tsExpected vs Actual
$synthesize.outputsubstitution in a bash node delivers the upstream output as-is to the subprocess's environment / stdin / wherever the bash code routes it. A consumer that works on the same string read from stdin manually should work the same when invoked via the bash node.Affected node
The framework substitutes
$synthesize.outputinto the bash code as a literal before bash interpretsRAW=.... For multi-line, multi-KB strings theRAW=assignment without surrounding quotes (RAW="...") word-splits and glob-expands the substituted text. With special characters (the synth output contains markdown, embedded JSON braces, parentheses, asterisks,$signs in narrative, etc.) the assignment can fail or produce a truncated value.Even if
RAW="$synthesize.output"(with quotes) is the right pattern, the substitution layer needs to escape any embedded"in the upstream output or the closing quote ends early.User Flow
Environment
worktree.enabled: false)Logs
The script's own
PERSIST FAILEDstderr is what's quoted in the log — meaning the persist script DID run, but received corrupted stdin (or no stdin at all, with the bash code itself crashing before the pipe). Either failure mode is the same root cause: the upstream substitution didn't deliver the value cleanly.Impact
$nodeId.outputsubstitution.maintainer-standupis the most prominent — fails on every run as the synth output grows. Also affects any custom workflow that pipes a large prompt result into a script via bash.Scope
workflowsworkflows:dag-executor(bash node execution path),workflows:executor-shared(substituteWorkflowVariables / substituteNodeOutputRefs)Proposed direction
Two possible fixes, either or both:
Pass
$nodeId.outputvia subprocess env vars, not via bash code substitution. The existingshellSafe: trueplumbing from fix(workflows): pass user-controlled variables via env vars in bash nodes #1651 already does this for user-controlled variables ($USER_MESSAGE,$ARGUMENTS,$LOOP_USER_INPUT, ...) — extend it to cover$nodeId.outputreferences when the consuming node isbash:. The bash code would then reference$SYNTHESIZE_OUTPUTfrom env, where Archon set it via the subprocessenv:block. No shell-quoting issue, no size limit beyondARG_MAX.Quote the substitution as a single shell-safe argument. When substituting
$nodeId.outputinto a bash node's code, wrap it as a single-quoted shell string with'escaped — i.e. emitRAW=$(printf '%s' '$SYNTH_OUTPUT_PLACEHOLDER')where the placeholder is the substituted text with embedded'escaped via'"'"'. Heavy-handed but correct for arbitrary text content.Option 1 aligns with the precedent already set by #1651 and is the cleaner long-term direction. Option 2 is a backstop that works without changing the substitution contract.
A regression test should exercise a bash node consuming a
$nodeId.outputof at least 50KB of mixed-content text (markdown + embedded JSON + special chars:$,",', backticks, parens) and assert the bash subprocess receives the exact bytes.Reproducer artifact
Today's exact failing synth output (42,695 bytes) is at
/Users/rasmus/.claude/projects/-Users-rasmus-Projects-cole-Archon/c8052aa7-3a9a-46e1-9516-c8abd374f716/tool-results/bxccgh2b7.txtlocally — happy to attach as a fixture if useful for the test suite.