Skip to content

bug(workflows): bash nodes silently fail on Windows when bash resolves to WSL launcher (System32\bash.exe) — $VAR expansion broken in -c mode #1326

@atlas-architect

Description

@atlas-architect

Summary

On Windows, Archon bash nodes fail silently when bash resolves to C:\Windows\System32\bash.exe (the WSL launcher shipped with Windows). Any ${VAR} reference in the bash script expands to empty string, causing downstream logic to fail. The failure appears as "dir does not exist" or empty path errors, which is misleading — the actual root cause is that variable expansion doesn't work in bash.exe -c 'script' when bash is WSL's launcher.

Impact: every bash node in every workflow fails silently on affected Windows machines. Gather-context succeeds (LLM node), but the first downstream bash node fails. Looks like a workflow YAML bug or user error; actually a shell-resolution bug.

Repro

Minimal workflow YAML:

name: repro
nodes:
  - id: gather
    prompt: "Return JSON {\"name\": \"hello\"}"
    model: haiku
    output_format:
      type: object
      properties:
        name: { type: string }
      required: [name]
  - id: bash-test
    bash: |
      set -e
      X=$gather.output.name
      echo "X=[$X]"
      [ -n "$X" ] && echo "OK" || echo "FAIL"
    depends_on: [gather]

Fire on Windows via archon workflow run repro. On affected machines, bash-test fails with X=[] and exits non-zero. Same YAML runs fine on macOS/Linux and on Windows machines where bash in PATH resolves to Git Bash.

Root cause

Windows CreateProcess search order for bare command names (no absolute path):

  1. Application's directory
  2. Current directory
  3. System directory (C:\Windows\System32) ← bash.exe found here (WSL launcher)
  4. 16-bit system directory
  5. Windows directory
  6. PATH environment variable ← Git Bash's bash.exe is here (if installed)

Because System32 is searched BEFORE PATH, Bun's child_process.spawn('bash', [...]) always resolves bash to C:\Windows\System32\bash.exe on any Windows install that has WSL enabled, even when Git Bash's C:\Program Files\Git\bin is prepended to PATH. Get-Command bash in PowerShell uses a different (PATH-respecting) lookup, so users verify their PATH shows Git Bash and assume their shell resolution is correct — but Bun doesn't see that resolution.

Once bash.exe resolves to WSL launcher:

C:\Windows\System32\bash.exe -c 'VAR=hello; echo "$VAR"'
→ []    (empty, $VAR not expanded)

The WSL launcher's -c argument handling strips $VAR references somewhere in the PowerShell→bash.exe→WSL argument passing chain. Known Windows/WSL arg-passing quirk. Result: every ${VAR} in Archon bash node scripts evaluates to empty.

Additionally, even if variable expansion worked, WSL bash mounts C: at /mnt/c/ by default, not /c/ — so path conventions like /c/Dev/hcr/hcr-els (Git Bash / MSYS2 convention) don't resolve in WSL regardless of expansion.

Why it's intermittent / hard to diagnose

Users who once ran their daemon from a context that had Git Bash early in PATH (VS Code integrated terminal with Git Bash default shell, a pre-configured PowerShell profile, an admin session with modified PATH, etc.) end up with a long-lived daemon that inherited that PATH. Bun's child_process.spawn for those daemons DOES find Git Bash (for reasons I don't fully understand — maybe bun on Windows uses different resolution than plain CreateProcess). Those daemons keep working fine indefinitely. When that daemon is eventually killed and restarted from a default PowerShell session, the new daemon hits this bug.

In our case: fires #1-20 of a long-running workflow all succeeded. Daemon restart at session end → fire #21 broke, reproduces every time afterward regardless of PATH modifications.

Expected behavior

Bash nodes execute their script with working ${VAR} expansion. The _DirExistenceCheck, wc -l < $TARGET/file, and similar simple bash idioms should behave identically to a macOS/Linux run.

Actual behavior

Every ${VAR} reference evaluates to empty. Paths built from variables are empty. Downstream ls, wc, etc. fail with "No such file or directory" errors. if [ ! -d "$EMPTY" ] is TRUE, causing early exit from defensive checks.

Suggested fixes (ordered by complexity)

Option A — resolve bash through PATH lookup explicitly, not via CreateProcess default search. Before spawning bash for a bash node, walk PATH in code and use the first bash.exe found, passed as an absolute path. Would bypass the System32-first quirk.

Option B — prefer C:\Program Files\Git\bin\bash.exe on Windows when it exists. Git Bash is the de-facto standard for Windows dev shells and is what Archon workflow scripts target by convention (Unix paths, /c/ style). Hard-coding a check for Git Bash first on Windows would make the intended behavior the actual behavior.

Option C — document the requirement and provide a setup check. Least invasive: at daemon startup, detect the bash.exe that will be spawned (via CreateProcess search order emulation), verify it's not System32's WSL launcher, emit a loud warning + doc link to install Git Bash if the check fails. Users can then install Git Bash and be directed to fix their setup.

Option A is the cleanest. Option B is a shortcut that works for the vast majority of real Windows setups. Option C is documentation-only.

Workaround we adopted (in our fork-equivalent use)

None of the user-facing workarounds work reliably:

  • $env:Path modification in PS parent session → Bun still resolves via CreateProcess, ignores PATH
  • Git Bash terminal running daemon → same, Bun uses CreateProcess
  • bun install / bun link refresh → unrelated, doesn't touch bash resolution

Planned workaround until fix lands: place a symlink bash.exe → Git Bash inside Archon's own directory so CreateProcess (#2 current directory) finds it before System32. Hacky but works locally.

Environment

  • Windows 11 Pro 10.0.26200
  • PowerShell 7.6.0 Core
  • Bun 1.3.12
  • Archon main (83c119a, tested also on d89bc76 — identical failure, commit-agnostic)
  • WSL present (Ubuntu distro), required for other tools (Codex CLI)
  • Git for Windows installed with C:\Program Files\Git\bin\bash.exe, but NOT on system PATH by default

Repro timeline

Full debug log is in our fleet journal ([SQI#348-R3, L155 when banked], 14-step scientific-method isolation that ruled out Archon code regression, YAML changes, my shell-tool choice, and Git Bash vs WSL daemon context — eventually narrowing to CreateProcess System32 priority). Happy to provide the full debug thread if useful for regression-test creation.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority - Backlog, when time permitsarea: workflowsWorkflow enginebugSomething is brokeneffort/mediumFew files, one domain or module, some coordination needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions