Skip to content

fix(workflows): resolve bash via absolute path on Windows (#1326)#1470

Closed
atlas-architect wants to merge 8 commits into
coleam00:devfrom
atlas-architect:fix/windows-bash-resolve-path-dev
Closed

fix(workflows): resolve bash via absolute path on Windows (#1326)#1470
atlas-architect wants to merge 8 commits into
coleam00:devfrom
atlas-architect:fix/windows-bash-resolve-path-dev

Conversation

@atlas-architect

@atlas-architect atlas-architect commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Note for reviewer: This is the re-opened PR per @Wirasm's request on closed #1342, now targeting dev instead of main. Single fix commit cherry-picked from the original branch onto fresh dev. One small import-block conflict was resolved (kept dev's resolve as resolvePath alias + discoverScriptsForCwd rename, added resolveBashPath to the @archon/git named imports). Function call sites at dag-executor.ts:1329 + :2117 already merged cleanly. bun run type-check re-run green on the dev rebase before pushing.

Fixes #1326.

Summary

  • Problem: On Windows, child_process.execFile('bash', ...) resolves to C:\Windows\System32\bash.exe (WSL launcher) instead of Git Bash, because Windows CreateProcess searches System32 BEFORE PATH.
  • Why it matters: WSL bash drops ${VAR} expansion in -c mode and uses /mnt/c/ paths instead of Git Bash's /c/, breaking every workflow bash node whose token substitution depends on env vars or path strings. Archon CLI on Windows fails with confusing "empty variable" symptoms.
  • What changed: New resolveBashPath() helper in @archon/git/exec returns ARCHON_BASH_PATH env override -> C:\Program Files\Git\bin\bash.exe Windows default -> bash (Linux/macOS PATH unchanged). Both execFileAsync('bash', ...) sites in dag-executor.ts (bash node + loop node until-bash check) call through the helper.
  • What did NOT change (scope boundary): Linux / macOS behavior unchanged. WSL bash itself is not patched (only avoided). No changes to bash-script syntax, variable substitution rules, or path normalization in caller code. Bun runtime / TypeScript compilation paths untouched.

Label Snapshot

  • Risk: risk: low
  • Size: size: S
  • Scope: core
  • Module: git:exec (helper) + workflows:executor (call sites)

Change Metadata

  • Change type: bug
  • Primary scope: core

Linked Issue

Validation Evidence (required)

$ bun run type-check
# Output: 10 packages green (no TypeScript errors) - re-verified on dev rebase
  • Evidence provided: Type-check pass on dev rebase + before/after smoke test (Windows 11 + Git for Windows 2.47, see Human Verification below).
  • Commands intentionally skipped: bun run lint, bun run format:check, bun run test not run by submitter - fix is surgical (1 helper function + 2 call-site updates). Happy to run on request, or rely on CI.

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No (reads process.env.ARCHON_BASH_PATH if set; no file system writes - only spawns the existing bash binary at a more deterministic path)

Compatibility / Migration

  • Backward compatible? Yes - Linux / macOS behavior unchanged (still resolves bash via PATH)
  • Config/env changes? Optional - ARCHON_BASH_PATH is a NEW optional env var; defaults work for standard Git for Windows installs.
  • Database migration needed? No

Human Verification (required)

What was personally validated beyond CI:

  • Verified scenarios:

    • Windows 11 + Git for Windows 2.47 default install -> ${VAR} expansion works, /c/ path convention preserved
    • Reproduction of the bug pre-patch (bare 'bash') -> confirmed empty ${VAR} + /mnt/c/ paths
  • Edge cases checked:

    • User-scope Git install (%LOCALAPPDATA%\Programs\Git\...) -> flagged as ARCHON_BASH_PATH use case
    • Linux fall-through (helper returns bare 'bash') -> unchanged PATH behavior
  • What was NOT verified:

    • Windows Server with WSL-only installations
    • macOS (no Mac in test environment; assumed unchanged since helper returns 'bash' on non-Windows)

Smoke test output (after patch):

Resolved bash: C:\Program Files\Git\bin\bash.exe
TARGET_FROM_PARENT=test-value-123          (${VAR} expansion works)
PATH_CONVENTION=/c/Dev/platform/Archon/... (/c/ convention preserved)

Before the patch (bare 'bash'):

Resolved bash: bash
TARGET_FROM_PARENT=                         (WSL bash dropped the variable)
PATH_CONVENTION=/mnt/c/Dev/...              (WSL mount convention)

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: Any Archon workflow with bash: nodes when CLI runs on Windows. Loop nodes with until-bash conditions.
  • Potential unintended effects: Users with both Git Bash and a custom bash.exe (higher on PATH) will now resolve to Git Bash specifically. Was non-deterministic before. ARCHON_BASH_PATH is the override.
  • Guardrails/monitoring for early detection: Bash node error logs now show the resolved absolute path (was opaque before). Failure mode changes from "silent variable drop" to "explicit binary path mismatch" - easier to diagnose.

Rollback Plan (required)

  • Fast rollback command/path: git revert <commit-sha> - single commit, no migration. Reverting reinstates bare 'bash' resolution behavior.
  • Feature flags or config toggles: ARCHON_BASH_PATH=bash env var would partially undo the fix at runtime without code change.
  • Observable failure symptoms: Post-rollback, Windows users would see the System32-first pathology again (empty ${VAR}, /mnt/c/ paths). Linux/macOS unaffected.

Risks and Mitigations

  • Risk: Hardcoded Windows path C:\Program Files\Git\bin\bash.exe may not exist on user-scope installer or Windows Server with no Git for Windows.
    • Mitigation: ARCHON_BASH_PATH env var provides clean override. Helper falls through to PATH search if hardcoded path doesn't exist (preserves current behavior in that edge case). Open to walking PATH manually for the first non-System32 bash.exe if reviewer prefers a fancier heuristic - env-var-plus-default kept the fix surgical.
  • Risk: Future contributor adds a third execFileAsync('bash', ...) site without going through resolveBashPath().
    • Mitigation: Helper exported from @archon/git/exec for discoverability. Could add ESLint rule banning bare 'bash' literal in future hardening pass if recurrence is observed.

Notes for reviewers

The Windows default (C:\Program Files\Git\bin\bash.exe) is the standard install location for Git for Windows. If you run into edge cases on Windows Server or custom Git installs, ARCHON_BASH_PATH gives a clean override without requiring a new patch.

Happy to iterate on the detection heuristic if you'd prefer something fancier (e.g. walking PATH manually to find the first non-System32 bash.exe), but the env-var-plus-sensible-default approach keeps the fix surgical and overridable.

Summary by CodeRabbit

  • New Features

    • Platform-aware resolution of the bash executable with an environment-variable override.
    • Two new marketplace items added: "Idea to Work Orders" and "Smart GitLab MR Review".
  • Bug Fixes

    • Fail-fast reporting for missing or inaccessible bash executables.
    • Improved subprocess invocation on Windows to avoid incorrect system bash selection.
  • Documentation

    • Added guide note for workflows that do not mutate the checkout.
  • Chores

    • Bumped project and package versions to 0.3.12 and updated release notes.

Review Change Stack

)

Fixes coleam00#1326.

On Windows, `child_process.execFile('bash', ...)` fails to use Git Bash
even when Git Bash is first on PATH. Windows CreateProcess searches the
System32 directory BEFORE consulting the PATH env var, so bare `'bash'`
resolves to `C:\Windows\System32\bash.exe` — the WSL launcher. The WSL
bash has two pathologies that break workflow bash nodes:

1. `${VAR}` expansion is broken when bash is invoked in `-c` mode via
   CreateProcess arg-passing. Variables emit as empty strings regardless
   of environment.
2. Path convention is `/mnt/c/...` rather than Git Bash's `/c/...`, so
   any absolute path substitutions (worktree paths, cwd, repo paths)
   mismatch expectations of downstream bash nodes.

Symptom seen in A00 workflow: `precheck-worktree` node fails fast with
TARGET empty, even though the token substitution correctly wrote
TARGET=<path> into the script.

Add `resolveBashPath()` helper in `@archon/git/exec` that returns:

- `process.env.ARCHON_BASH_PATH` if set (escape hatch for non-standard
  Git Bash installs, e.g. user-scope installer at
  `%LOCALAPPDATA%\Programs\Git\bin\bash.exe`)
- `C:\Program Files\Git\bin\bash.exe` on Windows (Git Bash default)
- `bash` on Linux/macOS (unchanged PATH behavior)

Update the two `execFileAsync('bash', ...)` call sites in
`dag-executor.ts` (bash node + loop node until-bash check) to call
through `resolveBashPath()`.

Local smoke test on Windows 11 + Git Bash 2.47:

```
Resolved bash: C:\Program Files\Git\bin\bash.exe
TARGET_FROM_PARENT=test-value-123          ← ${VAR} expansion works
PATH_CONVENTION=/c/Dev/platform/Archon/...  ← /c/ convention preserved
```

Before the patch (bare `'bash'`), same test returned:

```
Resolved bash: bash
TARGET_FROM_PARENT=                         ← WSL bash dropped the variable
PATH_CONVENTION=/mnt/c/Dev/...              ← WSL mount convention
```

Type-check green across all 9 packages.

- Linux/macOS: no behavioral change (still resolves `bash` via PATH).
- Windows with standard Git for Windows install: works out of the box.
- Windows with user-scope Git install: override via ARCHON_BASH_PATH.
- Docker containers (Linux): no change needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 28, 2026

Copy link
Copy Markdown

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Adds a platform-aware exported utility resolveBashPath() (honoring ARCHON_BASH_PATH) and uses it in the workflows executor to spawn bash subprocesses with stricter ENOENT/EACCES handling; includes tests, loader validation, marketplace/docs additions, release bumps, Homebrew update, and CI/workflow tweaks.

Changes

Primary changes

Layer / File(s) Summary
Git: resolveBashPath implementation and export
packages/git/src/exec.ts, packages/git/package.json
Add resolveBashPath() with ARCHON_BASH_PATH override validation and platform defaults; bump git package version.
Workflows executor: use resolved bash path
packages/workflows/src/dag-executor.ts, packages/workflows/src/dag-executor.test.ts
Replace hardcoded 'bash' with git.resolveBashPath() when spawning bash; refine ENOENT/EACCES handling to fail fast for loop nodes; update imports and tests to expect resolved path and always restore spies.
Tests & loader validations
packages/workflows/src/loader.test.ts, packages/workflows/src/loader.ts, packages/workflows/src/executor-shared.ts
Add loader validation test for unknown node provider; clarify tags comment and $LOOP_PREV_OUTPUT docs.
Release metadata and manifests
CHANGELOG.md, package.json, homebrew/archon.rb, packages/*/package.json
Add 0.3.12 changelog entry, bump root and many package versions to 0.3.12, and update Homebrew formula SHAs/URLs.
Docs, marketplace entries, and CI/workflow tweaks
packages/docs-web/src/data/marketplace.ts, packages/docs-web/src/content/docs/guides/authoring-workflows.md, .archon/workflows/maintainer/marketplace-pr-review-and-merge.yaml, .github/workflows/marketplace-auto-review.yml
Add two marketplace entries, document mutates_checkout: false, use an APPROVE_BODY variable in maintainer workflow, and extend the auto-review trigger to include ready_for_review.

Sequence Diagram(s)

sequenceDiagram
    participant Executor as Dag Executor
    participant GitPkg as git.resolveBashPath()
    participant Spawn as execFileAsync
    participant OS as System (bash)

    Executor->>GitPkg: request bash path
    GitPkg-->>Executor: return resolved path (ARCHON_BASH_PATH or platform default)
    Executor->>Spawn: execFileAsync(resolvedPath, args, env)
    Spawn->>OS: spawn process at resolvedPath
    OS-->>Spawn: exit code / errors
    Spawn-->>Executor: result / ENOENT|EACCES -> error thrown or logged
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hopped the PATH and sniffed the trail,
ARCHON_BASH_PATH unfurled my sail.
No System32 mischief, no WSL maze,
A steady bash for all our days. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: resolving bash via absolute path on Windows to avoid System32 resolution issues.
Description check ✅ Passed The PR description comprehensively covers all required template sections: summary with bullets, architecture impact, labels, change metadata, linked issues, validation evidence, security impact, compatibility, human verification, side effects, rollback plan, and risks/mitigations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/workflows/src/dag-executor.ts (2)

1323-1327: ⚠️ Potential issue | 🟡 Minor

Update ENOENT message to reflect resolved absolute bash path.

After Line 1323, the ENOENT branch (Line 1376) can be inaccurate because resolution may use an absolute path, not PATH search. This makes troubleshooting harder.

💡 Suggested patch
-    const { stdout, stderr } = await execFileAsync(resolveBashPath(), ['-c', finalScript], {
+    const bashPath = resolveBashPath();
+    const { stdout, stderr } = await execFileAsync(bashPath, ['-c', finalScript], {
       cwd,
       timeout,
       env: subprocessEnv,
     });
@@
-    } else if (err.message?.includes('ENOENT')) {
-      errorMsg = `Bash node '${node.id}' failed: bash executable not found in PATH`;
+    } else if (err.message?.includes('ENOENT')) {
+      errorMsg = `Bash node '${node.id}' failed: bash executable not found at '${bashPath}'. Set ARCHON_BASH_PATH if Git Bash is installed elsewhere.`;

As per coding guidelines, "handle expected failures gracefully ... surface ... errors to users for actionable issues."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.ts` around lines 1323 - 1327, The
catch/ENOENT branch that handles failures from the execFileAsync call using
resolveBashPath() should include the resolved absolute bash path in its error
message to avoid misleading PATH-based suggestions; update the error handling
around the execFileAsync(resolveBashPath(), ['-c', finalScript], {cwd, timeout,
env: subprocessEnv}) invocation to capture the value returned by
resolveBashPath() (store it in a local variable) and, when errno === 'ENOENT',
log/throw an error that mentions that exact resolved path (and the cwd) so users
see the absolute path that was attempted rather than implying a PATH search.

2111-2123: ⚠️ Potential issue | 🟠 Major

Don’t swallow until_bash infrastructure errors as “condition not met.”

At Line 2111, system-level execution failures (e.g., ENOENT, EACCES) are treated like a normal false condition. That can hide environment breakage and waste iterations instead of failing immediately.

💡 Suggested patch
-        await execFileAsync(resolveBashPath(), ['-c', substitutedBash], { cwd });
+        const bashPath = resolveBashPath();
+        await execFileAsync(bashPath, ['-c', substitutedBash], { cwd });
         bashComplete = true; // exit 0 = complete
       } catch (e) {
         const bashErr = e as NodeJS.ErrnoException;
-        // ENOENT or other system errors are unexpected — log them
-        if (bashErr.code === 'ENOENT') {
-          getLog().warn(
-            { err: bashErr, nodeId: node.id, iteration: i },
-            'loop_node.until_bash_exec_error'
-          );
-        }
-        bashComplete = false; // non-zero exit = not complete
+        if (bashErr.code === 'ENOENT' || bashErr.code === 'EACCES') {
+          throw new Error(
+            `Loop node '${node.id}' until_bash failed: cannot execute '${bashPath}'. Set ARCHON_BASH_PATH if needed.`
+          );
+        }
+        // Non-zero exit from bash script means completion condition is false.
+        bashComplete = false;
       }

As per coding guidelines, "handle expected failures gracefully ... surface ... errors to users for actionable issues."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.ts` around lines 2111 - 2123, The current
catch in the until_bash execution swallows system-level errors (caught around
execFileAsync/resolveBashPath) and treats them as a simple "condition not met"
by setting bashComplete=false; instead detect infrastructure errors (e.g.,
bashErr.code === 'ENOENT' or 'EACCES'), log them as errors via getLog()
including node.id and iteration i, and rethrow or propagate the error so the DAG
run fails immediately rather than looping; update the catch handling around
execFileAsync/resolveBashPath to only treat non-system (command exit) failures
as condition=false (bashComplete=false) while surfacing system-level errors
upward.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 1323-1327: The catch/ENOENT branch that handles failures from the
execFileAsync call using resolveBashPath() should include the resolved absolute
bash path in its error message to avoid misleading PATH-based suggestions;
update the error handling around the execFileAsync(resolveBashPath(), ['-c',
finalScript], {cwd, timeout, env: subprocessEnv}) invocation to capture the
value returned by resolveBashPath() (store it in a local variable) and, when
errno === 'ENOENT', log/throw an error that mentions that exact resolved path
(and the cwd) so users see the absolute path that was attempted rather than
implying a PATH search.
- Around line 2111-2123: The current catch in the until_bash execution swallows
system-level errors (caught around execFileAsync/resolveBashPath) and treats
them as a simple "condition not met" by setting bashComplete=false; instead
detect infrastructure errors (e.g., bashErr.code === 'ENOENT' or 'EACCES'), log
them as errors via getLog() including node.id and iteration i, and rethrow or
propagate the error so the DAG run fails immediately rather than looping; update
the catch handling around execFileAsync/resolveBashPath to only treat non-system
(command exit) failures as condition=false (bashComplete=false) while surfacing
system-level errors upward.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dec77c3a-d03c-4163-a0f4-8eec0cc9671f

📥 Commits

Reviewing files that changed from the base of the PR and between 7d06773 and 58921b2.

📒 Files selected for processing (3)
  • packages/git/src/exec.ts
  • packages/git/src/index.ts
  • packages/workflows/src/dag-executor.ts

… cause

Three fixes on top of resolveBashPath introduction:

1. **Test fix (the Windows-CI cascade root cause):** dag-executor.test.ts
   line 1236 asserted literal `'bash'` for the bash subprocess call. Our
   patch returns `resolveBashPath()` which is `'bash'` on Linux/macOS but
   `'C:\Program Files\Git\bin\bash.exe'` on Windows. The assertion
   failed on Windows, but the spy's `mockRestore()` was called on the
   line AFTER the assertion — so when expect() threw, the spy was never
   restored and bled into subsequent tests, causing 7 unrelated
   script-node tests to fail with mysterious "Use: ok" / "ok" outputs
   from the leaked spy. Wrapped expect + mockRestore in try/finally and
   updated the expected first arg to `git.resolveBashPath()` for
   platform-awareness.

2. **CodeRabbit minor (line 1322-1377):** Capture `bashPath` once at the
   top of executeBashNode + update the ENOENT error message to include
   the resolved absolute path + suggest ARCHON_BASH_PATH for non-standard
   Git installs. Resolves the misleading "not found in PATH" message
   that no longer reflects how bash is actually being resolved.

3. **CodeRabbit major (line 2112-2128):** Loop node `until_bash` catch
   was treating ENOENT/EACCES as "condition not met" — silently looping
   against a broken bash binary. Now distinguishes system-level errors
   (ENOENT/EACCES = throw immediately, surface env breakage) from
   non-zero script exit (= condition false, keep looping).

Local Windows verification: `bun test src/dag-executor.test.ts` =
199 pass, 0 fail. Type-check green across all 10 packages.

Refs: coleam00#1326 (the Windows bash bug this PR fixes), this
PR's CodeRabbit review (coleam00#1470).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@atlas-architect

Copy link
Copy Markdown
Contributor Author

Pushed 3ea82e9 addressing the CodeRabbit review + Windows CI failure root cause. Three fixes:

1. Test fix (the Windows-CI cascade root cause): the dag-executor.test.ts line 1236 assertion compared the bash command against literal 'bash'. Our patch returns resolveBashPath() which is 'bash' on Linux/macOS but C:\Program Files\Git\bin\bash.exe on Windows. The Windows assertion therefore failed — but execSpy.mockRestore() was on the line after the assertion, so when expect() threw, the spy was never restored and bled into 7 unrelated downstream tests (script nodes > inline bun script output, > non-zero exit code, > timeout kills subprocess, > stderr output, etc. — all hitting the leaked git.execFileAsync spy and producing mysterious 'Use: ok' / 'ok' outputs from the previous test). Wrapped expect + mockRestore in try/finally and updated the expected first arg to git.resolveBashPath() so the assertion is platform-aware.

2. CodeRabbit minor (line 1322-1377): captured bashPath once at the top of executeBashNode() + updated the ENOENT error message to include the resolved path + suggest ARCHON_BASH_PATH for non-standard Git installs (e.g. user-scope installer at %LOCALAPPDATA%\Programs\Git\bin\bash.exe). The previous "not found in PATH" message was misleading after the resolution change.

3. CodeRabbit major (line 2112-2128): loop node until_bash catch was treating ENOENT/EACCES as "condition not met" — silently looping against a broken bash binary. Now distinguishes system-level errors (ENOENT/EACCES → throw immediately, surface env breakage) from non-zero script exit (= condition false, keep looping).

Local Windows verification: bun test src/dag-executor.test.ts = 199 pass, 0 fail. Type-check green across all 10 packages.

Thanks @coderabbitai for the review — the ENOENT message + loop-error-handling catches were both real bugs.

@coderabbitai

coderabbitai Bot commented Apr 28, 2026

Copy link
Copy Markdown

@atlas-architect: Excellent follow-up — the root-cause analysis on the Windows CI cascade is particularly valuable. The leaked spy producing 'Use: ok' / 'ok' outputs in unrelated script-node tests is a classic hard-to-diagnose failure mode, and the try/finally fix is the correct pattern. All three items look solid:

  1. Test robustnesstry/finally around expect + mockRestore ensures the spy is always cleaned up regardless of assertion outcome, and git.resolveBashPath() makes the assertion platform-aware.
  2. ENOENT message — Including the resolved path and the ARCHON_BASH_PATH hint turns a confusing error into an actionable one, especially for user-scope Git installs at %LOCALAPPDATA%\Programs\Git\bin\bash.exe.
  3. Loop-node error handling — Distinguishing system-level failures (ENOENT/EACCES → throw) from logical failures (non-zero exit → keep looping) is the right semantic split; silently looping on a missing binary would have been very difficult to debug in production.

Triggering a re-review of the updated commit.


✏️ Learnings added
Learnt from: atlas-architect
URL: https://github.com/coleam00/Archon/pull/1470

Timestamp: 2026-04-28T21:00:22.937Z
Learning: In `packages/workflows/src/dag-executor.test.ts` (Bun test suite): always wrap `expect(...)` assertions and `spy.mockRestore()` in a `try/finally` block so the spy is restored even when the assertion throws. A leaked `execFileAsync` spy bleeds into downstream unrelated tests and produces misleading outputs (e.g. `'Use: ok'`). Pattern: `try { expect(spy).toHaveBeenCalledWith(...) } finally { spy.mockRestore() }`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: Applies to **/*.{ts,tsx} : Use `execFileAsync` (not `exec`) when calling git or external commands directly from TypeScript; handle expected failures gracefully (missing directories during cleanup); surface git errors to users for actionable issues (conflicts, uncommitted changes)

Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: Applies to **/*.{ts,tsx} : Use `archon/git` functions for git operations; use `execFileAsync` (not `exec`) when calling git directly; never run `git clean -fd` (use `git checkout .` instead)

Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-28T11:00:25.790Z
Learning: Applies to packages/git/src/**/*.ts : archon/git: git operations (worktrees, branches, repos, exec wrappers); depends only on archon/paths
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/workflows/src/dag-executor.ts (1)

2122-2125: Align the new log event key with the repo’s structured naming convention.

loop_node.until_bash_exec_error does not follow {domain}.{action}_{state} with standard states (e.g., _failed). Consider renaming to something like loop.until_bash_failed.

As per coding guidelines: "Structured logging with Pino: use event naming format {domain}.{action}_{state} (standard states: _started, _completed, _failed, _validated, _rejected)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.ts` around lines 2122 - 2125, The log
event key 'loop_node.until_bash_exec_error' doesn't follow the repo's structured
naming ({domain}.{action}_{state}); update the call to getLog().error(...) to
use a compliant event key such as 'loop.until_bash_failed' (preserving the same
metadata: err: bashErr, nodeId: node.id, iteration: i) so callers still receive
bashErr, node.id and i but the event name matches the {domain}.{action}_{state}
convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 1376-1378: The ENOENT branch currently emits a Windows-specific
hint and recomputes the bash path; update it to reuse the already-resolved
loopBashPath variable (instead of recomputing bashPath) and emit a
platform-aware message: on Windows mention setting ARCHON_BASH_PATH and an
example Git Bash path, on POSIX mention installing bash or ensuring it’s on
PATH; include the node id (node.id) and the loopBashPath value in the errorMsg
for clear diagnostics.

---

Nitpick comments:
In `@packages/workflows/src/dag-executor.ts`:
- Around line 2122-2125: The log event key 'loop_node.until_bash_exec_error'
doesn't follow the repo's structured naming ({domain}.{action}_{state}); update
the call to getLog().error(...) to use a compliant event key such as
'loop.until_bash_failed' (preserving the same metadata: err: bashErr, nodeId:
node.id, iteration: i) so callers still receive bashErr, node.id and i but the
event name matches the {domain}.{action}_{state} convention.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e9c5e69-7996-4e80-9a85-e2afdaf88992

📥 Commits

Reviewing files that changed from the base of the PR and between 58921b2 and 3ea82e9.

📒 Files selected for processing (2)
  • packages/workflows/src/dag-executor.test.ts
  • packages/workflows/src/dag-executor.ts

Comment on lines 1376 to 1378
} else if (err.message?.includes('ENOENT')) {
errorMsg = `Bash node '${node.id}' failed: bash executable not found in PATH`;
errorMsg = `Bash node '${node.id}' failed: bash executable not found at '${bashPath}'. Set ARCHON_BASH_PATH if Git Bash is installed elsewhere (e.g. user-scope installer at %LOCALAPPDATA%\\Programs\\Git\\bin\\bash.exe).`;
} else if (err.message?.includes('EACCES')) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use platform-aware ENOENT hints and reuse the already-resolved bash path.

Current ENOENT text is Windows-specific on all platforms, and Line 2127 recomputes the path instead of using loopBashPath (can desync diagnostics if env changes).

🔧 Suggested patch
+function getBashResolutionHint(): string {
+  return process.platform === 'win32'
+    ? 'Set ARCHON_BASH_PATH if Git Bash is installed elsewhere (e.g. user-scope installer at %LOCALAPPDATA%\\Programs\\Git\\bin\\bash.exe).'
+    : "Ensure 'bash' is installed or set ARCHON_BASH_PATH to a valid bash executable.";
+}
+
 ...
-    } else if (err.message?.includes('ENOENT')) {
-      errorMsg = `Bash node '${node.id}' failed: bash executable not found at '${bashPath}'. Set ARCHON_BASH_PATH if Git Bash is installed elsewhere (e.g. user-scope installer at %LOCALAPPDATA%\\Programs\\Git\\bin\\bash.exe).`;
+    } else if (err.message?.includes('ENOENT')) {
+      errorMsg = `Bash node '${node.id}' failed: bash executable not found at '${bashPath}'. ${getBashResolutionHint()}`;
 ...
-          throw new Error(
-            `Loop node '${node.id}' until_bash failed: cannot execute bash at '${resolveBashPath()}' (${bashErr.code}). Set ARCHON_BASH_PATH if Git Bash is installed elsewhere.`
-          );
+          throw new Error(
+            `Loop node '${node.id}' until_bash failed: cannot execute bash at '${loopBashPath}' (${bashErr.code}). ${getBashResolutionHint()}`
+          );

Also applies to: 2126-2128

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/workflows/src/dag-executor.ts` around lines 1376 - 1378, The ENOENT
branch currently emits a Windows-specific hint and recomputes the bash path;
update it to reuse the already-resolved loopBashPath variable (instead of
recomputing bashPath) and emit a platform-aware message: on Windows mention
setting ARCHON_BASH_PATH and an example Git Bash path, on POSIX mention
installing bash or ensuring it’s on PATH; include the node id (node.id) and the
loopBashPath value in the errorMsg for clear diagnostics.

@Wirasm

Wirasm commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Review Summary

Verdict: minor-fixes-needed

Good fix — the Windows bash path resolution is solid and the WSL mis-resolution is a real pain point. Three correctness issues in error-handling paths and one missing docs entry need to be addressed before merge. None are large, but the err.message leak and missing docs entry are user-facing.

Blocking issues

  • packages/docs-web/src/content/docs/reference/configuration.md: ARCHON_BASH_PATH is used in error messages but undocumented. Users will see Set ARCHON_BASH_PATH if Git Bash is installed elsewhere with no docs to reference. Add it to the configuration.md env vars table:
    | `ARCHON_BASH_PATH` | Override the bash executable path used by bash nodes and loop `until_bash`. Default: `'bash'` on Linux/macOS; `'C:\Program Files\Git\bin\bash.exe'` on Windows | -- |

Suggested fixes

  • packages/workflows/src/dag-executor.ts:1375: The else branch still leaks err.message to the user — this is a pre-existing bug this PR propagates further. Replace it with a clean message:

    errorMsg = `Bash node '${node.id}' failed with exit code ${err.code ?? 'unknown'}`;
  • packages/workflows/src/dag-executor.ts:2126: loopBashPath is declared but the error calls resolveBashPath() again. Use the already-resolved variable:

    throw new Error(
      `Loop node '${node.id}' until_bash failed: cannot execute bash at '${loopBashPath}' (${bashErr.code}). Set ARCHON_BASH_PATH if Git Bash is installed elsewhere.`
    );
  • packages/workflows/src/dag-executor.ts:1367: Change err.message?.includes('ENOENT') to err.code === 'ENOENT' — consistent with the EACCES check below and the loop node pattern.

  • packages/workflows/src/dag-executor.ts:1327: Add ENOTDIR to the error condition — if ARCHON_BASH_PATH points to a directory, execFileAsync throws ENOTDIR, not ENOENT:

    } else if (err.message?.includes('ENOENT') || err.code === 'ENOTDIR') {

Minor / nice-to-have

  • Remove the coleam00/Archon#1326 issue references from exec.ts:27 and dag-executor.test.ts:1237 — they add noise and become meaningless over time. The docstring is justified; the test comment is not.
  • The loop node until_bash error branches (EACCES throw, ENOENT behavior change from silent → throw) have no test coverage. Add two cases in the existing test suite.
  • Condense the resolveBashPath docstring from 4 detail layers to 2 — keep the non-obvious WHY (WSL launcher issues), drop the extra context.
  • Add a one-line fast-fail note to guides/loop-nodes.md under "Troubleshooting": "If bash is not found, the loop fails immediately. Set ARCHON_BASH_PATH to resolve this."
  • Add a CHANGELOG entry under [Unreleased] for this fix — it's a real user-visible improvement worth tracking.

Compliments

  • The ENOENT error message in executeBashNode now includes the resolved path and the ARCHON_BASH_PATH fix — much more actionable than the generic "not found in PATH" it replaces.
  • The test's try/finally guard against spy leakage is good discipline that most PRs skip.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact.

@Wirasm

Wirasm commented May 4, 2026

Copy link
Copy Markdown
Collaborator

@atlas-architect related to #1326 — Windows bash path resolution.

@leex279

leex279 commented May 4, 2026

Copy link
Copy Markdown
Collaborator

@atlas-architect this PR looks similar to #1342, which was closed on 2026-04-28 (closed without merging). You may want to read the discussion there before pushing further.

@Wirasm

Wirasm commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Review Summary

Verdict: minor-fixes-needed

This PR lands six focused changes — most importantly, the Windows Git Bash path resolution fix (resolveBashPath()), explicit node-level provider validation, a new mutates_checkout field to allow concurrent runs on the same checkout, and a pass-through simplification for model strings. The code is clean and error handling is solid throughout. A few targeted additions (one test, two doc comments, one docs entry) will close the gaps before merge.

Blocking issues

(none)

Suggested fixes

  • packages/workflows/src/loader.ts:292–303 — Add a test for node-level provider validation. The new loop checking node.provider against registered providers has no coverage. Add a test that uses nodes: [{ id: 'n', command: 'echo', provider: 'claud' }] and asserts result.errors[0].error contains "Node 'n': unknown provider 'claud'".

  • packages/workflows/src/executor.ts:536–549 — Add an explicit test confirming the DB error path is skipped when mutates_checkout: false. Add a test with mutates_checkout: false and getActiveWorkflowRunByPath throwing — assert the failure spy is NOT called and the function returns success.

  • packages/docs-web/src/content/docs/guides/authoring-workflows.md — Add a dedicated explanation of mutates_checkout: false (around line 133, after the tags: block). Explain that it authorises concurrent runs on the same checkout and disables the path-lock guard; safe only when all writes are per-run-scoped. Also note the distinction from worktree.enabled: false.

Minor / nice-to-have

  • packages/workflows/src/dag-executor.ts:1311–1317: Consider adding a file-exists check for the ARCHON_BASH_PATH override with a clear error message if the path is invalid.

  • packages/workflows/src/executor-shared.ts:279: Update the comment to remove the direct fresh_context field name reference. Suggested: "Useful for loops that need to reference what the previous pass produced or why it failed without carrying full session history."

  • packages/workflows/src/loader.ts:394: Update the parenthetical comment from (suppresses keyword inference in the UI) to (opt-out of any auto-inference) to avoid coupling to a specific UI behavior.

  • packages/docs-web/src/content/docs/guides/authoring-workflows.md (error example): The docs show Unknown provider 'claud'. but the actual error is Workflow '<name>': unknown provider '<id>'.. Consider syncing the example block to match the actual output.

Compliments

  • The $LOOP_PREV_OUTPUT test suite is thorough: empty-string, real previous output, and non-loop caller cases are all covered.
  • Error messages are specific and actionable throughout (e.g. Workflow '<name>': unknown provider '<id>'. Registered: <list>) — this makes debugging much easier.
  • The resolveBashPath() fix is cleanly isolated in @archon/git/exec and exported correctly for the two call sites.

Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: code-review, error-handling, test-coverage, comment-quality, docs-impact.

atlas-architect and others added 5 commits May 14, 2026 06:28
Wirasm review 2026-05-13 + CodeRabbit auto-review feedback close-out.

Suggested fixes (3 substantive):
- test/loader: add node-level provider validation test asserting error
  identifies both the node id and the unknown provider id
- git/exec: eagerly validate ARCHON_BASH_PATH override with existsSync;
  throws clear error if path doesn't exist (catches the typo class at
  resolveBashPath() call time instead of as opaque ENOENT inside the
  first bash-node fire). Best-effort check — a path that exists at
  validate-time can still race-disappear before exec, but the common
  case (env-var typo) surfaces immediately.
- docs/authoring-workflows: add mutates_checkout: false entry after the
  tags: block; calls out distinction from worktree.enabled: false

Minor cleanup (3 polish + 1 CodeRabbit):
- executor-shared: drop direct fresh_context field-name reference from
  $LOOP_PREV_OUTPUT comment per Wirasm
- loader: update tags parenthetical "(suppresses keyword inference in
  the UI)" → "(opt-out of any auto-inference)" — UI-agnostic phrasing
- dag-executor: rename log event 'loop_node.until_bash_exec_error' →
  'loop.until_bash_failed' to match repo's {domain}.{action}_{state}
  structured-logging convention (CodeRabbit Nitpick)

Verified passing on individual run: new node-level provider test +
existing 'reject unknown provider' test + existing mutates_checkout
path-lock tests. (Full suite has 5 pre-existing failures in
interactive-loop / cancel-node mock-isolation tests — they pass when
run individually with `-t`; unrelated to this PR.)

Items already addressed in prior commits:
- Test for mutates_checkout: false DB-skip (executor.test.ts:302 has
  'skips path-lock check when mutates_checkout is false')
- Platform-aware ENOENT message (dag-executor.ts:1414 includes
  %LOCALAPPDATA% Git-Bash example path)
- Docs error example syncs to actual loader.ts emit format

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@atlas-architect

Copy link
Copy Markdown
Contributor Author

@Wirasm thanks for the thorough review — apologies for the 13-day gap (our fault entirely; we've upgraded our daily-ritual to surface stale-review-pending PRs after this).

Pushed 9bd628fe addressing your minor-fixes-needed items + the CodeRabbit Nitpick:

Suggested fixes (substantive):

  • loader.ts node-level provider test — added it('should reject unknown provider on a node at load time', ...) asserting the error identifies both the node id (bad-node) and the unknown provider ('claud'). Existing workflow-level test left intact.
  • packages/git/src/exec.ts ARCHON_BASH_PATH file-exists checkresolveBashPath() now eagerly validates the override with existsSync() and throws a clear error if the path doesn't exist (mentions the env-var name + suggests %LOCALAPPDATA%\Programs\Git\bin\bash.exe as a common user-scope fallback). Best-effort check (race-disappear caveat noted in comment), catches the typo class immediately.
  • authoring-workflows.md mutates_checkout: false docs entry — added after the tags: block. Explains it disables the path-lock guard for concurrent runs on the same checkout, calls out the distinction from worktree.enabled: false.

Minor / nice-to-have:

  • executor-shared.ts $LOOP_PREV_OUTPUT comment — dropped the direct fresh_context field-name reference per your wording.
  • loader.ts:410 parenthetical"(suppresses keyword inference in the UI)""(opt-out of any auto-inference)".

CodeRabbit Nitpick:

  • dag-executor.ts:2185 log event rename'loop_node.until_bash_exec_error''loop.until_bash_failed' to match {domain}.{action}_{state} convention.

Already addressed in earlier commits (verified during this pass):

  • Test for mutates_checkout: false skipping the path-lock check (executor.test.ts:302: 'skips path-lock check when mutates_checkout is false')
  • Platform-aware ENOENT message (dag-executor.ts:1414 includes the %LOCALAPPDATA%... example)
  • Docs error example synced to actual loader.ts emit format (line 695 reads Unknown provider 'claud'. Registered: claude, codex, pi — matches code)

Verification: new node-level test passes individually + workflow-level should reject unknown provider at load time still passes + mutates_checkout parse and skip tests still pass + both packages typecheck clean (bunx tsc --noEmit). There are 5 pre-existing fails in the full suite (interactive_loop_in_non_interactive_workflow, cancel nodes AI-specific fields warn, etc.) that pass when run individually with -t — they look like mock-isolation issues with Bun's test runner state bleed; not introduced by this PR.

Ready for re-review.

@Wirasm

Wirasm commented May 27, 2026

Copy link
Copy Markdown
Collaborator

This PR has a lot of unrelated changes, eg archon.rb, version bump etc, please strip out anything that doesnt have with the solution to do and reopen please

@atlas-architect

Copy link
Copy Markdown
Contributor Author

Got it, @Wirasm — apologies for the scope creep. Filed slim replacement at #1779 with ONLY the bash-path fix + your 2026-04-29 review feedback applied.

Stripped from #1470 (everything unrelated): homebrew/archon.rb, all 12 package.json version bumps, marketplace workflows, provider validation, mutates_checkout, executor-shared.ts, loader.ts.

Kept in #1779 (the actual solution): 6 files / +116 / -18 lines.

All your 2026-04-29 review fixes addressed — see the PR body for the full checklist.

Closing this; review continues on #1779. Thanks for the patient feedback throughout — happy to keep iterating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(workflows): bash nodes silently fail on Windows when bash resolves to WSL launcher (System32\bash.exe) — $VAR expansion broken in -c mode

4 participants