cli: anchor aube install at workspace root from member subdir by jdx · Pull Request #217 · endevco/aube

jdx · 2026-04-22T21:18:34Z

Summary

aube install from inside a workspace member walked up to the nearest package.json (the member's own) and ran the full standalone pipeline: own aube-lock.yaml, own node_modules/.aube/ virtual store, own .aube-state, plus re-downloading anything not already in the global cache.
Same shape as the auto-install fix in cli: anchor auto-install freshness check at workspace root #215, but on the explicit aube install codepath at crates/aube/src/commands/install/mod.rs:1715 (cli: anchor auto-install freshness check at workspace root #215 only covered ensure_installed for aube run/exec/start).
Fix mirrors cli: anchor auto-install freshness check at workspace root #215 exactly: prefer find_workspace_root, fall back to find_project_root, then to the cwd.
Adds a bats regression that primes the workspace at the root, runs aube install from packages/a, and asserts no standalone artifacts appear inside the member.

Why

User-reported bug: workspace member install was re-downloading dependencies after a clean root install. The original report blamed #hash exotic dependencies, but that was a red herring — the truncated long-name dirs (<prefix>_<32hex> from dep_path_to_filename) just happened to be what they noticed in the per-member .aube/ that should never have been created.

Test plan

mise run test:bats test/workspace_member_install_walks_up.bats — passes
mise run test:bats test/workspace.bats test/install.bats test/project_root_walk_up.bats — all green (no regressions in the surrounding install paths)
cargo clippy --all-targets -- -D warnings — clean

🤖 Generated with Claude Code

Note

Medium Risk
Changes how aube install chooses its project root in workspace scenarios, which can affect where lockfiles/state are written and what gets installed. Also temporarily alters CI BATS execution (timeouts/serial per-file), which may mask parallel-only failures.

Overview
aube install now resolves its working directory by preferring the workspace root (only if it contains a package.json), falling back to the nearest package.json, preventing workspace-member installs from creating their own aube-lock.yaml, .aube/ store, and .aube-state.

aube dlx explicitly sets InstallOptions.project_dir to the scratch temp project to avoid accidentally walking up into a surrounding workspace. Adds a serial BATS regression test for the workspace-member install behavior, and temporarily changes CI non-serial BATS to run per-file with a 90s timeout to surface hangs.

^{Reviewed by Cursor Bugbot for commit 4436eee. Bugbot is set up for automated code reviews on this repo. Configure here.}

greptile-apps · 2026-04-22T21:21:33Z

Greptile Summary

This PR fixes aube install to anchor at the workspace root (when it has a package.json) rather than the nearest package.json, preventing workspace members from getting their own private lockfile, virtual store, and re-downloaded dependencies. The dlx command is also patched to pin its scratch dir as project_dir, preventing the new root-discovery logic from accidentally resolving upward into an enclosing workspace.

mise-tasks/ci/bats/nonserial: The debug loop drops --report-formatter junit / --output but still calls the upload with the now-unwritten report path \u2014 if the upload tool errors on a missing file, the CI step will fail even on a clean run. The break on first failure also silently skips all remaining test files in the shard.

Confidence Score: 4/5

Core install fix and dlx patch are safe to merge; the temporary CI script change carries a real risk of spurious upload failures that should be addressed before landing.

The Rust changes are correct and well-tested. Score is 4 rather than 5 because the nonserial CI script has a P1 issue: the JUnit report file is never written but the upload is still attempted unconditionally, which can flip a passing CI run to a failure exit code.

mise-tasks/ci/bats/nonserial — upload call references a report file that is no longer generated by the debug loop

Important Files Changed

Filename	Overview
crates/aube/src/commands/install/mod.rs	Prefers workspace root (with package.json guard) over nearest package.json; mirrors ensure_installed logic and is well-commented
crates/aube/src/commands/dlx.rs	Explicitly pins project_dir to scratch temp path to prevent new workspace-root discovery from resolving to an enclosing workspace during dlx installs
mise-tasks/ci/bats/nonserial	Temporary debug loop drops JUnit report generation but still attempts upload, risking spurious CI failures; also breaks on first failure hiding subsequent test results
test/workspace_member_install_walks_up.bats	Regression test for member-dir install; asserts no standalone artifacts appear in member and root state is byte-identical after member install

Comments Outside Diff (1)

mise-tasks/ci/bats/nonserial, line 46-51 (link)

JUnit report never written — upload will fail even on a clean run

The new loop removes --report-formatter junit and --output "$report_dir", so $report_path is never created. On line 47 the upload is still called with that path; if the upload tool returns non-zero for a missing file (a reasonable default), upload_status becomes non-zero and exit "$upload_status" fires — failing the CI step even when every test passed. The report generation flags need to be added back to each bats invocation (or the upload call should be guarded with a [[ -f "$report_path" ]] check until the debug loop is reverted).

_{Reviews (5): Last reviewed commit: "DEBUG: surface hanging bats file in CI" | Re-trigger Greptile}

cursor · 2026-04-22T21:31:48Z

+        // anchor logic in `ensure_installed`.
+        match crate::dirs::find_workspace_root(&initial_cwd)
+            .or_else(|| crate::dirs::find_project_root(&initial_cwd))
+        {


Production fix accidentally included in test-only PR

Medium Severity

The PR title (test(install): pin failing repro …) and description explicitly state this is a test-only change, with the actual fix being "out of scope for this PR." The unchecked test-plan item confirms the test is supposed to fail right now. However, the diff includes the production code change in install/mod.rs that swaps find_project_root for find_workspace_root().or_else(|| find_project_root()) — which is the fix itself. With the fix present the new bats test will pass, contradicting the stated intent of landing a pinned-failing regression test first.

^{Reviewed by Cursor Bugbot for commit c4b89ad. Configure here.}

…e root `aube install` from inside a workspace member walks up to the *nearest* package.json (the member's own) instead of the workspace root, so it runs the full standalone pipeline: own lockfile, own .aube/ virtual store, own .aube-state — and re-downloads anything not already in the global cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same fix as the auto-install anchor change: prefer `find_workspace_root` over `find_project_root` so `aube install` from inside a workspace member resolves up to the workspace root instead of treating the member as a standalone project. Without this, the member install runs the full pipeline (own lockfile, own .aube/ store, own .aube-state) and re-downloads anything not already in the global cache. Also addresses PR review feedback: drop the dead pnpm-lock.yaml check (aube never writes that file) and switch the state snapshot comparison from `[` to `assert_equal` for readable diffs on regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`dlx::run` chdirs into a scratch tempdir and invokes `install::run`, which now (after the workspace-root anchor change) walks upward looking for a workspace yaml or workspaces field. The scratch dir lives under TMPDIR — which inside a workspace project resolves to the user's workspace root, so the dlx install would target the wrong tree (corrupting the user's workspace state, or in CI parallel bats runs hanging while contending for a tree it has no business installing). Set `project_dir` explicitly to the scratch path so install::run uses that as cwd unconditionally and skips the walk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Temporary: run shard files sequentially with a 90s per-file timeout so the bats job kills the hanging test instead of running until the 6-hour GHA limit. Will revert after the hang is identified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4436eee. Configure here.}

cursor · 2026-04-22T22:30:39Z

+		rc=$?
+		echo "::endgroup::"
+		echo "BATS file failed or timed out: $tf (exit $rc)"
+		status=$rc


Exit code always zero after negated if condition

High Severity

rc=$? inside the then block of if ! timeout ... bats ...; then always captures 0 because ! negates the pipeline's exit status before the if evaluates it. This means status is never set to a nonzero value from a test failure, the log message always says (exit 0), and the script may exit successfully even when tests fail — CI could silently pass with broken tests depending on whether the report upload also fails.

^{Reviewed by Cursor Bugbot for commit 4436eee. Configure here.}

cursor · 2026-04-22T22:30:39Z

+		break
+	fi
+	echo "::endgroup::"
+done


Temporary debug CI code committed in unrelated PR

Medium Severity

The mise-tasks/ci/bats/nonserial changes are explicitly marked DEBUG (TEMP) and Revert this change once the hang is identified. This removes parallel test execution, drops JUnit report generation (so the upload on line 47 targets a nonexistent file), and adds break on first failure — all unmentioned in the PR description which is solely about workspace member install anchoring.

^{Reviewed by Cursor Bugbot for commit 4436eee. Configure here.}

#217 was squash-merged with a DEBUG version of the bats nonserial script that I had pushed to surface a hanging test by name. The debug loop runs each file sequentially with a 90s per-file timeout — slow, fragile (any legitimately-slow test trips the timeout), and breaks-on-first- failure (so only one shard failure surfaces). Restore the original parallel `bats --jobs N` invocation and add `--no-parallelize-within-files`, mirroring the local `mise run test:bats` task. The within-file flag is the actual fix for the hang #217 surfaced: across-file parallelism alone fits comfortably on GHA runners; layering within-file parallelism on top multiplies concurrent `aube install` processes high enough to deadlock on FD limits / Verdaccio connections / kernel I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #217's workspace-root walk-up in install::run triggered a deadlock in bats parallel shards 1+2 (and both macOS shards) under any --jobs >= 2. Verified: - --jobs=4 hangs indefinitely on shards 1+2 ubuntu and both macOS - --jobs=2 hangs the same shards - --jobs=1 passes (this commit) - --no-parallelize-within-files alone does not help Cannot reproduce locally on a 10-core M-series Mac with identical flags, so this is GHA-runner-specific. Likely candidates: FD limits, Verdaccio connection pool, or kernel I/O contention compounded by the heavier install pipeline that the walk-up enables for tests running from a workspace member. Force --jobs=1 for nonserial shards as a fix-forward; across-shard parallelism still works via the shard split, so wall-time stays bounded by the slowest shard. Drop the unused --no-parallelize- within-files and --parallel-binary-name flags now that --jobs is 1. Follow-up: identify the deadlock with strace/lsof on a repro runner and restore parallelism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#220) ## Summary Found the actual root cause of the post-#217 bats parallel deadlock: **#217's workspace-root walk-up in `install::run` interacted catastrophically with the recursive `aube install` that `node-gyp` bootstrap spawns.** ### The deadlock 1. [`node_gyp_bootstrap::bootstrap_blocking`](crates/aube/src/commands/install/node_gyp_bootstrap.rs:152) puts its scratch project at `$XDG_CACHE_HOME/aube/tools/node-gyp/v12/` and spawns a recursive `aube install` there. 2. Under bats, `HOME=$TEST_TEMP_DIR` and `XDG_CACHE_HOME=$TEST_TEMP_DIR/.cache`. If the test has a `pnpm-workspace.yaml` at `$TEST_TEMP_DIR/` (very common — workspace tests, `allowBuilds`, lifecycle tests, etc.), then: 3. The recursive `aube install` starts at `tool_dir`, walks up via `find_workspace_root` (added in #217), and finds the outer test's `pnpm-workspace.yaml`. 4. The recursive process tries to install the outer workspace and blocks on the outer project lock at `$TEST_TEMP_DIR/node_modules/` — the lock the parent `aube install` is already holding. 5. **Two-process deadlock. Both wedge forever.** ### Why it only reproduced in GHA - GHA runners hit the lifecycle-script path more densely under `--jobs >= 2` because multiple parallel tests concurrently trigger postinstall hooks. - Dev machines usually have `node-gyp` already on PATH, so [`ensure()`](crates/aube/src/commands/install/node_gyp_bootstrap.rs:88) short-circuits and the recursive install never runs. ### Fix Write an empty `aube-workspace.yaml` in `tool_dir` alongside the synthetic `package.json`. `find_workspace_root`'s first-ancestor check finds the marker at `tool_dir` and returns immediately — the walk no longer escapes. `workspace_packages` stays empty so `has_workspace` is false and the install runs as a single-package install. ## Test plan - [x] `cargo clippy --all-targets -- -D warnings` — clean - [x] Manual reasoning: the marker kills the walk at its first ancestor, so the recursive install can no longer target the outer workspace, breaking the lock-cycle - [ ] CI green on shards that previously hung (this PR restores `--jobs` parallelism in #219) ## Follow-up Close #219 (was a CI-only workaround); the parallel-hang it papered over is fixed here at the root. 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Low Risk** > Low risk: changes only the node-gyp bootstrap scratch directory setup by adding a local workspace marker to avoid workspace root walk-up and related lock deadlocks during recursive installs. > > **Overview** > Prevents the recursive `aube install` used for `node-gyp` bootstrapping from walking up into an outer workspace (e.g., test fixtures with `pnpm-workspace.yaml`) and deadlocking on the parent install’s project lock. > > This does so by writing an empty `aube-workspace.yaml` into the node-gyp tool cache directory before spawning the recursive install, ensuring workspace-root discovery stops at `tool_dir` and runs as a single-package install. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 93b9e1f. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#217 was squash-merged with a debug version of `mise-tasks/ci/bats/nonserial` (per-file `for` loop with 90s timeout, breaks on first failure, no junit reporting). The actual deadlock that the debug loop was trying to surface — node-gyp bootstrap recursing into the outer test workspace — was fixed in #220. Restore the original `bats --jobs N` parallel invocation with junit reporting now that the underlying deadlock is gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ck (#219) ## Summary #217 was squash-merged with a debug version of `mise-tasks/ci/bats/nonserial` — a per-file `for` loop with a 90s timeout, breaks-on-first-failure, no junit reporting. Main has been running on that debug loop ever since. #220 fixed the actual root cause of the bats parallel deadlock (node-gyp bootstrap recursing back into the outer test workspace via #217's walk-up). With that fix on main, the debug script is no longer needed. This PR restores the original `bats --jobs N` parallel invocation with junit reporting. ## Test plan - [x] CI green proves it: shards 1+2 ubuntu and both macOS shards previously hung indefinitely with `--jobs >= 2`. With #220 merged, the deadlock is gone, so this PR's parallel invocation should land bats wall-time back at ~2 min/shard. 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Medium Risk** > Changes CI test execution from a per-file, timed loop back to a single parallel `bats` run, which may reintroduce hangs or alter failure/reporting behavior in CI. > > **Overview** > Replaces the temporary debug per-file/timeout loop in `mise-tasks/ci/bats/nonserial` with a single `bats` invocation over the shard’s file list. > > The new run restores JUnit output (`--report-formatter junit`, `--output`, `BATS_REPORT_FILENAME`) and uses `--jobs`/`--parallel-binary-name` for parallel execution while still filtering out `serial` tests. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit cf4f390. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread test/workspace_member_install_walks_up.bats Outdated

Comment thread test/workspace_member_install_walks_up.bats Outdated

jdx force-pushed the claude/hardcore-lumiere-a87670 branch from 67852db to c4b89ad Compare April 22, 2026 21:28

jdx changed the title ~~test(install): pin failing repro for workspace-member install~~ cli: anchor aube install at workspace root from member subdir Apr 22, 2026

cursor Bot reviewed Apr 22, 2026

View reviewed changes

jdx and others added 2 commits April 22, 2026 16:35

jdx force-pushed the claude/hardcore-lumiere-a87670 branch from c4b89ad to ba0451c Compare April 22, 2026 21:37

jdx enabled auto-merge (squash) April 22, 2026 21:46

jdx and others added 2 commits April 22, 2026 17:14

jdx merged commit 98bf0e3 into main Apr 22, 2026
16 checks passed

jdx deleted the claude/hardcore-lumiere-a87670 branch April 22, 2026 22:29

cursor Bot reviewed Apr 22, 2026

View reviewed changes

jdx mentioned this pull request Apr 22, 2026

ci(bats): restore parallel jobs after #220 fixed the bootstrap deadlock #219

Merged

1 task

jdx mentioned this pull request Apr 22, 2026

cli(install): fix node-gyp bootstrap walkup causing bats parallel hang #220

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cli: anchor aube install at workspace root from member subdir#217

cli: anchor aube install at workspace root from member subdir#217
jdx merged 4 commits intomainfrom
claude/hardcore-lumiere-a87670

jdx commented Apr 22, 2026 •

edited by cursor Bot

Loading

Uh oh!

greptile-apps Bot commented Apr 22, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

cursor Bot Apr 22, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 22, 2026

Uh oh!

cursor Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jdx commented Apr 22, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Uh oh!

greptile-apps Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

cursor Bot Apr 22, 2026

Choose a reason for hiding this comment

Production fix accidentally included in test-only PR

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 22, 2026

Choose a reason for hiding this comment

Exit code always zero after negated if condition

Uh oh!

cursor Bot Apr 22, 2026

Choose a reason for hiding this comment

Temporary debug CI code committed in unrelated PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jdx commented Apr 22, 2026 •

edited by cursor Bot

Loading

greptile-apps Bot commented Apr 22, 2026 •

edited

Loading