Skip to content

fix(config): wait for background installs on dispose#7

Closed
Astro-Han wants to merge 2 commits into
devfrom
fix-dispose-background-installs
Closed

fix(config): wait for background installs on dispose#7
Astro-Han wants to merge 2 commits into
devfrom
fix-dispose-background-installs

Conversation

@Astro-Han

Copy link
Copy Markdown
Owner

Summary

Cherry-pick of commit ff8ec43b from the now-closed PR #3, isolated to the one change that actually fixes Windows CI red.

Root cause: On Windows, background config dependency installs share the global flock key config-install:win32. The install fibers outlive their originating Instance context, keep holding the lock after dispose, and the next test times out waiting for the leaked lock.

Fix: add an Effect.addFinalizer so Instance.disposeAll() waits (via Promise.allSettled) for background installs before releasing scope.

Why only this commit

PR #3 bundled 12 commits. Analysis showed only this one had a clear hypothesis → fix relationship. The rest were either:

  • Speculative (flock handoff microtask race — fabricated via module-rewrite test hooks)
  • Defensive without evidence (rm EBUSY retries with synthetic mocks, no real CI log showing EBUSY)
  • Test-infra chasing test-infra (9 commits patching flake introduced by earlier test additions)
  • Revert/restore thrash on a typecheck error the author was ping-ponging on

PR #3 CI was red for 15+ hours across 12 commits. This single commit on dev should reproduce the original hypothesis cleanly.

Product change

packages/opencode/src/config/config.ts: +4/-4. Removed iife wrapper, added finalizer that awaits deps: Promise<void>[].

Test change

packages/opencode/test/config/config.test.ts: +81. Windows-scenario test Instance.disposeAll waits for background config dependency installs to finish — asserts the second install starts only after the first completes, not before dispose returns.

Verification

  • bun test --timeout 30000 test/config/config.test.ts — 77/77 pass locally (macOS)
  • bun run typecheck — clean
  • Windows CI is the real validation; this PR's goal is to turn that green

Target failing tests on dev (Windows)

  • installs dependencies in writable OPENCODE_CONFIG_DIR — timeout 30s
  • dedupes concurrent config dependency installs for the same dir — timeout 30s
  • tool.registry > waits for config-scoped dependencies before importing local tools with bare imports — timeout 30s

All three share the config-install:win32 flock-leak root cause this PR addresses. tracks deleted files correctly in snapshot.test.ts is a separate failure not addressed here.

The dispose finalizer added in the prior commit waited via Promise.allSettled
for background config dependency installs to settle. On Windows the installs
serialize on the global config-install:win32 flock, so dispose would block for
the duration of every real Npm.install — exceeding the 30s test timeout and
cascading failures into subsequent beforeEach/afterEach hooks.

Wire an AbortController through installDependencies so the finalizer aborts
first and then drains. Installs queued on the flock bail immediately; only
the one currently inside Npm.install has to complete naturally, which is
usually one rather than the whole chain.

The existing InstallInput.signal support is reused — Flock.acquire already
honors the signal and throws on abort.
@Astro-Han Astro-Han force-pushed the fix-dispose-background-installs branch from 9c91182 to a421f8f Compare April 17, 2026 06:52
@Astro-Han

Copy link
Copy Markdown
Owner Author

Closing. Multi-reviewer crosscheck confirmed the fix is net-negative.

Why close (not iterate)

The motivating bug is not a current-dev blocker. PR #6's early-exit in installDependencies (if (hasDep && ignore && installed.every(Boolean)) return) means most test scenarios no longer trigger real Npm.install. Dev CI run #38 green corroborates. Run #39 on same test tree failed 3 unrelated tests — runner-level flake, not lock leak.

The implementation breaks teardown on Windows. The added Effect.addFinalizer in loadInstanceState awaits Promise.allSettled(deps). The AbortController added in the second commit only propagates to Flock.acquire — it does NOT reach Npm.install(dir) at config.ts:196, which calls Arborist.reify() (@npmcli/arborist@9.4.0) with no AbortSignal plumbing (verified by reading arborist source). Any install already past the outer flock blocks the finalizer for the full reify duration — ~30s on Windows CI — causing the beforeEach/afterEach hook timed out cascade we observed in runs #36 (16 fails) and #40 (18 fails).

The design conflicts with an existing opt-in pattern. Config.waitForDependencies() already exists (config.ts:1583) and callers that need it already use it (tool/registry.ts:184, plugin/index.ts:172). Making dispose auto-wait forces every test/path to pay this cost.

Tests are not reliable evidence here

Local tests (86/86 pass) mock Npm.install with an instant-returning promise, so they never exercise the real arb.reify() path that's the root of the Windows hang. Any future fix in this area needs either Windows-environment validation or a mock that simulates realistic install duration.

Residual flake

Dev run #38/#39 delta (same tree, one docs deletion) shows Windows runner-level flake affecting tests like loop waits while shell runs at 3s timeout. The right fix is a separate CI-level retry mechanism (bun test --rerun-each N inside a new PR), not any code change here.

Branch

fix-dispose-background-installs deleted on close.

@Astro-Han Astro-Han closed this Apr 17, 2026
@Astro-Han Astro-Han deleted the fix-dispose-background-installs branch April 17, 2026 07:33
Astro-Han added a commit that referenced this pull request Apr 27, 2026
… and test isolation

Combined response to inline review on PR #260, kept as one commit because
the test changes are tightly coupled to the production refactor.

Production:
- Route the global Claude Code fallback through Global.Path.home so
  OPENCODE_TEST_HOME (the codebase's existing test isolation pattern,
  see packages/shared/src/global.ts:24) makes the fallback deterministic
  in tests; os.homedir() is locked at process start in Bun.
- Expand InstructionSource union with a 'considered' variant so the
  diagnostic can report candidates that were checked but skipped by
  priority or absent (issue #230 acceptance #7 says 'considered, loaded,
  ignored', not just two states).
- Rewrite sources() to walk project files, the full global priority
  chain, config.instructions URLs, and the explicitly ignored Claude
  Code fallback. Closes the gap where sources() under-reported what
  system() actually loads (URLs were missing).
- Surface OPENCODE_DISABLE_CLAUDE_CODE_PROMPT as a separate ignore
  reason so opencode CLI users with that flag set also get a clear
  diagnostic.

Tests:
- Replace the brittle os.homedir() guard in the ignored-Claude test with
  OPENCODE_TEST_HOME + a tmpdir, removing the CI-environment-dependent
  early return CodeRabbit flagged.
- Add OPENCODE_TEST_HOME to the env capture/restore block.
- Add a regression test for the non-PawWork branch of the gate so an
  accidental inversion or future Runtime.isPawWork() change is caught.
- Add tests covering the new considered (priority-skipped global) and
  URL diagnostic entries.
Astro-Han added a commit that referenced this pull request Apr 27, 2026
Adds Instruction.sources() returning structured InstructionSource entries
so debug surfaces can explain which files the system loaded, which were
considered but skipped or absent, and which were intentionally ignored
(issue #230, acceptance #7).

The diagnostic walks the same logical sources system() does so prompt
and trace stay in lockstep:

- Project AGENTS.md / CLAUDE.md / CONTEXT.md priority chain.
- Full global priority chain (PAWWORK_CONFIG_DIR / OPENCODE_CONFIG_DIR /
  Global.Path.config) with first-existing as loaded and subsequent
  existing entries as considered with a priority-skipped reason; absent
  entries are considered with reason 'absent'.
- Local file entries from config.instructions, glob-resolved exactly as
  systemPaths() does (lines 159-174), reported as loaded or considered
  when the spec resolves to no files.
- Remote URL entries from config.instructions, reported as loaded on
  successful fetch and considered with reason on failure.
- Explicitly ignored ~/.claude/CLAUDE.md, with reason set to either the
  PawWork product baseline (Runtime.isPawWork()) or
  OPENCODE_DISABLE_CLAUDE_CODE_PROMPT depending on which gate suppressed
  it. Skipped when the path is already in loadedPaths to avoid double
  reporting.

Tests cover loaded project AGENTS.md, the priority-skipped global
considered branch, both URL outcomes via OPENCODE_CONFIG_CONTENT, the
local file path branch via PAWWORK_CONFIG_DIR + OPENCODE_DISABLE_PROJECT_CONFIG,
and the ignored Claude Code fallback with reason. All home-directory
isolation runs through OPENCODE_TEST_HOME so Bun's locked os.homedir()
doesn't leak the user's real home into the assertions.
Astro-Han added a commit that referenced this pull request Apr 27, 2026
Adds Instruction.sources() returning structured InstructionSource entries
so debug surfaces can explain which files the system loaded, which were
considered but skipped or absent, and which were intentionally ignored
(issue #230, acceptance #7).

The diagnostic walks the same logical sources system() does so prompt
and trace stay in lockstep:

- Project AGENTS.md / CLAUDE.md / CONTEXT.md priority chain.
- Full global priority chain (PAWWORK_CONFIG_DIR / OPENCODE_CONFIG_DIR /
  Global.Path.config) with first-existing as loaded and subsequent
  existing entries as considered with a priority-skipped reason; absent
  entries are considered with reason 'absent'.
- Local file entries from config.instructions, glob-resolved exactly as
  systemPaths() does (lines 159-174), reported as loaded or considered
  when the spec resolves to no files.
- Remote URL entries from config.instructions, reported as loaded on
  successful fetch and considered with reason on failure.
- Explicitly ignored ~/.claude/CLAUDE.md, with reason set to either the
  PawWork product baseline (Runtime.isPawWork()) or
  OPENCODE_DISABLE_CLAUDE_CODE_PROMPT depending on which gate suppressed
  it. Skipped when the path is already in loadedPaths to avoid double
  reporting.

Tests cover loaded project AGENTS.md, the priority-skipped global
considered branch, both URL outcomes via OPENCODE_CONFIG_CONTENT, the
local file path branch via PAWWORK_CONFIG_DIR + OPENCODE_DISABLE_PROJECT_CONFIG,
and the ignored Claude Code fallback with reason. All home-directory
isolation runs through OPENCODE_TEST_HOME so Bun's locked os.homedir()
doesn't leak the user's real home into the assertions.
Astro-Han added a commit that referenced this pull request Apr 27, 2026
Adds Instruction.sources() returning structured InstructionSource entries
so debug surfaces can explain which files the system loaded, which were
considered but skipped or absent, and which were intentionally ignored
(issue #230, acceptance #7).

The diagnostic walks the same logical sources system() does so prompt
and trace stay in lockstep:

- Project AGENTS.md / CLAUDE.md / CONTEXT.md priority chain.
- Full global priority chain (PAWWORK_CONFIG_DIR / OPENCODE_CONFIG_DIR /
  Global.Path.config) with first-existing as loaded and subsequent
  existing entries as considered with a priority-skipped reason; absent
  entries are considered with reason 'absent'.
- Local file entries from config.instructions, glob-resolved exactly as
  systemPaths() does (lines 159-174), reported as loaded or considered
  when the spec resolves to no files.
- Remote URL entries from config.instructions, reported as loaded on
  successful fetch and considered with reason on failure.
- Explicitly ignored ~/.claude/CLAUDE.md, with reason set to either the
  PawWork product baseline (Runtime.isPawWork()) or
  OPENCODE_DISABLE_CLAUDE_CODE_PROMPT depending on which gate suppressed
  it. Skipped when the path is already in loadedPaths to avoid double
  reporting.

Tests cover loaded project AGENTS.md, the priority-skipped global
considered branch, both URL outcomes via OPENCODE_CONFIG_CONTENT, the
local file path branch via PAWWORK_CONFIG_DIR + OPENCODE_DISABLE_PROJECT_CONFIG,
and the ignored Claude Code fallback with reason. All home-directory
isolation runs through OPENCODE_TEST_HOME so Bun's locked os.homedir()
doesn't leak the user's real home into the assertions.
Astro-Han added a commit that referenced this pull request Apr 27, 2026
…nd ignored chains

Adds Instruction.sources() returning structured InstructionSource entries
so debug surfaces can explain which files the system loaded, which were
considered but skipped or absent, and which were intentionally ignored
(issue #230, acceptance #7). This commit covers the project priority
chain, the global priority chain, and the explicitly ignored
~/.claude/CLAUDE.md; config.instructions handling lands in a follow-up.

Marks a file as loaded only after read() returns non-empty content so
diagnostics agree with what system() actually reads into the prompt.
Files whose content reads back empty are downgraded to considered with
reason 'file is empty or unreadable'.

The ignore reason distinguishes Runtime.isPawWork() (PawWork product
baseline) from OPENCODE_DISABLE_CLAUDE_CODE_PROMPT (legacy opencode CLI
opt-out), and dedupes against loadedPaths so a file never appears as
both loaded and ignored.

Tests cover loaded project AGENTS.md, the priority-skipped global
considered branch, the project priority chain (AGENTS.md + CLAUDE.md),
the empty-file downgrade, and the ignored Claude Code fallback with
reason. All home-directory isolation runs through OPENCODE_TEST_HOME.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant