Skip to content

fix(shim): fix race condition when removing in make_shim, when multiple plugins provide the same shim#8947

Merged
jdx merged 1 commit intojdx:mainfrom
brander-john:fix/shim-concurrent-plugin-race
Apr 6, 2026
Merged

fix(shim): fix race condition when removing in make_shim, when multiple plugins provide the same shim#8947
jdx merged 1 commit intojdx:mainfrom
brander-john:fix/shim-concurrent-plugin-race

Conversation

@brander-john
Copy link
Copy Markdown
Contributor

Problem

When multiple tools are installed that resolve to different backends but provide overlapping shims (e.g. nodejs via the default registry and asdf:nodejs via an explicit asdf backend), reshim spawns concurrent async tasks — one per plugin — that each call make_shim for the same output path. Function make_shim unconditionally removes the existing shim before writing the new one. The tasks race; one removes the file successfully, and the other fails with No such file or directory on the same remove.

This is most commonly triggered when a monorepo or multi-repo setup mixes a project using default backends with one using explicit asdf: aliases, causing both the default plugin and the asdf plugin to be active concurrently and both walk their shims/ directories.

Reproduction

Repo A (~/projects/example-1/mise.toml):
[tools]
nodejs = '22.19.0'
python = '3.11.13'

Repo B (~/projects/example-2/mise.toml):
[tools]
'asdf:nodejs' = '22.19.0'
'asdf:python' = '3.11.13'

[settings]
disable_backends = ['aqua', 'cargo', 'go', 'pipx', 'ubi', 'vfox']
disable_default_registry = true
idiomatic_version_file = false

[alias]
nodejs = 'asdf:https://github.com/asdf-vm/asdf-nodejs.git#fd2c7f94c4c71416047cacf2af6e4fa819b364b7'
python = 'asdf:https://github.com/asdf-community/asdf-python.git#5e277e24ec2b6739728f458c1c25eeb2a8b8bb79'

With both repos' tools installed, run:
mise reshim (or mise install)

Before fix — intermittent failure (timing-dependent):
Error:
0: failed to rebuild shims
1: failed rm: ~/.local/share/mise/shims/pip
2: No such file or directory (os error 2)

Location:
src/file.rs:101

After fix — reshim completes successfully regardless of task scheduling order.

Fix

Replace remove_file_async (fails on NotFound) with remove_file_async_if_exists (treats NotFound as success) in make_shim. The concurrent tasks still race to remove the file, but the one that loses the race now silently succeeds rather than erroring. Also removes the now-unused remove_file_async function.

Caveat

There is still a potential race condition from make_executable_async if it tries to run between the time a separate process deletes the shim and recreates it again. From my testing this was much less likely to hit but is still possible. You might be able to combine writing the file and setting it executable. Another potential fix would be to refactor the way make_shim is called, and build a de-duplicated map of the shims to make but that seemed like a larger refactor.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces remove_file_async_if_exists in src/file.rs to handle file removal gracefully when the file is missing, and updates src/shims.rs to use this new function. Feedback highlights a remaining race condition between file writing and permission setting in concurrent tasks, suggesting that de-duplicating shim paths before processing would be a more robust solution.

Comment thread src/shims.rs
if shim.exists() {
file::remove_file_async(shim).await?;
}
file::remove_file_async_if_exists(shim).await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this change prevents the crash during the removal phase, a race condition still exists between write_async (line 556) and make_executable_async (line 569). If another task removes the shim (via line 555) after it has been written by this task but before permissions are set, make_executable_async will fail with a NotFound error.

As noted in the PR description's caveat, a more robust fix would be to de-duplicate the shims before spawning these concurrent tasks in reshim. This would not only eliminate the race condition but also improve efficiency by avoiding redundant filesystem operations for the same shim path. If implementing de-duplication with itertools::unique_by, remember that the closure must return an owned value to avoid lifetime errors.

References
  1. The key returned by the closure in itertools::unique_by must be an owned value. Returning a borrowed value (a reference) will cause lifetime errors because the key needs to outlive the closure call.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 6, 2026

Greptile Summary

This PR fixes a TOCTOU race in make_shim where concurrent async tasks (one per plugin, via JoinSet) both attempt to remove and recreate the same shim file — the task that loses the remove race errors with No such file or directory. The fix replaces the check-then-remove pattern (if shim.exists() { remove_file_async }) with an unconditional idempotent remove_file_async_if_exists call, and includes unit tests for both the file-present and file-absent cases.

Confidence Score: 5/5

Safe to merge — the primary crash is fixed and only a minor residual race window (P2) remains.

All findings are P2 or lower. The fix is minimal, correct, and well-tested. The residual race is acknowledged by the author and has no production-blocking impact.

No files require special attention.

Important Files Changed

Filename Overview
src/file.rs Renames remove_file_async to remove_file_async_if_exists, silently succeeding on NotFound; adds two unit tests covering the file-present and file-absent cases.
src/shims.rs Drops the TOCTOU if shim.exists() guard and calls remove_file_async_if_exists unconditionally, fixing the concurrent NotFound crash when multiple plugins race to recreate the same shim.

Sequence Diagram

sequenceDiagram
    participant A as Task A (plugin-1 shims/node)
    participant FS as Filesystem
    participant B as Task B (plugin-2 shims/node)

    Note over A,B: Before fix — TOCTOU race
    A->>FS: shim.exists()? → true
    B->>FS: shim.exists()? → true
    A->>FS: remove_file_async(shim) → OK
    B->>FS: remove_file_async(shim) → NotFound error ❌

    Note over A,B: After fix — idempotent removal
    A->>FS: remove_file_async_if_exists(shim) → OK
    B->>FS: remove_file_async_if_exists(shim) → OK (NotFound silently ignored ✓)
    A->>FS: write_async(shim, content) → OK
    B->>FS: write_async(shim, content) → OK (same content)
    A->>FS: make_executable_async(shim) → OK
    B->>FS: make_executable_async(shim) → OK
Loading

Comments Outside Diff (1)

  1. src/shims.rs, line 554-576 (link)

    P2 Residual interleave window between write_async and make_executable_async

    The removal is now idempotent, which fixes the primary crash. However, a narrower race window still exists: if Task B's remove_file_async_if_exists fires after Task A's write_async but before Task A's make_executable_async, Task A's chmod call will operate on a missing file. The end state on disk is correct (one valid shim), but a transient error is still possible. A follow-up improvement would be to write to a temp file and rename(2) into place (atomic on POSIX), or to deduplicate shim paths before spawning tasks so the same path is never processed concurrently.

Reviews (1): Last reviewed commit: "fix(shim): fix race condition when remov..." | Re-trigger Greptile

@jdx jdx merged commit 8880583 into jdx:main Apr 6, 2026
35 checks passed
mise-en-dev added a commit that referenced this pull request Apr 7, 2026
### 🚀 Features

- **(config)** Add Tera template support to miserc.toml by
@richardthe3rd in [#8867](#8867)

### 🐛 Bug Fixes

- **(env)** include tools-only redactions in `mise env --redacted` by
@jakedgy in [#8956](#8956)
- **(env)** pass dependency env to vfox backend plugin hooks by
@cprecioso in [#8952](#8952)
- **(shim)** fix race condition when removing in make_shim, when
multiple plugins provide the same shim by @brander-john in
[#8947](#8947)
- **(spm)** derive API URL from host for self-hosted instances by
@ThomasDutartre in [#8955](#8955)
- **(task)** resolve env vars in usage tera templates when flags are
provided by @jdx in [#8957](#8957)

### 📚 Documentation

- **(python)** clarify attestation settings must be under [settings] in
mise.toml by @fru1tworld in
[#8939](#8939)

### 📦 Registry

- added sing-box by @tony-sol in
[#8944](#8944)

### Chore

- **(ci)** remove auto-draft PR workflow by @jdx in
[#8945](#8945)

### New Contributors

- @ThomasDutartre made their first contribution in
[#8955](#8955)
- @jakedgy made their first contribution in
[#8956](#8956)
- @brander-john made their first contribution in
[#8947](#8947)
- @fru1tworld made their first contribution in
[#8939](#8939)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants