-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Guard against concurrent cache writes on Windows #11007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d69d88b to
3548f59
Compare
|
I'll fix up the tests on approval -- they're just count offsets due to writing more files to the cache (sadly). |
|
Can you add the script you used to provoke the error and check that it's test? Otherwise we don't have a regression test should we ever have to revisit this code. |
| Connectivity::Offline => CacheControl::AllowStale, | ||
| }; | ||
|
|
||
| // Acquire an advisory lock, to guard against concurrent writes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you extend the comment explaining the [cfg(windows)]?
| /// On Windows, this uses the `junction` crate to create a junction point. The | ||
| /// operation is _not_ atomic, as we first delete the junction, then create a | ||
| /// junction at the same path. | ||
| /// | ||
| /// Note that because junctions are used, the source must be a directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do i read this comment correctly that there is no way for us to make this atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try, but making it atomic doesn't solve the problem. We could still run into issues whereby we attempt to overwrite the file while it's open, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know, i didn't think of that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's somewhat tragic. I think we should still try to make this atomic though.
Sorry, do you mean, add a test to the codebase itself? The script I used is in the PR summary. |
|
Sorry, totally overlooked that. Is it possible to port this to an integration test, or alternatively could we add that script to |
3548f59 to
8796e3f
Compare
8796e3f to
b749ee4
Compare
|
@konstin -- I'm just gonna leave it in here for now. I think this is ~as good as putting it in |
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.5.24` -> `0.5.25` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (astral-sh/uv)</summary> ### [`v0.5.25`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0525) [Compare Source](astral-sh/uv@0.5.24...0.5.25) ##### Enhancements - Allow installation of manylinux wheels on loongarch64 ([#​10927](astral-sh/uv#10927)) - Allow optional `=` for editables in `requirements.txt` ([#​10954](astral-sh/uv#10954)) - Add Windows aarch64 to the release binaries ([#​10885](astral-sh/uv#10885)) ##### Bug fixes - Use spec-compliant (`128+n`) exit codes for `uv run` and `uv tool run` on Unix ([#​10781](astral-sh/uv#10781)) - Fix best-interpreter lookups when there is an invalid interpreter in the `PATH` ([#​11030](astral-sh/uv#11030)) - Guard against concurrent cache writes on Windows ([#​11007](astral-sh/uv#11007)) - Prioritize package preferences with greater package versions ([#​10963](astral-sh/uv#10963)) - Reject `--editable` flag on non-directory requirements ([#​10994](astral-sh/uv#10994)) - Respect `--no-sources` for `uv pip install` workspace discovery ([#​11003](astral-sh/uv#11003)) - Set `JEMALLOC_SYS_WITH_LG_PAGE=16` in ARM Docker builds ([#​10943](astral-sh/uv#10943)) - Update `riscv64` Python downloads to allow install on `riscv64gc` ([#​10937](astral-sh/uv#10937)) - Fix file persist retries on Windows ([#​11008](astral-sh/uv#11008)) - Fix incorrect error message when specifying `tool.uv.sources.(package).workspace` with other options ([#​11013](astral-sh/uv#11013)) - Improve SIGINT handling in `uv run` ([#​11009](astral-sh/uv#11009)) ##### Documentation - Add `SECURITY` policy ([#​11035](astral-sh/uv#11035)) - Add `Requires-Python` upper bound behavior to the docs ([#​10964](astral-sh/uv#10964)) - Add a troubleshooting section and reproducible example guide ([#​10947](astral-sh/uv#10947)) - Add documentation for `uv add -r` ([#​10926](astral-sh/uv#10926)) - Amend `requires-python` rules in resolver documentation ([#​10993](astral-sh/uv#10993)) - Reference workspaces in `--no-sources` documentation ([#​10995](astral-sh/uv#10995)) - Update documentation for activating virtual environments in different shell ([#​11000](astral-sh/uv#11000)) - Add Docker SHA pinning tip ([#​10955](astral-sh/uv#10955)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xMzcuMiIsInVwZGF0ZWRJblZlciI6IjM5LjEzNy4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
* main: (53 commits) Shorten "Using existing Python versions" nav item so it fits on one line (astral-sh#11077) docs: suggest copy linking for GitLab integration guide (astral-sh#11067) Refactor `uv tool run` hint into separate function (astral-sh#11069) Fix typo in no-deps docs/comments/cli description (astral-sh#11073) Allow `--no-dev --invert` in `uv tree` (astral-sh#11068) Add docs for signal handling (astral-sh#11041) Add a bit more context about SIGTERM and PID 1 (astral-sh#11036) Reflow CLI documentation comments (astral-sh#11040) doc typo: unnecessary backslashes to represent brackets in markdown (astral-sh#11059) Update Dependabot links (astral-sh#11054) Document `gather_credentials` (astral-sh#11024) Link to our MRE documentation in the issue template (astral-sh#11045) Avoid sharing state between universal and non-universal resolves (astral-sh#11051) Mark metadata as dynamic when reading from built wheel cache (astral-sh#11046) Fix formatting of `RUST_LOG` documentation (astral-sh#10053) Bump version to 0.5.25 (astral-sh#11042) Add CVE disclosure to security policy (astral-sh#11037) Guard against concurrent cache writes on Windows (astral-sh#11007) Add SECURITY policy (astral-sh#11035) Improve SIGINT handling in `uv run` (astral-sh#11009) ...
This commit addresses the remaining concurrency issues on Windows when multiple processes try to install the same package concurrently. The previous fix in PR astral-sh#11007 added locks to cache operations, but the issue persisted because the synchronized_copy function only locked the destination directory. When multiple processes try to install the same package, one process may be reading from the archive cache while another is still writing to it, causing 'file is being used by another process' errors on Windows. This fix adds source directory locking on Windows in the synchronized_copy function to prevent concurrent reads while another process is writing. Fixes astral-sh#11002
astral-sh#11007) This branch is based on commit 321f8cc, which is the parent commit of PR astral-sh#11007 that added cache locking improvements. Testing this historical baseline will prove that the concurrent cache access issue existed BEFORE the fix was implemented. Test configuration: - 40 concurrent jobs - 3 iterations with cache clearing between each - Fail-fast on first failure Expected result: Test should FAIL with concurrent cache access errors, proving the issue was real and needed the fix.
Summary
On Windows, we have a lot of issues with atomic replacement and such. There are a bunch of different failure modes, but they generally involve: trying to persist a fail to a path at which the file already exists, trying to replace or remove a file while someone else is reading it, etc.
This PR adds locks to all of the relevant database paths. We already use these advisory locks when building source distributions; now we use them when unzipping wheels, storing metadata, etc.
Closes #11002.
Test Plan
I ran the following script:
And ensured it succeeded in five straight invocations (whereas on
main, it consistently fails with a variety of different traces).