Handle the hard link limit gracefully instead of failing#17699
Handle the hard link limit gracefully instead of failing#17699EliteTK merged 8 commits intoastral-sh:mainfrom
Conversation
|
Before making a change, we need a clearer description of the problem. Can you share instructions how to reproduce the problem, as well as the error message you received and the logs? Why do we need an extra option for this, can we handle the better internally? |
As explained in the description you can reproduce this by using EFS and installing a package in more than 177 python environments. That's probably the easiest way.
You would get an error like
I think we can detect the os error and handle it automatically, but I wasn't very fond of introducing an install bottleneck without the user being aware of it. Adding an explicit option seemed to ensure that the user was aware the bottleneck was there. But I'm happy to change to automatic handling of the error if you feel it's an approach more aligned to |
|
Rather than having the user figure out the hard link limit and specify it themselves, we should just handle Moreover, I believe we don't lock the cache for exclusive access in these cases, so there's a chance that two instances of uv end up copying the file at the same time. But the way this is written, this won't cause any corruption, just a bit of unnecessary work. |
Sounds good! I'll move the implementation in that direction and I'll create a script to reproduce the issue to verify it, as it's not as easy as before to reproduce the condition at that point. |
|
The test is flagged as passed in the CI But I don't know if we have a way to see if it was skipped or not because the logging isn't shown. PS: On my system I know it's not skipped and works as expected, I can see that via the logging |
|
@EliteTK anything I can do to help with this PR? I have to manually cleanup links one on of our systems once in a while when the deploy fails, so I'd love to see this one go |
|
@amol- Sorry, I just hadn't gotten around to re-reviewing it yet. But I should be able to do that next week. |
|
@EliteTK thanks for reviewing this! Any updates on when you think you will have some bandwidth to finish taking a look? |
EliteTK
left a comment
There was a problem hiding this comment.
Are there any other places where we make a hardlink where we should instead be calling this?
|
@EliteTK I still plan to move this forward. I'm just stuck with limited internet connectivity until next week. Then I'll address your suggestions |
|
moving this back to a draft until I finished addressing review comments |
d78580e to
8312db4
Compare
fb47ea2 to
9aa6873
Compare
| - name: "Create minix filesystem (low hardlink limit)" | ||
| run: | | ||
| truncate -s 16M /tmp/minix.img | ||
| mkfs.minix /tmp/minix.img |
There was a problem hiding this comment.
I went for minix because it has a limit of 250 hard links, thus makes the test faster to run
|
@EliteTK I rebased against main, addressed your comments and made the test run faster and more predictably via a dedicated file system. It should now be ready for re-review |
|
This is looking pretty great actually! Thanks. I'll try to review this properly soonish (probably Monday at this rate) but just a quick thing i noticed. We should still set |
Added the setup for windows too, the CI seems to confirm it worked -> https://github.com/astral-sh/uv/actions/runs/22497686685/job/65176453709?pr=17699#step:11:1597 + https://github.com/astral-sh/uv/actions/runs/22497686685/job/65176453709?pr=17699#step:11:31 |
EliteTK
left a comment
There was a problem hiding this comment.
Looks good, there's a couple of small things. I can probably just nip them myself tomorrow, but feel free to address them yourself.
crates/uv-fs/src/link.rs
Outdated
| "Resetting cache entry due to too many hardlinks: {}", | ||
| src.display() | ||
| ); | ||
| let parent = src.parent().ok_or_else(|| { |
There was a problem hiding this comment.
Should probably also handle src.parent() returning an empty path (indicating it was passed a single component relative path).
There was a problem hiding this comment.
I'm a bit confused by this one, I'd expect that the installed file is always in a package, like /home/user/.cache/uv/.../package-1.0/file.py so not clear when src.parent() could not exist.
But I guess it won't cause any harm handling an mpty parent,
c72de08 should handle both the case of invalid parents (like a parent for /) and the case of missing parents. Both will lead to "current directory"
Co-authored-by: Tomasz Kramkowski <tomasz@kramkow.ski>
|
@EliteTK it smells like the CI failures are transient errors, do you have a chance to rerun the CI? I don't have the option available |
|
my last commit yesterday restarted the tests, but they failed again with 401 on requests to setup build infrastructure, not sure if there is a way I can intervene |
|
Those are apparently because as an external contributor your CI runs don't get the creds required for those... So you can just ignore them, I guess once this hits main if there is an issue then it can be reverted. Other than that, PR looks good. I was wrong about needing to copy the permissions, thought the code wrote into the tempfile rather than overwriting with I didn't want to waste your time on these last couple of things so I pushed a commit. Will merge this once the rest of CI passes. |
|
@EliteTK thanks for all the support in getting this one through. I really appreciated your help! |
|
@amol- thanks for the contribution :) |
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [uv](https://github.com/astral-sh/uv) | patch | `0.10.7` → `0.10.9` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (uv)</summary> ### [`v0.10.9`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0109) [Compare Source](astral-sh/uv@0.10.8...0.10.9) Released on 2026-03-06. ##### Enhancements - Add `fbgemm-gpu`, `fbgemm-gpu-genai`, `torchrec`, and `torchtune` to the PyTorch list ([#​18338](astral-sh/uv#18338)) - Add torchcodec to PyTorch List ([#​18336](astral-sh/uv#18336)) - Log the duration we took before erroring ([#​18231](astral-sh/uv#18231)) - Warn when using `uv_build` settings without `uv_build` ([#​15750](astral-sh/uv#15750)) - Add fallback to `/usr/lib/os-release` on Linux system lookup failure ([#​18349](astral-sh/uv#18349)) - Use `cargo auditable` to include SBOM in uv builds ([#​18276](astral-sh/uv#18276)) ##### Configuration - Add an environment variable for `UV_VENV_RELOCATABLE` ([#​18331](astral-sh/uv#18331)) ##### Performance - Avoid toml `Document` overhead ([#​18306](astral-sh/uv#18306)) - Use a single global workspace cache ([#​18307](astral-sh/uv#18307)) ##### Bug fixes - Continue on trampoline job assignment failures ([#​18291](astral-sh/uv#18291)) - Handle the hard link limit gracefully instead of failing ([#​17699](astral-sh/uv#17699)) - Respect build constraints for workspace members ([#​18350](astral-sh/uv#18350)) - Revalidate editables and other dependencies in scripts ([#​18328](astral-sh/uv#18328)) - Support Python 3.13+ on Android ([#​18301](astral-sh/uv#18301)) - Support `cp3-none-any` ([#​17064](astral-sh/uv#17064)) - Skip tool environments with broken links to Python on Windows ([#​17176](astral-sh/uv#17176)) ##### Documentation - Add documentation for common marker values ([#​18327](astral-sh/uv#18327)) - Improve documentation on virtual dependencies ([#​18346](astral-sh/uv#18346)) ### [`v0.10.8`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0108) [Compare Source](astral-sh/uv@0.10.7...0.10.8) Released on 2026-03-03. ##### Python - Add CPython 3.10.20 - Add CPython 3.11.15 - Add CPython 3.12.13 ##### Enhancements - Add Docker images based on Docker Hardened Images ([#​18247](astral-sh/uv#18247)) - Add resolver hint when `--exclude-newer` filters out all versions of a package ([#​18217](astral-sh/uv#18217)) - Configure a real retry minimum delay of 1s ([#​18201](astral-sh/uv#18201)) - Expand `uv_build` direct build compatibility ([#​17902](astral-sh/uv#17902)) - Fetch CPython from an Astral mirror by default ([#​18207](astral-sh/uv#18207)) - Download uv releases from an Astral mirror in installers by default ([#​18191](astral-sh/uv#18191)) - Add SBOM attestations to Docker images ([#​18252](astral-sh/uv#18252)) - Improve hint for installing meson-python when missing as build backend ([#​15826](astral-sh/uv#15826)) ##### Configuration - Add `UV_INIT_BARE` environment variable for `uv init` ([#​18210](astral-sh/uv#18210)) ##### Bug fixes - Prevent `uv tool upgrade` from installing excluded dependencies ([#​18022](astral-sh/uv#18022)) - Promote authentication policy when saving tool receipts ([#​18246](astral-sh/uv#18246)) - Respect exclusions in scripts ([#​18269](astral-sh/uv#18269)) - Retain default-branch Git SHAs in `pylock.toml` files ([#​18227](astral-sh/uv#18227)) - Skip installed Python check for URL dependencies ([#​18211](astral-sh/uv#18211)) - Respect constraints during `--upgrade` ([#​18226](astral-sh/uv#18226)) - Fix `uv tree` orphaned roots and premature deduplication ([#​17212](astral-sh/uv#17212)) ##### Documentation - Mention cooldown and tweak inline script metadata in dependency bots documentation ([#​18230](astral-sh/uv#18230)) - Move cache prune in GitLab to `after_script` ([#​18206](astral-sh/uv#18206)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My40OS4wIiwidXBkYXRlZEluVmVyIjoiNDMuNTcuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6OnBhdGNoIl19-->
Summary
Handle the case where too many hardlinks were created and thus installing packages fails.
There are cases where the file system can have a hardlinks limit and when it's hit
uvfails,for example AWS EFS that has a limit of 177 hard links ( https://docs.aws.amazon.com/efs/latest/ug/troubleshooting-efs-fileop-errors.html#hardlinkerror )
This PR address this by resetting the hardlinks when the limit is reached (it does this by replacing the file in the cache, so new hardlinks can be made)
There can be race conditions over the limit, but those are ok, as the links are reset atomically so in case of race conditions the worst case will be that the limit is reset twice, but nothing will break.
Test Plan
Add a
install_hardlink_after_emlinkfunction, it successfully reproduced the issue on my system.The issue is that it's expensive to run as it has to generate a lot of hardlinks