fix(install): limit preparer concurrency to prevent file handle exhaustion by denyszhak · Pull Request #17633 · astral-sh/uv

denyszhak · 2026-01-20T22:54:03Z

Summary

Resolves #17512

Fixes an issue where uv could exhaust the operating system's file descriptor limit when installing projects with a large number of local dependencies, causing a "Too many open files (os error 24)" crash.

Test Plan

Built uv version that has an issue
Issued ulimit -n 256 to set number of allowed descriptors to a low value
mkdir -p /tmp/airflow-test
git clone --depth 1 https://github.com/apache/airflow /tmp/airflow-test
cd /tmp/airflow-test
/Users/denyszhak/personal-repos/uv/target/debug/uv sync

Bug: Hit Too many open files (os error 24) at path "/Users/denyszhak/.cache/uv/sdists-v9/editable/136c133a3434a0fa/.tmpZtqKQt"

Built uv version with the fix (source branch)
/Users/denyszhak/personal-repos/uv/target/debug/uv sync

Fix: Successfully Resolved 922 packages in 10.82s

denyszhak · 2026-01-20T23:06:56Z

Not sure why test job is failing, seems unrelated to the change at first.

konstin · 2026-01-21T11:53:50Z

Yep the failure is unrelated: #17637

konstin · 2026-01-21T12:04:14Z

Interesting, I thought we were already using limits early but it seems we aren't.

CC @charliermarsh for the preparer code.

crates/uv/src/commands/pip/operations.rs

Avoid problems such as #15307, follow-up to #18054. See also #17633, for which this should be helpful.

crates/uv/tests/it/sync.rs

konstin · 2026-02-23T09:48:49Z

crates/uv-dispatch/src/lib.rs

    sources: NoSources,
    workspace_cache: WorkspaceCache,
    concurrency: Concurrency,
+    source_distribution_limiter: Arc<Semaphore>,


We should really handle this better globally: #18156

thanks, closing this one

It should be enough to rebase this on top of the other PR once it lands.

Let me wait then, I thought you plan handling this thing there as well

I haven't looked into this one. I believe we can handle them semi-independently, #18156 should only remove the amount of plumbing here, while we still need to figure out the correct approach to the file locks in this PR. It's also fine by me if we land this one first and I rebase #18156 on top of it.

I would wait for your merged first so this one can be more local on the exact thing it addresses

#18156 is merged now.

Sorry for the delay, I wanted a bit more time to evaluate other ways to approach this.

Short summary of the current change so you can evaluate wether you like the direction.

I think this is the most local change that fixes the FD issue while preserving the existing lock-based source-build coordination.

I considered moving builds_semaphore earlier for source-distribution paths instead of introducing a separate semaphore, but that makes the PythonRunner interface uglier because it then needs two modes depending on whether the caller already holds a permit.

Since the advisory cache lock remains long-lived in the current design, the new semaphore needs to cover that same region to actually bound the lock heavy section that was causing the FD spike. A cleaner longterm solution is probably a larger redesign with shorter cache lock lifetimes and explicit in-flight dedupe (more explicit coordination for source builds, something similar to coordination used in the installer), but that seems like a bigger change than just a fix for this issue.

Avoid problems such as #15307, follow-up to #18054. See also #17633, for which this should be helpful.

…stion

…austion

denyszhak · 2026-03-15T18:15:27Z

crates/uv-distribution/src/source/mod.rs


+        // Acquire the concurrency permit and advisory lock.
+        let _permit = self.acquire_concurrency_permit().await;
+        let _lock = lock_shard.lock().await.map_err(Error::CacheLock)?;


I moved the lock to later on purpose. If I had just added the semaphore where the old lock was, it would also start limiting revision and download work on some paths The tradeoff is that it could potentially duplicate a bit of earlier work, but we still re-check under lock before building, so final cache correctness stays the same.

url_revision reads and writes cache_shard.entry(HTTP_REVISION). Doesn't that mean there can now be a race as two processes read or write this file? I don't think we're completely on the wrong path here, it's just that this code is tricky for ensuring that all operations that two different processes can do are either atomic replaces (e.g., building a directory and putting it in place) or protected by a lock, and that there can't be TOCTOU problems between processes (such as process A reading that something is fitting from the metadata, then process B atomically replaces the directory, and then B reads the actual content of the switched thing that isn't fitting anymore)

There can be a race on HTTP_REVISION but it's not a harmful one. It's not TOCTOU because we read url_revision once, and we keep it in memory, and then use revision.id() for the build. We do not re-read HTTP_REVISION later, so we don’t “check one revision, then use another".

I agree this concern is valid so I’m just trying to avoid it here.

Let me actually do more testing on this specific concern and capture some state to confirm this is harmless, like running two separate uv sync so they will share the same cache.

I'm thinking about the case of two uv processes specifically, which without a lock, could read and write url_revision in parallel, if i read it correctly.

Yes, I think that read is correct. That part was intentional, the goal here was to push the semaphore/lock boundary as far down as possible so we don’t put the earlier revision/download/extract work behind semaphore as well.

I agree this is a race, but the behavior I expect from it is duplicate early work (multiple revision shards), not a later “checked one revision, built another”.

So I think the local tradeoff is:

lock/semaphore later: lets more of the revision/download work run in parallel, but two processes can duplicate some work for the same source

lock earlier: avoids that race on the revision, but puts more of the earlier IO behind new semaphore.

2 is the safer choice. My hesitation was mainly the earlier concern about adding more serialization around the IO-heavy part of the work.

On the test side, I added an integration test that runs two separate uv processes against the same direct-URL dependency and shared cache, with a delayed local HTTP response to widen the race window. It verifies both succeed, then runs a third uv lock --offline against the same cache. That last step checks that the cache left behind by the concurrent race is still reusable afterward. In this test, the result looked like duplicate early work rather than an unusable or obviously corrupted cache.

denyszhak@807a93b - that's how it looks

Ahh, my re-check "under the lock" is stupid

There's two constraints we need to look for: For all the IO operations in the cache and in venvs, we need to ensure that either all writes are atomic (create a file or directory, then rename it to the target) or we hold a lock while we do them. Additionally, we need to make sure that applies to when the data is streched across e.g. a cache info file and a cache data file. What I'm trying to figure out is, can we statically, from the code, assert that that's the case here, or do we need the lock for the larger scope again?

I think the first constraint is mostly satisfied here, but not the second. The second looks possible to satisfy too, but it would take a bit more restructuring. Let me get this version implemented first, and then we can decide whether we want to push it further.

konstin added the bug Something isn't working label Jan 21, 2026

konstin requested a review from charliermarsh January 21, 2026 12:04

konstin reviewed Jan 21, 2026

View reviewed changes

crates/uv/src/commands/pip/operations.rs Outdated Show resolved Hide resolved

denyszhak marked this pull request as draft February 23, 2026 00:18

konstin added a commit that referenced this pull request Feb 23, 2026

Use global semaphores for concurrency limits

8b427a2

Avoid problems such as #15307, follow-up to #18054. See also #17633, for which this should be helpful.

konstin mentioned this pull request Feb 23, 2026

Use global semaphores for concurrency limits #18156

Merged

konstin reviewed Feb 23, 2026

View reviewed changes

konstin added a commit that referenced this pull request Feb 23, 2026

Use global semaphores for concurrency limits

599effa

Avoid problems such as #15307, follow-up to #18054. See also #17633, for which this should be helpful.

konstin added a commit that referenced this pull request Feb 25, 2026

Use global semaphores for concurrency limits

1369a7b

Avoid problems such as #15307, follow-up to #18054. See also #17633, for which this should be helpful.

konstin added a commit that referenced this pull request Feb 25, 2026

Use global semaphores for concurrency limits (#18156)

3697a2d

Avoid problems such as #15307, follow-up to #18054. See also #17633, for which this should be helpful.

denyszhak added 2 commits March 1, 2026 23:50

fix(install): limit preparer concurrency to prevent file handle exhau…

1070cfa

…stion

fix(install): apply single semaphore internally during build

d830fc7

denyszhak force-pushed the fix/installer-concurrency branch from 51390d1 to 2468a97 Compare March 2, 2026 01:00

fix(install): bound source-distribution preparation to prevent FD exh…

949397f

…austion

denyszhak force-pushed the fix/installer-concurrency branch from 2468a97 to 949397f Compare March 2, 2026 01:11

denyszhak commented Mar 15, 2026

View reviewed changes

denyszhak marked this pull request as ready for review March 15, 2026 18:15

denyszhak requested a review from konstin March 15, 2026 18:16

Conversation

denyszhak commented Jan 20, 2026

Summary

Test Plan

Uh oh!

denyszhak commented Jan 20, 2026

Uh oh!

konstin commented Jan 21, 2026

Uh oh!

konstin commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

konstin Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

denyszhak Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

konstin Mar 17, 2026 •

edited

Loading

denyszhak Mar 24, 2026 •

edited

Loading