Skip to content

Unpack error when using uv to install a lot of big packages in Docker in WSL2 #2833

@potiuk

Description

@potiuk

Not sure if Docker has an influence here, because I am not a WSL2 user, but it seems that uv's pretty aggressive networking use has some negative side effects when you install a lot of big packages in WSL2 .

This has been reported by quite a few Airflow WSL2 contributors who use our breeze development environment and got failures when they tried to build our CI image in WSL2, seems that the default 300 seconds timeout from UV was not enough and increasing the UV_HTTP_TIMEOUT to 900 helped to solve the situation.

A bit of context - why we need to install big number of big packages sometimes

In Airflow we have a common development environment called breeze ("It's a breeze to contirbute to Airflow" is the theme there). This development environment is basically a (rather sophisticated) Python sub-project - wrapper around docker-compose and docker that makes it easy to get a common development environment for everyone, that is 1-1 corresponding to the CI environment we use. Due to the number of dependencies (~700 including the transitive ones) for the complete suite of Airflow + 90 providers we have (we get them all in one monorepo and develop/hack on them together) - the CI image is big, and we use the image as the ultimate common environment to avoid "works for me" syndrome - on top of local venvs people use. To gain all the speed improvesments, default package installer for the CI image is uv since almost the first day it has been public, we still have an option to use pip but uv is default - which is a tremendous improvement in case you need - for whatever reason - rebuild the image locally rather than use remote docker cache (which is the primarey way so far we improved the experience of synchronising the image between contributors).

So in case you have some new dependencies or if our cache is being refreshed, the contributors might sometimes get into situation where the image needs to be rebuilt and what happens essentially is:

  • download and unpack latest airflow repo:
curl -fsSL "https://github.com/apache/airflow/archive/main.tar.gz" | tar xz -C "${TEMP_AIRFLOW_DIR}" --strip 1
  • Install airflow in editable mode (which pulls all the ~700 dependencies:
uv pip install --editable "${TEMP_AIRFLOW_DIR}[devel-ci]"

We do it all with UV_NO_CACHE="true" - simply because having UV cache in this case increases the size of the image by a 1GB or so, and we found that docker layer caching in this case works quite a bit better - especially for people who do not have a good network.

Problem

The problem is that when running this uv pip install command in WSL2's docker, the command reliably and consistently fails (regardless of how good connectivity people have) with:

107.8 + uv pip install --python /usr/local/bin/python --editable '/tmp/tmp.Pt7KEZ8PVo[devel-ci]'
115.4 Built 1 editable in 9.05s
245.0 error: Failed to download and build: pyspark==3.5.1
245.0   Caused by: Failed to extract archive
245.0   Caused by: failed to unpack `/tmp/.tmpsE5COZ/built-wheels-v2/.tmp8M9bFw/pyspark-3.5.1/deps/jars/hadoop-client-api-3.3.4.jar`
245.0   Caused by: failed to unpack `pyspark-3.5.1/deps/jars/hadoop-client-api-3.3.4.jar` into `/tmp/.tmpsE5COZ/built-wheels-v2/.tmp8M9bFw/pyspark-3.5.1/deps/jars/hadoop-client-api-3.3.4.jar`
245.0   Caused by: request or response body error: error reading a body from connection: unexpected end of file
245.0   Caused by: error reading a body from connection: unexpected end of file
245.0   Caused by: unexpected end of file

Usually it happens for spark.

After some back-forth we found that what reliably works is increasing the UV_HTTP_TIMEOUT. Getting it to 900 seems to reliably solve the problem (we did not try 600, usually when I suspect timeout being the problem I increase it 3x).

For now I implemented a temporary workaround in breeze - by detecting WSL2 and increasing the default timeout to 900 in WSL2 (apache/airflow#38742), but I suspect (and looking at the error) this is caused by some of the optimizations done by uv - I believe UV does not initially download all the packages, only some parts of it to retrieve missing metadata (or at least that's what the peek into the code of uv showed) - and the error suggests that the problem results from using those partially downloaded packages (that's the intelligent guess at least). Not sure if it can be helped, other than increasing the timeout but possibly there is a race or error condition that triggers it and it could be avoided.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions