Skip to content

Use uv for managing dependencies#923

Merged
osma merged 18 commits intomainfrom
issue919-uv-for-dependencies
Jan 15, 2026
Merged

Use uv for managing dependencies#923
osma merged 18 commits intomainfrom
issue919-uv-for-dependencies

Conversation

@osma
Copy link
Copy Markdown
Member

@osma osma commented Dec 16, 2025

This PR switches to uv for dependency management instead of Poetry.

Included changes:

  • migrate pyproject.toml to work with uv; for example, declare the dev dependencies using standard [dependency-groups] section
  • update README.md to explain how to use uv commands for development installs
  • migrate GitHub Actions to use uv instead of Poetry
  • migrate Dockerfile to use uv instead of Poetry
  • configure flake8 to skip e.g. .venv directory (uv will create the virtual environment there by default)

Some notes:

  1. uv doesn't have a command for permanently activating the venv like the old poetry shell or the newer equivalent eval $(poetry env activate). It is recommended to use uv run every time, but I think that's cumbersome. However, the venv can simply be activated using source .venv/bin/activate so I recommended that in the README. (There is an open issue about this, but it seems that implementing this properly to handle all edge cases is a bit difficult, so the developers hesitate to do it.)
  2. I changed the Docker base image to one provided by Astral / uv: ghcr.io/astral-sh/uv:python3.12-bookworm-slim. An alternative would have been to keep the vanilla base image python:3.12-slim-bookworm and just install uv within that.
  3. I think that the GitHub Actions action astral-sh/setup-uv is handling all caching of dependencies, but I haven't verified that it works in a sane way...

Overall, uv seems a bit faster than Poetry. For example, the GitHub Actions run completes in less than 3 minutes, when it used to take a bit more than 3 minutes (though with a lot of variation).

In the future, using uv should enable better management of PyTorch variants when we add it as a dependency (for example for the EBM backend - see #914 ).

Closes #919

@osma osma self-assigned this Dec 16, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.63%. Comparing base (2f7ff89) to head (436253a).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #923   +/-   ##
=======================================
  Coverage   99.63%   99.63%           
=======================================
  Files         103      103           
  Lines        8238     8238           
=======================================
  Hits         8208     8208           
  Misses         30       30           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@osma osma marked this pull request as ready for review December 17, 2025 15:04
@osma osma requested a review from juhoinkinen December 17, 2025 15:04
@osma osma changed the title [WIP] Use uv for managing dependencies Use uv for managing dependencies Dec 17, 2025
@osma
Copy link
Copy Markdown
Member Author

osma commented Dec 18, 2025

There's a potential problem when using uv together with custom spaCy models. python -m spacy download downloads a model and installs it as a Python package into the current venv using pip (that's why I added pip as a dependency in this PR; see also explosion/spaCy#13846). But the new model package is not registered in pyproject.toml, so uv doesn't know about it. Next time you run uv sync for any reason, it will delete this "extra" package, unless you give it the --inexact flag every time (there is no permanent setting for this).

This has been reported as astral-sh/uv#12481 but it was closed by uv developers as they are not planning to support this kind of use.

This has also been discussed in explosion/spaCy#13747 where it was suggested to call e.g. uv run spacy info en_core_web_sm --url to get the package URL and then passing it to uv add to add a real dependency tracked by uv. But we want to be flexible and allow users to install any spaCy models they need, so having to modify pyproject.toml (via uv add) seems a bit wrong.

OTOH, this only affects development installs (and currently Docker images that also use uv). A regular user installing Annif via pip install annif[spacy] would not be affected by this unless they specifically use uv pip.

@osma
Copy link
Copy Markdown
Member Author

osma commented Dec 18, 2025

Note to self: The pip dependency could be moved to the spacy extra instead of having it as a top level dependency. Nothing but spaCy depends on it.

EDIT: Done in 65abb14

@osma osma mentioned this pull request Dec 18, 2025
Copy link
Copy Markdown
Member

@juhoinkinen juhoinkinen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the Dockerimage size doubles, and AI tools recommended the changes as in the attached file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a possibilty to enable bytecode compilation for uv installs (apparently uv does not perform it by default?); we could add UV_COMPILE_BYTECODE=1 to Dockerfile (although when I checked its effect with plain annif command, I cannot see a difference in startup time).

@juhoinkinen
Copy link
Copy Markdown
Member

The mentioned Dockerfile here, with .txt extension to allow sharing in GitHub.

Dockerfile.txt

@osma
Copy link
Copy Markdown
Member Author

osma commented Jan 8, 2026

@juhoinkinen Thanks for the review! I made the suggested changes. The Docker image size is now 2.36 GB, which is close to the current main image size 2.3GB.

But now SonarCloud complains about ownership of copied files. See full explanation.

The problem (if it is a problem indeed) is not new but SonarCloud only noticed it now that the COPY commands were adjusted instead of doing separate chown operations.

I wonder if file ownership for the annif user is really necessary within the image?

@juhoinkinen
Copy link
Copy Markdown
Member

I tried using COPY --chmod=644 instead of COPY --chown=annif_user:annif_user , but it does not give access permissions to subdirectories; this is what resulted when building an image:

ls -l /Annif/annif/
total 192
-rw-r--r-- 1 root root  3542 Dec 18 11:46 __init__.py
drw-r--r-- 2 root root  4096 Dec 18 11:55 analyzer
drw-r--r-- 2 root root  4096 Dec 18 11:55 backend
-rw-r--r-- 1 root root 30330 Dec 18 11:46 cli.py
-rw-r--r-- 1 root root 10176 Dec 18 11:46 cli_util.py
...

The subdirectories cannot be accessed (as annif_user) and Annif cannot start:

ls -l /Annif/annif/analyzer/
ls: cannot access '/Annif/annif/analyzer/spacy.py': Permission denied
ls: cannot access '/Annif/annif/analyzer/estnltk.py': Permission denied
ls: cannot access '/Annif/annif/analyzer/simplemma.py': Permission denied
ls: cannot access '/Annif/annif/analyzer/__init__.py': Permission denied
ls: cannot access '/Annif/annif/analyzer/snowball.py': Permission denied
ls: cannot access '/Annif/annif/analyzer/analyzer.py': Permission denied
ls: cannot access '/Annif/annif/analyzer/voikko.py': Permission denied
ls: cannot access '/Annif/annif/analyzer/simple.py': Permission denied
total 0
-????????? ? ? ? ?            ? __init__.py
-????????? ? ? ? ?            ? analyzer.py
-????????? ? ? ? ?            ? estnltk.py
-????????? ? ? ? ?            ? simple.py
-????????? ? ? ? ?            ? simplemma.py
-????????? ? ? ? ?            ? snowball.py
-????????? ? ? ? ?            ? spacy.py
-????????? ? ? ? ?            ? voikko.py

Previously the chmod command was chmod -R a+rX /Annif, the capital X giving the directory access.

I think that was good: everyone could read the files, but only root could write. To achieve that with uv without doubling the image size, maybe a multistage build should be utilized. 🤔

@osma
Copy link
Copy Markdown
Member Author

osma commented Jan 8, 2026

@juhoinkinen It turned out that the chown/chmod options were mostly useless. Apparently most of these were introduced by the AI tools used to suggest changes to the Dockerfile. I adjusted the Dockerfile so it's now much closer to the original (before this PR). Image size is still ~2.37GB.

There are still a couple of open questions:

  • I dropped the chmod -R a+rX /Annif command from Dockerfile because it didn't seem to do anything useful (the file permissions look OK to me even without it), but I may have missed something
  • I didn't try the bytecode compilation that you suggested

…Annif are readable/traversable regardless of their original permissions
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Jan 8, 2026

@osma
Copy link
Copy Markdown
Member Author

osma commented Jan 8, 2026

I added the chmod command back because it is useful after all: if a developer has strict file permissions (non world readable/traversable), then they will be copied to the container with those permissions and cannot be accessed by annif_user.

I had to adjust the command a little to avoid matching .venv, which would blow up the size of the container.

@osma osma added this to the 1.5 milestone Jan 15, 2026
@osma osma merged commit 27e4ac7 into main Jan 15, 2026
20 of 24 checks passed
@osma osma deleted the issue919-uv-for-dependencies branch January 15, 2026 10:22
@osma
Copy link
Copy Markdown
Member Author

osma commented Jan 15, 2026

Posted a heads up about the change on annif-users: https://groups.google.com/g/annif-users/c/J1tikQAQQAE

@osma
Copy link
Copy Markdown
Member Author

osma commented Jan 16, 2026

Based on what I learned here in this PR, I suggested a feature for uv to support a shorter syntax for uv sync extras: uv sync -E foo,bar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider switching to uv for dependency management (esp. PyTorch)

2 participants