Skip to content

TST: decoupling stable CI and incompatibilities discovery (let's use lock files) #17596

@neutrinoceros

Description

@neutrinoceros

Disclaimer

As the length of this issue may suggest, I've put a lot of thoughts into this proposal, which comes as a culmination of one year serving as our first line of defense against environment incompatibilities (also known as dependency hell). Maybe this should be an APE ?

TL;DR

The one-liner version of this proposal is: let's start using lock files.

Now for a slightly longer summary:
We already have a distinction between regular, required CI and jobs that are allowed to fail (allowed-failure). I propose to consolidate this distinction into a stable/unstable paradigm, using a lock file in stable jobs to get exact reproducibility of the testing environment and improve our trust that failures are always related to the patch under test.

Since lock files are currently foreign to our shared maintenance culture, and I was myself completely new to them a couple months back, I'll take a long detour to explain the motivation as well as the solution: if you don't know what a lock file is, hang in there ! I'll get to it.

Here are some external resources describing the core concepts and stakes for cargo, the rust package manager, and uv, the Python package manager inspired by cargo:

Problem: required CI isn't stable

Our regular CI comprises jobs that are required for merging, and some that are explicitly allowed to fail. The former are meant to reveal failures caused by the patch under test, while the latter are aimed to continuously reveal incoming incompatibilities with a selection of important dependencies (first and foremost, numpy).
Incoming incompatibilities indeed shouldn't block most patches, as they are usually caused by the outside world, so I think this distinction between required and allowed-failures is a great thing to have. However, I see a problem with how it is currently implemented:

Even in required jobs, the Python environment that is installed isn't stable or easily reproducible; whatever is the latest version of one of our dependencies, is what gets installed and tested against. This regularly leads to situations where required CI is failing on every single PR at once because of some upstream change we didn't happen to test early against (again, we only run such early tests against a selection of our direct dependencies, leaving lots of room for failure). Such a situation, because it is so inconvenient to the team as a whole, is often treated as an emergency, and rightfully so. Regardless how much dopamine I get by serving as our first line (and I am grateful to be of help as often as I can), I need to point that solutions found while in crisis mode may not always be the best or most robust ones.

It's also not always obvious what (if anything) changed in the environment between two runs, and a defensive reflex when ones encounter a cryptic, seemingly unrelated error, is to diff the outputs of pip freeze found in CI logs.
This is bad for two reasons:

  • it shows that we do not really trust required CI to reveal only issues with the patch under test
  • it is a somewhat tedious and error-prone task

Proposed solution (spoilers: it's lock files)

What are lock files

Simply put, a lock file is a record of a resolved environment with metadata including exact versions of all dependencies (direct, indirect, mandatory, and optional ones).
A requirements.txt file obtained via pip freeze is conceptually similar (albeit generally less portable1, or secure2), in that it allows to recreate an environment exactly.

Practically, lock files allow pinning all dependencies in a way that's not visible to end users.

Expected gains

Locking CI means that we never get surprise updates on weekends or in indirect dependencies we don't pay attention to. Updates happen on our schedule rather than being imposed by the outside world. This proposed strategy is actually not that foreign to us: we already use exact pins (paired with automated updates) for GitHub Action workflows and for pre-commit hooks.

Having a version-controlled lock files also means that we can revisit previous states of the repo and run tests within the exact same environment

The current state of standards

I want to point out that lock files are currently not standardized in the Python packaging ecosystem, which means all workflow tools that support a version of this implement their own flavor (poetry.lock, pdm.lock, uv.lock ...) but there is a PEP under discussion.

Namely PEP 751 is being discussed on discourse., note that stake holders (poetry, pdm, uv ...) are all included in the discussion, which is a good sign that this one might succeed3.

Having a standard file format for lock files would greatly improve interoperability between tools that consume them, and would allow us to migrate from one tool to another as we see fit without converting the lock file.

Proposed plan

I propose we start using a uv lock file (uv.lock) in stable CI as well as documentation (#17052) and wheel builds. I choose uv over other options because:

  • it integrates nicely with our existing toolbox
  • astral-sh/uv as a very good record on supporting newly accepted PEPs (e.g. PEP 723 or PEP 735, supported within 2 weeks after acceptance), so I'm confident that they would support converting to a standardized lock file format if there is one in the future.
  • I think I have a good understanding of how this tool work: I've been using uv more and more over the past year that it's been available.
  • I don't have experience with the other options (pdm, poetry and pixi are the ones I know about), but I hear some of them have non-ideal records in terms of standard adoption:
    • pdm support at least one PEP that was ultimately rejected
    • poetry has been very slow/reluctant to support standards, effectively locking its users in)
  • maybe it's worth mentioning pixi is already bound to uv in a sense (it uses uv internally for dependency resolution)

Deployment

There are 3 easy steps needed to adopt to this testing strategy, which I believe can be performed in a single PR:

  • generate and commit the file itself (trivial)
  • use uv in CI (this can easily be achieved as a follow up to #16963)
  • add uv-lock to pre-commit (ensure uv.lock stays in sync when dependencies are added, updated or removed in pyproject.toml)

For the record, I have been doing exactly this on smaller packages that I control.

Long term maintenance

The core idea is to decouple environment updates, and the ensuing risk of surprise incompatibility, from regular testing. However, running regular CI in an up-to-date environment remains very much needed, so keeping the lock file up to date is crucial to this strategy.
There are two questions to discuss here:

  • how frequently should it be updated ? Given how large astropy's testing stack is, I would recommend at least weekly updates, maybe even daily (5 days a week) ?
  • what service (if any) can we use ?

My personal recommendation for now would be to use renovate exclusively for this task (as opposed to replacing dependabot completely) until dependabot gains support for uv.lock or an equivalent, standardized format (which might take a while).

Expected drawbacks, side effects

In short: none that I'm aware of.

How does this affect end users and downstream packages ?

It doesn't. Lock files are not meant to be distributed and using them does not change how we specify our dependencies in pyproject.toml .

How does this affect individual maintainers ?

This does not force anyone to use uv locally4 If you manage your own local dev environment(s) with pip, conda, pyenv (you name it) and you don't care for uv, nothing changes for you.

Maintainers who do want to use uv will benefit from having a centralized and version-controlled lock file.

In general, lock file upgrades should not invalidate CI jobs that already ran on pending PRs, with the possible exception of someone working specifically around dependency issues.

How does this affect release managers ?

This removes the (admittedly small) risk of discovering incompatibilities with our dependency stack between the moment a release is tagged and when it's actually published (wheels are tested in a guaranteed-stable environment).
Lock file updates should probably be backported as much as possible.

Does this lock us into current major versions of our dependencies ?

It's been pointed out by @eerovaher that cargo, the inspiration for uv, assumes that every package in the rust ecosystem strictly follows semantic versioning, so cargo update (the command to upgrade Cargo.lock) will not perform major upgrades by default (one needs to specify --breaking to get major upgrades). uv lock --upgrade doesn't do this and treats all upgrades (major, minor, patch) the same, to the best of my knowledge, and as of uv 0.5.14.

How does this affect oldestdeps and devdeps jobs ?

It doesn't. uv simply ignore existing lock files if a different resolution strategy is requested, which is exactly what's done for oldestdeps (via --resolution=lowest-direct, see #16963), and devdeps (via --prerelease=allow, equivalent to pip's --pre flag).

Cultural impact

This proposal is a cultural shift in the testing strategy. While it is by not means required that packages downstream of astropy follow the same path, I believe that it would still be helpful to encourage it: the problem I want to resolve isn't specific to astropy. I would be happy to help coordinated and affiliated packages migrate to this new paradigm if it is adopted by astropy itself.

Footnotes

  1. Actual lock files are more portable: they can lock on different versions of a package depending, e.g., on the target platform (Linux VS Windows), or the version of Python, depending on what's available. See https://github.com/astral-sh/uv/pull/9827, for instance.

  2. lock files typically contain a lot more metadata than package names and versions, and will also record sources (usually pypi.org) and sha256 hashes of artifacts (wheels and source distributions) to be downloaded.

  3. as opposed to the previous PEP 665 it replaced

  4. except in the context of running tox, but one doesn't need to know about uv to use uv-in-tox, just like one doesn't need to know about pip to run pip-in-tox (current status quo).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions