TST: decoupling stable CI and incompatibilities discovery (let's use lock files)

### Disclaimer
As the length of this issue may suggest, I've put a lot of thoughts into this proposal, which comes as a culmination of one year serving as our first line of defense against environment incompatibilities (also known as dependency hell). Maybe this should be an APE ?

### TL;DR
The one-liner version of this proposal is: let's start using lock files.

Now for a slightly longer summary:
We already have a distinction between regular, required CI and jobs that are allowed to fail (allowed-failure). I propose to consolidate this distinction into a stable/unstable paradigm, using a lock file in stable jobs to get exact reproducibility of the testing environment and improve our trust that failures are *always* related to the patch under test.

Since lock files are currently foreign to our shared maintenance culture, and I was myself completely new to them a couple months back, I'll take a long detour to explain the motivation as well as the solution: if you don't know what a lock file is, hang in there ! I'll get to it.

Here are some external resources describing the core concepts and stakes for `cargo`, the rust package manager,  and `uv`, the Python package manager inspired by cargo:

- [Cargo.toml vs Cargo.lock](https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html#cargotoml-vs-cargolock)
- [Why have Cargo.lock in version control?](https://doc.rust-lang.org/cargo/faq.html#why-have-cargolock-in-version-control)
- [Verifying Latest Dependencies (cargo)](https://doc.rust-lang.org/cargo/guide/continuous-integration.html#verifying-latest-dependencies)
- [Locking and syncing (uv)](https://docs.astral.sh/uv/concepts/projects/sync/#locking-and-syncing)

### Problem: *required* CI isn't *stable*

Our regular CI comprises jobs that are *required* for merging, and some that are explicitly *allowed to fail*. The former are meant to reveal failures caused by the patch under test, while the latter are aimed to continuously reveal incoming incompatibilities with a selection of important dependencies (first and foremost, numpy).
Incoming incompatibilities indeed shouldn't block most patches, as they are usually caused by the outside world, so I think this distinction between required and allowed-failures is a great thing to have. However, I see a problem with how it is currently implemented:

Even in required jobs, the Python environment that is installed isn't *stable* or easily reproducible; whatever is the latest version of one of our dependencies, is what gets installed and tested against. This regularly leads to situations where required CI is failing on every single PR at once because of some upstream change we didn't happen to test early against (again, we only run such early tests against a *selection* of our *direct* dependencies, leaving lots of room for failure). Such a situation, because it is so inconvenient to the team as a whole, is often treated as an emergency, and rightfully so. Regardless how much dopamine I get by serving as our first line (and I *am* grateful to be of help as often as I can), I need to point that solutions found while in crisis mode may not always be the best or most robust ones.

It's also not always obvious what (if anything) changed in the environment between two runs, and a defensive reflex when ones encounter a cryptic, seemingly unrelated error, is to diff the outputs of `pip freeze` found in CI logs.
This is bad for two reasons:
- it shows that we do not really trust required CI to reveal *only* issues with the patch under test
- it is a somewhat tedious and error-prone task
### Proposed solution (spoilers: it's lock files)

#### What are lock files

Simply put, a lock file is a record of a resolved environment with metadata including exact versions of all dependencies (direct, indirect, mandatory, and optional ones).
A `requirements.txt` file obtained via `pip freeze` is conceptually similar (albeit generally less portable[^1], or secure[^2]), in that it allows to recreate an environment exactly.

Practically, lock files allow pinning all dependencies in a way that's not visible to end users.

[^1]: Actual lock files are more portable: they can lock on *different* versions of a package depending, e.g., on the target platform (Linux VS Windows), or the version of Python, depending on what's available. See https://github.com/astral-sh/uv/pull/9827, for instance.
[^2]: lock files typically contain a lot more metadata than package names and versions, and will also record sources (usually `pypi.org`) and `sha256` hashes of artifacts (wheels and source distributions) to be downloaded. 
##### Expected gains
Locking CI means that we never get surprise updates on weekends or in indirect dependencies we don't pay attention to. Updates happen on *our* schedule rather than being imposed by the outside world. This proposed strategy is actually not that foreign to us: we already use exact pins (paired with automated updates) for GitHub Action workflows and for pre-commit hooks.

Having a version-controlled lock files also means that we can revisit previous states of the repo and run tests within the exact same environment 

#### The current state of standards

I want to point out that lock files are currently not standardized in the Python packaging ecosystem, which means all workflow tools that support a version of this implement their own flavor (`poetry.lock`, `pdm.lock`, `uv.lock` ...) but there *is* a PEP under discussion.

Namely [PEP 751](https://peps.python.org/pep-0751/) is being discussed on [discourse](https://discuss.python.org/t/pep-751-now-with-graphs/69721/4)., note that stake holders (`poetry`, `pdm`, `uv` ...) are all included in the discussion, which is a good sign that this one might succeed[^3].

[^3]: as opposed to the previous [PEP 665](https://peps.python.org/pep-0665/) it replaced

Having a standard file format for lock files would greatly improve interoperability between tools that consume them, and would allow us to migrate from one tool to another as we see fit without converting the lock file.
#### Proposed plan

I propose we start using a `uv` lock file (`uv.lock`) in stable CI as well as documentation (https://github.com/astropy/astropy/issues/17052) and wheel builds. I choose `uv` over other options because:
- it integrates nicely with our existing toolbox 
	- tox (via [`tox-uv`](https://pypi.org/project/tox-uv/))
	- [ReadTheDocs](https://docs.readthedocs.io/en/stable/build-customization.html#install-dependencies-with-uv)
	- [cibuildwheel](https://cibuildwheel.pypa.io/en/stable/options/#build-frontend)
	- [dependabot](https://github.blog/changelog/2025-03-13-dependabot-version-updates-now-support-uv-in-general-availability/)
- astral-sh/uv as a very good record on supporting newly accepted PEPs (e.g. [PEP 723](https://peps.python.org/pep-0723/) or [PEP 735](https://peps.python.org/pep-0735/), supported within 2 weeks after acceptance), so I'm confident that they would support converting to a standardized lock file format if there is one in the future.
- I think I have a good understanding of how this tool work: I've been using `uv` more and more over the past year that it's been available.
- I don't have experience with the other options (`pdm`, `poetry` and `pixi` are the ones I know about), but I hear some of them have non-ideal records in terms of standard adoption:
	- `pdm` support at least one PEP that was ultimately rejected
	- poetry has been very slow/reluctant to support standards, effectively locking its users in)
- maybe it's worth mentioning `pixi` is already bound to uv in a sense (it uses uv internally for dependency resolution)

#### Deployment

There are 3 easy steps needed to adopt to this testing strategy, which I believe can be performed in a single PR:
- generate and commit the file itself (trivial)
- use uv in CI (this can easily be achieved as a follow up to [#16963](https://github.com/astropy/astropy/pull/16963))
- add [uv-lock](https://github.com/astral-sh/uv-pre-commit) to `pre-commit` (ensure `uv.lock` stays in sync when dependencies are added, updated or removed in `pyproject.toml`)

For the record, I have been doing exactly this on smaller packages that I control.

#### Long term maintenance

The core idea is to decouple environment updates, and the ensuing risk of surprise incompatibility, from regular testing. However, running regular CI in an up-to-date environment remains very much needed, so keeping the lock file up to date is crucial to this strategy.
There are two questions to discuss here:

- how frequently should it be updated ? Given how large astropy's testing stack is, I would recommend at least weekly updates, maybe even daily (5 days a week) ?
- what service (if any) can we use ?
	- ~dependabot doesn't support uv.lock ([yet ?](https://github.com/dependabot/dependabot-core/issues/10478))~
	- [dependabot supports `uv.lock`](https://github.blog/changelog/2025-03-13-dependabot-version-updates-now-support-uv-in-general-availability/)
	- [renovate does too](https://docs.astral.sh/uv/guides/integration/dependency-bots/#renovate)
	- we could also choose to only perform/commit manual upgrades using `uv lock --upgrade` locally

My personal recommendation for now would be to use `renovate` exclusively for this task (as opposed to replacing dependabot completely) until dependabot gains support for `uv.lock` or an equivalent, standardized format (which might take a while).

### Expected drawbacks, side effects

In short: none that I'm aware of.

#### How does this affect end users and downstream packages ?
It doesn't. Lock files are not meant to be distributed and using them does not change how we specify our dependencies in `pyproject.toml` .

#### How does this affect individual maintainers ?
**This does not force anyone to use uv locally**[^4]  If you manage your own local dev environment(s) with `pip`, `conda`, `pyenv` (you name it) and you don't care for `uv`, nothing changes for you.

Maintainers who *do* want to use uv will benefit from having a centralized and version-controlled lock file.

In general, lock file upgrades should not invalidate CI jobs that already ran on pending PRs, with the possible exception of someone working specifically around dependency issues.

[^4]: except in the context of running tox, but one doesn't need to *know* about uv to use uv-in-tox, just like one doesn't need to know about `pip` to run pip-in-tox (current status quo).

##### How does this affect release managers ?
This removes the (admittedly small) risk of discovering incompatibilities with our dependency stack between the moment a release is tagged and when it's actually published (wheels are tested in a guaranteed-stable environment).
Lock file updates should probably be backported as much as possible.

#### Does this *lock* us into current major versions of our dependencies ?

It's been pointed out by @eerovaher that `cargo`, the inspiration for `uv`, assumes that every package in the rust ecosystem strictly follows semantic versioning, so `cargo update` (the command to upgrade `Cargo.lock`) will not perform major upgrades by default (one needs to specify `--breaking` to get major upgrades). `uv lock --upgrade` doesn't do this and treats all upgrades (major, minor, patch) the same, to the best of my knowledge, and as of uv 0.5.14.

#### How does this affect `oldestdeps` and `devdeps` jobs ?
It doesn't. `uv` simply ignore existing lock files if a different resolution strategy is requested, which is exactly what's done for `oldestdeps` (via `--resolution=lowest-direct`, see  [#16963](https://github.com/astropy/astropy/pull/16963)), and `devdeps` (via `--prerelease=allow`, equivalent to pip's `--pre` flag).

#### Cultural impact
This proposal is a cultural shift in the testing strategy. While it is by not means required that packages downstream of astropy follow the same path, I believe that it would still be helpful to encourage it: the problem I want to resolve isn't specific to astropy. I would be happy to help coordinated and affiliated packages migrate to this new paradigm if it is adopted by astropy itself.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TST: decoupling stable CI and incompatibilities discovery (let's use lock files) #17596

Disclaimer

TL;DR

Problem: required CI isn't stable

Proposed solution (spoilers: it's lock files)

What are lock files

Expected gains

The current state of standards

Proposed plan

Deployment

Long term maintenance

Expected drawbacks, side effects

How does this affect end users and downstream packages ?

How does this affect individual maintainers ?

How does this affect release managers ?

Does this lock us into current major versions of our dependencies ?

How does this affect `oldestdeps` and `devdeps` jobs ?

Cultural impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

TST: decoupling stable CI and incompatibilities discovery (let's use lock files) #17596

Description

Disclaimer

TL;DR

Problem: required CI isn't stable

Proposed solution (spoilers: it's lock files)

What are lock files

Expected gains

The current state of standards

Proposed plan

Deployment

Long term maintenance

Expected drawbacks, side effects

How does this affect end users and downstream packages ?

How does this affect individual maintainers ?

How does this affect release managers ?

Does this lock us into current major versions of our dependencies ?

How does this affect oldestdeps and devdeps jobs ?

Cultural impact

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

How does this affect `oldestdeps` and `devdeps` jobs ?