-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
A strategy for evolving the pkgs/by-name CI checks #256788
Description
Context
The pkgs/by-name CI check gets the nixpkgs-check-by-name tooling, which lives in Nixpkgs itself, from the latest NixOS channel. This makes CI very fast and predictable for all PRs, because it's able to re-use the pre-built tooling from Hydra.
Problem
However, we have a problem if we want to change the tool:
Say we increase the strictness of the tool with a PR, while fixing all the problems it newly detects in the same PR. But now we have to wait for perhaps days until the NixOS channel updates until the new tool is used in CI. In that timespan, new problems could've been introduced without being detected.
Proposed solution
To solve this I propose to temporarily adjust CI for every strictness increase in the tool as follows:
- In addition to the latest NixOS channel version of the tool, also use a version that is pinned to the then-latest NixOS channel revision at the time of the tool change
- The two versions of the tool are then used to determine whether the check should succeed or not as follows:
-
If the latest tool succeeds on the base branch of the PR, it must also succeed for the PR.
This indicates that the pinned tooling isn't necessary anymore, a message is traced that the pin can get removed
-
Otherwise, if the pinned tool succeeds on the base branch, it must also succeed for the PR.
This indicates that the base branch needs to be fixed for the new tooling. The logs will contain the failures of the latest tool.
-
Otherwise, either the pinned or the latest tool must succeed for the PR
This indicates that the base branch is broken, either due to checks of a PR being ignored, or the PR being merged after the checks have changed.
In this state we don't know whether the base branch already succeeded with the latest tool, so a PR can pass if it fixes the breakage using either version.
-
Once the channel updates the next time after the tooling update is merged, another PR can be made to fix any remaining problems. Repeat for some time until no new problems are introduced from PR's that were merged without running CI again.
This could also benefit from some automation to rerun PR checks if it's been say 1 week, which could then be used as the time window to be sure of no PR's still having old checks.
And in a final PR, once the base branch is definitely working with the new check, the temporary pinned version can be disabled again, only using the latest NixOS channel version of the tool once more.
Having thought through this, I think it's about as smooth as it can get, and it sounds generally useful for all CI changes.
I'd love to hear if there are other proposals to handle this though. In the end I think we need something like this for RFC 140, because we'll have a lot of PR's affected.