Skip to content

A strategy for evolving the pkgs/by-name CI checks #256788

@infinisil

Description

@infinisil

Context

The pkgs/by-name CI check gets the nixpkgs-check-by-name tooling, which lives in Nixpkgs itself, from the latest NixOS channel. This makes CI very fast and predictable for all PRs, because it's able to re-use the pre-built tooling from Hydra.

Problem

However, we have a problem if we want to change the tool:
Say we increase the strictness of the tool with a PR, while fixing all the problems it newly detects in the same PR. But now we have to wait for perhaps days until the NixOS channel updates until the new tool is used in CI. In that timespan, new problems could've been introduced without being detected.

Proposed solution

To solve this I propose to temporarily adjust CI for every strictness increase in the tool as follows:

  • In addition to the latest NixOS channel version of the tool, also use a version that is pinned to the then-latest NixOS channel revision at the time of the tool change
  • The two versions of the tool are then used to determine whether the check should succeed or not as follows:
    • If the latest tool succeeds on the base branch of the PR, it must also succeed for the PR.

      This indicates that the pinned tooling isn't necessary anymore, a message is traced that the pin can get removed

    • Otherwise, if the pinned tool succeeds on the base branch, it must also succeed for the PR.

      This indicates that the base branch needs to be fixed for the new tooling. The logs will contain the failures of the latest tool.

    • Otherwise, either the pinned or the latest tool must succeed for the PR

      This indicates that the base branch is broken, either due to checks of a PR being ignored, or the PR being merged after the checks have changed.

      In this state we don't know whether the base branch already succeeded with the latest tool, so a PR can pass if it fixes the breakage using either version.

Once the channel updates the next time after the tooling update is merged, another PR can be made to fix any remaining problems. Repeat for some time until no new problems are introduced from PR's that were merged without running CI again.

This could also benefit from some automation to rerun PR checks if it's been say 1 week, which could then be used as the time window to be sure of no PR's still having old checks.

And in a final PR, once the base branch is definitely working with the new check, the temporary pinned version can be disabled again, only using the latest NixOS channel version of the tool once more.


Having thought through this, I think it's about as smooth as it can get, and it sounds generally useful for all CI changes.

I'd love to hear if there are other proposals to handle this though. In the end I think we need something like this for RFC 140, because we'll have a lot of PR's affected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    6.topic: continuous integrationAffects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions