Discussion of the consistency checks

# Background

Currently, `astropy` maintains three primary branches:

1. The `main` (development) branch.
2. The current bugfix branch (currently `5.3.x`).
3. The LTS branch (currently `5.0.x`); however, the acceptance of [APE21](https://github.com/astropy/astropy-APEs/pull/82) will end this soon.

For the vast majority of changes, they are PRs made against the `main` branch unless they are specific to an issue only present in the other two. We then mark these PRs in the following ways:

- If those changes are determined to be relevant to the other branches, we attempt to use the `meekseeks` bot to "backport" them to those branches (and then manually backport them if the bot fails). Those backports are controlled via setting specific labels on the PR in question, which then tell `meeseeks` if/where to backport the PR. It does this by opening a PR against the backport branches.
- Somewhat separately, we also try (and sometimes fail) to set milestones on the PRs. Those milestones are supposed to correspond to the oldest (lowest) version of `astropy` we expect the change to appear in. For example
	- if it is an LTS bugfix, it would currently get `5.0.7`.
	- if it was only going to the bugfix branch it would currently get `5.3.1`.
	- If it was a new feature and not a bugfixe it would currently get `6.0.0`.

  Essentially, these milestones are used to track where/when a particular change will or has appeared in `astropy`.

Clearly, these to processes are related e.g.

- If something has a `backport-v5.0.x` label, we would currently expect it to have a `5.0.7` milestone and vice versa.
- If something only has a `backport-v5.3.x` label, we would expect it to have a `5.3.1` milestone and vice versa.
- If something does not have a backport label, we would expect it to have a `6.0.0` milestone and vice versa.

However, they are currently two separate unlinked steps we expect (hope) our maintainers to perform sometime before merging a pull request. This tends to introduce human errors, which tend to undermine consistency the checks usefulness (assuming they are working).

# Consistency Checks

It is my understanding that the consistency checks are intended to ensure that all the changes we intend to first appear in a given `astropy` version actually appear in that version. By this I mean:

1. For "backport" versions, all the backport pull requests opened by the bot (or manually created when it fails) are merged.
2. The backport bot has actually run on all PRs intended to be backported to the backport version in question. This is because the bot can sometimes be "down" and not get triggered; meaning, it does not create the backport PR nor does it mark the PR in question for manual backport.
3. For all versions, the PRs marked with a given milestone have been merged. Note that the backport bot will set the milestone on its PR to be the same as the origin PR's.

I understand that the code supporting the consistency checks is both a bit flaky (sometimes it is just broken) and can be tricked by PRs not having the correct labels/milestones set. Both these issues tend to create "problems" for determining if a branch is ready to have its release made.

# Possible PR Level Changes to Alleviate Consistency Check Issues

Most of my ideas revolve around preventing the "human" errors that will make it difficult for any consistency check solution to be completely accurate. That is making sure the backport labels and milestones are in some sort of agreement.

I propose that we add a required CI check to our current PR CI, which does two things:

1. Requires a "backport" related label to be set. For the case that the change will not be backported, I propose a `no-backport` label be created to fill this purpose.
2. Uses the label added by the maintainers to "compute" (determine) what the milestone for the PR should be.
3. If a milestone is already present, check that the computed and listed milestones match (fail if they do not). If no milestone is present set the milestone on the PR. (Note in the failure case, in addition to updating the milestone to be correct, the maintainer will need to manually re-trigger the CI job).

This accomplishes the goal of having the right milestone set on PRs which is consistent with its backport status, and since it would be required this will "remind" maintainers to make and execute these decisions prior to merging the PR.  

## Known issues with this process

### Human caused issues:
1. GitHub currently does not have an event which can trigger CI if a milestone is set/changed. This is why I based all the CI on adding/modifying the backport label; however, this CI does not protect against the case were after the CI is run, someone updates only the milestone to something inconsistent with the currently specified label.
2. It also doesn't protect against the case where we "change our minds" after a PR has been merged, as the CI will no-longer be triggered by any changes to labels (or milestones for that matter). However, one can manually trigger the backport bot using a comment on a PR without touching its current milestone; however, the backport PR generated will have whatever milestone was set at the time of the trigger. Again this can cause milestone-backport inconsistencies.

The only totally general solution I have for these issues is that we make it clear in the maintainer's merging guide/workflow that they do not touch the milestone prior to merging the PR. In theory this should lessen the burden on maintainers because

1. They have only one (enforced) action they need to take in order to fit into the consistency process.
2. They no longer have to figure out what the current milestones are active/relevant, the CI will be able to do this.

Hopefully this lessoning of burdens will convince the maintainers to adopt the proposed alteration to their workflow. However, it could still lead to some cases of human error that we cannot catch (due to the problem that CI cannot be triggered by a milestone change).

Many of the human caused inconsistencies from these two issues can be solved by:

1. Ensuring the proposed CI is run on by the backport branches (the backport bot copies the labels and milestones of the originating PR).
2. Adding a third step to the CI on backport branches which specifically checks the milestone computed from the label(s) matches what we would expect from computing that label based on the current backport branch instead. This part should cover most of the cases where one can run around the CI checks in the original PR, but it does not cover the `main` branch from this and it does not cover the backport PRs from the issues described above.

Note that if GitHub ever enables event triggers from updating milestones I can cover off all of the possible human caused inconsistencies.

### Bot caused issues:

The other inconsistency which can appear is the case were the backport bot fails to even be triggered by merging a PR.

I believe it maybe possible to detect this by examining the PRs opened/merged on `main` with a given label and looking for a matching PR on the given backport branch. This would require that backport PRs have some identification linking them to the origin PR, such as having the PR number in the title or a commit message (both of which are done by the backport bot). Either of these facts can again be enforced on backport branches via CI (though it requires any human backports to mention the correct number).

## Some Specific Details

Note that I have examples of getting/checking/setting the milestone from a CI job; however, it does require the `pull_request_target` trigger (so it has enough permissions to do so). This does have some security implications, but I can do it without any externally maintained actions.

The main detail will be "computing" the milestone from the label. This will require the branch names, label titles, and milestone titles to be entirely consistent with one-another and in formulation in order to do this. I believe we already do follow specific rules for this, and we will need to be extra vigilant about continuing to do this. In addition, to this consistency we will also need to very careful to close milestones when they are completed and close backport labels (add the :zzz: to the beginning) as needed.

Note it is possible to handle the current three branch/milestone case even in the face of multiple backport labels (just have "order" the labels by the semantic version given and choose the first corresponding milestone). However, this should not be an issue once we begin following APE 21 (drop the LTS, meaning one backport branch and the main branch).

To actually compute the milestone from the label note:

1. The labels in question are either like `backport-v5.3.x` or `no-backport`. This means we know the milstone is `v5.3.something` or the latest (in semantic version terms) milestone.
2. If we are vigilant and close all completed milestones and don't add unrelated milestones which follow the PR milestone pattern (such as for issues or something), then we will only ever have one milestone beginning with `v5.3` for example and so that one can be selected.

This strict adherence to milestones will make cases where we need to push changes into the "next" milestone, because they cannot make it into the current one a bit awkward. However, I believe we can add a `future-va.b.c` milestone and manually update those PRs to that milestone then once the "old" milestone is completed (an `astropy` release occurs), we can simply drop the `future-` prefix. Note that the fact that the `future-va.b.c` will not match the currently computed milestones will actually protect the branches from merging those PRs (if they are updated) because the milestone will not match the computed one, causing a failure in a required CI job (similar protection can be ensured using a label such as `future`, which will always fail the CI job).

# The Actual Consistency Checks

The described PR level checks should (for the most part) ensure that we can reliably check the status of `astropy` with regards to the next release. Assuming, the backport bot has never had any issues, the constancy check itself is just pulling the milestone information and checking that all PRs with that milestone have been closed.

The issue with completely relying on this is the case when backports fail, this can happen in two ways:

1. The backport bot cannot automatically generate the backport PR, in which case a `Still Needs Manual Backport` label is added to the origin PR.
2. The backport bot fails to be triggered at all (possible but a rare occurrence). 

Since we can "rely" on the milestones set on the `main` branch PRs and we are enforcing a title/commit pattern on the backport branches we can detect both cases, and provide a list of PRs that need attention. This can be done for a given milestone by looking at all the closed PRs against `main` that have the given milestone and then checking if a PR with following the enforced title/commit pattern has been opened on the given backport branch. If that is the case, the PR can be flagged/listed as being inconsistent.

When APE 21 is enforced, we can even set/remove the `Still Needs Manual Backport` label on merged PRs to `main` during this process. This can be done by removing the label if the backport PR is detected (a manual backport was create) or adding the label if it does not already exist on the PR (the backport bot failed to run entirely).

# Conclusions

I think the above changes are all possible, I have (or had) CI proof of concepts implementing the majority of the required CI jobs. Though I have not put all the pieces together.

I also believe that this check "system" should detect virtually all the cases where a PR has not been merged to the correct branch prior to a release on that branch. The situations where it might be confused or fail would either need to be deliberately constructed or be a very complex and strange situation which the release managers should be made aware of.

Note that I may have missed key functions of the consistency checks as I have not actually used the current ones. The above conception evolved out of the consistency issues I explored automatically resolving in order to make managing some of the other projects I am involved with simpler (though I have not had time to fully implement it any where as of yet). Thus I maybe overlooking important `astropy` related details. Please comment below with anything I may have overlooked.

In any case, I would like to open discussions on how to tackle these issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussion of the consistency checks #14897

Background

Consistency Checks

Possible PR Level Changes to Alleviate Consistency Check Issues

Known issues with this process

Human caused issues:

Bot caused issues:

Some Specific Details

The Actual Consistency Checks

Conclusions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Discussion of the consistency checks #14897

Description

Background

Consistency Checks

Possible PR Level Changes to Alleviate Consistency Check Issues

Known issues with this process

Human caused issues:

Bot caused issues:

Some Specific Details

The Actual Consistency Checks

Conclusions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions