Skip to content

CI Add workflow to update lock files#27622

Merged
glemaitre merged 22 commits intoscikit-learn:mainfrom
lesteve:update-lock-files-bot
Oct 27, 2023
Merged

CI Add workflow to update lock files#27622
glemaitre merged 22 commits intoscikit-learn:mainfrom
lesteve:update-lock-files-bot

Conversation

@lesteve
Copy link
Copy Markdown
Member

@lesteve lesteve commented Oct 19, 2023

Reference Issues/PRs

Towards #22425

What does this implement/fix? Explain your changes.

This adds a github workflow to manually update the lock files. Right now this will open three PRs:

  • one that update all the lock files for the main CIs. This is expected to pass most of the times (or at least not fail very often)
  • one for scipy-dev. This may fail a bit more often, most of the time not because of the lock file update, but because of a change in our dev dependencies.
  • one for CirrusCI arm tests
  • one for PyPy. I can remove this one since the PyPy build is not in a good state right now.

It adds only a manual trigger (workflow_dispatch) for now but eventually it will be a cron job that updates lock files every one or two weeks.

Another thing that can be done when we are more confident about the bot, is that PRs can be auto-merged: https://github.com/peter-evans/create-pull-request#auto-merge.

This is the kind of PR that is hard to test without it getting merged first.

I did some basic testing in my fork, you can see some PRs created by the bot:
https://github.com/lesteve/scikit-learn/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc

I drew some inspiration from Scitools/iris since they have a similar setup #22425 (comment)

@lesteve lesteve changed the title CI Add CI Add workflow to update lock files Oct 19, 2023
@github-actions
Copy link
Copy Markdown

github-actions bot commented Oct 19, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 2d8d362. Link to the linter CI: here

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this PR. It is exactly what I was dreaming of ;)

So now there is 2 things that we can think of:

  • Having a weekly CRON job for main. IMO, it could straight in a new PR and merge (we can keep it out of this PR if someone wants to try the action first).
  • Adding an auto-merge of the PR if the CI does not fail. I think that we can wait for some time to see how the manual step is going but we can trigger the option quite soon.

@glemaitre
Copy link
Copy Markdown
Member

@lesteve Could solve the conflicts.

Do you think that we could already have the auto-merge but commented with a comment?

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Oct 23, 2023

Do you think that we could already have the auto-merge but commented with a comment?

I would leave auto-merge for later for now and do it manually one or two times to make sure everything works as expected.

Adding auto-merge would be something like this:

- name: Enable Pull Request Automerge
      run: gh pr merge --merge --auto "1"
      env:
        GH_TOKEN: ${{ secrets.PAT }}

For this to work we will need to create a bot that has admin privileges so that it can merge PRs. There probably needs to be an auto-approval too, so that required checks are met (we require one approval right now).

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Oct 23, 2023

There probably needs to be an auto-approval too, so that required checks are met (we require one approval right now).

I don't think we can auto-approve our own PRs, but it's possible for a maintainer to override the review checks.

Copy link
Copy Markdown
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I just have a few questions:

  • what would be the github account set as author of the automatically opened PR (not the committer). On your fork it's @lesteve. Will this be scikit-learn (the org) on the main repo? I am fine with that but it would be implicit.

Some inline questions and comments below:

author: "Lock file bot <noreply@github.com>"
delete-branch: true
branch: auto-update-lock-files-${{ matrix.name }}
title: ":lock: :robot: CI Update lock files for ${{ matrix.name }} CI build(s) :lock: :robot:"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens at that step if the diff is empty? For instance where the lock files are already up to date?

  • does it create an empty commit PR?
  • does the workflow fail?

Ideally, I would like neither option but a message that says that there are no changes to commit and therefore no pull request has been created.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No PR is created if there are no changes and the workflow does not fail.

The peter-evans/create-pull-request hash some log that you can look at with all the details:

  • build log existing PR which needs to be updated you see at one point
Pushing pull request branch to 'origin/auto-update-lock-files-scipy-dev'
  • build log no change needed, no PR created you don't see the Pushing pull request branch step

each
for each in filtered_conda_build_metadata_list
if not re.search(skip_build, each["build_name"])
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filtered_pip_build_metadata_list should be filtered the same way below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Oct 24, 2023

what would be the github account set as author of the automatically opened PR (not the committer).

I have used put the scikit-learn-bot token (the one that automatically opens an issue when the CI fails) so that will be the PR author.

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Oct 24, 2023

I don't think we can auto-approve our own PRs, but it's possible for a maintainer to override the review checks.

Yeah I thought about that, this needs to be looked into in more details, there is a chance that a pull_request_target workflow with the relevant rights may work for the auto-approval + auto-merge.

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Oct 24, 2023

So by testing a bit more in my fork, I realised that there were some issues when the PR is created by another user. Right now, I see two main solutions:

  • give scikit-learn-bot Admin rights i.e. move it to the privileged-bots team. This will push to a branch in the scikit-learn main repo and do a PR from it.
  • do a PR from scikit-learn-bot/scikit-learn as a normal user using push-to-fork, but then scikit-learn-bot needs to fork scikit-learn/scikit-learn so that scikit-learn-bot/scikit-learn exists, not sure who created this account maybe @thomasjpfan? I have this working in my fork https://github.com/lesteve/scikit-learn/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc where PRs are created from test-lesteve/scikit-learn.

There may be other work-arounds of course.

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Oct 27, 2023

but then scikit-learn-bot needs to fork scikit-learn/scikit-learn so that scikit-learn-bot/scikit-learn exists, not sure who created this account maybe @thomasjpfan?

I found a work-around to create scikit-learn-bot/scikit-learn with the GITHUB_BOT_TOKEN from the CI so I think all is good for this PR to be merged and tested for real!

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so let's merge and make a live-test.

@glemaitre glemaitre merged commit 5fc67ae into scikit-learn:main Oct 27, 2023
@glemaitre
Copy link
Copy Markdown
Member

I launched the workflow here: https://github.com/scikit-learn/scikit-learn/actions/runs/6666949772

@lesteve lesteve deleted the update-lock-files-bot branch October 27, 2023 13:31
@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Oct 27, 2023

So the workflow does not work because on the push to scikit-learn-bot/scikit-learn it sees there is a new worfklow update-lock-files.yml compared to what is already on the fork. The push is rejected because the token does not have the workflow scope (it is very likely to have only public_repo):

Pushing pull request branch to 'fork/auto-update-lock-files-cirrus-arm'
  /usr/bin/git push --force-with-lease fork auto-update-lock-files-cirrus-arm:refs/heads/auto-update-lock-files-cirrus-arm
  To https://github.com/test-lesteve/scikit-learn
   ! [remote rejected]     auto-update-lock-files-cirrus-arm -> auto-update-lock-files-cirrus-arm (refusing to allow a Personal Access Token to create or update workflow `.github/workflows/test-workflow.yml` without `workflow` scope)

There is more explanation there

The easiest fix is to add the workflow scope to BOT_GITHUB_TOKEN but I think @thomasjpfan is the only one who can do this.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Oct 30, 2023

Rather than granting elevated token privileges to the third-party managed action (peter-evans/create-pull-request in this case), I would rather grant it to a step that uses the GitHub maintained gh CLI: gh pr create in this case.

The gh command should be installed by default in github actions. The only contraint is that it can only be used in workflow that has an actions/checkout step but this is already the case here.

EDIT: or alternatively use the github API manually via our own Python script, similarly to what is done for maint_tools/update_tracking_issue.py.

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Oct 30, 2023

I am pretty sure irrespective of the method you use, you will still need the workflow scope if the bots pushes into its fork branch and there is a new workflow that the fork has not seen before. This is the case for this error since update-lock-file.yml is seen as a new workflow and the push is rejected.

Updating scikit-learn-bot/scikit-learn to main could make the bot work for a while until a new workflow is added in scikit-learn/scikit-learn main branch.

I don't think this is likely to be a security issue as the workflow scope is granted on the scikit-learn-bot account and does not allow to trigger workflow on the scikit-learn/scikit-learn repo.

RUrlus pushed a commit to RUrlus/scikit-learn that referenced this pull request Oct 30, 2023
@thomasjpfan
Copy link
Copy Markdown
Member

The scikit-learn-bot user account does not have any elevated permissions to the scikit-learn repo so I think it's safe to give peter-evans/create-pull-request the workflow scope.

I gave the current token the workflow permission and synced the scikit-learn-bot/scikit-learn fork with upstream.

@lesteve
Copy link
Copy Markdown
Member Author

lesteve commented Oct 30, 2023

Nice, thanks a lot!

I ran the workflow manually, looks like the first actions have run fine and some pull requests created 🎉

@glemaitre
Copy link
Copy Markdown
Member

Nice. Thanks @thomasjpfan and @lesteve

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 31, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants