Skip to content

Add timeout-minutes to the lint PR format job#9499

Closed
ahukkanen wants to merge 9 commits intodecidim:developfrom
mainio:fix/lint-pr-format-timeout
Closed

Add timeout-minutes to the lint PR format job#9499
ahukkanen wants to merge 9 commits intodecidim:developfrom
mainio:fix/lint-pr-format-timeout

Conversation

@ahukkanen
Copy link
Copy Markdown
Contributor

🎩 What? Why?

I noticed the lint PR format needs to be restarted quite often because the job queue is full and it gets tired of waiting. For this job we are not currently defining the timeout-minutes configuration as the other jobs.

GitHub documentation says the default timeout should be at 320 minutes but it seems it's actually being cancelled after 10-15 minutes.

This should fix that issue.

Testing

See that CI is green.

📋 Checklist

  • CONSIDER adding a unit test if your PR resolves an issue.
  • ✔️ DO check open PR's to avoid duplicates.
  • ✔️ DO keep pull requests small so they can be easily reviewed.
  • ✔️ DO build locally before pushing.
  • ✔️ DO make sure tests pass.
  • ✔️ DO make sure any new changes are documented in docs/.
  • ✔️ DO add and modify seeds if necessary.
  • ✔️ DO add CHANGELOG upgrade notes if required.
  • ✔️ DO add to GraphQL API if there are new public fields.
  • ✔️ DO add link to MetaDecidim if it's a new feature.
  • AVOID breaking the continuous integration build.
  • AVOID making significant changes to the overall architecture.

@ahukkanen ahukkanen added target: developer-experience type: internal PRs that aren't necessary to add to the CHANGELOG for implementers labels Jun 28, 2022
@ahukkanen ahukkanen marked this pull request as draft June 28, 2022 14:22
@ahukkanen
Copy link
Copy Markdown
Contributor Author

It still seems to be cancelled for this PR, so I need to investigate further.

@ahukkanen
Copy link
Copy Markdown
Contributor Author

It seems this is not related to the timeout minutes, it just keeps happening randomly. It can even happen when the job queue is not that full but most of the times it happens when this action is waiting in the queue for some minutes (although it happened sometimes even only after some tens of seconds).

I tried different debugging attempts but I could not figure out a conclusion. The things that I tried:

  • Playing with the timeout-minutes for the only job but this does not seem to have any effect because the job is cancelled before it has even started.
  • Adding the DEBUG flag to get some debug logs from rokroskar/workflow-run-cleanup-action but this does not work because the workflow is never even started and it is cancelled already before it reaches this step.
  • Removing the synchronize type from the event types but then we won't have the job status available when new pushes happen for the PR + I noticed many times this is actually cancelled when the PR is opened. The assumption here was if it is somehow related to the fact that the "push" actions won't trigger the action and I wanted to test if it is somehow thinking it needs to cancel this job because push actions shouldn't trigger it (as this is the only difference in this regard with the other actions).
  • Looking through the workflow runs from the GitHub API if there was any extra information available there but there wasn't anything relevant. The cancelled runs just have "status": "completed" and "conclusion": "cancelled" in the API.
  • Tried to search if some other people were experiencing this particular issue but the only answer I found eventually was just "it must have been a temporary issue with GitHub". There were some people running their own workers saying that there are some cases where the workflow run is cancelled when it fails to either find a worker for the job or when the worker creation fails for one reason or another. But there's not much information available about these for us (and no logs about the cancelled run), so I guess we just need to keep on guessing. It would also seem odd that this doesn't happen with any of the other actions.

Therefore, I will close this one with the following notes:

  • This is likely not related to the action's configuration
  • This can be some internal problem with GitHub but we have no information available about that
  • If someone wants to continue from this in the future, I guess the next step would be to ask any information from GitHub support (and link the particular workflow runs that were automatically cancelled, I won't link any here because they will disappear at some point)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

target: developer-experience type: internal PRs that aren't necessary to add to the CHANGELOG for implementers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant