Skip to content

Add timeout to pytest command#1082

Merged
ajschmidt8 merged 5 commits intorapidsai:branch-23.02from
ajschmidt8:add-python-test-timeout
Jan 13, 2023
Merged

Add timeout to pytest command#1082
ajschmidt8 merged 5 commits intorapidsai:branch-23.02from
ajschmidt8:add-python-test-timeout

Conversation

@ajschmidt8
Copy link
Copy Markdown
Member

@ajschmidt8 ajschmidt8 commented Jan 13, 2023

There were two instances recently (below) where some Python test errors caused the conda-python-tests job to run/hang for ~4 hours.

To prevent this from happening again in the future, I've added a reasonable timeout of 45 minutes to that particular job 30 minutes to the pytest command.

The job usually takes ~25 minutes to complete entirely, so 30 minutes just for pytest should be plenty.

This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for arm nodes).

There were two instances recently (below) where some Python test errors caused the `conda-python-tests` job to run/hang for ~4 hours.

- rapidsai#981 (comment)
- rapidsai#1081 (comment)

To prevent this from happening again in the future, I've added a reasonable timeout of 45 minutes to that particular job.

The job usually takes ~25 minutes to complete, so 45 minutes should be plenty.

This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for `arm` nodes).
@ajschmidt8 ajschmidt8 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 13, 2023
@ajschmidt8 ajschmidt8 requested a review from a team as a code owner January 13, 2023 19:34
@ajschmidt8
Copy link
Copy Markdown
Member Author

the timeout-minutes key doesn't seem to be supported by reusable workflows.

It throws the error here: https://github.com/rapidsai/dask-cuda/actions/runs/3914358886

That's very unfortunate.

@pentschev
Copy link
Copy Markdown
Member

I thought there always was a timeout, wasn't that the case before?

apparently `timeout-minutes` isn't supported on reusable workflows
@github-actions github-actions bot added the gpuCI gpuCI issue label Jan 13, 2023
@ajschmidt8
Copy link
Copy Markdown
Member Author

I thought there always was a timeout, wasn't that the case before?

There is a default timeout of 6 hours for GH Action workflows.

Generally you can override this at the job level, but it doesn't seem to be supported for reusable workflows (e.g. these reusable workflows).

I could add a timeout to the reusable workflow itself, but that wouldn't be ideal since all of our repositories use these reusable workflows and their runtimes may be different. I was hoping to be able to set it at the repo level.

I think an adequate workaround is to just prefix pytest with timeout 45m.

this should still be plenty of time.

a successful job takes ~25 minutes in it's entirety, so `30m` for just the `pytest` command seems sufficient

[skip ci]
@ajschmidt8 ajschmidt8 changed the title Add timeout to conda-python-tests job Add timeout to pytest command Jan 13, 2023
@ajschmidt8
Copy link
Copy Markdown
Member Author

admin merging w/ permission from @charlesbluca while they investigate the underlying issue of the tests hanging.

@ajschmidt8 ajschmidt8 merged commit 1149257 into rapidsai:branch-23.02 Jan 13, 2023
@ajschmidt8 ajschmidt8 deleted the add-python-test-timeout branch January 13, 2023 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpuCI gpuCI issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants