Add timeout to pytest command#1082
Conversation
There were two instances recently (below) where some Python test errors caused the `conda-python-tests` job to run/hang for ~4 hours. - rapidsai#981 (comment) - rapidsai#1081 (comment) To prevent this from happening again in the future, I've added a reasonable timeout of 45 minutes to that particular job. The job usually takes ~25 minutes to complete, so 45 minutes should be plenty. This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for `arm` nodes).
|
the It throws the error here: https://github.com/rapidsai/dask-cuda/actions/runs/3914358886 That's very unfortunate. |
|
I thought there always was a timeout, wasn't that the case before? |
apparently `timeout-minutes` isn't supported on reusable workflows
There is a default timeout of 6 hours for GH Action workflows. Generally you can override this at the job level, but it doesn't seem to be supported for reusable workflows (e.g. these reusable workflows). I could add a timeout to the reusable workflow itself, but that wouldn't be ideal since all of our repositories use these reusable workflows and their runtimes may be different. I was hoping to be able to set it at the repo level. I think an adequate workaround is to just prefix |
this should still be plenty of time. a successful job takes ~25 minutes in it's entirety, so `30m` for just the `pytest` command seems sufficient [skip ci]
conda-python-tests jobpytest command
|
admin merging w/ permission from @charlesbluca while they investigate the underlying issue of the tests hanging. |
There were two instances recently (below) where some Python test errors caused the
conda-python-testsjob to run/hang for ~4 hours.--partition-distribution#1081 (comment)To prevent this from happening again in the future, I've added a reasonable timeout of
45 minutes to that particular job30 minutes to thepytestcommand.The job usually takes ~25 minutes to complete entirely, so 30 minutes just for
pytestshould be plenty.This timeout will help prevent jobs from hanging and thus help preserve our finite GPU capacity for CI (particularly for
armnodes).