Skip to content

[ROCm] Increase binary build timeout to 5 hours (300 minutes)#163776

Closed
jithunnair-amd wants to merge 2 commits intopytorch:mainfrom
ROCm:increase_rocm_binary_build_timeout
Closed

[ROCm] Increase binary build timeout to 5 hours (300 minutes)#163776
jithunnair-amd wants to merge 2 commits intopytorch:mainfrom
ROCm:increase_rocm_binary_build_timeout

Conversation

@jithunnair-amd
Copy link
Collaborator

@jithunnair-amd jithunnair-amd commented Sep 24, 2025

Despite narrowing down the FBGEMM_GENAI build to gfx942, the nightly builds still timed out because they didn't get enough time to finish the post-PyTorch-build steps.

This PR increases timeout for ROCm builds for both libtorch and manywheel, because both of those are close to the 4hr mark currently.

This PR is a more ROCm-targeted version of #162880 (which is for release/2.9 branch).

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163776

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 5f89995 with merge base 4c2c401 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Sep 24, 2025
@jeffdaily jeffdaily marked this pull request as ready for review September 24, 2025 18:22
@jeffdaily jeffdaily requested a review from a team as a code owner September 24, 2025 18:22
@jeffdaily jeffdaily added the ciflow/binaries_libtorch Trigger binary build and upload jobs for libtorch on the PR label Sep 24, 2025
@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "just a workflow timeout increase for rocm, no further testing needed, no more timeouts"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

jainapurva pushed a commit that referenced this pull request Sep 29, 2025
Despite narrowing down the [FBGEMM_GENAI build to gfx942](#162648), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897).

This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026)and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041), because both of those are close to the 4hr mark currently.

This PR is a more ROCm-targeted version of #162880 (which is for release/2.9 branch).

Pull Request resolved: #163776
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
@Camyll
Copy link
Contributor

Camyll commented Oct 6, 2025

@pytorchbot cherry-pick --onto release/2.9 --c critical

pytorchbot pushed a commit that referenced this pull request Oct 6, 2025
Despite narrowing down the [FBGEMM_GENAI build to gfx942](#162648), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897).

This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026)and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041), because both of those are close to the 4hr mark currently.

This PR is a more ROCm-targeted version of #162880 (which is for release/2.9 branch).

Pull Request resolved: #163776
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
(cherry picked from commit 0ec946a)
@pytorchbot
Copy link
Collaborator

Cherry picking #163776

The cherry pick PR is at #164770 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

atalman pushed a commit that referenced this pull request Oct 6, 2025
[ROCm] Increase binary build timeout to 5 hours (300 minutes) (#163776)

Despite narrowing down the [FBGEMM_GENAI build to gfx942](#162648), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897).

This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026)and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041), because both of those are close to the 4hr mark currently.

This PR is a more ROCm-targeted version of #162880 (which is for release/2.9 branch).

Pull Request resolved: #163776
Approved by: https://github.com/jeffdaily


(cherry picked from commit 0ec946a)

Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_libtorch Trigger binary build and upload jobs for libtorch on the PR ciflow/rocm Trigger "default" config CI on ROCm Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants