-
Notifications
You must be signed in to change notification settings - Fork 27.4k
CI: rocm (default, 1, 3, linux.rocm.gpu) is very slow #110181
Copy link
Copy link
Closed
Labels
module: ciRelated to continuous integrationRelated to continuous integrationmodule: devxRelated to PyTorch contribution experience (HUD, pytorchbot)Related to PyTorch contribution experience (HUD, pytorchbot)module: rocmAMD GPU support for PytorchAMD GPU support for PytorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Current Status
ongoing
Issue looks like
Taking ~2.5 hours on #110167

Taking 3.5+ hours on #109976

User impact
Slower merging
Root cause
Seems like it may have been introduced in #109817 @malfet

Mitigation
Not sure
Prevention/followups
Investigate cause of slow running time or split up tests into smaller test jobs. Try to make the tests run in similar time to CUDA tests (~1.5 hours)
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @seemethere @malfet @pytorch/pytorch-dev-infra @ZainRizvi @kit1980 @huydhn @clee2000
### Tasks
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
module: ciRelated to continuous integrationRelated to continuous integrationmodule: devxRelated to PyTorch contribution experience (HUD, pytorchbot)Related to PyTorch contribution experience (HUD, pytorchbot)module: rocmAMD GPU support for PytorchAMD GPU support for PytorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
Done
Status
Done