-
Notifications
You must be signed in to change notification settings - Fork 27.5k
UNSTABLE trunk / linux-jammy-rocm-py3.10 / test (distributed) #177301
Copy link
Copy link
Closed
Labels
bot-triagedThis is a label only to be used by the auto triage botThis is a label only to be used by the auto triage botmodule: ciRelated to continuous integrationRelated to continuous integrationmodule: rocmAMD GPU support for PytorchAMD GPU support for Pytorchmodule: testsIssues related to tests (not the torch.testing module)Issues related to tests (not the torch.testing module)triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleunstable
Metadata
Metadata
Assignees
Labels
bot-triagedThis is a label only to be used by the auto triage botThis is a label only to be used by the auto triage botmodule: ciRelated to continuous integrationRelated to continuous integrationmodule: rocmAMD GPU support for PytorchAMD GPU support for Pytorchmodule: testsIssues related to tests (not the torch.testing module)Issues related to tests (not the torch.testing module)triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleunstable
Type
Projects
Status
Done
Status
Done
a lot of timeouts are being observed on the trunk distributed jobs for ROCm: https://hud.pytorch.org/hud/pytorch/pytorch/6775069391cb18f988ad9f5b0676b398071b1fb8/1?per_page=50&name_filter=trunk.*rocm.*distributed&useRegexFilter=true&mergeEphemeralLF=true
Marking it as unstable until we get it back under control
cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @seemethere @malfet @pytorch/pytorch-dev-infra @mruberry