Conversation
|
cuda 12.0 is also released. Will we see a build for cuda-12? |
|
@ptrblck change is required here as : https://github.com/pytorch/pytorch/blob/master/.github/workflows/docker-builds.yml#L36 in order to add these configs to workflow |
There was a problem hiding this comment.
more changes are required for 11.8 we need to add cuda 11.8 change similar to 11.7 here : https://github.com/pytorch/pytorch/blob/master/.github/workflows/trunk.yml#L71
I think we need followup PR for this one this one with docker is merged
|
Please note CUDA 11.8 failure was adressed here #92264 |
|
@pytorchbot merge -f "Failures for 11.8 where resolved" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| test-matrix: | | ||
| { include: [ | ||
| { config: "default", shard: 1, num_shards: 3, runner: "windows.g5.4xlarge.nvidia.gpu" }, | ||
| { config: "default", shard: 2, num_shards: 3, runner: "windows.g5.4xlarge.nvidia.gpu" }, | ||
| { config: "default", shard: 3, num_shards: 3, runner: "windows.g5.4xlarge.nvidia.gpu" }, | ||
| { config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" }, | ||
| ]} |
There was a problem hiding this comment.
All three windows tests hards are failing, with errors like the following. Can you please take a look? Cuda doesn't seem to be installed correctly on the windows boxes
I think we'll need to disable these tests for now since they're failing consistently on periodic
2023-01-24T03:25:03.2580503Z ERROR (0.006s)
2023-01-24T03:25:03.2580979Z test_multihead_attention_dtype_batch_first_cuda_float16 (__main__.TestMultiheadAttentionNNDeviceTypeCUDA) ... test_multihead_attention_dtype_batch_first_cuda_float16 errored - num_retries_left: 2
2023-01-24T03:25:03.2581378Z Traceback (most recent call last):
2023-01-24T03:25:03.2581890Z File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 414, in instantiated_test
2023-01-24T03:25:03.2582234Z raise rte
2023-01-24T03:25:03.2582586Z File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 401, in instantiated_test
2023-01-24T03:25:03.2582948Z result = test(self, **param_kwargs)
2023-01-24T03:25:03.2583378Z File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 1010, in only_fn
2023-01-24T03:25:03.2583719Z return fn(slf, *args, **kwargs)
2023-01-24T03:25:03.2584077Z File "C:\actions-runner\_work\pytorch\pytorch\test\nn\test_multihead_attention.py", line 640, in test_multihead_attention_dtype_batch_first
2023-01-24T03:25:03.2584496Z model = nn.MultiheadAttention(embed_dim, num_heads, batch_first=True).cuda().to(dtype)
2023-01-24T03:25:03.2584961Z File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\nn\modules\module.py", line 1132, in to
2023-01-24T03:25:03.2585289Z return self._apply(convert)
2023-01-24T03:25:03.2585661Z File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\nn\modules\module.py", line 784, in _apply
2023-01-24T03:25:03.2586057Z module._apply(fn)
2023-01-24T03:25:03.2586468Z File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\nn\modules\module.py", line 807, in _apply
2023-01-24T03:25:03.2586789Z param_applied = fn(param)
2023-01-24T03:25:03.2587196Z File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\nn\modules\module.py", line 1130, in convert
2023-01-24T03:25:03.2587602Z return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2023-01-24T03:25:03.2587932Z RuntimeError: CUDA error: no kernel image is available for execution on the device
2023-01-24T03:25:03.2588367Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2023-01-24T03:25:03.2588716Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2023-01-24T03:25:03.2589013Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
https://hud.pytorch.org/pytorch/pytorch/commit/d8aa68c683bdf31f237bffb734b6038bc4f63898
There was a problem hiding this comment.
@ZainRizvi looking into this issue, was there AMI update recently ? I don't see any failures on https://github.com/pytorch/pytorch/actions/runs/3961152467/jobs/6788616724
These periodic tests were introduced in #92137 They've been consistently failing on trunk, so disabling them until they're fixed. Sample failures: https://hud.pytorch.org/pytorch/pytorch/commit/d8aa68c683bdf31f237bffb734b6038bc4f63898 Pull Request resolved: #92902 Approved by: https://github.com/malfet
Fixes #92090
CC @atalman