🐛 Bug
After running a unit test (either a test suite or a single test), the terminal hangs. ctrl-C doesn't help as well.
Need to ctrl-z and manually kill the process running the test.
To Reproduce
On the HEAD (01/18), run root@xiowei-gpu:/ansible# GPU_NUM_DEVICES=4 PJRT_DEVICE=CUDA python pytorch/xla/test/pjrt/test_runtime_gpu.py TestExperimentalPjrtGpu.test_multi_gpu_devices would be able to reproduce it.
Expected behavior
It shouldn't hang. Previously, the expected behavior is;
root@xiowei-gpu:/ansible# GPU_NUM_DEVICES=4 PJRT_DEVICE=CUDA python pytorch/xla/test/pjrt/test_runtime_gpu.py TestExperimentalPjrtGpu.test_multi_gpu_devices
Running tests under Python 3.8.18: /usr/local/bin/python
[ RUN ] TestExperimentalPjrtGpu.test_multi_gpu_devices
[ OK ] TestExperimentalPjrtGpu.test_multi_gpu_devices
----------------------------------------------------------------------
Ran 1 test in 4.433s
OK
root@xiowei-gpu:/ansible#
Environment
- Reproducible on XLA backend [CPU/TPU]: GPU
- torch_xla version: nightly
Additional context
🐛 Bug
After running a unit test (either a test suite or a single test), the terminal hangs. ctrl-C doesn't help as well.
Need to ctrl-z and manually kill the process running the test.
To Reproduce
On the HEAD (01/18), run
root@xiowei-gpu:/ansible# GPU_NUM_DEVICES=4 PJRT_DEVICE=CUDA python pytorch/xla/test/pjrt/test_runtime_gpu.py TestExperimentalPjrtGpu.test_multi_gpu_deviceswould be able to reproduce it.Expected behavior
It shouldn't hang. Previously, the expected behavior is;
Environment
Additional context