https://app.circleci.com/pipelines/github/pytorch/pytorch/262044/workflows/b59d8f66-4081-46a9-83c6-ccba47867226/jobs/10274257/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/262004/workflows/be1286ee-9dfd-4428-a358-9cf6d72cf677/jobs/10274202/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/261991/workflows/8ee717cb-99ab-4470-b49f-2da2f3e8dc68/jobs/10274240/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/261958/workflows/ea603c98-20ab-471d-8647-d92f350354ac/jobs/10272278/steps
Jan 20 21:35:02 ======================================================================
Jan 20 21:35:02 ERROR [100.125s]: test_custom_stream_multi (__main__.TensorPipeTensorPipeAgentRpcTestWithSpawn)
Jan 20 21:35:02 ----------------------------------------------------------------------
Jan 20 21:35:02 Traceback (most recent call last):
Jan 20 21:35:02 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 282, in wrapper
Jan 20 21:35:02 self._join_processes(fn)
Jan 20 21:35:02 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 399, in _join_processes
Jan 20 21:35:02 self._check_return_codes(elapsed_time)
Jan 20 21:35:02 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 440, in _check_return_codes
Jan 20 21:35:02 raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time))
Jan 20 21:35:02 RuntimeError: Process 0 terminated or timed out after 100.07110738754272 seconds
Jan 20 21:28:49 test_custom_stream_multi (__main__.TensorPipeTensorPipeAgentRpcTestWithSpawn) ... [E thread_pool.cpp:112] Exception in thread pool task: CUDA error: invalid device ordinal
Jan 20 21:28:49 Exception raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:31 (most recent call first):
Jan 20 21:28:49 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f2d42e7d50b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jan 20 21:28:49 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7f2d42e6de8e in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jan 20 21:28:49 frame #2: <unknown function> + 0x547d21 (0x7f2d5c7d7d21 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jan 20 21:28:49 frame #3: std::_Function_handler<void (), at::cuda::CUDAFuture::wrapCallback(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) + 0x3c1 (0x7f2d5cd0aed1 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jan 20 21:28:49 frame #4: <unknown function> + 0xa6a661 (0x7f2d5ccfa661 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jan 20 21:28:49 frame #5: c10::ThreadPool::main_loop(unsigned long) + 0x2b3 (0x7f2d42e5a593 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jan 20 21:28:49 frame #6: <unknown function> + 0xc819d (0x7f2d6016219d in /opt/conda/lib/libstdc++.so.6)
Jan 20 21:28:49 frame #7: <unknown function> + 0x76ba (0x7f2d962f06ba in /lib/x86_64-linux-gnu/libpthread.so.0)
Jan 20 21:28:49 frame #8: clone + 0x6d (0x7f2d960264dd in /lib/x86_64-linux-gnu/libc.so.6)
Jan 20 21:28:49
Jan 20 21:30:08 Timing out after 100 seconds and killing subprocesses.
Jan 20 21:30:08 ERROR (100.125s)
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @jjlilley @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd
https://app.circleci.com/pipelines/github/pytorch/pytorch/262044/workflows/b59d8f66-4081-46a9-83c6-ccba47867226/jobs/10274257/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/262004/workflows/be1286ee-9dfd-4428-a358-9cf6d72cf677/jobs/10274202/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/261991/workflows/8ee717cb-99ab-4470-b49f-2da2f3e8dc68/jobs/10274240/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/261958/workflows/ea603c98-20ab-471d-8647-d92f350354ac/jobs/10272278/steps
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @jjlilley @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd