Skip to content

ugh#65477

Closed
suo wants to merge 1 commit intogh/suo/451/basefrom
gh/suo/451/head
Closed

ugh#65477
suo wants to merge 1 commit intogh/suo/451/basefrom
gh/suo/451/head

Conversation

@suo
Copy link
Member

@suo suo commented Sep 22, 2021

Stack from ghstack:

Differential Revision: D31115936

[ghstack-poisoned]
@pytorch-probot
Copy link

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/4088b94bede60565c6ea67df4c608a8eeddc1eee/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
win-vs2019-cuda10.2-py3 ciflow/all, ciflow/cuda, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 22, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 4088b94 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu) (1/2)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-22T19:27:04.6466591Z RuntimeError: Expe...e, but found at least two devices, cuda:0 and cpu!
2021-09-22T19:27:04.6369047Z     return x.cpu() + y.cuda()
2021-09-22T19:27:04.6371320Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2021-09-22T19:27:04.6372732Z 
2021-09-22T19:27:04.6451065Z On WorkerInfo(id=2, name=worker2):
2021-09-22T19:27:04.6453665Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!',)
2021-09-22T19:27:04.6455571Z Traceback (most recent call last):
2021-09-22T19:27:04.6457957Z   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py", line 204, in _run_function
2021-09-22T19:27:04.6460157Z     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2021-09-22T19:27:04.6462934Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 5879, in _gpu_add_wrong_gpus
2021-09-22T19:27:04.6464960Z     return x.cpu() + y.cuda()
2021-09-22T19:27:04.6466591Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2021-09-22T19:27:04.6468069Z 
2021-09-22T19:27:05.2340430Z ok (6.227s)
2021-09-22T19:27:06.8521462Z   test_devices_option_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (1.618s)
2021-09-22T19:27:08.4699636Z   test_devices_option_mismatch_reverse (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (1.618s)
2021-09-22T19:27:15.2960548Z   test_owner_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (6.826s)
2021-09-22T19:27:24.6265551Z   test_owner_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (9.330s)
2021-09-22T19:27:33.6568417Z   test_owner_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (9.030s)
2021-09-22T19:27:40.5831242Z   test_owner_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (6.926s)
2021-09-22T19:27:56.2235467Z   test_rref_as_arg_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (15.640s)
2021-09-22T19:28:15.5734266Z   test_rref_as_arg_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... ok (19.350s)

See GitHub Actions build win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (2/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-09-22T19:38:49.1770489Z RuntimeError: test_cpp_extensions_jit failed!
2021-09-22T19:38:49.0624799Z 
2021-09-22T19:38:49.0625078Z FAILED (errors=1, skipped=7)
2021-09-22T19:38:49.0625303Z 
2021-09-22T19:38:49.0625602Z Generating XML reports...
2021-09-22T19:38:49.0626454Z Generated XML report: test-reports\dist-gloo\test_cpp_extensions_jit\TEST-TestCppExtensionJIT-20210922193615.xml
2021-09-22T19:38:49.1768015Z Traceback (most recent call last):
2021-09-22T19:38:49.1768856Z   File "run_test.py", line 1030, in <module>
2021-09-22T19:38:49.1769189Z     main()
2021-09-22T19:38:49.1769597Z   File "run_test.py", line 1008, in main
2021-09-22T19:38:49.1770039Z     raise RuntimeError(err_message)
2021-09-22T19:38:49.1770489Z RuntimeError: test_cpp_extensions_jit failed!
2021-09-22T19:38:49.4005698Z 
2021-09-22T19:38:49.4006357Z (base) C:\actions-runner\_work\pytorch\pytorch\pytorch-1262768294\test>popd
2021-09-22T19:38:49.4010890Z 
2021-09-22T19:38:49.4011426Z (base) C:\actions-runner\_work\pytorch\pytorch\pytorch-1262768294>if ERRORLEVEL 1 exit /b 1 
2021-09-22T19:38:49.4034854Z + cleanup
2021-09-22T19:38:49.4035176Z + retcode=1
2021-09-22T19:38:49.4035445Z + set +x
2021-09-22T19:38:49.4066952Z ##[error]Process completed with exit code 1.
2021-09-22T19:38:49.4218437Z ##[group]Run # -ir => recursive include all files in pattern
2021-09-22T19:38:49.4219081Z �[36;1m# -ir => recursive include all files in pattern�[0m

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

suo added a commit that referenced this pull request Sep 22, 2021
ghstack-source-id: b97ffa0
Pull Request resolved: #65477
@suo
Copy link
Member Author

suo commented Sep 22, 2021

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@suo merged this pull request in b3ec88f.

@facebook-github-bot facebook-github-bot deleted the gh/suo/451/head branch September 26, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants