Skip to content

Add a description of GradBucket Python class#53596

Closed
wayi1 wants to merge 3 commits intogh/SciPioneer/77/basefrom
gh/SciPioneer/77/head
Closed

Add a description of GradBucket Python class#53596
wayi1 wants to merge 3 commits intogh/SciPioneer/77/basefrom
gh/SciPioneer/77/head

Conversation

@wayi1
Copy link
Copy Markdown
Contributor

@wayi1 wayi1 commented Mar 9, 2021

Stack from ghstack:

This description will be used in ddp_comm_hook docstrings.

Differential Revision: D26908160

This description will be used in ddp_comm_hook docstrings.

Differential Revision: [D26908160](https://our.internmc.facebook.com/intern/diff/D26908160/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Mar 9, 2021
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Mar 9, 2021

💊 CI failures summary and remediations

As of commit ae1aaca (more details on the Dr. CI page):


  • 15/15 failures possibly* introduced in this PR
    • 2/15 non-scanned failure(s)

🕵️ 13 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (1/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:12:51 RuntimeError: test_autograd failed!
Mar 10 22:12:51   File "test_autograd.py", line 42, in <module>
Mar 10 22:12:51     from torch.testing._internal.common_methods_invocations import (method_tests,
Mar 10 22:12:51   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2451, in <module>
Mar 10 22:12:51     OpInfo('eig',
Mar 10 22:12:51 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:12:51 Traceback (most recent call last):
Mar 10 22:12:51   File "test/run_test.py", line 1074, in <module>
Mar 10 22:12:51     main()
Mar 10 22:12:51   File "test/run_test.py", line 1053, in main
Mar 10 22:12:51     raise RuntimeError(err_message)
Mar 10 22:12:51 RuntimeError: test_autograd failed!
Mar 10 22:12:52 
Mar 10 22:12:52 real	0m6.388s
Mar 10 22:12:52 user	0m4.309s
Mar 10 22:12:52 sys	0m0.832s
Mar 10 22:12:52 + cleanup
Mar 10 22:12:52 + retcode=1
Mar 10 22:12:52 + set +x
Mar 10 22:12:52 =================== sccache compilation log ===================
Mar 10 22:12:52 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 22:12:52 Compile requests                      28

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test1 (2/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:21:41 RuntimeError: test_autograd failed!
Mar 10 22:21:41   File "test_autograd.py", line 42, in <module>
Mar 10 22:21:41     from torch.testing._internal.common_methods_invocations import (method_tests,
Mar 10 22:21:41   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 22:21:41     skipCUDAIfRocm
Mar 10 22:21:41 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:21:41 Traceback (most recent call last):
Mar 10 22:21:41   File "test/run_test.py", line 1074, in <module>
Mar 10 22:21:41     main()
Mar 10 22:21:41   File "test/run_test.py", line 1053, in main
Mar 10 22:21:41     raise RuntimeError(err_message)
Mar 10 22:21:41 RuntimeError: test_autograd failed!
Mar 10 22:21:42 =================== sccache compilation log ===================
Mar 10 22:21:42 + cleanup
Mar 10 22:21:42 + retcode=1
Mar 10 22:21:42 + set +x
Mar 10 22:21:42 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 22:21:42 Compile requests                      28
Mar 10 22:21:42 Compile requests executed             26
Mar 10 22:21:42 Cache hits                             2
Mar 10 22:21:42 Cache hits (C/C++)                     2
Mar 10 22:21:42 Cache misses                          24

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (3/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:12:21 RuntimeError: test_ops failed!
Mar 10 22:12:21   File "test_ops.py", line 10, in <module>
Mar 10 22:12:21     from torch.testing._internal.common_methods_invocations import \
Mar 10 22:12:21   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2451, in <module>
Mar 10 22:12:21     OpInfo('eig',
Mar 10 22:12:21 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:12:21 Traceback (most recent call last):
Mar 10 22:12:21   File "test/run_test.py", line 1074, in <module>
Mar 10 22:12:21     main()
Mar 10 22:12:21   File "test/run_test.py", line 1053, in main
Mar 10 22:12:21     raise RuntimeError(err_message)
Mar 10 22:12:21 RuntimeError: test_ops failed!
Mar 10 22:12:22 
Mar 10 22:12:22 real	0m5.617s
Mar 10 22:12:22 user	0m3.794s
Mar 10 22:12:22 sys	0m0.792s
Mar 10 22:12:22 + cleanup
Mar 10 22:12:22 + retcode=1
Mar 10 22:12:22 + set +x
Mar 10 22:12:22 =================== sccache compilation log ===================
Mar 10 22:12:22 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 22:12:22 Compile requests                      28

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_jit_legacy_test (4/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:07:29 RuntimeError: test_jit_legacy failed!
Mar 10 22:07:29   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/jit_metaprogramming_utils.py", line 5, in <module>
Mar 10 22:07:29     from torch.testing._internal.common_methods_invocations import non_differentiable, create_input, \
Mar 10 22:07:29   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 22:07:29     skipCUDAIfRocm
Mar 10 22:07:29 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:07:29 Traceback (most recent call last):
Mar 10 22:07:29   File "test/run_test.py", line 1074, in <module>
Mar 10 22:07:29     main()
Mar 10 22:07:29   File "test/run_test.py", line 1053, in main
Mar 10 22:07:29     raise RuntimeError(err_message)
Mar 10 22:07:29 RuntimeError: test_jit_legacy failed!
Mar 10 22:07:30 + cleanup
Mar 10 22:07:30 + retcode=1
Mar 10 22:07:30 + set +x
Mar 10 22:07:30 =================== sccache compilation log ===================
Mar 10 22:07:30 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 22:07:30 Compile requests                      0
Mar 10 22:07:30 Compile requests executed             0
Mar 10 22:07:30 Cache hits                            0
Mar 10 22:07:30 Cache misses                          0
Mar 10 22:07:30 Cache timeouts                        0

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (5/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:23:16 RuntimeError: test_ops failed!
Mar 10 22:23:16   File "test_ops.py", line 10, in <module>
Mar 10 22:23:16     from torch.testing._internal.common_methods_invocations import \
Mar 10 22:23:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 22:23:16     skipCUDAIfRocm
Mar 10 22:23:16 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:23:16 Traceback (most recent call last):
Mar 10 22:23:16   File "test/run_test.py", line 1074, in <module>
Mar 10 22:23:16     main()
Mar 10 22:23:16   File "test/run_test.py", line 1053, in main
Mar 10 22:23:16     raise RuntimeError(err_message)
Mar 10 22:23:16 RuntimeError: test_ops failed!
Mar 10 22:23:17 =================== sccache compilation log ===================
Mar 10 22:23:17 + cleanup
Mar 10 22:23:17 + retcode=1
Mar 10 22:23:17 + set +x
Mar 10 22:23:17 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 22:23:17 Compile requests                      28
Mar 10 22:23:17 Compile requests executed             26
Mar 10 22:23:17 Cache hits                             2
Mar 10 22:23:17 Cache hits (C/C++)                     2
Mar 10 22:23:17 Cache misses                          24

See CircleCI build pytorch_macos_10_13_py3_test (6/13)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:04:28 RuntimeError: test_autograd failed!
Mar 10 22:04:27   File "test_autograd.py", line 42, in <module>
Mar 10 22:04:27     from torch.testing._internal.common_methods_invocations import (method_tests,
Mar 10 22:04:27   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 22:04:27     skipCUDAIfRocm
Mar 10 22:04:27 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:04:28 Traceback (most recent call last):
Mar 10 22:04:28   File "test/run_test.py", line 1074, in <module>
Mar 10 22:04:28     main()
Mar 10 22:04:28   File "test/run_test.py", line 1053, in main
Mar 10 22:04:28     raise RuntimeError(err_message)
Mar 10 22:04:28 RuntimeError: test_autograd failed!
Mar 10 22:04:28 + cleanup
Mar 10 22:04:28 + retcode=1
Mar 10 22:04:28 + set +x


Exited with code exit status 1

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 (7/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 23:03:48 RuntimeError: test_autograd failed!
Mar 10 23:03:48   File "test_autograd.py", line 42, in <module>
Mar 10 23:03:48     from torch.testing._internal.common_methods_invocations import (method_tests,
Mar 10 23:03:48   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 23:03:48     skipCUDAIfRocm
Mar 10 23:03:48 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 23:03:48 Traceback (most recent call last):
Mar 10 23:03:48   File "test/run_test.py", line 1074, in <module>
Mar 10 23:03:48     main()
Mar 10 23:03:48   File "test/run_test.py", line 1053, in main
Mar 10 23:03:48     raise RuntimeError(err_message)
Mar 10 23:03:48 RuntimeError: test_autograd failed!
Mar 10 23:03:49 + cleanup
Mar 10 23:03:49 + retcode=1
Mar 10 23:03:49 + set +x
Mar 10 23:03:49 =================== sccache compilation log ===================
Mar 10 23:03:49 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 23:03:49 Compile requests                     64
Mar 10 23:03:49 Compile requests executed            38
Mar 10 23:03:49 Cache hits                            2
Mar 10 23:03:49 Cache hits (C/C++)                    2
Mar 10 23:03:49 Cache misses                         36

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_jit_legacy_test (8/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:59:14 RuntimeError: test_jit_legacy failed!
Mar 10 22:59:14   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/jit_metaprogramming_utils.py", line 5, in <module>
Mar 10 22:59:14     from torch.testing._internal.common_methods_invocations import non_differentiable, create_input, \
Mar 10 22:59:14   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 22:59:14     skipCUDAIfRocm
Mar 10 22:59:14 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:59:14 Traceback (most recent call last):
Mar 10 22:59:14   File "test/run_test.py", line 1074, in <module>
Mar 10 22:59:14     main()
Mar 10 22:59:14   File "test/run_test.py", line 1053, in main
Mar 10 22:59:14     raise RuntimeError(err_message)
Mar 10 22:59:14 RuntimeError: test_jit_legacy failed!
Mar 10 22:59:15 + cleanup
Mar 10 22:59:15 + retcode=1
Mar 10 22:59:15 + set +x
Mar 10 22:59:15 =================== sccache compilation log ===================
Mar 10 22:59:15 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 22:59:15 Compile requests                      0
Mar 10 22:59:15 Compile requests executed             0
Mar 10 22:59:15 Cache hits                            0
Mar 10 22:59:15 Cache misses                          0
Mar 10 22:59:15 Cache timeouts                        0

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test1 (9/13)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_autograd failed!
  File "test_autograd.py", line 42, in <module>
    from torch.testing._internal.common_methods_invocations import (method_tests,
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_methods_invocations.py", line 2460, in <module>
    skipCUDAIfRocm
TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Traceback (most recent call last):
  File "run_test.py", line 1074, in <module>
    main()
  File "run_test.py", line 1053, in main
    raise RuntimeError(err_message)
RuntimeError: test_autograd failed!

(base) C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1 
+ cleanup
+ retcode=1
+ set +x


Exited with code exit status 1

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (10/13)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_ops failed!
  File "test_ops.py", line 10, in <module>
    from torch.testing._internal.common_methods_invocations import \
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_methods_invocations.py", line 2460, in <module>
    skipCUDAIfRocm
TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Traceback (most recent call last):
  File "run_test.py", line 1074, in <module>
    main()
  File "run_test.py", line 1053, in main
    raise RuntimeError(err_message)
RuntimeError: test_ops failed!

(base) C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1 
+ cleanup
+ retcode=1
+ set +x


Exited with code exit status 1

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (11/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 23:04:34 RuntimeError: test_ops failed!
Mar 10 23:04:33   File "test_ops.py", line 10, in <module>
Mar 10 23:04:33     from torch.testing._internal.common_methods_invocations import \
Mar 10 23:04:33   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 23:04:33     skipCUDAIfRocm
Mar 10 23:04:33 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 23:04:34 Traceback (most recent call last):
Mar 10 23:04:34   File "test/run_test.py", line 1074, in <module>
Mar 10 23:04:34     main()
Mar 10 23:04:34   File "test/run_test.py", line 1053, in main
Mar 10 23:04:34     raise RuntimeError(err_message)
Mar 10 23:04:34 RuntimeError: test_ops failed!
Mar 10 23:04:34 + cleanup
Mar 10 23:04:34 + retcode=1
Mar 10 23:04:34 + set +x
Mar 10 23:04:34 =================== sccache compilation log ===================
Mar 10 23:04:34 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 23:04:34 Compile requests                     64
Mar 10 23:04:34 Compile requests executed            38
Mar 10 23:04:34 Cache hits                            2
Mar 10 23:04:34 Cache hits (C/C++)                    2
Mar 10 23:04:34 Cache misses                         36

See CircleCI build pytorch_linux_bionic_py3_6_clang9_test (12/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:13:43 RuntimeError: test_autograd failed!
Mar 10 22:13:43   File "test_autograd.py", line 42, in <module>
Mar 10 22:13:43     from torch.testing._internal.common_methods_invocations import (method_tests,
Mar 10 22:13:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 22:13:43     skipCUDAIfRocm
Mar 10 22:13:43 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:13:43 Traceback (most recent call last):
Mar 10 22:13:43   File "test/run_test.py", line 1074, in <module>
Mar 10 22:13:43     main()
Mar 10 22:13:43   File "test/run_test.py", line 1053, in main
Mar 10 22:13:43     raise RuntimeError(err_message)
Mar 10 22:13:43 RuntimeError: test_autograd failed!
Mar 10 22:13:44 
Mar 10 22:13:44 real	1m20.399s
Mar 10 22:13:44 user	1m15.164s
Mar 10 22:13:44 sys	0m2.371s
Mar 10 22:13:44 + cleanup
Mar 10 22:13:44 + retcode=1
Mar 10 22:13:44 + set +x
Mar 10 22:13:44 =================== sccache compilation log ===================
Mar 10 22:13:44 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 22:13:44 Compile requests                      28

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (13/13)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Mar 10 22:10:39 RuntimeError: test_autograd failed!
Mar 10 22:10:39   File "test_autograd.py", line 42, in <module>
Mar 10 22:10:39     from torch.testing._internal.common_methods_invocations import (method_tests,
Mar 10 22:10:39   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_methods_invocations.py", line 2460, in <module>
Mar 10 22:10:39     skipCUDAIfRocm
Mar 10 22:10:39 TypeError: __init__() got an unexpected keyword argument 'supports_tensor_out'
Mar 10 22:10:39 Traceback (most recent call last):
Mar 10 22:10:39   File "test/run_test.py", line 1074, in <module>
Mar 10 22:10:39     main()
Mar 10 22:10:39   File "test/run_test.py", line 1053, in main
Mar 10 22:10:39     raise RuntimeError(err_message)
Mar 10 22:10:39 RuntimeError: test_autograd failed!
Mar 10 22:10:39 + cleanup
Mar 10 22:10:39 + retcode=1
Mar 10 22:10:39 + set +x
Mar 10 22:10:39 =================== sccache compilation log ===================
Mar 10 22:10:39 =========== If your build fails, please take a look at the log above for possible reasons ===========
Mar 10 22:10:39 Compile requests                     28
Mar 10 22:10:39 Compile requests executed            26
Mar 10 22:10:39 Cache hits                            2
Mar 10 22:10:39 Cache hits (C/C++)                    2
Mar 10 22:10:39 Cache misses                         24

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@wayi1 wayi1 mentioned this pull request Mar 9, 2021
wayi1 pushed a commit that referenced this pull request Mar 9, 2021
This description will be used in ddp_comm_hook docstrings.

Differential Revision: [D26908160](https://our.internmc.facebook.com/intern/diff/D26908160/)

ghstack-source-id: 123414709
Pull Request resolved: #53596
Comment thread torch/csrc/distributed/c10d/init.cpp
This description will be used in ddp_comm_hook docstrings.

Differential Revision: [D26908160](https://our.internmc.facebook.com/intern/diff/D26908160/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Mar 10, 2021
Pull Request resolved: #53596

This description will be used in ddp_comm_hook docstrings.
ghstack-source-id: 123522560

Differential Revision: [D26908160](https://our.internmc.facebook.com/intern/diff/D26908160/)
This description will be used in ddp_comm_hook docstrings.

Differential Revision: [D26908160](https://our.internmc.facebook.com/intern/diff/D26908160/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Mar 10, 2021
Pull Request resolved: #53596

This description will be used in ddp_comm_hook docstrings.
ghstack-source-id: 123590360

Differential Revision: [D26908160](https://our.internmc.facebook.com/intern/diff/D26908160/)
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in c988b78.

@facebook-github-bot facebook-github-bot deleted the gh/SciPioneer/77/head branch March 14, 2021 14:14
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
Summary:
Pull Request resolved: pytorch#53596

This description will be used in ddp_comm_hook docstrings.
ghstack-source-id: 123590360

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D26908160

fbshipit-source-id: 824dea9203ca583676bddf0161c9edca52c9d20e
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#53596

This description will be used in ddp_comm_hook docstrings.
ghstack-source-id: 123590360

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D26908160

fbshipit-source-id: 824dea9203ca583676bddf0161c9edca52c9d20e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants