Skip to content

[WIP] Support broadcasting via slow path#52448

Closed
izdeby wants to merge 18 commits intogh/izdeby/87/basefrom
gh/izdeby/87/head
Closed

[WIP] Support broadcasting via slow path#52448
izdeby wants to merge 18 commits intogh/izdeby/87/basefrom
gh/izdeby/87/head

Conversation

@izdeby
Copy link
Copy Markdown
Contributor

@izdeby izdeby commented Feb 18, 2021

Stack from ghstack:

Differential Revision: D26520922

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Feb 18, 2021

💊 CI failures summary and remediations

As of commit d370e27 (more details on the Dr. CI page):


  • 11/11 failures possibly* introduced in this PR
    • 2/11 non-scanned failure(s)

🕵️ 7 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Apr 01 19:35:45 RuntimeError: test_foreach failed!
Apr 01 19:35:43 Executing ['/Users/distiller/workspace/miniconda3/bin/python', 'test_foreach.py', '-v'] ... [2021-04-01 19:35:43.765230]
Apr 01 19:35:45 Traceback (most recent call last):
Apr 01 19:35:45   File "test_foreach.py", line 8, in <module>
Apr 01 19:35:45     from torch.testing._internal.common_methods_invocations import \
Apr 01 19:35:45 ImportError: cannot import name 'foreach_binary_op_tensor_list_db' from 'torch.testing._internal.common_methods_invocations' (/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py)
Apr 01 19:35:45 Traceback (most recent call last):
Apr 01 19:35:45   File "test/run_test.py", line 1094, in <module>
Apr 01 19:35:45     main()
Apr 01 19:35:45   File "test/run_test.py", line 1073, in main
Apr 01 19:35:45     raise RuntimeError(err_message)
Apr 01 19:35:45 RuntimeError: test_foreach failed!
Apr 01 19:35:45 + cleanup
Apr 01 19:35:45 + retcode=1
Apr 01 19:35:45 + set +x


Exited with code exit status 1

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (2/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 19:43:26 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 01 19:43:26 
Apr 01 19:43:26 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 01 19:43:26 Traceback of TorchScript (most recent call last):
Apr 01 19:43:26   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 334, in raise_func_script
Apr 01 19:43:26 @torch.jit.script
Apr 01 19:43:26 def raise_func_script(expected_err: str) -> torch.Tensor:
Apr 01 19:43:26     raise ValueError(expected_err)
Apr 01 19:43:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Apr 01 19:43:26 RuntimeError: Expected error
Apr 01 19:43:26 
Apr 01 19:43:26 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 01 19:43:26 Traceback of TorchScript (most recent call last):
Apr 01 19:43:26   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 334, in raise_func_script
Apr 01 19:43:26 @torch.jit.script
Apr 01 19:43:26 def raise_func_script(expected_err: str) -> torch.Tensor:
Apr 01 19:43:26     raise ValueError(expected_err)
Apr 01 19:43:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Apr 01 19:43:26 RuntimeError: Expected error
Apr 01 19:43:26 
Apr 01 19:43:26 ok (2.756s)
Apr 01 19:43:28   test_wait_all_multiple_call (__main__.ProcessGroupRpcTestWithSpawn) ... RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (3/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 20:37:06 RuntimeError: test_foreach failed!
Apr 01 20:37:03 Executing ['/opt/conda/bin/python', 'test_foreach.py', '-v'] ... [2021-04-01 20:37:03.797448]
Apr 01 20:37:06 Traceback (most recent call last):
Apr 01 20:37:06   File "test_foreach.py", line 8, in <module>
Apr 01 20:37:06     from torch.testing._internal.common_methods_invocations import \
Apr 01 20:37:06 ImportError: cannot import name 'foreach_binary_op_tensor_list_db'
Apr 01 20:37:06 Traceback (most recent call last):
Apr 01 20:37:06   File "test/run_test.py", line 1094, in <module>
Apr 01 20:37:06     main()
Apr 01 20:37:06   File "test/run_test.py", line 1073, in main
Apr 01 20:37:06     raise RuntimeError(err_message)
Apr 01 20:37:06 RuntimeError: test_foreach failed!
Apr 01 20:37:07 =================== sccache compilation log ===================
Apr 01 20:37:07 + cleanup
Apr 01 20:37:07 + retcode=1
Apr 01 20:37:07 + set +x
Apr 01 20:37:07 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 20:37:07 Compile requests                      4
Apr 01 20:37:07 Compile requests executed             2
Apr 01 20:37:07 Cache hits                            2
Apr 01 20:37:07 Cache hits (C/C++)                    2
Apr 01 20:37:07 Cache misses                          0

See CircleCI build pytorch_linux_bionic_py3_6_clang9_noarch_test (4/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:56:44 RuntimeError: test_foreach failed!
Apr 01 18:56:42 Executing ['/opt/conda/bin/python', 'test_foreach.py', '-v'] ... [2021-04-01 18:56:42.763283]
Apr 01 18:56:43 Traceback (most recent call last):
Apr 01 18:56:43   File "test_foreach.py", line 8, in <module>
Apr 01 18:56:43     from torch.testing._internal.common_methods_invocations import \
Apr 01 18:56:43 ImportError: cannot import name 'foreach_binary_op_tensor_list_db'
Apr 01 18:56:44 Traceback (most recent call last):
Apr 01 18:56:44   File "test/run_test.py", line 1094, in <module>
Apr 01 18:56:44     main()
Apr 01 18:56:44   File "test/run_test.py", line 1073, in main
Apr 01 18:56:44     raise RuntimeError(err_message)
Apr 01 18:56:44 RuntimeError: test_foreach failed!
Apr 01 18:56:44 
Apr 01 18:56:44 real	21m47.759s
Apr 01 18:56:44 user	27m43.002s
Apr 01 18:56:44 sys	4m25.188s
Apr 01 18:56:44 + cleanup
Apr 01 18:56:44 + retcode=1
Apr 01 18:56:44 + set +x
Apr 01 18:56:44 =================== sccache compilation log ===================
Apr 01 18:56:44 ERROR 2021-04-01T18:45:30Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected ‘;’ before ‘}’ token\n int main() { return 0 }\n                       ^\n" }
Apr 01 18:56:44 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (5/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:47:54 RuntimeError: test_foreach failed!
Apr 01 18:47:53 Executing ['/opt/conda/bin/python', 'test_foreach.py', '-v'] ... [2021-04-01 18:47:53.142649]
Apr 01 18:47:54 Traceback (most recent call last):
Apr 01 18:47:54   File "test_foreach.py", line 8, in <module>
Apr 01 18:47:54     from torch.testing._internal.common_methods_invocations import \
Apr 01 18:47:54 ImportError: cannot import name 'foreach_binary_op_tensor_list_db'
Apr 01 18:47:54 Traceback (most recent call last):
Apr 01 18:47:54   File "test/run_test.py", line 1094, in <module>
Apr 01 18:47:54     main()
Apr 01 18:47:54   File "test/run_test.py", line 1073, in main
Apr 01 18:47:54     raise RuntimeError(err_message)
Apr 01 18:47:54 RuntimeError: test_foreach failed!
Apr 01 18:47:54 + cleanup
Apr 01 18:47:54 + retcode=1
Apr 01 18:47:54 + set +x
Apr 01 18:47:54 =================== sccache compilation log ===================
Apr 01 18:47:54 ERROR 2021-04-01T18:37:15Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected ‘;’ before ‘}’ token\n int main() { return 0 }\n                       ^\n" }
Apr 01 18:47:54 
Apr 01 18:47:54 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:47:54 Compile requests                     83
Apr 01 18:47:54 Compile requests executed            54
Apr 01 18:47:54 Cache hits                           29

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test1 (6/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: RuntimeError not raised
  test_scatter_gpu_neg_dim (__main__.TestCudaComm) ... skip (0.000s)
  test_scatter_gpu_sizes (__main__.TestCudaComm) ... skip (0.000s)
  test_scatter_namedtuple (__main__.TestCudaComm) ... skip (0.003s)

======================================================================
FAIL [0.019s]: test_grad_scaling_unscale (__main__.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_cuda.py", line 1917, in test_grad_scaling_unscale
    inv_scale)
AssertionError: RuntimeError not raised

----------------------------------------------------------------------
Ran 158 tests in 111.507s

FAILED (failures=1, skipped=65)

Generating XML reports...
Generated XML report: test-reports\dist-gloo\test_cuda\TEST-TestCuda-20210401194456.xml
Generated XML report: test-reports\dist-gloo\test_cuda\TEST-TestCudaComm-20210401194456.xml
Traceback (most recent call last):

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (7/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 20:41:21 AssertionError: RuntimeError not raised
Apr 01 20:41:21 ======================================================================
Apr 01 20:41:21 FAIL [0.031s]: test_grad_scaling_unscale (__main__.TestCuda)
Apr 01 20:41:21 ----------------------------------------------------------------------
Apr 01 20:41:21 Traceback (most recent call last):
Apr 01 20:41:21   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 979, in wrapper
Apr 01 20:41:21     method(*args, **kwargs)
Apr 01 20:41:21   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 979, in wrapper
Apr 01 20:41:21     method(*args, **kwargs)
Apr 01 20:41:21   File "test_cuda.py", line 1917, in test_grad_scaling_unscale
Apr 01 20:41:21     inv_scale)
Apr 01 20:41:21 AssertionError: RuntimeError not raised
Apr 01 20:41:21 
Apr 01 20:41:21 ----------------------------------------------------------------------
Apr 01 20:41:21 Ran 158 tests in 166.602s
Apr 01 20:41:21 
Apr 01 20:41:21 FAILED (failures=1, skipped=12)
Apr 01 20:41:21 
Apr 01 20:41:21 Generating XML reports...
Apr 01 20:41:21 Generated XML report: test-reports/dist-gloo/test_cuda/TEST-TestCuda-20210401203834.xml
Apr 01 20:41:21 Generated XML report: test-reports/dist-gloo/test_cuda/TEST-TestCudaComm-20210401203834.xml
Apr 01 20:41:21 Traceback (most recent call last):

2 failures not recognized by patterns:

Job Step Action
GitHub Actions flake8-py3 Fail if there were any warnings 🔁 rerun
GitHub Actions quick-checks Ensure no trailing spaces 🔁 rerun

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Iurii Zdebskyi and others added 12 commits February 18, 2021 12:16
@izdeby izdeby changed the title Support broadcasting via slow path [WIP] Support broadcasting via slow path Mar 16, 2021
@facebook-github-bot
Copy link
Copy Markdown
Contributor

Hi @izdeby!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot pushed a commit that referenced this pull request May 25, 2021
Summary:
This is based on  #48224.

To make `foreach` more flexible, this PR pushes unsupported cases to slow path.
Also, this adds some tests to verify that
- `foreach` functions work with tensors of different dtypes and/or memory layouts in 7bd4b2c
- `foreach` functions work with tensors on different devices in a list, but are on the same device if the indices are the same: def4b9b

Future plans:
1. Improve the coverage of unittests using `ops` decorator & updating `foreach_unary_op_db` and creating `foreach_(binary|pointwise|minmax)_db`.
2. Support broadcasting in slow path. Ref:  #52448
3. Support type promotion in fast path. Ref #52449

CC: ngimel mcarilli  ptrblck

Pull Request resolved: #56993

Reviewed By: zou3519

Differential Revision: D28630580

Pulled By: ngimel

fbshipit-source-id: e26ee74a39a591025e18c1ead48948cb7ec53c19
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
This is based on  pytorch#48224.

To make `foreach` more flexible, this PR pushes unsupported cases to slow path.
Also, this adds some tests to verify that
- `foreach` functions work with tensors of different dtypes and/or memory layouts in pytorch@7bd4b2c
- `foreach` functions work with tensors on different devices in a list, but are on the same device if the indices are the same: pytorch@def4b9b

Future plans:
1. Improve the coverage of unittests using `ops` decorator & updating `foreach_unary_op_db` and creating `foreach_(binary|pointwise|minmax)_db`.
2. Support broadcasting in slow path. Ref:  pytorch#52448
3. Support type promotion in fast path. Ref pytorch#52449

CC: ngimel mcarilli  ptrblck

Pull Request resolved: pytorch#56993

Reviewed By: zou3519

Differential Revision: D28630580

Pulled By: ngimel

fbshipit-source-id: e26ee74a39a591025e18c1ead48948cb7ec53c19
@pytorchbot
Copy link
Copy Markdown
Collaborator

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
Stale pull requests will automatically be closed 30 days after being marked Stale

@github-actions github-actions bot closed this May 12, 2022
@facebook-github-bot facebook-github-bot deleted the gh/izdeby/87/head branch June 11, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants