Skip to content

[wip] Pushed unsupported scenarios to slow path#48224

Closed
izdeby wants to merge 56 commits intogh/izdeby/68/basefrom
gh/izdeby/68/head
Closed

[wip] Pushed unsupported scenarios to slow path#48224
izdeby wants to merge 56 commits intogh/izdeby/68/basefrom
gh/izdeby/68/head

Conversation

@izdeby
Copy link
Copy Markdown
Contributor

@izdeby izdeby commented Nov 19, 2020

Stack from ghstack:

Differential Revision: D25074764

Coming next for this stack:

  1. Docs PR. Add more code documentation
  2. Swap optimizers from torch.optim and torch.optim.multi_tensor

Motivation
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this.

What was done

  • Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
  • Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
  • Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
  • Completely rewrote test/test_foreach.py, see testing section

Testing
This is an arguable change. Im open to reconsidering.

  • Refactoring
    I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain.

  • Coverage and correctness
    Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.

izdeby pushed a commit that referenced this pull request Nov 19, 2020
ghstack-source-id: 97d1ba4
Pull Request resolved: #48224
@izdeby izdeby changed the title Pushed unsupported scenarios to slow path [WIP] Pushed unsupported scenarios to slow path Nov 19, 2020
@dr-ci
Copy link
Copy Markdown

dr-ci bot commented Nov 19, 2020

💊 CI failures summary and remediations

As of commit 156288b (more details on the Dr. CI page):


  • 11/11 failures possibly* introduced in this PR
    • 2/11 non-scanned failure(s)

🕵️ 7 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Apr 01 19:37:00 RuntimeError: test_foreach failed!
Apr 01 19:36:58 Executing ['/Users/distiller/workspace/miniconda3/bin/python', 'test_foreach.py', '-v'] ... [2021-04-01 19:36:58.666983]
Apr 01 19:37:00 Traceback (most recent call last):
Apr 01 19:37:00   File "test_foreach.py", line 8, in <module>
Apr 01 19:37:00     from torch.testing._internal.common_methods_invocations import \
Apr 01 19:37:00 ImportError: cannot import name 'foreach_binary_op_tensor_list_db' from 'torch.testing._internal.common_methods_invocations' (/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py)
Apr 01 19:37:00 Traceback (most recent call last):
Apr 01 19:37:00   File "test/run_test.py", line 1094, in <module>
Apr 01 19:37:00     main()
Apr 01 19:37:00   File "test/run_test.py", line 1073, in main
Apr 01 19:37:00     raise RuntimeError(err_message)
Apr 01 19:37:00 RuntimeError: test_foreach failed!
Apr 01 19:37:00 + cleanup
Apr 01 19:37:00 + retcode=1
Apr 01 19:37:00 + set +x


Exited with code exit status 1

See CircleCI build pytorch_linux_bionic_py3_6_clang9_noarch_test (2/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:54:01 RuntimeError: test_foreach failed!
Apr 01 18:54:00 Executing ['/opt/conda/bin/python', 'test_foreach.py', '-v'] ... [2021-04-01 18:54:00.028210]
Apr 01 18:54:01 Traceback (most recent call last):
Apr 01 18:54:01   File "test_foreach.py", line 8, in <module>
Apr 01 18:54:01     from torch.testing._internal.common_methods_invocations import \
Apr 01 18:54:01 ImportError: cannot import name 'foreach_binary_op_tensor_list_db'
Apr 01 18:54:01 Traceback (most recent call last):
Apr 01 18:54:01   File "test/run_test.py", line 1094, in <module>
Apr 01 18:54:01     main()
Apr 01 18:54:01   File "test/run_test.py", line 1073, in main
Apr 01 18:54:01     raise RuntimeError(err_message)
Apr 01 18:54:01 RuntimeError: test_foreach failed!
Apr 01 18:54:01 
Apr 01 18:54:01 real	21m59.854s
Apr 01 18:54:01 user	28m2.140s
Apr 01 18:54:01 sys	4m24.639s
Apr 01 18:54:01 + cleanup
Apr 01 18:54:01 + retcode=1
Apr 01 18:54:01 + set +x
Apr 01 18:54:01 =================== sccache compilation log ===================
Apr 01 18:54:01 ERROR 2021-04-01T18:42:40Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected ‘;’ before ‘}’ token\n int main() { return 0 }\n                       ^\n" }
Apr 01 18:54:01 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (3/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 18:52:55 RuntimeError: test_foreach failed!
Apr 01 18:52:54 Executing ['/opt/conda/bin/python', 'test_foreach.py', '-v'] ... [2021-04-01 18:52:54.063992]
Apr 01 18:52:55 Traceback (most recent call last):
Apr 01 18:52:55   File "test_foreach.py", line 8, in <module>
Apr 01 18:52:55     from torch.testing._internal.common_methods_invocations import \
Apr 01 18:52:55 ImportError: cannot import name 'foreach_binary_op_tensor_list_db'
Apr 01 18:52:55 Traceback (most recent call last):
Apr 01 18:52:55   File "test/run_test.py", line 1094, in <module>
Apr 01 18:52:55     main()
Apr 01 18:52:55   File "test/run_test.py", line 1073, in main
Apr 01 18:52:55     raise RuntimeError(err_message)
Apr 01 18:52:55 RuntimeError: test_foreach failed!
Apr 01 18:52:55 + cleanup
Apr 01 18:52:55 + retcode=1
Apr 01 18:52:55 + set +x
Apr 01 18:52:55 =================== sccache compilation log ===================
Apr 01 18:52:55 ERROR 2021-04-01T18:41:33Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected ‘;’ before ‘}’ token\n int main() { return 0 }\n                       ^\n" }
Apr 01 18:52:55 
Apr 01 18:52:55 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 18:52:55 Compile requests                      83
Apr 01 18:52:55 Compile requests executed             54
Apr 01 18:52:55 Cache hits                            29

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (4/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 20:26:50 RuntimeError: test_foreach failed!
Apr 01 20:26:47 Executing ['/opt/conda/bin/python', 'test_foreach.py', '-v'] ... [2021-04-01 20:26:47.691985]
Apr 01 20:26:50 Traceback (most recent call last):
Apr 01 20:26:50   File "test_foreach.py", line 8, in <module>
Apr 01 20:26:50     from torch.testing._internal.common_methods_invocations import \
Apr 01 20:26:50 ImportError: cannot import name 'foreach_binary_op_tensor_list_db'
Apr 01 20:26:50 Traceback (most recent call last):
Apr 01 20:26:50   File "test/run_test.py", line 1094, in <module>
Apr 01 20:26:50     main()
Apr 01 20:26:50   File "test/run_test.py", line 1073, in main
Apr 01 20:26:50     raise RuntimeError(err_message)
Apr 01 20:26:50 RuntimeError: test_foreach failed!
Apr 01 20:26:51 + cleanup
Apr 01 20:26:51 + retcode=1
Apr 01 20:26:51 + set +x
Apr 01 20:26:51 =================== sccache compilation log ===================
Apr 01 20:26:51 =========== If your build fails, please take a look at the log above for possible reasons ===========
Apr 01 20:26:51 Compile requests                      4
Apr 01 20:26:51 Compile requests executed             2
Apr 01 20:26:51 Cache hits                            2
Apr 01 20:26:51 Cache hits (C/C++)                    2
Apr 01 20:26:51 Cache misses                          0

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (5/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 19:46:24 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 01 19:46:24 
Apr 01 19:46:24 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 01 19:46:24 Traceback of TorchScript (most recent call last):
Apr 01 19:46:24   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 334, in raise_func_script
Apr 01 19:46:24 @torch.jit.script
Apr 01 19:46:24 def raise_func_script(expected_err: str) -> torch.Tensor:
Apr 01 19:46:24     raise ValueError(expected_err)
Apr 01 19:46:24     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Apr 01 19:46:24 RuntimeError: Expected error
Apr 01 19:46:24 
Apr 01 19:46:24 [E request_callback_no_python.cpp:656] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
Apr 01 19:46:24 Traceback of TorchScript (most recent call last):
Apr 01 19:46:24   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 334, in raise_func_script
Apr 01 19:46:24 @torch.jit.script
Apr 01 19:46:24 def raise_func_script(expected_err: str) -> torch.Tensor:
Apr 01 19:46:24     raise ValueError(expected_err)
Apr 01 19:46:24     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Apr 01 19:46:24 RuntimeError: Expected error
Apr 01 19:46:24 
Apr 01 19:46:25 ok (2.854s)
Apr 01 19:46:27   test_wait_all_multiple_call (__main__.ProcessGroupRpcTestWithSpawn) ... RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (6/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 01 20:40:33 AssertionError: RuntimeError not raised
Apr 01 20:40:33 ======================================================================
Apr 01 20:40:33 FAIL [0.033s]: test_grad_scaling_unscale (__main__.TestCuda)
Apr 01 20:40:33 ----------------------------------------------------------------------
Apr 01 20:40:33 Traceback (most recent call last):
Apr 01 20:40:33   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 979, in wrapper
Apr 01 20:40:33     method(*args, **kwargs)
Apr 01 20:40:33   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 979, in wrapper
Apr 01 20:40:33     method(*args, **kwargs)
Apr 01 20:40:33   File "test_cuda.py", line 1917, in test_grad_scaling_unscale
Apr 01 20:40:33     inv_scale)
Apr 01 20:40:33 AssertionError: RuntimeError not raised
Apr 01 20:40:33 
Apr 01 20:40:33 ----------------------------------------------------------------------
Apr 01 20:40:33 Ran 158 tests in 168.317s
Apr 01 20:40:33 
Apr 01 20:40:33 FAILED (failures=1, skipped=12)
Apr 01 20:40:33 
Apr 01 20:40:33 Generating XML reports...
Apr 01 20:40:33 Generated XML report: test-reports/dist-gloo/test_cuda/TEST-TestCuda-20210401203744.xml
Apr 01 20:40:33 Generated XML report: test-reports/dist-gloo/test_cuda/TEST-TestCudaComm-20210401203744.xml
Apr 01 20:40:33 Traceback (most recent call last):

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test1 (7/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: RuntimeError not raised
  test_scatter_gpu_neg_dim (__main__.TestCudaComm) ... skip (0.000s)
  test_scatter_gpu_sizes (__main__.TestCudaComm) ... skip (0.006s)
  test_scatter_namedtuple (__main__.TestCudaComm) ... skip (0.000s)

======================================================================
FAIL [0.018s]: test_grad_scaling_unscale (__main__.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_cuda.py", line 1917, in test_grad_scaling_unscale
    inv_scale)
AssertionError: RuntimeError not raised

----------------------------------------------------------------------
Ran 158 tests in 110.448s

FAILED (failures=1, skipped=65)

Generating XML reports...
Generated XML report: test-reports\dist-gloo\test_cuda\TEST-TestCuda-20210401193805.xml
Generated XML report: test-reports\dist-gloo\test_cuda\TEST-TestCudaComm-20210401193805.xml
Traceback (most recent call last):

2 failures not recognized by patterns:

Job Step Action
GitHub Actions quick-checks Ensure no trailing spaces 🔁 rerun
GitHub Actions flake8-py3 Fail if there were any warnings 🔁 rerun

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

izdeby pushed a commit that referenced this pull request Dec 2, 2020
ghstack-source-id: 16c7e05
Pull Request resolved: #48224
izdeby pushed a commit that referenced this pull request Dec 3, 2020
ghstack-source-id: 5ae2b4f
Pull Request resolved: #48224
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
izdeby pushed a commit that referenced this pull request Dec 3, 2020
ghstack-source-id: c90cef6
Pull Request resolved: #48224
@izdeby izdeby changed the title [WIP] Pushed unsupported scenarios to slow path Pushed unsupported scenarios to slow path Dec 4, 2020
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
izdeby pushed a commit that referenced this pull request Dec 8, 2020
ghstack-source-id: c9f7082
Pull Request resolved: #48224
Iurii Zdebskyi added 4 commits December 8, 2020 14:51
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Iurii Zdebskyi and others added 16 commits February 18, 2021 12:37
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
Differential Revision: [D25074764](https://our.internmc.facebook.com/intern/diff/D25074764)

Coming next for this stack:
1. Docs PR. Add more code documentation
2. Swap optimizers from torch.optim and torch.optim.multi_tensor
 
--------------------------------
**Motivation**
_foreach APIs should produce the exact same result as if they were implemented via for-loop implementation but have significant performance gains in particular scenarios. This PR achieves this. 

**What was done**
- Stopped throwing an error in case when previous API restrictions were not met. Instead, execution will go via slow path if possible.
- Updated code logic to acknowledge division op. This is a temp fix until MTA wont support type promotion. In case of integer input and division, we should go via slow path as MTA doesn't support return of a different dtype.
- Significantly extended type support for binary/unary/pointwise ops. with this PR, _foreach APIs support exactly same list of types as their analogs in torch.
- Completely rewrote test/test_foreach.py, see testing section

**Testing**
This is an arguable change. Im open to reconsidering. 

- Refactoring
I've rewritten the whole test suite to have individual test per operator rather than having one master test with many else/if conditions. In my opinion, the tests became much more readable and easier to maintain. 

- Coverage and correctness 
Each _foreach API has its dedicated test where all possible dtypes/ combination of dtypes are being tested and compared to their counter parts in torch.  

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

Hi @izdeby!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

crcrpar added a commit to crcrpar/pytorch that referenced this pull request Apr 22, 2021
crcrpar added a commit to crcrpar/pytorch that referenced this pull request Apr 22, 2021
crcrpar added a commit to crcrpar/pytorch that referenced this pull request Apr 27, 2021
crcrpar added a commit to crcrpar/pytorch that referenced this pull request May 2, 2021
Ref: pytorch#48224

apply int/bool check to all tensors
facebook-github-bot pushed a commit that referenced this pull request May 25, 2021
Summary:
This is based on  #48224.

To make `foreach` more flexible, this PR pushes unsupported cases to slow path.
Also, this adds some tests to verify that
- `foreach` functions work with tensors of different dtypes and/or memory layouts in 7bd4b2c
- `foreach` functions work with tensors on different devices in a list, but are on the same device if the indices are the same: def4b9b

Future plans:
1. Improve the coverage of unittests using `ops` decorator & updating `foreach_unary_op_db` and creating `foreach_(binary|pointwise|minmax)_db`.
2. Support broadcasting in slow path. Ref:  #52448
3. Support type promotion in fast path. Ref #52449

CC: ngimel mcarilli  ptrblck

Pull Request resolved: #56993

Reviewed By: zou3519

Differential Revision: D28630580

Pulled By: ngimel

fbshipit-source-id: e26ee74a39a591025e18c1ead48948cb7ec53c19
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
This is based on  pytorch#48224.

To make `foreach` more flexible, this PR pushes unsupported cases to slow path.
Also, this adds some tests to verify that
- `foreach` functions work with tensors of different dtypes and/or memory layouts in pytorch@7bd4b2c
- `foreach` functions work with tensors on different devices in a list, but are on the same device if the indices are the same: pytorch@def4b9b

Future plans:
1. Improve the coverage of unittests using `ops` decorator & updating `foreach_unary_op_db` and creating `foreach_(binary|pointwise|minmax)_db`.
2. Support broadcasting in slow path. Ref:  pytorch#52448
3. Support type promotion in fast path. Ref pytorch#52449

CC: ngimel mcarilli  ptrblck

Pull Request resolved: pytorch#56993

Reviewed By: zou3519

Differential Revision: D28630580

Pulled By: ngimel

fbshipit-source-id: e26ee74a39a591025e18c1ead48948cb7ec53c19
@zou3519 zou3519 removed their request for review June 28, 2021 13:31
@pytorchbot
Copy link
Copy Markdown
Collaborator

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
Stale pull requests will automatically be closed 30 days after being marked Stale

@github-actions github-actions bot closed this May 12, 2022
@facebook-github-bot facebook-github-bot deleted the gh/izdeby/68/head branch June 11, 2022 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants