[PyTorch] Reduce errors of `foreach` functions by crcrpar · Pull Request #56993 · pytorch/pytorch

crcrpar · 2021-04-27T05:44:51Z

This is based on #48224.

To make foreach more flexible, this PR pushes unsupported cases to slow path.
Also, this adds some tests to verify that

foreach functions work with tensors of different dtypes and/or memory layouts in 7bd4b2c
foreach functions work with tensors on different devices in a list, but are on the same device if the indices are the same: def4b9b

Future plans:

Improve the coverage of unittests using ops decorator & updating foreach_unary_op_db and creating foreach_(binary|pointwise|minmax)_db.
Support broadcasting in slow path. Ref: [WIP] Support broadcasting via slow path #52448
Support type promotion in fast path. Ref [WIP] Add type promotion to a fast path for foreach APIs #52449

CC: @ngimel @mcarilli @ptrblck

Ref: pytorch#48224

Apply skipMeta to reduce warnings

facebook-github-bot · 2021-04-27T05:44:58Z

💊 CI failures summary and remediations

As of commit 4cc29a5 (more details on the Dr. CI page):

3/3 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/binary_smoketest.py
Auto-merging .circleci/cimodel/data/simple/binary_smoketest.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/android_definitions.py
Auto-merging .circleci/cimodel/data/simple/android_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py
Auto-merging .circleci/cimodel/data/dimensions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_definitions.py
Auto-merging .circleci/cimodel/data/binary_build_definitions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/binary_smoketest.py
Auto-merging .circleci/cimodel/data/simple/binary_smoketest.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/android_definitions.py
Auto-merging .circleci/cimodel/data/simple/android_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py
Auto-merging .circleci/cimodel/data/dimensions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_definitions.py
Auto-merging .circleci/cimodel/data/binary_build_definitions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

1 failure not recognized by patterns:

Job	Step	Action
^{clang-format / clang-format}	^{Run clang-format}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ngimel · 2021-05-20T00:55:10Z

aten/src/ATen/native/ForeachUtils.h

-    TORCH_CHECK(t.dtype() == expected_dtype, "All tensors in the tensor list must have the same dtype.");
+  if (check_dtype) {
+    const auto expected_dtype = tensors[0].dtype();
+    for (const auto & t : tensors) {


nit: you can use std::all_of here

ngimel · 2021-05-20T01:14:23Z

aten/src/ATen/native/cuda/ForeachPointwiseOp.cu

    check_foreach_api_restrictions(input, tensors1, tensors2);                                                                        \
                                                                                                                                      \
-    if (!can_use_fast_route({input, tensors1, tensors2}, scalar)) {                                                                   \
+    if (!can_use_fast_route({input, tensors1, tensors2}, scalar) || has_int_or_bool_tensor(input)) {                                  \


out of curiosity, why are we getting rid of int tensors? Because of type promotion?

Yes, type promotion.

ngimel · 2021-05-20T01:16:34Z

aten/src/ATen/native/cuda/ForeachUnaryOp.cu

+    for (const auto& t : tensors) {
        if (at::isComplexType(t.scalar_type())) {
-            has_complex = true;
+            has_complex_or_integer = true;


this is misleading - how did integer get in the name? Also, std::any_of.

ah, sorry, forgot to fix the name

also, use `std::any_of`

Stop providing a special path for AMP kernels.

Upon the removal of dtype checks from `check_foreach_api_restrictions`, non_finite_check_and_unscale_cuda_ internally checks each tensor's dtype and push it to single-tensor TensorIterator kernel. Also, update the comment about `can_use_fast_route`.

_amp_foreach_non_finite_checks_and_unscale_ is expected to - handle tensors of various dtypes - raise a RuntimeError if devices are different

crcrpar · 2021-05-20T11:54:27Z

check_foreach_api_restriction functions dropped the option of dtype checks and _amp_foreach_non_finite_check_and_unscale_cuda_ is now able to handle tensorlists of mixed dtypes (this case is not supposed to happen though(.

ngimel

This looks good, I've left minor comments.

ngimel · 2021-05-21T01:17:08Z

aten/src/ATen/native/ForeachUtils.h


 bool will_promote_tensor(const Tensor& tensor, const Scalar& scalar, bool does_op_promote_integer_inputs_to_float = false) {
+  // complex scalar + float/int/bool tensor will result in complex tensor
+  if (scalar.isComplex() && (at::isIntegralType(tensor.scalar_type(), /* includeBool= */true) || at::isFloatingType(tensor.scalar_type())) ) {


If would be better to call result_type here

pytorch/aten/src/ATen/native/TypeProperties.cpp

Line 134 in d1d2430

ScalarType result_type(const Tensor &tensor, const Scalar& other) {

, that way, we are not duplicating logic and don't risk divergence

ngimel · 2021-05-21T01:21:23Z

aten/src/ATen/native/cuda/ForeachPointwiseOp.cu

 FOREACH_POINTWISE_OP_SCALARLIST(addcdiv, std::divides);

+
+bool has_bool_tensor(TensorList tensors) {


this has to live next to has_int_or_bool_tensor, or, better, be a single function with an argument indicating whether to include bool, has_integral_tensor(TensorList tensors, bool include_bool = False)

That makes sense, I'll rename has_int_or_bool_tensor to has_integral_tensor and add includeBool argument.
I also will juxtapose has_bool_tensor with has_integral_tensor, but I want to have them as separate functions as the caller of has_bool_tensor can handle int tensors.
Or, is it better to write std::any_of(tensor.begin(), tensors.end(), [](const auto & t) -> bool { t.scalar_type() == Bool;}); inside the caller?

ngimel · 2021-05-21T01:23:15Z

aten/src/ATen/native/cuda/ForeachPointwiseOp.cu

 std::vector<Tensor> foreach_tensor_##NAME##_cuda(TensorList tensors1, TensorList tensors2) {               \
    check_foreach_api_restrictions(tensors1, tensors2);                                                    \
-    if (!can_use_fast_route({tensors1, tensors2})) {                                                       \
+    if (!can_use_fast_route({tensors1, tensors2}) || has_bool_tensor(tensors1)) {                          \


can you please leave a comment here why bool tensors can't be handled.

ngimel · 2021-05-21T01:25:23Z

test/test_foreach.py

            else:
                self.assertEqual(tensors1, expected)

+    @skipMeta


out of curiosity, how is it working now, without skipMeta?

I don't remember the details, but without this I saw a bunch of messages which seem like a ton of failures, but they were somehow treated as warnings or something like that.
The reason I added this decorator is, those messages were like "this op does not support meta tensor" and I thought the messages from every single foreach function test is meaningless as this decorator says all.

It that's the case (the tests are actually passing), don't skip them please, so that if coverage is extended to meta they don't have to be explicitly reenabled. For you local runs you could still filter to only CPU or only CUDA device (not both together I'm afraid)

Yes, please don't do this. If you want to reduce the spamminess, edit the code in torch/testing/_internal/common_utils.py

@contextmanager def skip_exception_type(exc_type): try: yield except exc_type as e: raise unittest.SkipTest(f"not implemented: {e}") from e

you could make it stop printing the entire exception text (though I've personally found it useful for collecting information from XML logs)

Sure, thank you @ezyang for the explanation

ngimel · 2021-05-21T01:34:21Z

test/test_foreach.py

+    @skipMeta
    @dtypes(*torch.testing.get_all_dtypes())
    def test_add_list_slow_path(self, device, dtype):
+        # 0-strides


Nit: it's "implicit broadcast", not 0-strides

Sorry, might be just I'm wrong, but is it still the case, even if I deliberately call expand_as(tensor1)?
If sizes don't match, the current foreach functions should raise.

Side note: I want to support broadcasting via slowpath after unittest refactoring

Rel: #52448

If you deliberately expand, then it's explicit broadcast, but either way, it's just a comment, so doesn't matter

ngimel · 2021-05-21T01:36:53Z

test/test_foreach.py

+        tensor2 = torch.randn(5, 2, 1, 3 * 7, device=device).to(dtype)[:, :, :, ::7]
+        res = torch._foreach_add([tensor1], [tensor2])
+        torch._foreach_add_([tensor1], [tensor2])
+        self.assertEqual(res, [tensor1])


Note for the future: this test is not actually testing correctness, it's testing in-place vs out-of-place. But that can wait till future refactor

ngimel · 2021-05-21T01:50:58Z

test/test_foreach.py

+            # `tensors2`: ['cuda', 'cpu']
+            _cuda_tensors = self._get_test_data(device, dtype, 2)
+            _cpu_tensors = self._get_test_data('cpu', dtype, 2)
+            tensors1, tensors2 = [[_cuda_tensors[i], _cpu_tensors[i]] for i in range(2)]


nit: tensors1, tensors2 = [ t for t in zip(_cuda_tensors[i], _cpu_tensors[i])] is slightly more pythonic

Oh, I like this, thank you.

ngimel · 2021-05-21T01:53:35Z

test/test_foreach.py

+            _cpu_tensors = self._get_test_data('cpu', dtype, 2)
+            tensors1, tensors2 = [[_cuda_tensors[i], _cpu_tensors[i]] for i in range(2)]
+
+            if dtype == torch.bool and native_op == torch.sub:


why do you need this special case? It should be handled by the following try - except block (both for_each and native should raise the same error)?

You are absolutely right and that is what I expected but it unfortunately fails.
If I merge this special handling into the following try-except block, in my understanding, as the error message includes the ^, one of the regex special characters, I get the following:

Traceback (most recent call last): File "/workspace/test/test_foreach.py", line 890, in test_binary_op_tensors_on_different_devices actual = foreach_op(tensors1, tensors2) RuntimeError: Subtraction, the `-` operator, with two bool tensors is not supported. Use the `^` or `logical_xor()` operator instead. During handling of the above exception, another exception occurred: RuntimeError: Subtraction, the `-` operator, with two bool tensors is not supported. Use the `^` or `logical_xor()` operator instead. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/workspace/test/test_foreach.py", line 893, in test_binary_op_tensors_on_different_devices [native_op(t1, t2) for t1, t2 in zip(tensors1, tensors2)] File "/opt/conda/lib/python3.8/unittest/case.py", line 240, in __exit__ self._raiseFailure('"{}" does not match "{}"'.format( File "/opt/conda/lib/python3.8/unittest/case.py", line 164, in _raiseFailure raise self.test_case.failureException(msg) AssertionError: "Subtraction, the `-` operator, with two bool tensors is not supported. Use the `^` or `logical_xor()` operator instead." does not match "Subtraction, the `-` operator, with two bool tensors is not supported. Use the `^` or `logical_xor()` operator instead."

Can you try if wrapping in re.escape solves this? e.g.

with self.assertRaisesRegex(type(e), re.escape(str(e))):

re.search seems to be happy:

In [10]: aa="pytorch `^`" In [11]: re.search(re.escape(aa), bb) Out[11]: <re.Match object; span=(0, 11), match='pytorch `^`'> In [12]: re.search(aa, bb)

ngimel · 2021-05-21T02:00:13Z

test/test_foreach.py

+            tensors1, tensors2, tensors3 = [[_cuda_tensors[i], _cpu_tensors[i]] for i in range(3)]
+
+            if native_op == torch.addcdiv:
+                if dtype in torch.testing.get_all_int_dtypes() + [torch.bool]:


can you please leave a comment here why this special case is necessary?

Okay, I think the reason is similar to that of sub and bool case: the message has one or more special characters.

Traceback (most recent call last): File "/workspace/test/test_foreach.py", line 931, in test_pointwise_op_tensors_on_different_devices actual = foreach_op(tensors1, tensors2, tensors3) RuntimeError: Integer division with addcdiv is no longer supported, and in a future release addcdiv will perform a true division of tensor1 and tensor2. The historic addcdiv behavior can be implemented as (input + value * torch.trunc(tensor1 / tensor2)).to(input.dtype) for integer inputs and as (input + value * tensor1 / tensor2) for float inputs. The future addcdiv behavior is just the latter implementation: (input + value * tensor1 / tensor2), for all dtypes. During handling of the above exception, another exception occurred: RuntimeError: Integer division with addcdiv is no longer supported, and in a future release addcdiv will perform a true division of tensor1 and tensor2. The historic addcdiv behavior can be implemented as (input + value * torch.trunc(tensor1 / tensor2)).to(input.dtype) for integer inputs and as (input + value * tensor1 / tensor2) for float inputs. The future addcdiv behavior is just the latter implementation: (input + value * tensor1 / tensor2), for all dtypes. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/workspace/test/test_foreach.py", line 934, in test_pointwise_op_tensors_on_different_devices expected = [native_op(t1, t2, t3) for t1, t2, t3 in zip(tensors1, tensors2, tensors3)] File "/opt/conda/lib/python3.8/unittest/case.py", line 240, in __exit__ self._raiseFailure('"{}" does not match "{}"'.format( File "/opt/conda/lib/python3.8/unittest/case.py", line 164, in _raiseFailure raise self.test_case.failureException(msg) AssertionError: "Integer division with addcdiv is no longer supported, and in a future release addcdiv will perform a true division of tensor1 and tensor2. The historic addcdiv behavior can be implemented as (input + value * torch.trunc(tensor1 / tensor2)).to(input.dtype) for integer inputs and as (input + value * tensor1 / tensor2) for float inputs. The future addcdiv behavior is just the latter implementation: (input + value * tensor1 / tensor2), for all dtypes." does not match "Integer division with addcdiv is no longer supported, and in a future release addcdiv will perform a true division of tensor1 and tensor2. The historic addcdiv behavior can be implemented as (input + value * torch.trunc(tensor1 / tensor2)).to(input.dtype) for integer inputs and as (input + value * tensor1 / tensor2) for float inputs. The future addcdiv behavior is just the latter implementation: (input + value * tensor1 / tensor2), for all dtypes."

note: `has_int_or_bool_tensor` is renamed to `has_integral_tensor` and a new argument `includeBool` is added.

Binary foreach functions cannot be compiled for the inputs of one tensorlist and one scalarlist if scalarlist consists of complex128.

leave a comment to the test which requires the decorator

This obviates the special case for `_foreach_sub` and `_foreach_addcdiv`

ngimel

Looks good!

facebook-github-bot · 2021-05-23T22:48:59Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-05-25T17:51:51Z

@ngimel merged this pull request in 7eade66.

Summary: This is based on pytorch#48224. To make `foreach` more flexible, this PR pushes unsupported cases to slow path. Also, this adds some tests to verify that - `foreach` functions work with tensors of different dtypes and/or memory layouts in pytorch@7bd4b2c - `foreach` functions work with tensors on different devices in a list, but are on the same device if the indices are the same: pytorch@def4b9b Future plans: 1. Improve the coverage of unittests using `ops` decorator & updating `foreach_unary_op_db` and creating `foreach_(binary|pointwise|minmax)_db`. 2. Support broadcasting in slow path. Ref: pytorch#52448 3. Support type promotion in fast path. Ref pytorch#52449 CC: ngimel mcarilli ptrblck Pull Request resolved: pytorch#56993 Reviewed By: zou3519 Differential Revision: D28630580 Pulled By: ngimel fbshipit-source-id: e26ee74a39a591025e18c1ead48948cb7ec53c19

crcrpar added 4 commits April 26, 2021 08:37

Reduce errors

42caeea

Ref: pytorch#48224

update test upon foreach update

175373f

Apply skipMeta to reduce warnings

more non-contiguous cases

7bd4b2c

Test list of tensors on different devices

def4b9b

facebook-github-bot added the cla signed label Apr 27, 2021

pytorchbot added the open source label Apr 27, 2021

H-Huang requested review from ngimel and zou3519 April 27, 2021 21:31

H-Huang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 27, 2021

crcrpar added 6 commits April 28, 2021 10:20

make check_dtype optional in foreach api risrictions

ea9b524

early retun of int/bool check

45b72d9

foreach max/min-imum: cuda bool tensors on slowpath

f8f8b3c

cosmetic

03e1a90

fix abs invocation

b03e849

make tests of different device tensors simpler

27ee28b

crcrpar force-pushed the fe/rich-slowpath branch from d243573 to 27ee28b Compare May 5, 2021 13:01

ngimel reviewed May 20, 2021

View reviewed changes

crcrpar added 4 commits May 20, 2021 14:39

change name to has_complex

b4378e4

also, use `std::any_of`

remove dtype checks from foreach api restrictions

4d1c0fd

Stop providing a special path for AMP kernels.

update tests

21a16e0

_amp_foreach_non_finite_checks_and_unscale_ is expected to - handle tensors of various dtypes - raise a RuntimeError if devices are different

ngimel reviewed May 21, 2021

View reviewed changes

crcrpar added 6 commits May 21, 2021 17:47

Add comments & use Pythonic inputs creation

8cfd5f5

juxtapose has_bool_tensor with has_integral_tensor

14b313d

note: `has_int_or_bool_tensor` is renamed to `has_integral_tensor` and a new argument `includeBool` is added.

explain why bool tensors are pushed to slowpath

2183f61

comments and will_promote_tensor refactor

877c199

comment about the limitation binary ops have

9c6483d

Binary foreach functions cannot be compiled for the inputs of one tensorlist and one scalarlist if scalarlist consists of complex128.

remove skipMeta decorator

08ef78c

leave a comment to the test which requires the decorator

use re.escape

4cc29a5

This obviates the special case for `_foreach_sub` and `_foreach_addcdiv`

ngimel approved these changes May 23, 2021

View reviewed changes

crcrpar mentioned this pull request May 24, 2021

Foreach Functions Tracking Issue #58833

Open

28 tasks

facebook-github-bot closed this in 7eade66 May 25, 2021

facebook-github-bot added the Merged label May 25, 2021

crcrpar mentioned this pull request May 27, 2021

for_each doesn't support integer input to float output type promotion #44489

Open

crcrpar deleted the fe/rich-slowpath branch May 28, 2021 02:51

		FOREACH_POINTWISE_OP_SCALARLIST(addcdiv, std::divides);


		bool has_bool_tensor(TensorList tensors) {

Conversation

crcrpar commented Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

1 failure not recognized by patterns:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crcrpar commented May 20, 2021

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crcrpar May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crcrpar May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crcrpar May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngimel May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crcrpar May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

crcrpar commented Apr 27, 2021 •

edited

Loading

facebook-github-bot commented Apr 27, 2021 •

edited

Loading

crcrpar May 21, 2021 •

edited

Loading

crcrpar May 21, 2021 •

edited

Loading

crcrpar May 21, 2021 •

edited

Loading

ngimel May 21, 2021 •

edited

Loading

crcrpar May 21, 2021 •

edited

Loading