[Functionalization] Manually redispatch convolution_backward to functionalize pass by alanwaketan · Pull Request #4681 · pytorch/xla

alanwaketan · 2023-02-23T03:12:24Z

Summary:
For any CompositeExplicitAutograd ops, we are supposed to explicitly re-enable functionalization such that any decomposed ops within those ops get functionalized as well.

However, if directly calling into at::functionalization::functionalize_aten_op, convolution_backward will somehow omit convolution_backward_overridable which is our own kernel to calculate convolution. Thus, no grads are produced.

To workaround the issue, we manually redispatch convolution_backward to functionalize pass.

Test Plan:
PJRT_DEVICE=TPU python test/test_operations.py -v -k test_conv2d_backward

alanwaketan · 2023-02-23T03:12:42Z

    ], test_fn)

+  def test_conv2d_backward(self):
+    # Somehow eager cpu produces different results than us, and


@JackCaoG do you know why? cc @miladm

it is not that uncommon that we give slightly different result, is it off by a lot? If is possible that result is different because we run some optimization pass which make code runs faster but not as accurate.

alanwaketan · 2023-02-23T22:45:22Z

-
-::std::tuple<at::Tensor, at::Tensor, at::Tensor>
-XLANativeFunctions::convolution_backward(
-    const at::Tensor& grad_output, const at::Tensor& input,


@bdhirsh I hit a problem here. It looks like we do need to redispatch convolution_backward and _convolution to functionalized pass in order to make view ops in their decompositions process be replaced with view_copy ops. However, somehow convolution_backward is not calling into at::native::convolution_backward and thus our own kernel convolution_backward_overrideable is not called.

Here is the dispatcher calls:
convolution:

[call] op=[aten::conv_transpose3d.input], key=[AutogradXLA] [call] op=[aten::convolution], key=[AutogradXLA] [redispatch] op=[aten::convolution], key=[Functionalize] [callBoxed] op=[aten::convolution], key=[XLA] [call] op=[aten::_convolution], key=[XLA] [redispatchBoxed] op=[aten::_convolution], key=[Meta] [call] op=[aten::convolution_overrideable], key=[Functionalize] [callBoxed] op=[aten::convolution_overrideable], key=[XLA] [call] op=[aten::ones_like], key=[Functionalize]

convolution_backward:

[call] op=[aten::convolution_backward], key=[AutogradXLA] [redispatch] op=[aten::convolution_backward], key=[Functionalize] [callBoxed] op=[aten::convolution_backward], key=[XLA] [redispatchBoxed] op=[aten::convolution_backward], key=[Meta] [call] op=[aten::new_empty], key=[Functionalize]

Let me know if you need more information.

Is that trace coming from a run on this PR, with these ops removed from XLA? or from before this PR.

In the convolution_backward() trace above, I see aten::convolution_backward called with the Meta key. My guess is that we're calling the meta implementation of convolution_backward for shape inference, either directly in pytorch/xla, or in the functionalization kernel (the fact that the call right before it was with the XLA key makes me think it's coming from the XLA kernel). This will just run shape compute for convolution_backward, so it won't end up dispatching to XLAs implementation - it will run shape compute from core. Is the problem that this is error'ing somehow?

Before this PR, so what the convolution_backward kernal in XLA does is to call

at::functionalization::functionalize_aten_op<ATEN_OP( convolution_backward)>::call(grad_output, input, weight, bias_sizes, stride, padding, dilation, transposed, output_padding, groups, output_mask);

We did the same thing for _convolution. However, in the Meta kernal of _convolution, it ends up calling convolution_overrideable. But not seeing the same behavior for convolution_backward.

@bdhirsh I applied a hack to make things work.

Summary: I somehow wrongly lowered _convolution/convolution_backward in the early stage. It then makes conv.backward disappear from the graph. Therefore, undoing that change and adds a test case for it. Test Plan: PJRT_DEVICE=TPU python test/test_operations.py -v -k test_conv2d_backward

alanwaketan · 2023-02-25T01:53:56Z

I think this PR is ready for reviews.

alanwaketan · 2023-02-25T04:52:00Z

Thanks Jack for approving the change.

…ionalize pass (#4681) Summary: For any CompositeExplicitAutograd ops, we are supposed to explicitly re-enable functionalization such that any decomposed ops within those ops get functionalized as well. However, if directly calling into at::functionalization::functionalize_aten_op, convolution_backward will somehow omit convolution_backward_overridable which is our own kernel to calculate convolution. Thus, no grads are produced. To workaround the issue, we manually redispatch convolution_backward to functionalize pass. Test Plan: PJRT_DEVICE=TPU python test/test_operations.py -v -k test_conv2d_backward

alanwaketan requested review from JackCaoG, miladm and wonjoo-wj February 23, 2023 03:12

alanwaketan commented Feb 23, 2023

View reviewed changes

alanwaketan force-pushed the alanwaketan/conv branch from 3fba351 to 829d578 Compare February 23, 2023 20:37

alanwaketan commented Feb 23, 2023

View reviewed changes

alanwaketan force-pushed the functionalization branch from 5bd4743 to d527999 Compare February 24, 2023 01:39

alanwaketan added 3 commits February 24, 2023 05:27

Skip test_conv_transposed_backward_agnostic_to_memory_format_xla

4c27fb0

Skip test_conv_empty_input

9b30217

alanwaketan force-pushed the alanwaketan/conv branch from 829d578 to 9b30217 Compare February 24, 2023 05:29

alanwaketan added 2 commits February 24, 2023 08:12

Fix typo

9f51d81

Restore _convolution and convolution_backward

a4dfc1c

alanwaketan changed the title ~~[Functionalization] Undo lowering _convolution/convolution_backward~~ [Functionalization] Manually redispatch convolution_backward to functionalize pass Feb 25, 2023

alanwaketan added 3 commits February 25, 2023 01:40

Added a comment

2b9fdc3

Fix linters

259fe4b

Adds comment

5023000

alanwaketan mentioned this pull request Feb 25, 2023

[Functionalization] dynamo test failure #4680

Closed

JackCaoG approved these changes Feb 25, 2023

View reviewed changes

alanwaketan merged commit 1f50e32 into functionalization Feb 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Functionalization] Manually redispatch convolution_backward to functionalize pass#4681

[Functionalization] Manually redispatch convolution_backward to functionalize pass#4681
alanwaketan merged 8 commits intofunctionalizationfrom
alanwaketan/conv

alanwaketan commented Feb 23, 2023 •

edited

Loading

Uh oh!

alanwaketan Feb 23, 2023 •

edited

Loading

Uh oh!

JackCaoG Feb 23, 2023

Uh oh!

alanwaketan Feb 23, 2023

Uh oh!

bdhirsh Feb 24, 2023

Uh oh!

alanwaketan Feb 24, 2023

Uh oh!

alanwaketan Feb 25, 2023

Uh oh!

alanwaketan commented Feb 25, 2023

Uh oh!

alanwaketan commented Feb 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alanwaketan commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alanwaketan Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackCaoG Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

alanwaketan Feb 23, 2023

Choose a reason for hiding this comment

Uh oh!

bdhirsh Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

alanwaketan Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

alanwaketan Feb 25, 2023

Choose a reason for hiding this comment

Uh oh!

alanwaketan commented Feb 25, 2023

Uh oh!

alanwaketan commented Feb 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alanwaketan commented Feb 23, 2023 •

edited

Loading

alanwaketan Feb 23, 2023 •

edited

Loading