Skip to content

fix maxpool2d for XLA dynamo tracing#4276

Merged
JackCaoG merged 11 commits intomasterfrom
dynamo_maxpool_fix
Dec 15, 2022
Merged

fix maxpool2d for XLA dynamo tracing#4276
JackCaoG merged 11 commits intomasterfrom
dynamo_maxpool_fix

Conversation

@bdhirsh
Copy link
Copy Markdown
Contributor

@bdhirsh bdhirsh commented Dec 5, 2022

waiting for CI before getting review

bdhirsh pushed a commit to pytorch/pytorch that referenced this pull request Dec 5, 2022
Today when XLA registers an autograd.Function to the `AutogradXLA` key, the "add to the autograd graph" step and the "run the forward kernel" step happen all in one go. That's wrong, and prevents other dispatcher code from executing in the middle.

When trying to fix this, I noticed a bug in the codegen: we register kernels for both the XLA and AutogradXLA dispatch keys to the same class. This prevents XLA from registering a separate kernel to the XLA and Autograd XLA, which is what this PR attempts to address.

Companion patch to fix XLA's max_pool2d registration here, which was blocking the dynamo integration: pytorch/xla#4276

After this PR, XLA should generate two separate header files: `XLANativeFunctions.h`, and `AutogradXLANativeFunctions.h` Before, all of the kernels (including autograd kernels) would be thrown in `XLANativeFunctions.h`.

cc JackCaoG 




[ghstack-poisoned]
@bdhirsh
Copy link
Copy Markdown
Contributor Author

bdhirsh commented Dec 6, 2022

@JackCaoG do you know why the option to re-run is grey'd out on the CI failure? Not sure if it's a permissions thing or something else. It looks like CI didn't pick up my torch pin or something, trying to kick it off again:
image

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 6, 2022

Hmm, I was able to restart the CI. I thought you are admin so you have all permission, let me double check.

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 6, 2022

hmm I think torch pin has taken effect

+ TORCH_PIN=/tmp/pytorch/xla/scripts/../torch_patches/.torch_pin
+ '[' -f /tmp/pytorch/xla/scripts/../torch_patches/.torch_pin ']'
++ cat /tmp/pytorch/xla/scripts/../torch_patches/.torch_pin
+ CID='#90226'
+ [[ #90226 = \#* ]]
+ PRNUM=90226
+ set +x
Fetching PyTorch PR #90226
/tmp/pytorch /tmp/pytorch
From https://github.com/pytorch/pytorch
 * [new ref]               refs/pull/90226/head -> 90226
Switched to branch '90226'
M	third_party/ideep
M	third_party/kineto
Submodule path 'third_party/ideep': checked out 'ececd0a4f53c39f2d91caaddee0de1cd214f5b99'
Submodule path 'third_party/kineto': checked out '0703c78999061b8329dfab7ec5046fc5764a5573'

in the full log

@bdhirsh bdhirsh force-pushed the dynamo_maxpool_fix branch from 69f54bb to c4d9828 Compare December 6, 2022 21:13
@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 7, 2022

======================================================================
ERROR: test_pooling_shape_xla (__main__.TestPoolingNNDeviceTypeXLA)
Test the output shape calculation for pooling functions
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 391, in instantiated_test
    raise rte
  File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 378, in instantiated_test
    result = test(self, **param_kwargs)
  File "/tmp/pytorch/xla/test/../../test/nn/test_pooling.py", line 593, in test_pooling_shape
    check((1, 1, 3, 3, 4), (1, 1, 5, 6, 7), kernel_size=1, stride=2, padding=0, ceil_mode=True)
  File "/tmp/pytorch/xla/test/../../test/nn/test_pooling.py", line 591, in check
    self.assertEqual(op(t, *args, **kwargs).shape, expected_out_shape[:i + 2])
  File "/opt/conda/lib/python3.7/site-packages/torch/_jit_internal.py", line 485, in fn
    return if_false(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 782, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: 0 INTERNAL ASSERT FAILED at "/tmp/pytorch/aten/src/ATen/core/boxing/KernelFunction.cpp":19, please report a bug to PyTorch. fallthrough_kernel was executed but it should have been short-circuited by the dispatcher. This could occur if you registered a fallthrough kernel as a override for a specific operator (as opposed to a backend fallback); this is NOT currently supported, and we do not intend to add support for it in the near future.  If you do find yourself in need of this, let us know in the bug tracker.

seems like something has to do with fallthrough kernel? The test is about pooling so I think this is a real failure.

@bdhirsh
Copy link
Copy Markdown
Contributor Author

bdhirsh commented Dec 8, 2022

@JackCaoG I'm trying to rebuild XLA locally on my new devserver (I'm anticipating issues) - but I just stared at the code for a while and I think I know what the problem is. Just pushed it, so I'll see what the latest round of CI yields.

@shunting314 I'll give you a shout when this PR looks ready to test - when it is, can you try re-running your dynamo-XLA integration with max_pool2d (both fw and bw) and confirm if there are issues?

@shunting314
Copy link
Copy Markdown
Collaborator

@bdhirsh sure, I'd be glad to do the tests

@bdhirsh
Copy link
Copy Markdown
Contributor Author

bdhirsh commented Dec 8, 2022

Hey @shunting314, it looks like the max_pool2d unit tests are passing. I do see a failure in the XLA-dynamo tests, but it doesn't seem related to this change (?). Can you try running E2E tests again?

ERROR: test_simple_model (__main__.DynamoBasicTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/pytorch/xla/test/dynamo/test_dynamo.py", line 41, in test_simple_model
    res_xla_dynamo_2 = self.fn_simple_dynamo(xla_x, xla_y)
  File "/opt/conda/lib/python3.7/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
    return fn(*args, **kwargs)
  File "/tmp/pytorch/xla/test/dynamo/test_dynamo.py", line 21, in fn_simple_dynamo
    @dynamo.optimize('torchxla_trace_once')
  File "/opt/conda/lib/python3.7/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/_dynamo/optimizations/backends.py", line 755, in fwd
    model = subgraph.model
NameError: free variable 'subgraph' referenced before assignment in enclosing scope

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 8, 2022

rebase should fix the issue

@bdhirsh bdhirsh force-pushed the dynamo_maxpool_fix branch from 443f718 to 7bf44ba Compare December 8, 2022 22:39
@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 8, 2022

@bdhirsh you might also need to rebase your pytorch pr.

bdhirsh pushed a commit to pytorch/pytorch that referenced this pull request Dec 8, 2022
…ops"

Today when XLA registers an autograd.Function to the `AutogradXLA` key, the "add to the autograd graph" step and the "run the forward kernel" step happen all in one go. That's wrong, and prevents other dispatcher code from executing in the middle.

When trying to fix this, I noticed a bug in the codegen: we register kernels for both the XLA and AutogradXLA dispatch keys to the same class. This prevents XLA from registering a separate kernel to the XLA and Autograd XLA, which is what this PR attempts to address.

Companion patch to fix XLA's max_pool2d registration here, which was blocking the dynamo integration: pytorch/xla#4276

After this PR, XLA should generate two separate header files: `XLANativeFunctions.h`, and `AutogradXLANativeFunctions.h` Before, all of the kernels (including autograd kernels) would be thrown in `XLANativeFunctions.h`.

cc JackCaoG 




[ghstack-poisoned]
bdhirsh pushed a commit to pytorch/pytorch that referenced this pull request Dec 8, 2022
Today when XLA registers an autograd.Function to the `AutogradXLA` key, the "add to the autograd graph" step and the "run the forward kernel" step happen all in one go. That's wrong, and prevents other dispatcher code from executing in the middle.

When trying to fix this, I noticed a bug in the codegen: we register kernels for both the XLA and AutogradXLA dispatch keys to the same class. This prevents XLA from registering a separate kernel to the XLA and Autograd XLA, which is what this PR attempts to address.

Companion patch to fix XLA's max_pool2d registration here, which was blocking the dynamo integration: pytorch/xla#4276

After this PR, XLA should generate two separate header files: `XLANativeFunctions.h`, and `AutogradXLANativeFunctions.h` Before, all of the kernels (including autograd kernels) would be thrown in `XLANativeFunctions.h`.

cc JackCaoG 




[ghstack-poisoned]
@bdhirsh
Copy link
Copy Markdown
Contributor Author

bdhirsh commented Dec 8, 2022

yep - done

@shunting314
Copy link
Copy Markdown
Collaborator

@bdhirsh is this the only PR i need patch? ( I previously saw you have 2 related PRs?)

@shunting314
Copy link
Copy Markdown
Collaborator

Do I need patch ' pytorch/pytorch#90226 ' as well ?

@bdhirsh
Copy link
Copy Markdown
Contributor Author

bdhirsh commented Dec 8, 2022

yes sorry - you'll need both :)

@shunting314
Copy link
Copy Markdown
Collaborator

@bdhirsh I seed the following errors when building torchxla

# BUILD_CPP_TESTS=0 python setup.py develop
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Building torch_xla version: 1.14
XLA Commit ID: c6acdf5417e736429e353e7abf2a26cde1d96122
PyTorch Commit ID: 9f04175b952abbe841e52e65fc95c0bd61a2f2cd
Traceback (most recent call last):
  File "/pytorch/xla/scripts/gen_lazy_tensor.py", line 120, in <module>
    get_device_fn="torch_xla::bridge::GetXlaDevice")
  File "/pytorch/torchgen/gen_lazy_tensor.py", line 367, in run_gen_lazy_tensor
    source_yaml, grouped_native_functions, backend_indices
  File "/pytorch/torchgen/gen_backend_stubs.py", line 246, in parse_backend_yaml
    {forward_kernels[0].kernel} is listed under "supported", but {backward_kernels[0].kernel} is listed under "autograd".'
AssertionError: Currently, all variants of an op must either be registered to a backend key, or to a backend's autograd key. They cannot be mix and matched. If this is something you need, feel free to create an issue! max_pool2d is listed under "supported", but max_pool2d is listed under "autograd".
Failed to generate lazy files: ['python', '/pytorch/xla/scripts/gen_lazy_tensor.py']

@shunting314
Copy link
Copy Markdown
Collaborator

oh, nvm, let my patch the pytorch side PR as well

@shunting314
Copy link
Copy Markdown
Collaborator

@bdhirsh I still see the issue after patching this PR and the corresponding PR on pytorch side. I've created a standalone tests without the need of patching my PR. You can repro using this simple script in your environment: pytorch/torchdynamo#1837 (comment)

Comment thread torch_xla/csrc/aten_autograd_ops.cpp Outdated
c10::DispatchKey::Conjugate,
c10::DispatchKey::Negative,
c10::DispatchKey::ZeroTensor,
c10::DispatchKey::ADInplaceOrView,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh we should take out ADInplaceOrView from here

@bdhirsh bdhirsh force-pushed the dynamo_maxpool_fix branch 2 times, most recently from 98d1059 to 83876d5 Compare December 13, 2022 23:25
@bdhirsh
Copy link
Copy Markdown
Contributor Author

bdhirsh commented Dec 14, 2022

Hey @JackCaoG - do you mind finishing up landing? @shunting314 confirmed that this fixes the E2E tests for max_pool2d

Copy link
Copy Markdown
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@JackCaoG JackCaoG merged commit 4013c57 into master Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants