This repository was archived by the owner on Aug 21, 2025. It is now read-only.

add batching rule for block_diag, kill DECOMPOSE_FUNCTIONAL#814

Merged

zou3519 merged 2 commits intomainfrom

May 31, 2022

Contributor

bdhirsh commented May 18, 2022 •

edited

Loading

Companion core PR: pytorch/pytorch#77716

The above PR makes block_diag composite compliant, and this PR adds a batching rule for it.

Those two changes together should let us fully remove the DECOMPOSE_FUNCTIONAL macro, which was preventing me from moving the Functionalize dispatch key below FuncTorchBatched (which I want to do as part of XX, in order to properly get functionalization working with LTC/XLA).

facebook-github-bot added the cla signed label

bdhirsh commented

View reviewed changes

functorch/csrc/BatchingRegistrations.cpp

+                }
+                auto result = at::cat(batched_outputs);
+                return physical_views[0].getPhysicalToLogicalMap().apply(result);
+              }

Contributor Author

bdhirsh May 18, 2022 •

edited

Loading

I'm not convinced that I actually implemented this correctly (particularly in the case where you have multiple layers of vmap), but I wasn't sure what the right API's to use were. I got op info tests to pass though.

I also wasn't sure how to actually make this batching rule fast, so I implemented it as a dummy for loop. It's probably still more efficient than the FUNCTIONAL_DECOMPOSE version though, since that functionalizes every intermediate copy_() which probably resulted in a bunch of large temporary tensors.

Contributor

zou3519 May 20, 2022

I'm not convinced that I actually implemented this correctly (particularly in the case where you have multiple layers of vmap), but I wasn't sure what the right API's to use were. I got op info tests to pass though.

This works with multiple layers of vmap. Each Interpreter in the DynamicLayer stack handles one vmap, so as long as we're calling pytorch composite operations it all works out :)

I also wasn't sure how to actually make this batching rule fast, so I implemented it as a dummy for loop. It's probably still more efficient than the FUNCTIONAL_DECOMPOSE version though, since that functionalizes every intermediate copy_() which probably resulted in a bunch of large temporary tensors.

Yeah there isn't a more efficient way to implement this. If we want this to go faster then we would need a "batched block_diag" operator in pytorch/pytorch that implements the behavior and is easier to write a batching rule for. We should leave a comment here about that for whoever comes along in the future

bdhirsh mentioned this pull request

make block_diag composite compliant pytorch/pytorch#77716

Closed

bdhirsh requested a review from zou3519

May 18, 2022 03:18

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

c6289c9

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

271b1e4

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

e6acce9

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

38a0b68

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

65f38e9

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

20c92b0

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

7668ce2

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

d2c223b

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

7b707c1

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

zou3519 reviewed

View reviewed changes

functorch/csrc/BatchingRegistrations.cpp

                 return physical_views[0].getPhysicalToLogicalMap().apply(result);
               }
+              Tensor block_diag_batching_rule(TensorList tensors) {

Contributor

zou3519 May 20, 2022 •

edited

Loading

(no action required) For some more context... BatchingRegistrations.cpp is for all the legacy batching rules and is "deprecated", but since we haven't actually gotten TensorList inputs to work with the new batching rule API, this is indeed our only option for now. The downside is that the legacy API is a bit difficult to work with and all the documentation for it is misleading, as you've probably discovered here

zou3519 approved these changes

View reviewed changes

Contributor

zou3519 left a comment

LGTM. We should probably wait for the pytorch-side change to get merged

functorch/csrc/BatchingRegistrations.cpp

Comment on lines +570 to +571

		// Implementing this as a dummy for loop for now, since I'm not sure how to do it any better.
		// I'm probably not accounting for potentially multiple batched dimensions?

Contributor

zou3519 May 20, 2022 •

edited

Loading

When writing a batching rule it's safe to assume there is only a single layer of vmap. DynamicLayerStack handles the case where there are multiple layers of vmap. This is contrary to all the documentation in VmapTransforms.h which was written back in the world when DynamicLayerStack didn't exist

functorch/csrc/BatchingRegistrations.cpp

+                }
+                auto result = at::cat(batched_outputs);
+                return physical_views[0].getPhysicalToLogicalMap().apply(result);
+              }

Contributor

zou3519 May 20, 2022

I'm not convinced that I actually implemented this correctly (particularly in the case where you have multiple layers of vmap), but I wasn't sure what the right API's to use were. I got op info tests to pass though.

This works with multiple layers of vmap. Each Interpreter in the DynamicLayer stack handles one vmap, so as long as we're calling pytorch composite operations it all works out :)

I also wasn't sure how to actually make this batching rule fast, so I implemented it as a dummy for loop. It's probably still more efficient than the FUNCTIONAL_DECOMPOSE version though, since that functionalizes every intermediate copy_() which probably resulted in a bunch of large temporary tensors.

Yeah there isn't a more efficient way to implement this. If we want this to go faster then we would need a "batched block_diag" operator in pytorch/pytorch that implements the behavior and is easier to write a batching rule for. We should leave a comment here about that for whoever comes along in the future

functorch/csrc/BatchingRegistrations.cpp Show resolved Hide resolved

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

1046fdc

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

0aed454

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

24cf4ba

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

03a11c1

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

c4a2a09

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

af386fa

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

4f95bf0

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

8de1d1a

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

be848ba

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

70338c1

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

ea97cc4

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

7cf35cd

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

1de82e9

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

8556f6b

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh mentioned this pull request

move Functionalize dispatch key closer to backends pytorch/pytorch#77132

Closed

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "move Functionalize dispatch key closer to …

963414d

…backends"

Need this to get functionalize to work with backends (LTC/XLA). Now that we can kill the `DECOMPOSE_FUNCTIONAL` code in functorch (see pytorch/functorch#814), this should be ok to land once that PR merges.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update base for Update on "make block_diag composite compliant"

59a5efb

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "make block_diag composite compliant"

3cebbac

The code for `block_diag` isn't composite compliant today - functorch deals with that by registering a special "conditional functionalization" kernel, but I want to kill that here: pytorch/functorch#814.

We can't efficiently make `block_diag` composite compliant though, because it performs `O(num_inputs)` mutations, and converting them all into out-of-place calls would be very inefficient.

Instead, I ended up making `block_diag` `CompositeExplicitAutograd`, and writing a derivative formula for it. That also ended up fixing some OpInfos tests.




[ghstack-poisoned]

bdhirsh added a commit to pytorch/pytorch that referenced this pull request


          Update on "move Functionalize dispatch key closer to backends"

1d4e8fa

Need this to get functionalize to work with backends (LTC/XLA). Now that we can kill the `DECOMPOSE_FUNCTIONAL` code in functorch (see pytorch/functorch#814), this should be ok to land once that PR merges.




[ghstack-poisoned]

Contributor

zou3519 commented May 26, 2022

Block diag tests seem to be failing: https://app.circleci.com/pipelines/github/pytorch/functorch/2886/workflows/bfc11336-cb2d-4206-a6ca-7132a4e2f204/jobs/19966/tests

But also, it looks like this PR has unrelated commits in it

Contributor Author

bdhirsh commented May 26, 2022

hmm I may have mucked up a rebase the last time. Taking a look

bdhirsh force-pushed the block_diag_fix branch 2 times, most recently from a5e5800 to 53836fb Compare

May 26, 2022 16:10

Contributor Author

bdhirsh commented May 26, 2022

Had to make a quick fix locally (I had tested test_ops.py but forgot to test test_vmap.py locally) - tests should be passing now but I'll let CI run to be safe.

Contributor Author

bdhirsh commented May 26, 2022

Failures so far look unrelated (looks like they're coming from the addr PR here: pytorch/pytorch@a1765f0)

Contributor

samdow commented May 26, 2022

Wait sorry I'm still seeing a block_diag failure:

https://app.circleci.com/pipelines/github/pytorch/functorch/2895/workflows/d67e1bce-1557-410c-8a10-b696f109e0e9/jobs/20105
specifically for
test_jvpvjp_block_diag_cpu_float32
test_vmapjvp_block_diag_cpu_float32
test_vmapjvpall_has_batch_rule_block_diag_cpu_float32
test_vmapvjp_block_diag_cpu_float32
test_vmapvjp_has_batch_rule_block_diag_cpu_float32

With that being sad, sorry about the addr failures. There should be 3 of them on cpu (6 on cuda) and you're definitely right that they're unrelated. Currently trying to decide if it's worth to xfail those or update the testing infra for nan inputs

Contributor Author

bdhirsh commented May 26, 2022

Welp thanks. Weird, those tests all pass for me locally...

I removed a bunch of existing xfail's this morning - I'll try adding them back and re-running the CI.

bdhirsh force-pushed the block_diag_fix branch from 53836fb to b1ecb5b Compare

May 26, 2022 17:04

Contributor

samdow commented May 26, 2022 •

edited

Loading

Welp thanks. Weird, those tests all pass for me locally...

Might be from this 😭 😭 I eagerly look forward to dropping this into pytorch/pytorch so we aren't making people deal with this CI junk..

Contributor Author

bdhirsh commented May 26, 2022

Staring at the log output, I think I see 3 remaining failing block diag tests:

TestOperatorsCPU.test_vmapjvpall_addr_cpu_float32
TestOperatorsCPU. test_jvpvjp_block_diag_cpu_float32
TestOperatorsCUDA.test_jvpvjp_block_diag_cuda_float32

What's weird is that when I pull this PR locally and build it against a fresh copy of master I don't see the same errors. The only failures are unexpected successes (which I could remove the xfails for, but they don't seem to be passing on CI).

I'm not really sure what's causing the discrepancy in CI :( Would it be reasonable to merge the PR in a way that passes locally, and see what the final version of CI looks like on main afterwards?

pytest test/test_ops.py test/test_vmap.py -k "block_diag"
...
================================================================== FAILURES ===================================================================
____________________________________________ TestOperatorsCPU.test_vmapjvp_block_diag_cpu_float32 _____________________________________________
Unexpected success
___________________________________________ TestOperatorsCPU.test_vmapjvpall_block_diag_cpu_float32 ___________________________________________
Unexpected success
___________________________________ TestOperatorsCPU.test_vmapjvpall_has_batch_rule_block_diag_cpu_float32 ____________________________________
Unexpected success
____________________________________________ TestOperatorsCPU.test_vmapvjp_block_diag_cpu_float32 _____________________________________________
Unexpected success
_____________________________________ TestOperatorsCPU.test_vmapvjp_has_batch_rule_block_diag_cpu_float32 _____________________________________
Unexpected success
___________________________________________ TestOperatorsCUDA.test_vmapjvp_block_diag_cuda_float32 ____________________________________________
Unexpected success
__________________________________________ TestOperatorsCUDA.test_vmapjvpall_block_diag_cuda_float32 __________________________________________
Unexpected success
__________________________________ TestOperatorsCUDA.test_vmapjvpall_has_batch_rule_block_diag_cuda_float32 ___________________________________
Unexpected success
___________________________________________ TestOperatorsCUDA.test_vmapvjp_block_diag_cuda_float32 ____________________________________________
Unexpected success
____________________________________ TestOperatorsCUDA.test_vmapvjp_has_batch_rule_block_diag_cuda_float32 ____________________________________
Unexpected success

...
FAILED test/test_ops.py::TestOperatorsCPU::test_vmapjvp_block_diag_cpu_float32
FAILED test/test_ops.py::TestOperatorsCPU::test_vmapjvpall_block_diag_cpu_float32
FAILED test/test_ops.py::TestOperatorsCPU::test_vmapjvpall_has_batch_rule_block_diag_cpu_float32
FAILED test/test_ops.py::TestOperatorsCPU::test_vmapvjp_block_diag_cpu_float32
FAILED test/test_ops.py::TestOperatorsCPU::test_vmapvjp_has_batch_rule_block_diag_cpu_float32
FAILED test/test_ops.py::TestOperatorsCUDA::test_vmapjvp_block_diag_cuda_float32
FAILED test/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_block_diag_cuda_float32
FAILED test/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_block_diag_cuda_float32
FAILED test/test_ops.py::TestOperatorsCUDA::test_vmapvjp_block_diag_cuda_float32
FAILED test/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_block_diag_cuda_float32

Contributor

zou3519 commented May 26, 2022

Hmm I can reproduce this locally. Will debug a bit and get back to you. How urgent is it to get this merged? IIRC this is blocking your move of the functionalize dispatch key?

Contributor Author

bdhirsh commented May 26, 2022 •

edited

Loading

Will debug a bit and get back to you.

Thanks!

How urgent is it to get this merged? IIRC this is blocking your move of the functionalize dispatch key?

Not super urgent - it's blocking my moving the functionalize key, which blocks me landing the LTC <> functionalize integration. But there are still a bunch of other CI failures in that integration that I'm working through (although having less stuff in the stack makes the failures easier to reason about).

Contributor

zou3519 commented May 26, 2022 •

edited

Loading

Will debug a bit and get back to you.

@bdhirsh after applying your PR to my local build with PyTorch master, I am seeing the same thing (there are a bunch of unexpected successes, but that is expected).

After applying your PR to my local build with PyTorch nightly, I am seeing the same failures reported in our CI (the block_diag failures)

functorch CI runs on the nightlies, so some of the changes in master that were necessary to get everything working (or on the unexpected success state) haven't made it to the nightlies yet.

If you're in a rush, I think we can merge your dispatch key move PR in pytorch/pytorch (I believe that won't break the functorch build in fbcode; if it does someone will revert it). It will break these tests (but hey, these tests are already broken, and no one has complained yet :D)

If you're not in a rush, we should wait until the next business day (for the nightlies to update), rebase this PR & fix the unexpected successes, and the CI should be green.

Contributor Author

bdhirsh commented May 26, 2022

If you're not in a rush, we should wait until the next business day (for the nightlies to update), rebase this PR & fix the unexpected successes, and the CI should be green.

Oof, somehow forgot about the [local master] vs [CI nightly] discrepancy again. Thanks for checking. Waiting until tomorrow and rebasing sounds fine to me, I'll go ahead and do that

bdhirsh added 2 commits

May 31, 2022 07:49


          add batching rule for block_diag, kill DECOMPOSE_FUNCTIONAL

63be200


          add batching rule for block_diag, kill DECOMPOSE_FUNCTIONAL

dd23d9f

bdhirsh force-pushed the block_diag_fix branch from b1ecb5b to dd23d9f Compare

May 31, 2022 14:49

Contributor

zou3519 commented May 31, 2022

build+test passed locally for me (minus some xfails I will be adding soon), the code LGTM as well so let's merge

zou3519 merged commit abe4c4d into main

zou3519 pushed a commit to zou3519/pytorch that referenced this pull request


          [functorch] add batching rule for block_diag, kill DECOMPOSE_FUNCTION…

…AL (pytorch/functorch#814)

* add batching rule for block_diag, kill DECOMPOSE_FUNCTIONAL

* add batching rule for block_diag, kill DECOMPOSE_FUNCTIONAL

bigfootjon pushed a commit to pytorch/pytorch that referenced this pull request


          [functorch] add batching rule for block_diag, kill DECOMPOSE_FUNCTION…

ab7f5fd

…AL (pytorch/functorch#814)

* add batching rule for block_diag, kill DECOMPOSE_FUNCTIONAL

* add batching rule for block_diag, kill DECOMPOSE_FUNCTIONAL

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels