functionalization: introduce a "zero()" aten op by bdhirsh · Pull Request #75913 · pytorch/pytorch

bdhirsh · 2022-04-15T20:30:30Z

This adds support for zero_() in the functionalization pass by introducing a new at::zero().

It's identically to at::zeros_like(t), but adding it directly in to native_functions.yaml allows the functionalization pass to automatically figure out how to undo a mutation from zero_().

We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding.

From conversation with @ezyang, we should probably just do the same with at::_copy() (even though at::copy() will be a pretty unintuitive op.

This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88

Stack from ghstack:

integrate functionalization <> LTC torchscript backend #75527 [prototype] integrate functionalization <> LTC torchscript backend
generate out= and functional variants of NativeFunctions, get functionalization to work for all mutable ops #76320 [test] attempt to functionalize ops with mutable positional-only args
add native view_copy.out ops, teach codegen about tensorlist out= #76126 add native view_copy.out ops, teach codegen about tensorlist out=
fix torch.tensor for functionalization #76319 [poc] try to fix torch.tensor for functionalization
fix nested grad(functionalize(f)) transforms #76318 fix nested grad(functionalize(f)) transforms
functionalization bugfix: using owning type when unwrapping tensors #76125 functionalization bugfix: using owning type when unwrapping tensors
fix static init issue with JIT container types #76085 fix static init issue with JIT container types
remove _is_foreach_op codegen special cases, clean up mutable return type checks #76190 remove _is_foreach_op codegen special cases, clean up mutable return type checks
functionalization: add native fill() op #76084 functionalization: add native fill() op
functionalization: add a copy() native function #76083 functionalization: add a copy() native function
functionalization: introduce a "zero()" aten op #75913 functionalization: introduce a "zero()" aten op

Differential Revision: D35705378

[ghstack-poisoned]

facebook-github-bot · 2022-04-15T20:30:37Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/75913
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 5add523 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

bdhirsh · 2022-04-15T22:03:22Z

aten/src/ATen/native/native_functions.yaml

+- func: zero(Tensor self) -> Tensor
+  variants: function
+  dispatch:
+    CompositeExplicitAutograd: zero


I don't want this to be CompositeImplicitAutograd because I want you to be able to trace out a zero call.

@ezyang this should probably go under the new CompositeNonFunctional alias key once we have one (name tbd), so functional backends don't actually decompose it.

TBH I am confused, what is wrong with this getting traced as zeros_like?

@ezyang

This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 [ghstack-poisoned]

@ezyang

This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 [ghstack-poisoned]

bdhirsh · 2022-04-17T01:02:20Z

@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2022-04-17T03:23:59Z

I know I suggested you do this, but seeing the PR brings another thing to mind: you don't want this to be exposed as a user visible concept so you've suppressed Python bindings. But this is still exposed to the user in another way: as an operator that can show up in a trace. This seems counterproductive: now you have to know to define BOTH aten::zero and aten::zeros_like but actually they're the same thing. You only need the completion for functionalization mapping zero <-> zero_, but it seems to me that you probably want it to evaporate after the mapping into zeros_like to avoid adding a duped operator to the set of operators that need to be created.

bdhirsh · 2022-04-18T16:43:20Z

it seems to me that you probably want it to evaporate after the mapping into zeros_like to avoid adding a duped operator to the set of operators that need to be created.

Ah yep, this makes sense to me.

One problem with that today is that at::Tensor::zeros_like is a CompositeImplicitAutograd op, which means that it will actually decompose further in the trace (and it calls... empty_like and zero_. So mutations show up in the trace again!)

Somehow... we want zeros_like() not to decompose before hitting the Python key, when functionalization is involved.

I have a question: would it be too heavy-handed to require that all functional op CompositeImplicitAutograd kernels that decompose into mutations get their own autograd formulas? (And get switched over eventually to some 'DecomposeWithMutations` alias key that runs underneath autograd).

If we have a derivative formula for zeros_like, then we can:

keep zero() as CompositeImplicitAutograd
mark zeros_like() as CompositeExplicitAutograd so it shows up in traces

ezyang · 2022-04-18T17:07:12Z

Weren't you going to distinguish between mutating and non-mutating composites? It seems like that would help.

bdhirsh · 2022-04-18T17:26:08Z

Weren't you going to distinguish between mutating and non-mutating composites? It seems like that would help

Yep - that doesn't help with CompositeImplicitAutograd ops though, because we automatically lose autograd support if we decide not to decompose. That's why I'm thinking we'd need to give zeros_like an autograd formula as part of this PR.

At first I thought that we'd need to do the same for all other CompositeImplicitAutograd ops that call into mutation ops, but I guess that's not true - we only need to worry about CompositeImplicitAutograd ops that get called underneath the functionalization pass, like zeros_like.

ezyang · 2022-04-19T01:05:08Z

I have a question: would it be too heavy-handed to require that all functional op CompositeImplicitAutograd kernels that decompose into mutations get their own autograd formulas? (And get switched over eventually to some 'DecomposeWithMutations` alias key that runs underneath autograd).

It's a bit finely balanced, but this plan seems reasonable to me.

ezyang · 2022-04-19T01:05:54Z

I have a question: would it be too heavy-handed to require that all functional op CompositeImplicitAutograd kernels that decompose into mutations get their own autograd formulas? (And get switched over eventually to some 'DecomposeWithMutations` alias key that runs underneath autograd).

It's a bit finely balanced, but this plan seems reasonable to me.

@ezyang

This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

@ezyang

This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

bdhirsh · 2022-04-20T13:32:59Z

tools/autograd/derivatives.yaml


+- name: zeros_like(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor
+  self: zeros_like(grad)
+  result: auto_linear


cc @albanD / @soulitzer, I'm giving a derivative formula to zeros_like (see the comment above)

@ezyang

Fixes pytorch/functorch#705 This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

@ezyang

Fixes pytorch/functorch#705 This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

@ezyang

Fixes pytorch/functorch#705 This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

@ezyang

Fixes pytorch/functorch#705 This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

bdhirsh · 2022-04-22T12:47:28Z

torch/testing/_internal/common_methods_invocations.py

-               # Fails due to a limitation of gradgradcheck
-               # https://github.com/pytorch/pytorch/issues/59137
-               DecorateInfo(unittest.expectedFailure, 'TestGradients', 'test_fn_gradgrad'),
-               DecorateInfo(unittest.expectedFailure, 'TestGradients', 'test_inplace_gradgrad'),


cc @soulitzer, these tests started passing after I added a derivative formula for zeros_like. Just double checking - does that sound reasonable? (If it is I can close the linked issue)

@ezyang

Fixes pytorch/functorch#705 This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

@ezyang

Fixes pytorch/functorch#705 This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

@ezyang

Fixes pytorch/functorch#705 This adds support for `zero_()` in the functionalization pass by introducing a new `at::zero()`. It's identically to `at::zeros_like(t)`, but adding it directly in to `native_functions.yaml` allows the functionalization pass to automatically figure out how to undo a mutation from `zero_()`. We probably don't want users to actually use the operator, so I didn't give it a tensor method or a python binding. From conversation with @ezyang, we should probably just do the same with `at::_copy()` (even though `at::copy()` will be a pretty unintuitive op. This also fixes one of the torch dynamo integration issues mentioned in pytorch/torchdynamo#88 Differential Revision: [D35705378](https://our.internmc.facebook.com/intern/diff/D35705378) [ghstack-poisoned]

bdhirsh · 2022-04-25T21:30:09Z

@pytorchbot merge this please

datumbox · 2022-04-26T12:54:41Z

@bdhirsh We have started receiving Runtime Errors related to torch.zeros_like at TorchVision after the landing of this PR. It's not clear if it is related to this PR but could you please have a look at: pytorch/vision#5881

vfdev-5 · 2022-04-26T15:02:22Z

Repro code:

import torch  # '1.12.0.dev20220426+cu113' and '1.12.0a0+gitb17b2b1'
import torchvision  # 0.13.0a0+01b0a00

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights=None, weights_backbone=None)
model.eval()

smodel = torch.jit.script(model)
smodel.eval()
smodel([torch.rand(3, 224, 224), ])

# Based on Detectron2 implementation, just manually call nms() on each class independently
    keep_mask = torch.zeros_like(scores, dtype=torch.bool)
                ~~~~~~~~~~~~~~~~ <--- HERE
    for class_id in torch.unique(idxs):
        curr_indices = torch.where(idxs == class_id)[0]
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "../torch/csrc/autograd/functions/utils.h":65, please report a bug to PyTorch.

Reverting #75913 as it broke torchvision (see comment at #75913 (comment)) Diagnose: The following code now fails but used to work: ``` import torch def foo(a): b = torch.zeros_like(a, dtype=torch.bool) return b a = torch.ones(2, requires_grad=True) sfoo = torch.jit.script(foo) sfoo(a) ``` Why? The reverted PR added an autograd formula for `zeros_like()`, which used to be a `CompositeImplicitAutograd` op. Unfortunately, that changed the behavior of zeros_like as follows. The `*_like` ops "work" with autograd, but they sever the autograd graph. ``` >>> a = torch.ones(2, requires_grad=True) >>> b = torch.zeros_like(a) >>> b.requires_grad False >>> b.is_leaf True ``` That makes code like `torch.zeros_like(a, dtype=torch.bool)` valid even if `a` requires_grad: if the requires_grad-ness were propagated, autograd would throw an error that you can't use autograd with `bool` tensors. This reverts commit 7d44b36. [ghstack-poisoned]

Summary: Pull Request resolved: #75913 Approved by: https://github.com/albanD Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/7d44b3675bafdfbd59e6c81734ca3febd771dd7b Reviewed By: albanD Differential Revision: D35705378 Pulled By: bdhirsh fbshipit-source-id: 7aebc1bbe8fdc7aca461920a2ac1f3b4a1afbe28

functionalization: add support for zero_()

19834fa

[ghstack-poisoned]

bdhirsh requested review from albanD and soulitzer as code owners April 15, 2022 20:30

This was referenced Apr 15, 2022

split out functionalization codegen to use view_copy operators #75302

Closed

fix out= op handling for functionalization #75818

Closed

teach ivalue about List[Optional[Tensor]], fix fallbacks #75716

Closed

facebook-github-bot added the cla signed label Apr 15, 2022

This was referenced Apr 15, 2022

fix unfold for meta tensors #75717

Closed

functionalization: avoid some unnecessary view_copy calls #75819

Closed

bdhirsh mentioned this pull request Apr 15, 2022

integrate functionalization <> LTC torchscript backend #75527

Closed

bdhirsh changed the title ~~functionalization: add support for zero_()~~ functionalization: introduce a "zero()" aten op Apr 15, 2022

bdhirsh commented Apr 15, 2022

View reviewed changes

bdhirsh requested a review from ezyang April 15, 2022 22:03

bdhirsh mentioned this pull request Apr 15, 2022

functionalize(): make "additionally removing views" toggleable pytorch/functorch#678

Merged

albanD approved these changes Apr 18, 2022

View reviewed changes

This was referenced Apr 20, 2022

functionalization: add a copy() native function #76083

Closed

functionalization: add native fill() op #76084

Closed

fix static init issue with JIT container types #76085

Closed

bdhirsh commented Apr 20, 2022

View reviewed changes

This was referenced Apr 20, 2022

functionalization bugfix: using owning type when unwrapping tensors #76125

Closed

add native view_copy.out ops, teach codegen about tensorlist out= #76126

Closed

This was referenced Apr 21, 2022

functionalization: fix nested grad + functionalize transforms #76189

Closed

remove _is_foreach_op codegen special cases, clean up mutable return type checks #76190

Closed

bdhirsh added 2 commits April 21, 2022 16:36

bdhirsh commented Apr 22, 2022

View reviewed changes

bdhirsh requested review from mruberry and ngimel as code owners April 22, 2022 13:57

This was referenced Apr 25, 2022

fix nested grad(functionalize(f)) transforms #76318

Closed

fix torch.tensor for functionalization #76319

Closed

generate out= and functional variants of NativeFunctions, get functionalization to work for all mutable ops #76320

Closed

pytorchmergebot closed this in 7d44b36 Apr 25, 2022

bdhirsh mentioned this pull request Apr 26, 2022

Reuse Lazy IR nodes for faster tracing #73968

Closed

datumbox mentioned this pull request Apr 26, 2022

JIT related failures using Core nightly 20220426 pytorch/vision#5881

Closed

bdhirsh mentioned this pull request Apr 26, 2022

Revert "functionalization: add support for zero_()" #76375

Closed

facebook-github-bot deleted the gh/bdhirsh/208/head branch April 29, 2022 14:17

Conversation

bdhirsh commented Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

bdhirsh Apr 15, 2022

Choose a reason for hiding this comment

Uh oh!

ezyang Apr 17, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented Apr 17, 2022

Uh oh!

ezyang commented Apr 17, 2022

Uh oh!

bdhirsh commented Apr 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Apr 18, 2022

Uh oh!

bdhirsh commented Apr 18, 2022

Uh oh!

ezyang commented Apr 19, 2022

Uh oh!

ezyang commented Apr 19, 2022

Uh oh!

bdhirsh Apr 20, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh Apr 22, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented Apr 25, 2022

Uh oh!

datumbox commented Apr 26, 2022

Uh oh!

vfdev-5 commented Apr 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bdhirsh commented Apr 15, 2022 •

edited

Loading

facebook-github-bot commented Apr 15, 2022 •

edited

Loading

bdhirsh commented Apr 18, 2022 •

edited

Loading

vfdev-5 commented Apr 26, 2022 •

edited

Loading