Add context manager for conditional rewrites of torch.* to torch._refs.* calls by IvanYashchuk · Pull Request #81764 · pytorch/pytorch

IvanYashchuk · 2022-07-20T12:44:08Z

Adds a new context manager TorchRefsNvfuserCapabilityMode for conditional rewrite of torch.* calls to torch._refs.* based on whether the decomposition consisting of prims supports nvFuser execution or not.

A new optional argument for TorchRefsMode is added - should_fallback_fn, a callable that returns whether the original torch.foo or the replacement torch._refs.foo should be used.

facebook-github-bot · 2022-07-20T12:44:16Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81764
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (1 Pending)

As of commit 6c3e9e7 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ezyang · 2022-07-21T04:43:54Z

torch/fx/experimental/proxy_tensor.py

+    # make_fx doesn't support kwargs, so we need to do this flattening
+    # and then unflatten the args before calling func
+    nargs = len(args)
+    flat_kwargs = list(kwargs.values())


Consider using pytree flatten/unflatten instead?

…ntext

jjsjann123 · 2022-07-21T08:59:39Z

torch/_prims/context.py

@@ -87,6 +91,9 @@ def __torch_function__(
        mapping = torch_to_refs_map()


This function seems to be only targeting torch.xxx.

In our dynamo workload, I think we are expecting an input GraphModule with aten.ops. Might want to expand this.

Right, it's only torch.xxx for now. One option is to use dynamo under this context manager and another is to expand the context manager with "aten_to_refs_map", let's handle the extension in a separate PR.

jjsjann123 · 2022-07-21T09:13:54Z

torch/fx/experimental/proxy_tensor.py

+    return next(proxy_tensors, None)
+
+
+def get_isolated_graphmodule(func, args, kwargs):


Would we want some special decomposition here to short-cut things like nvfuser.var_mean? With a proper impl_nvfuser this short-cut would work out well 🎉

ezyang · 2022-07-21T15:14:56Z

torch/fx/experimental/proxy_tensor.py

+    try:
+        for arg in all_args:
+            if isinstance(arg, ProxyTensor):
+                arg.proxy.tracer = new_tracer


Instead of mutating the argument, couldn't you pull out the inner elem and create a fresh ProxyTensor from that? Then you wouldn't need to worry about resetting the proxy afterwards.

Yes I can and I posted this example of creating a fresh ProxyTensor on Slack (https://pytorch.slack.com/archives/C03DP57R27M/p1658256451021309?thread_ts=1658256439.210389&cid=C03DP57R27M)

It requires creating a fresh torch.fx.proxy.Proxy which in turn requires creating a fresh Node (because nodes do not work with arbitrary graphs, it has to match). Just resetting the tracer seemed cleaner. Would you like me to change that and create a fresh ProxyTensor instead?

I changed it now to create a fresh ProxyTensor instead of mutating.

ezyang · 2022-07-21T16:50:53Z

torch/fx/experimental/proxy_tensor.py

+            if isinstance(arg, ProxyTensor):
+                arg.proxy.tracer = new_tracer
+
+        gm = make_fx(wrapped)(all_args)


This doesn't feel like it is enough. If there is an ambient proxy tensor mode active, it was never disabled and will still be interposing on invocations. But your test is passing. Do you have an explanation for why this is working as is?

~~It looks like the answer is, you're in a context where the proxy tensor mode is disabled by the time you called this~~ no this isn't right

It works because all the calls are recorded on the different tracer which is created in this function.

I see... this feels so bad haha

Do you have a different understanding of the expected behavior? We have ProxyTensors as inputs to the function they all are expected to have the same tracer object attached (there's an assert for this somewhere), this tracer object is the place where the information about the calls gets stored and when we swap the tracer information about the calls is getting stored elsewhere.

One reason it feels bad is because the ProxyTensorMode itself contains a tracer, and that tracer is now inconsistent with the tracers on the proxy objects. In fact, this means that if you have factory functions inside the conditional rewrite, they will go to the wrong graph. So this is definitely wrong!

Yes, this is definitely wrong.
I added a test case with factory functions and two nested get_isolated_graphmodule calls. The test passes with:

mode = torch._C._get_torch_dispatch_mode() with enable_torch_dispatch_mode(mode.inner, replace=mode): make_fx(...)

Then I added a test case with two nested make_fx and one get_isolated_graphmodule with factory functions. It also now passes with ExistStack of enable_torch_dispatch_mode(mode.inner, replace=mode) contexts.

torch/_prims/context.py

IvanYashchuk · 2022-07-31T13:39:20Z

Write a function that walks up the current mode stack, and looks for ProxyTensorMode

Done with a combination of maybe_disable_proxy_tensor_mode() and while torch._C._get_torch_dispatch_mode() is not None.

Make every ProxyTensor hold a reference to ProxyTensorMode (you can look at FakeTensor to see an example of how this is done)

Done in #82549.

6.Make isolated graph disables all proxy tensor modes on the mode stack (using (1) to find the modes)

Done:

with contextlib.ExitStack() as stack:
    while torch._C._get_torch_dispatch_mode() is not None:
        stack.enter_context(maybe_disable_proxy_tensor_mode())

then runs make_fx with the arguments as is

make_fx is now run with unwrapped elem of proxy tensors. I think steps (3), (4), (5) are not needed in this case?

In the current state get_isolated_graphmodule is a very simple function that unwraps given proxy tensors and runs make_fx with all outer ProxyTensorModes disabled.

Could you please provide a concrete test case you think the current state of the pull request is not covering? Because as I mentioned I couldn't make up a test case for

What if a inner tensor from the isolated graph mode escapes (e.g. by mutation)

ezyang · 2022-07-31T18:04:52Z

Could you please provide a concrete test case you think the current state of the pull request is not covering? Because as I mentioned I couldn't make up a test case for

The reason unwrapping doesn't unconditionally work is because the proxy tensor may itself be embedded within another data structure that isn't tree mappable. The most common situation is a tensor subclass. For example, consider this patch:

diff --git a/test/test_proxy_tensor.py b/test/test_proxy_tensor.py
index 9ac4e0a470..9d1635ecfa 100644
--- a/test/test_proxy_tensor.py
+++ b/test/test_proxy_tensor.py
@@ -169,7 +169,10 @@ class TestProxyTensor(TestCase):
             self.assertTrue(is_any_sum(gm))
             return torch.sigmoid(x)
 
+        from torch.testing._internal.logging_tensor import LoggingTensor
+
         def f2(x):
+            x = LoggingTensor(x)
             gm = get_isolated_graphmodule(f1, (x,), {})
             self.assertFalse(is_any_sum(gm))
             self.assertTrue(is_any_sigmoid(gm))

The LoggingTensor prevents the unwrapping from happening on the inside, and so sigmoid shows up in the inner graph.

If you really don't want to handle this case, I suppose that is reasonable, because inside AOTAutograd the expectation is that all tensor subclasses have already been erased, so we should only be passing plain tensors through and this should never happen. But in that case, there ought to be asserts about the assumed preconditions; and I also don't think it is that complicated to make it work for this general case as well.

ezyang · 2022-07-31T18:08:08Z

You also have another problem which is that maybe disable proxy tensor mode cannot "see" if there is a proxy mode behind another, more recently pushed mode. When I patch with:

diff --git a/test/test_proxy_tensor.py b/test/test_proxy_tensor.py
index 9ac4e0a470..4cd100f18f 100644
--- a/test/test_proxy_tensor.py
+++ b/test/test_proxy_tensor.py
@@ -169,8 +169,11 @@ class TestProxyTensor(TestCase):
             self.assertTrue(is_any_sum(gm))
             return torch.sigmoid(x)
 
+        from torch.testing._internal.logging_tensor import LoggingTensorMode
+
         def f2(x):
-            gm = get_isolated_graphmodule(f1, (x,), {})
+            with LoggingTensorMode():
+                gm = get_isolated_graphmodule(f1, (x,), {})
             self.assertFalse(is_any_sum(gm))
             self.assertTrue(is_any_sigmoid(gm))
             return torch.digamma(x)

the script seems to infinite loop.

…ed_graphmodule

IvanYashchuk · 2022-08-01T14:04:46Z

Thanks a lot for the help here!
I've fixed the infinite loop problem: currently, all dispatch modes are disabled first and then pushed again skipping proxy tensor modes.

I've added an assert that unwrapped Tensor arguments should not wrap other Tensors for now. I'd like to make it work later in a separate PR.

inside AOTAutograd the expectation is that all tensor subclasses have already been erased, so we should only be passing plain tensors through and this should never happen.

Is this erasion done by TorchDynamo?

ezyang · 2022-08-01T15:11:28Z

Is this erasion done by TorchDynamo?

It's done by AOTAutograd / proxy tensor tracing

ezyang · 2022-08-01T15:12:55Z

torch/fx/experimental/proxy_tensor.py

+        getattr(a, "elem", None) is None
+        for a in unwrapped_all_args
+        if isinstance(a, torch.Tensor)
+    ), "ProxyTensor is wrapped with another Tensor subclass"


I don't feel like this assert actually works haha

Maybe... 🤔 But the test passes!

The assumption is that a tensor subclass is actually a subclass of torch.Tensor and not a generic object created with torch.Tensor._make_wrapper_subclass. Is there a better way to test for subclasses?

ezyang · 2022-08-01T15:14:27Z

torch/fx/experimental/proxy_tensor.py

+        for mode in reversed([m for m in modes if not isinstance(m, ProxyTorchDispatchMode)]):
+            # mode.restore() doesn't work because mode.inner might be ProxyTorchDispatchMode
+            # mode.push() is restricted to modes that don't take any arguments
+            stack.enter_context(mode.push())


This still seems super error prone. As you say here, this only works for modes that don't take any arguments. It would be much better if there was just a boolean toggle on the proxy dispatch mode you can use to turn it off without actually removing it from the stack.

I just pushed a change to use mode.restore() with modified mode.inner and mode.ancestors, is this too hacky?

I'm against anything that modifies mode.inner, we don't really know how to understand dynamically changing mode stack structure, it's really weird and hard to think about.

It's done on a copy of the mode locally inside the exit stack context and I checked that it doesn't mutate the modes outside the exit stack context.

Instead of the current approach, you would like to see the same as done with in_kernel_invocation_manager and FakeTensorMode.in_kernel_invocation = True/False? But it also mutates the attribute locally, yes not the .inner but a different attribute, still similar.

Yes, that is my preference. The mutation in this case can be treated like a dynamically scoped variable with limited impact (as opposed to changing of inner, which totally changes the semantics of subsequent calls in the stack.)

Okay, do you think changing of inner should be disallowed programmatically?

In this situation, the semantics of modes is quite clear and asserted that outside the "with" block the modes are not changed. Isn't it a good thing to exercise various usage of .inner attribute?
Is there an issue for

pytorch/torch/utils/_python_dispatch.py

Lines 24 to 25 in afafd16

# - We need a better user-facing api for torch._C._DisableTorchDispatch that

# is able to selectively disable __torch_dispatch__ of a particular class.

Okay, do you think changing of inner should be disallowed programmatically?

Maybe. Python makes it hard to prevent people from doing this sort of thing though lol.

IvanYashchuk · 2022-08-01T19:27:51Z

Now, this PR is safer: asserts are added to prevent nested tensor subclasses, and other asserts are added to verify that all tensor modes before and after the context disabling proxy tensor modes are the same and not modified.

The last thing to do for this PR to be accepted is to change the approach of disabling proxy tensor modes: #81764 (comment) (partially blocked on #82549 being merged first).

IvanYashchuk · 2022-08-01T19:55:51Z

torch/fx/experimental/proxy_tensor.py

+        assert torch._C._get_torch_dispatch_mode() is None
+
+        # Enable all torch dispatch modes except ProxyTorchDispatchMode
+        for mode in reversed([m for m in modes if not isinstance(m, ProxyTorchDispatchMode)]):


@samdow, what do you think of the approach here modifying .inner and .ancestors to rebuild torch dispatch context skipping ProxyTorchDispatchMode instances?

I think I mostly agree with @ezyang that I don't love this as an idea but I get that copying has some limitations that make this difficult (ideally we would want to copy while passing a new argument to the constructor...). So what about this idea:
(1) delete the inner and ancestor attributes for the copy of every mode in the stack (note: they cannot just be set to None but must be deleted in order to get the mode code to work)
(2) as we're doing this, one by one push the modes back onto the stack

This is still imperfect since we are altering the inner and ancestor elements of the stack but at least the mode mechanism is still the one in charge of the mode ordering instead of having this repeated code

Also open to thoughts from either of you on this idea

I am still advocating for just setting dynamically scoped variables on the context objects themselves. This saves you from (1) having to copy or (2) mutating the inner pointers resulting in weird behavior

I'm happy with that. And to Ed's other point, I would be happy to have us restrict users updating inner but not sure if there's a clean way to

ezyang

I'll probably be editing this in the near future

IvanYashchuk · 2022-08-02T10:59:54Z

@pytorchbot merge

pytorchmergebot · 2022-08-02T11:02:03Z

@pytorchbot successfully started a merge job. Check the current status here

github-actions · 2022-08-02T11:03:09Z

Hey @IvanYashchuk.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

…s.* calls (#81764) (#81764) Summary: Adds a new context manager `TorchRefsNvfuserCapabilityMode` for conditional rewrite of `torch.*` calls to `torch._refs.*` based on whether the decomposition consisting of prims supports nvFuser execution or not. A new optional argument for `TorchRefsMode` is added - `should_fallback_fn`, a callable that returns whether the original `torch.foo` or the replacement `torch._refs.foo` should be used. Pull Request resolved: #81764 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/900e93d351bf9b0eae89efddabc7ba0c9339396a Reviewed By: kit1980 Differential Revision: D38359506 fbshipit-source-id: c66ba2c8ee54bf27ae5ab689a8d6237139c56930

…ng torch dispatch mode stack inner attributes (#82643) ### Description This PR removes fiddling with the mode stack using copies and ExitStack in favor of a simpler and more straightforward approach. ### Issue #81764 (comment) ### Testing No new tests are needed. Pull Request resolved: #82643 Approved by: https://github.com/ezyang

…ng torch dispatch mode stack inner attributes (#82643) (#82643) Summary: ### Description This PR removes fiddling with the mode stack using copies and ExitStack in favor of a simpler and more straightforward approach. ### Issue #81764 (comment) ### Testing No new tests are needed. Pull Request resolved: #82643 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/8092cf60c6d5985f88ab2c4ceceac75b83c428ff Reviewed By: kit1980 Differential Revision: D38395117 fbshipit-source-id: 0d0dcc9fb7c663181b82fed6bcb048bbd0ffc88c

IvanYashchuk added 8 commits July 20, 2022 14:07

Add the first version of the refs mode based on nvfuser+prim capability

7fb07eb

Add test_nvfuser_capability_context

4c5f19e

flake8

fd59dad

Add should_fallback_fn to TorchRefsMode

125f30a

Move the code for nested make_fx to proxy_tensor.py

5a6c2ed

Add test_isolated_graphmodule

a211aff

orig_func is not used

469d6ab

Fix default should_fallback_fn

dedc8a2

IvanYashchuk added module: nvfuser module: primTorch labels Jul 20, 2022

IvanYashchuk requested a review from ezyang July 20, 2022 12:44

facebook-github-bot added cla signed module: fx labels Jul 20, 2022

pytorchbot added the open source label Jul 20, 2022

ezyang reviewed Jul 21, 2022

View reviewed changes

IvanYashchuk added 3 commits July 21, 2022 11:04

Raise an error if no proxy tensor is found in the args

8ec8807

Use pytree

957b3eb

Merge remote-tracking branch 'upstream/viable/strict' into nvfuser-co…

eed7f79

…ntext

IvanYashchuk mentioned this pull request Jul 21, 2022

[NVFuser] Choose partitioner op list based on supported prim decompositions #80188

Closed

Context manager should work for normal non-ProxyTensor usage

7d4c69b

jjsjann123 reviewed Jul 21, 2022

View reviewed changes

jjsjann123 mentioned this pull request Jul 21, 2022

disable partition for aten ops without torch._refs decomposition #81820

Closed

IvanYashchuk requested a review from ezyang July 21, 2022 13:05

ezyang requested a review from Chillee July 21, 2022 15:13

ezyang reviewed Jul 21, 2022

View reviewed changes

ezyang requested a review from zou3519 July 21, 2022 15:15

ezyang reviewed Jul 21, 2022

View reviewed changes

torch/_prims/context.py Outdated Show resolved Hide resolved

IvanYashchuk added 2 commits July 31, 2022 15:41

Merge remote-tracking branch 'upstream/master' into nvfuser-context

a02f426

Unwrap elem instead of creating new torch.fx.Proxy

b5970f3

IvanYashchuk added 4 commits August 1, 2022 16:20

Fix infinite loop and interaction with non-ProxyTensor modes

9dc5222

Revert maybe_disable_tensor_mode

d0c709e

Assert there are no Tensor subclasses passed to make_fx in get_isolat…

28c8185

…ed_graphmodule

Add a link to the comment about nested tensor subclass wrappers

8499e65

Add a comment why push is used instead of restore

5b9db52

ezyang reviewed Aug 1, 2022

View reviewed changes

IvanYashchuk added 2 commits August 1, 2022 18:15

Use restore() instead of push()

a28e353

Add asserts that modes outside of ExitStack are not modified

6c3e9e7

IvanYashchuk commented Aug 1, 2022

View reviewed changes

ezyang approved these changes Aug 2, 2022

View reviewed changes

pytorchmergebot added the Merged label Aug 2, 2022

pytorchmergebot closed this in 900e93d Aug 2, 2022

IvanYashchuk mentioned this pull request Aug 2, 2022

Use enable_tracing flag for ProxyTorchDispatchMode instead of modifying torch dispatch mode stack inner attributes #82643

Closed

ezyang mentioned this pull request Aug 8, 2022

Add torch.ops.aten -> torch._refs mapping to TorchRefsMode using decomposition_table #82657

Closed

		@@ -87,6 +91,9 @@ def __torch_function__(
		mapping = torch_to_refs_map()

		return next(proxy_tensors, None)


		def get_isolated_graphmodule(func, args, kwargs):

	# - We need a better user-facing api for torch._C._DisableTorchDispatch that
	# is able to selectively disable __torch_dispatch__ of a particular class.

Conversation

IvanYashchuk commented Jul 20, 2022

Uh oh!

facebook-github-bot commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (1 Pending)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IvanYashchuk commented Jul 31, 2022

Uh oh!

ezyang commented Jul 31, 2022

Uh oh!

ezyang commented Jul 31, 2022

Uh oh!

IvanYashchuk commented Aug 1, 2022

Uh oh!

ezyang commented Aug 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk commented Aug 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

facebook-github-bot commented Jul 20, 2022 •

edited

Loading

jjsjann123 Jul 21, 2022 •

edited

Loading

ezyang Jul 21, 2022 •

edited

Loading

ezyang commented Aug 1, 2022 •

edited

Loading

IvanYashchuk commented Aug 1, 2022 •

edited

Loading