Immediately compile backwards graph in AOTAutograd if dynamic shapes by ezyang · Pull Request #104971 · pytorch/pytorch

ezyang · 2023-07-11T14:26:56Z

Stack from ghstack (oldest at bottom):

-> Immediately compile backwards graph in AOTAutograd if dynamic shapes #104971

Previously, we made backwards graph compilation lazy to avoid paying
for compilation if the user didn't actually end up using the backwards
graph. This was useful in the old days when a lot of things in Inductor
didn't work and we could bypass errors this way.

However, this has a bad implication for dynamic shapes: the backwards
graph compilation can trigger extra guards, which are too late to
install in the Dynamo context if we wait until backwards is being run.
So in this PR I move us back to compiling backwards graph immediately
if we capture any SymInts for backwards.

Signed-off-by: Edward Z. Yang ezyang@meta.com

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately. This should also make it easier to predict when compilation occurs, since compilation now all happens up front during forwards. Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

pytorch-bot · 2023-07-11T14:27:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104971

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ 3 Unrelated Failures

As of commit ada9501:

BROKEN TRUNK - The following job failed but were present on the merge base 8c479d3:

👉 Rebase onto the `viable/strict` branch to avoid these failures

cuda11.8-py3.10-gcc7-sm86 / test (inductor_torchbench_dynamic, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately. This should also make it easier to predict when compilation occurs, since compilation now all happens up front during forwards. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 74ba604 Pull Request resolved: #104971

torch/_functorch/aot_autograd.py

bdhirsh · 2023-07-11T14:54:19Z

torch/_functorch/aot_autograd.py

+        with track_graph_compiling(aot_config, "backward"):
+            placeholder_list = fx_placeholder_vals(bw_module)
+
+            compiled_bw_func = aot_config.bw_compiler(


fwiw - one thing that subclass support will (eventually, not immediately) need is backwards guards: when we generate the joint, we might incorrectly assume that the grad_outputs are/are_not subclasses, which would require us to re-trace and recompile the backward later (I'm writing a doc on subclass requirements, more details will be in the doc).

Compiling the backward eagerly is probably not optimal if we end up having to recompile the backward later, although maybe we're okay with this (since invalidating bw guards is hopefully rare).

@bdhirsh Your thing is going to need true two level cache. But IMO you should just force your users to use compiled backwards in that case, which no longer has this problem. (BTW, @jansel's compiled autograd is what convinced me to do this "simpler" fix; basically if you have any complicated situation where we don't know ahead of time what the gradients will be, you instead use compiled autograd to be able to compile given full info.)

ok, telling users to use compiled autograd when this happens sounds fair! (still need to eventually figure out how to teach compiled autograd how to add extra bw guards)

Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately. This should also make it easier to predict when compilation occurs, since compilation now all happens up front during forwards. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

…mic shapes" Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately if we capture any SymInts for backwards. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]

Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately. This should also make it easier to predict when compilation occurs, since compilation now all happens up front during forwards. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 13a23ce Pull Request resolved: #104971

ezyang · 2023-07-12T03:42:10Z

If there are no dynamic shapes, I restore the old behavior of lazy compilation to shut up some TorchScript failures. If eager backwards compilation fails, I suppress it, which gets us through some failures in our test suite where our backwards dynamic codegen doesn't actually work.

voznesenskym · 2023-07-13T06:34:13Z

torch/_functorch/aot_autograd.py

+        # NB: It's important to compile backwards ahead of time, as this may
+        # add extra guards which we need to apply to the Dynamo cache at
+        # forwards


voznesenskym · 2023-07-13T06:39:04Z

torch/_inductor/utils.py


 def run_and_get_triton_code(fn, *args, **kwargs):
    _, source_codes = run_and_get_code(fn, *args, **kwargs)
+    # Can have two outputs if backwards was eagerly compiled


Can we store a flag to drive if this should be exactly 1, or of (1, 2)?

It's very awkward, because the current implementation will attempt to compile backwards, and if backwards failed to compile, suppress the error and return anyway. So there isn't really a clear delineation.

voznesenskym · 2023-07-13T06:39:15Z

test/inductor/test_torchinductor.py

        )
        m.eval()
-        self.common(m, (torch.randn([16, 32]),), check_lowp=False)
+        with torch.no_grad():


por que this change

Layer norm's backward compilation doesn't work, so the no grad forces us not to attempt compile it

voznesenskym · 2023-07-13T06:39:42Z

torch/_functorch/aot_autograd.py

+            # saved activations can have different stride to eager if
+            # the compiler does layout optimization. We should restride the
+            # tensor passed in for compiling the backward graph using the
+            # saved tensor's stride.


I would stamp but I don't know the nuances of strides well enough.

…mic shapes" Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately if we capture any SymInts for backwards. Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]

Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately. This should also make it easier to predict when compilation occurs, since compilation now all happens up front during forwards. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 47eb2c4 Pull Request resolved: #104971

ezyang · 2023-07-14T00:28:07Z

This is ready to go, just waiting for review.

eellison · 2023-07-14T00:29:14Z

torch/_functorch/aot_autograd.py

+                    try:
+                        compiled_bw_func = aot_config.bw_compiler(
+                            bw_module, placeholder_list
+                        )
+                    except Exception:
+                        log.warning(
+                            "failed to eagerly compile backwards for dynamic, suppressing in case backwards not needed",
+                            exc_info=True
+                        )


What necessitates this ? It would be nice to not land with a try-catch.

Per operator backwards dynamic codegen is still buggy af. Sample run: https://hud.pytorch.org/pytorch/pytorch/pull/104971?sha=b0bb7782a9eb8b85acbcb9ffe1835c07c6dc9f80

I'm not actually suppressing anything. If you actually try to compile backwards, we will try to compile again and THEN fail. If this suppression works out, it just means you didn't actually need the backwards graph at all.

Chillee

iiuc, this is mostly just code movement?

Chillee · 2023-07-17T07:53:32Z

torch/_functorch/aot_autograd.py

+            # the compiler does layout optimization. We should restride the
+            # tensor passed in for compiling the backward graph using the
+            # saved tensor's stride.
+            for i in range(len(placeholder_list)):


I'm assuming this is all code movement?

ezyang · 2023-07-17T13:00:47Z

@pytorchbot merge -r

pytorchmergebot · 2023-07-17T13:02:38Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

…mic shapes" Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately if we capture any SymInts for backwards. Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]

pytorchmergebot · 2023-07-17T13:02:51Z

Successfully rebased gh/ezyang/2220/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/104971)

Previously, we made backwards graph compilation lazy to avoid paying for compilation if the user didn't actually end up using the backwards graph. This was useful in the old days when a lot of things in Inductor didn't work and we could bypass errors this way. However, this has a bad implication for dynamic shapes: the backwards graph compilation can trigger extra guards, which are too late to install in the Dynamo context if we wait until backwards is being run. So in this PR I move us back to compiling backwards graph immediately. This should also make it easier to predict when compilation occurs, since compilation now all happens up front during forwards. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 6101c9f Pull Request resolved: #104971

pytorchmergebot · 2023-07-17T13:03:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ezyang requested a review from Chillee as a code owner July 11, 2023 14:26

pytorch-bot bot added the release notes: AO frontend label Jul 11, 2023

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, bdhirsh, jbschlosser, miladm, voznesenskym and wconstab July 11, 2023 14:27

eellison reviewed Jul 11, 2023

View reviewed changes

torch/_functorch/aot_autograd.py Show resolved Hide resolved

bdhirsh reviewed Jul 11, 2023

View reviewed changes

ezyang requested a review from shunting314 July 11, 2023 15:04

albanD removed their request for review July 11, 2023 18:08

ezyang mentioned this pull request Jul 11, 2023

Move more stuff into ViewAndMutationMeta #105009

Closed

ezyang mentioned this pull request Jul 11, 2023

Read out real strides from compilation result, rather than real args #105010

Closed

ezyang changed the title ~~Immediately compile backwards graph in AOTAutograd~~ Immediately compile backwards graph in AOTAutograd if dynamic shapes Jul 11, 2023

github-actions bot added module: inductor ciflow/inductor labels Jul 12, 2023

ezyang added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Jul 12, 2023

voznesenskym reviewed Jul 13, 2023

View reviewed changes

eellison reviewed Jul 14, 2023

View reviewed changes

ezyang mentioned this pull request Jul 17, 2023

vision_maskrcnn: AssertionError: expected size 368==368, stride 156==28 at dim=0 #104653

Closed

Chillee approved these changes Jul 17, 2023

View reviewed changes

pytorchmergebot added the merging label Jul 17, 2023

pytorchmergebot added Merged and removed merging labels Jul 17, 2023

pytorchmergebot closed this in 2fa7d11 Jul 17, 2023

facebook-github-bot deleted the gh/ezyang/2220/head branch July 21, 2023 14:17

imzhuhl mentioned this pull request Mar 6, 2024

[Inductor][AOTI] Fix item() + unsqeeze #115544

Closed

Conversation

ezyang commented Jul 11, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104971

✅ 3 Unrelated Failures

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jul 12, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jul 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Chillee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Uh oh!

pytorchmergebot commented Jul 17, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ezyang commented Jul 11, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 11, 2023 •

edited

Loading