[Prototype] Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time by voznesenskym · Pull Request #89392 · pytorch/pytorch

voznesenskym · 2022-11-20T23:14:48Z

cc @mlazos @soumith @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

pytorch-bot · 2022-11-20T23:14:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89392

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5321f59:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…tempt_again

ezyang · 2022-11-21T13:28:06Z

functorch/_src/aot_autograd.py

+                        if not isinstance(x, torch.Tensor):
+                            return x
+                        if isinstance(x, torch._subclasses.fake_tensor.FakeTensor):
+                            return x


This looks questionable. If an argument is already a fake tensor, it's unlikely to be consistent with the freshly allocated fake mode. Which means you'd probably get an error if you tried to actually use it. Better to not support this case.

Fwiw this is copied fro the old functionality, but let's see if we can make things better than we found them.

This isn't copied. The old code doesn't explicitly test for FakeTensor.

functorch/_src/aot_autograd.py

ezyang · 2022-11-21T13:32:06Z

functorch/_src/aot_autograd.py

+    def fakify_params_and_buffers(flat_args):
+        nonlocal fake_mode
+        if config.use_fake_tensor:
+            flat_inputs, _ = pytree.tree_flatten(inputs)


A comment here saying what this is doing will help other readers

ezyang · 2022-11-21T13:34:22Z

functorch/_src/aot_autograd.py

-def aot_module_simplified(mod: nn.Module, *top_args, **top_kwargs) -> nn.Module:
+def aot_module_simplified(
+        mod: nn.Module,
+        inputs,


A comment saying what the acceptable inputs here would be good.

…sors" Strategy taken from voz's #89392 but my implementation strategy is a bit different. If a fake tensor is provided, we use its FakeTensorMode (and more importantly, its ShapeEnv--this is what is tested in the new unit test). Only one tensor needs to be fake; if nothing is fake we just make a fresh mode as before. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

…ograd, move aot_autograd compilation to lowering time" After all of the preparatory commits, this is a subset of the changes in #89392 that actually change us to propagating fake tensors to backends. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

Strategy taken from voz's #89392 but my implementation strategy is a bit different. If a fake tensor is provided, we use its FakeTensorMode (and more importantly, its ShapeEnv--this is what is tested in the new unit test). Only one tensor needs to be fake; if nothing is fake we just make a fresh mode as before. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

Taken from voz's #89392 Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

…arguments directly" This is extracted from voz's #89392 Previously, the implementation did some half-assed caching where it returned a callable, that when invoked for the first time, actually performed the compilation. Delaying the compilation like this... seems totally unnecessary? To make matters worse, this has cost (we have to check if we hit the cache) and unsound (because the compiled function may not be valid for other arguments.) So instead, we ask user to provide arguments, and compile everything immediately. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

This is extracted from voz's #89392 Previously, the implementation did some half-assed caching where it returned a callable, that when invoked for the first time, actually performed the compilation. Delaying the compilation like this... seems totally unnecessary? To make matters worse, this has cost (we have to check if we hit the cache) and unsound (because the compiled function may not be valid for other arguments.) So instead, we ask user to provide arguments, and compile everything immediately. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 49f39c5 Pull Request resolved: #89669

Taken from voz's #89392 Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

This is extracted from voz's #89392 Previously, the implementation did some half-assed caching where it returned a callable, that when invoked for the first time, actually performed the compilation. Delaying the compilation like this... seems totally unnecessary? To make matters worse, this has cost (we have to check if we hit the cache) and unsound (because the compiled function may not be valid for other arguments.) So instead, we ask user to provide arguments, and compile everything immediately. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

…sors" Strategy taken from voz's #89392 but my implementation strategy is a bit different. If a fake tensor is provided, we use its FakeTensorMode (and more importantly, its ShapeEnv--this is what is tested in the new unit test). Only one tensor needs to be fake; if nothing is fake we just make a fresh mode as before. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

Strategy taken from voz's #89392 but my implementation strategy is a bit different. If a fake tensor is provided, we use its FakeTensorMode (and more importantly, its ShapeEnv--this is what is tested in the new unit test). Only one tensor needs to be fake; if nothing is fake we just make a fresh mode as before. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

… mode in aot_autograd, move aot_autograd compilation to lowering time" After all of the preparatory commits, this is a subset of the changes in #89392 that actually change us to propagating fake tensors to backends. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

…ograd, move aot_autograd compilation to lowering time" After all of the preparatory commits, this is a subset of the changes in #89392 that actually change us to propagating fake tensors to backends. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

… aot_autograd compilation to lowering time After all of the preparatory commits, this is a subset of the changes in #89392 that actually change us to propagating fake tensors to backends. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 8684d1d Pull Request resolved: #89672

There is only one call site for compiler_fn, so we can safely delay wrapping verify correctness to here. This will help later when we change the backend compiler calling convention to pass fake tensors (but I need to pass real tensors here.) This is adapted from voz's changes at #89392 but with less changes to the substantive logic. I only moved the relevant inner implementation; there are no changes otherwise. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: #89662 Approved by: https://github.com/voznesenskym

…arguments directly" This is extracted from voz's #89392 Previously, the implementation did some half-assed caching where it returned a callable, that when invoked for the first time, actually performed the compilation. Delaying the compilation like this... seems totally unnecessary? To make matters worse, this has cost (we have to check if we hit the cache) and unsound (because the compiled function may not be valid for other arguments.) So instead, we ask user to provide arguments, and compile everything immediately. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

…sors" Strategy taken from voz's #89392 but my implementation strategy is a bit different. If a fake tensor is provided, we use its FakeTensorMode (and more importantly, its ShapeEnv--this is what is tested in the new unit test). Only one tensor needs to be fake; if nothing is fake we just make a fresh mode as before. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

Strategy taken from voz's #89392 but my implementation strategy is a bit different. If a fake tensor is provided, we use its FakeTensorMode (and more importantly, its ShapeEnv--this is what is tested in the new unit test). Only one tensor needs to be fake; if nothing is fake we just make a fresh mode as before. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

Taken from voz's #89392 Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

This is extracted from voz's #89392 Previously, the implementation did some half-assed caching where it returned a callable, that when invoked for the first time, actually performed the compilation. Delaying the compilation like this... seems totally unnecessary? To make matters worse, this has cost (we have to check if we hit the cache) and unsound (because the compiled function may not be valid for other arguments.) So instead, we ask user to provide arguments, and compile everything immediately. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

Taken from voz's #89392 Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

… aot_autograd compilation to lowering time After all of the preparatory commits, this is a subset of the changes in #89392 that actually change us to propagating fake tensors to backends. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 8684d1d Pull Request resolved: #89672

Taken from voz's #89392 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: #89656 Approved by: https://github.com/voznesenskym

…arguments directly" This is extracted from voz's #89392 Previously, the implementation did some half-assed caching where it returned a callable, that when invoked for the first time, actually performed the compilation. Delaying the compilation like this... seems totally unnecessary? To make matters worse, this has cost (we have to check if we hit the cache) and unsound (because the compiled function may not be valid for other arguments.) So instead, we ask user to provide arguments, and compile everything immediately. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

…sors" Strategy taken from voz's #89392 but my implementation strategy is a bit different. If a fake tensor is provided, we use its FakeTensorMode (and more importantly, its ShapeEnv--this is what is tested in the new unit test). Only one tensor needs to be fake; if nothing is fake we just make a fresh mode as before. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

Strategy taken from voz's #89392 but my implementation strategy is a bit different. If a fake tensor is provided, we use its FakeTensorMode (and more importantly, its ShapeEnv--this is what is tested in the new unit test). Only one tensor needs to be fake; if nothing is fake we just make a fresh mode as before. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]

This is extracted from voz's #89392 Previously, the implementation did some half-assed caching where it returned a callable, that when invoked for the first time, actually performed the compilation. Delaying the compilation like this... seems totally unnecessary? To make matters worse, this has cost (we have to check if we hit the cache) and unsound (because the compiled function may not be valid for other arguments.) So instead, we ask user to provide arguments, and compile everything immediately. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

… mode in aot_autograd, move aot_autograd compilation to lowering time" After all of the preparatory commits, this is a subset of the changes in #89392 that actually change us to propagating fake tensors to backends. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

Taken from voz's #89392 Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

voznesenskym added 6 commits November 20, 2022 19:56

Initial

e571418

wip

3753f06

no idea

060c8d0

wip

ca22f27

hax

4c209e7

finger slipped

d381df5

github-actions bot added ciflow/inductor module: dynamo module: inductor labels Nov 20, 2022

github-actions bot requested review from Chillee, SherlockNoMad, albanD, anjali411, antoniojkim, bdhirsh, ezyang, miladm and wconstab November 20, 2022 23:15

voznesenskym added the topic: not user facing topic category label Nov 20, 2022

voznesenskym added 6 commits November 20, 2022 23:44

lints, small tests

3701c28

raise errors properly

27d57c4

Merge branch 'master' of github.com:pytorch/pytorch into voz/plumb_at…

e81eb72

…tempt_again

raise errors properly

5d51bf3

Misc test fixes, misc fixes

e9baf3f

Make inductor less sad

8264149

ezyang reviewed Nov 21, 2022

View reviewed changes

functorch/_src/aot_autograd.py Outdated Show resolved Hide resolved

ezyang reviewed Nov 21, 2022

View reviewed changes

pytorchmergebot pushed a commit that referenced this pull request Nov 26, 2022

Don't suppress exceptions from backends (#89656)

36018a6

Taken from voz's #89392 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: #89656 Approved by: https://github.com/voznesenskym

voznesenskym mentioned this pull request Dec 2, 2022

Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time [Merger of 89672 and 89773] #90039

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Prototype] Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time#89392

[Prototype] Use dynamo fake tensor mode in aot_autograd, move aot_autograd compilation to lowering time#89392
voznesenskym wants to merge 27 commits intomasterfrom
voz/plumb_attempt_again

voznesenskym commented Nov 20, 2022 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Nov 20, 2022 •

edited

Loading

Uh oh!

ezyang Nov 21, 2022

Uh oh!

voznesenskym Nov 22, 2022

Uh oh!

ezyang Nov 24, 2022

Uh oh!

Uh oh!

ezyang Nov 21, 2022

Uh oh!

ezyang Nov 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

voznesenskym commented Nov 20, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89392

✅ No Failures

Uh oh!

ezyang Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

voznesenskym Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 24, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ezyang Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

ezyang Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

voznesenskym commented Nov 20, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 20, 2022 •

edited

Loading