Nested tensor subclass support by tugsbayasgalan · Pull Request #127431 · pytorch/pytorch

tugsbayasgalan · 2024-05-29T18:22:45Z

Stack from ghstack (oldest at bottom):

-> Nested tensor subclass support #127431

When we have nested tensor subclasses, we need to recursively flatten/unflatten in Fake tensor creation and AOTAUtograd. Most of the PR is about mechanical change which changes today's single level flatten logic to be recursive.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang

Differential Revision: D58533224

[ghstack-poisoned]

pytorch-bot · 2024-05-29T18:22:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127431

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (6 Unrelated Failures)

As of commit 4b0160f with merge base 78e40b2 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex128
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13) (gh) (similar failure)
test_mps.py::TestMPS::test_mps_allocator_module
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-14) (gh) (similar failure)
test_mps.py::TestMPS::test_mps_allocator_module

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 2, 3, linux.8xlarge.nvidia.gpu) (gh) (trunk failure)
distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_bucketing_concat_op

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / rocm6.1-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2, unstable) (gh) (#128871)
'test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_pool_multiple_devices'
trunk / linux-focal-cuda11.8-py3.10-gcc9-experimental-split-build-test / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu) (gh) (#129539, #129538, #129540)
distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_bucketing_concat_op

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2024-05-29T18:41:42Z

-            sub = t.type.__tensor_unflatten__(
-                transformed_tensors_dict, t.ctx, outer_size, outer_stride
-            )
+            todo = plain_meta_tensors


Can you just do it recursively? I don't think you'll stack overflow and I think it will be a lot easier to understand

Any update here?

When we have nested tensor subclasses, we need to recurse down to access the underlying real tensor and wrap it in FakeTensor and recursively build back up the nested tensor subclasses. I am not sure if I am passing around the SymbolicContext correctly? cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ghstack-source-id: 7622d01 Pull Request resolved: #127431

When we have nested tensor subclasses, we need to recurse down to access the underlying real tensor and wrap it in FakeTensor and recursively build back up the nested tensor subclasses. I am not sure if I am passing around the SymbolicContext correctly? cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ghstack-source-id: 932b944 Pull Request resolved: #127431

ezyang · 2024-06-07T19:44:17Z

+                    return inner_t
+
+                attr_fqn = prefix + "." + attr if prefix != "" else attr
+                attr_list = attr_fqn.split(".")


If all you're going to do to the attr_fqn is split it, why not just pass around a list

When we have nested tensor subclasses, we need to recurse down to access the underlying real tensor and wrap it in FakeTensor and recursively build back up the nested tensor subclasses. I am not sure if I am passing around the SymbolicContext correctly? cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ezyang · 2024-06-12T17:30:49Z

+                        current_context = symbolic_context.inner_contexts[attr]
+
+                    current_source = AttrSource(source, attr)
+                    new_empty_tensor = _empty_create_subclass(


You don't have to fix it here, but there's a somewhat prevalent antipattern in this file of doing small recursions on helper functions, rather than calling all the way back to the very top level. I think it should be OK to recurse to the very top call function, and that makes things more general since you can handle composition of things with other things the small helpers don't help. Just calling attention to this.

ezyang · 2024-06-12T17:32:01Z

-            sub = t.type.__tensor_unflatten__(
-                transformed_tensors_dict, t.ctx, outer_size, outer_stride
+            sub = _empty_create_subclass(
+                t, outer_size, outer_stride, symbolic_context, callback, source


ACKing this part, I'll let Brian do the rest

When we have nested tensor subclasses, we need to recurse down to access the underlying real tensor and wrap it in FakeTensor and recursively build back up the nested tensor subclasses. I am not sure if I am passing around the SymbolicContext correctly? cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

tugsbayasgalan · 2024-06-12T19:52:37Z

                # TODO: figure out how to refactor the backward properly
                # so I can use aot_dispatch_subclass_wrapper() here.
                if CompiledFunction.maybe_subclass_metadata is not None:
+                    tangents = all_args[tangents_start_idx:tangents_end_idx]


This seems kinda weird way of detecting wrong tangents but i guess this is best we can do?

bdhirsh · 2024-06-25T00:08:14Z

+        curr_start_idx = self.flat_tensor_start_idx
+        for attr, creation_meta in self.attrs.items():
+            if creation_meta is None:
+                subclass = all_args[curr_start_idx]


nit: the variable name subclass here seems misleading, since it may or not actually be a subclass (my understanding is that if creation_meta is None, this is guaranteed to be a plain tensor.

Maybe inner_tensor?

bdhirsh · 2024-06-25T00:10:43Z

@@ -171,47 +171,56 @@ class SubclassCreationMeta:
    flat_tensor_start_idx: int
    # The number of tensors that live in this subclass wrapper
    arg_count: int


after reading the code, some invariants that I think are worth explicitly mentioning in the comments:

arg_count is inclusive of the arg_counts of any inner tensor subclasses: If I have a TwoTensor and both of its inner elements are TwoTensors, then the arg_count of the outer-most sublass will be 4

bdhirsh · 2024-06-25T00:11:58Z

+                curr_start_idx += creation_meta.arg_count
+            inner_tensors[attr] = subclass
+
+        rebuilt = type(self.original_subclass).__tensor_unflatten__(


All of the indices in this reconstruction are definitely non-trivial. It would be great if we had some runtime debug-asserts we could run that would tell us if we messed up the indexing somewhere, so we get a less cryptic error if we get this wrong 🤔. I can't think of a great way to do this though, unless we do something like save all of the shapes of the inner tensors at trace time and assert that our reconstructed inner tensors are have the same shape at runtime

bdhirsh · 2024-06-25T00:16:25Z

+        z = x.clone().detach().requires_grad_()
+        z_compile = z.clone().detach().requires_grad_()
+
+        out_eager = f(x_nested, y_nested, z)


hmm... more out of paranoia than anything else, I'm worried about more complicated sets of inputs. The inputs to this test are something like Two(Two(plain, plain), Two(plain, plain)), plain, plain.

Some more testing ideas:
(1) Add a fourth argument that is an unbalanced TwoTensor, e.g. Two(plain, Two(plain, plain))
(2) add different subclass types into the test: e.g. make one input ConstantMetadataTensor(plain), and another a TwoTensor(plain, ConstantMetadataTensor(plain)).

bdhirsh

left some more nits and more tests would be great, pre-emptively stamping!

weifengpy · 2024-06-25T07:05:40Z

curious are we landing this PR soon? It's helpful in addressing IMA issues when compiling DTensor(local=fp8). Super valuable work!

sharing my 2 cents perfs. For cpu overhead and gpu time, computing fp8 amax in eager is still faster than torch.compile #129457

When we have nested tensor subclasses, we need to recursively flatten/unflatten in Fake tensor creation and AOTAUtograd. Most of the PR is about mechanical change which changes today's single level flatten logic to be recursive. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang Differential Revision: [D58533224](https://our.internmc.facebook.com/intern/diff/D58533224) [ghstack-poisoned]

facebook-github-bot · 2024-06-25T15:43:33Z

This pull request was exported from Phabricator. Differential Revision: D58533224

tugsbayasgalan · 2024-06-25T16:44:45Z

curious are we landing this PR soon? It's helpful in addressing IMA issues when compiling DTensor(local=fp8). Super valuable work!

sharing my 2 cents perfs. For cpu overhead and gpu time, computing fp8 amax in eager is still faster than torch.compile #129457

Will try to land today :)

When we have nested tensor subclasses, we need to recursively flatten/unflatten in Fake tensor creation and AOTAUtograd. Most of the PR is about mechanical change which changes today's single level flatten logic to be recursive. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang Differential Revision: [D58533224](https://our.internmc.facebook.com/intern/diff/D58533224) [ghstack-poisoned]

@voznesenskym

Pull Request resolved: #127431 When we have nested tensor subclasses, we need to recursively flatten/unflatten in Fake tensor creation and AOTAUtograd. Most of the PR is about mechanical change which changes today's single level flatten logic to be recursive. cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @imported-using-ghimport Differential Revision: [D58533224](https://our.internmc.facebook.com/intern/diff/D58533224/) ghstack-source-id: 21cebdb

tugsbayasgalan · 2024-06-25T21:29:20Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-06-26T04:43:42Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-06-26T04:45:08Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: `unwrap_tensor_subclass` is incorporated in export stack natively after pytorch/pytorch#127431 so we can remove this workaround now Test Plan: python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:

[DRAFT] nested tensor subclass support

c3768c3

[ghstack-poisoned]

tugsbayasgalan requested review from Chillee and ezyang as code owners May 29, 2024 18:22

pytorch-bot Bot added ciflow/inductor module: dynamo release notes: fx release notes category labels May 29, 2024

tugsbayasgalan requested a review from bdhirsh May 29, 2024 18:30

ezyang reviewed May 29, 2024

View reviewed changes

tugsbayasgalan added a commit that referenced this pull request May 29, 2024

[DRAFT] nested tensor subclass support

d01fc27

ghstack-source-id: 7622d01 Pull Request resolved: #127431

bdhirsh reviewed May 29, 2024

View reviewed changes

Comment thread test/functorch/test_aotdispatch.py

bdhirsh reviewed Jun 4, 2024

View reviewed changes

Comment thread test/functorch/test_aotdispatch.py

weifengpy mentioned this pull request Jun 6, 2024

[FSDP2] precompute scale after optimizer.step for dynamic scaling meta-pytorch/float8_experimental#266

Closed

tugsbayasgalan added a commit that referenced this pull request Jun 6, 2024

[DRAFT] nested tensor subclass support

647b448

ghstack-source-id: 932b944 Pull Request resolved: #127431

ezyang reviewed Jun 7, 2024

View reviewed changes

ezyang reviewed Jun 12, 2024

View reviewed changes

tugsbayasgalan changed the title ~~[DRAFT] nested tensor subclass support~~ Nested tensor subclass support Jun 12, 2024

tugsbayasgalan commented Jun 12, 2024

View reviewed changes

bdhirsh reviewed Jun 25, 2024

View reviewed changes

Comment thread torch/testing/_internal/custom_tensor.py

bdhirsh reviewed Jun 25, 2024

View reviewed changes

bdhirsh approved these changes Jun 25, 2024

View reviewed changes

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 25, 2024

weifengpy mentioned this pull request Jun 25, 2024

[PT2][fp8][FSDP2] compile the function that pre-computes fp8 amax #129457

Closed

jerryzh168 mentioned this pull request Jun 26, 2024

Feedback on quantize() API pytorch/ao#384

Closed

pytorchmergebot added the merging label Jun 26, 2024

pytorchmergebot closed this in 6181e65 Jun 26, 2024

pytorchmergebot added Merged and removed merging labels Jun 26, 2024

jerryzh168 mentioned this pull request Jul 1, 2024

Remove unwrap_tensor_subclass pytorch/ao#462

Closed

github-actions Bot deleted the gh/tugsbayasgalan/220/head branch July 27, 2024 01:56

Conversation

tugsbayasgalan commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127431

✅ You can merge normally! (6 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

weifengpy commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 25, 2024

Uh oh!

tugsbayasgalan commented Jun 25, 2024

Uh oh!

tugsbayasgalan commented Jun 25, 2024

Uh oh!

facebook-github-bot commented Jun 26, 2024

Uh oh!

pytorchmergebot commented Jun 26, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tugsbayasgalan commented May 29, 2024 •

edited

Loading

pytorch-bot Bot commented May 29, 2024 •

edited

Loading

weifengpy commented Jun 25, 2024 •

edited

Loading