[dynamo] Fix tracing partially initialized tensor subclass during dispatch#175397
[dynamo] Fix tracing partially initialized tensor subclass during dispatch#175397azahed98 wants to merge 3 commits intogh/azahed98/5/basefrom
Conversation
…patch [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175397
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit 545bf4c with merge base f72a552 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
anijain2305
left a comment
There was a problem hiding this comment.
Can you compare it with other tensor subclasses like DTensor etc? Most of the hesitation stems from not understanding how tensor subclass init is handled in Dynamo. My understanding was we just graph break, but I might be wrong.
@anijain2305 Did some digging and what I found is DTensor graph breaks on I also realized that this issue is specific to compiling In that case I'm thinking our best options are
Edit 1: |
|
@anijain2305 I tried adding a skip for root frame subclass This resolves the issue by skipping the frame instead, which should be fine since we realistically will only encounter this issue if |
Yes, this makes sense. At the time, we might not have had this |
|
@azahed98 any ETA on landing this? 👀 |
… during dispatch" Fixes an edge-case found with Diffusers+TorchAo+Dynamo where a Tensor Subclass can have it's `__init__` traced, which calls `__tensor_flatten__` prior to init. In this case. attributes used in `__tensor_flatten__` result in an error since they are not yet initialized. This PR instead adds an escape hatch to skip faking at the start of `VariableBuilder.wrap_tensor` in this case. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33). I was unable to reproduce the error end-to-end without Diffusers or TorchAO dependency, so I instead added a unit test that checks that the escape hatch is taken with mocks. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]
|
@sayakpaul I'll start the merge of the stack today -- just need to fix some CI fails from changes to the linter. |
…patch ghstack-source-id: 6d0b87a Pull Request resolved: pytorch/pytorch#175397
… during dispatch" Fixes an edge-case found with Diffusers+TorchAo+Dynamo where a Tensor Subclass can have it's `__init__` traced, which calls `__tensor_flatten__` prior to init. In this case. attributes used in `__tensor_flatten__` result in an error since they are not yet initialized. This PR instead adds an escape hatch to skip faking at the start of `VariableBuilder.wrap_tensor` in this case. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33). I was unable to reproduce the error end-to-end without Diffusers or TorchAO dependency, so I instead added a unit test that checks that the escape hatch is taken with mocks. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx kadeng chauhang amjames Lucaskabela jataylo [ghstack-poisoned]
|
Ok looks like I resolved the ghstack dupe issue. Re-requesting review to unblock merge. |
|
Starting merge as part of PR stack under #175660 |
Fixes an error with the TENSOR_SUBCLASS_METADATA_MATCH guard when the tensor subclass has a SymInt in its metadata. In this scenario, `deepcopy` of the metadata propagates through the SymInt down to the ShapeEnv, FakeMode, and then FakeTensors, causing an error due to no data pointer. This PR replaces SymInts in the metadata with an `_AnyCompare` object that always returns `True` for equals checks. This assumes dynamic shapes checks will handle correctness. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33) (if ran on the previous commit from this stack). This PR adds a regression test with a manually injected SymInt into the metadata, then compiles with `full_graph=True` and checks for no recompiles. Pull Request resolved: #175596 Approved by: https://github.com/anijain2305 ghstack dependencies: #175397
…sure refcycle (#175660) Fixes a potential reference cycle that can block `swap_tensors` during or after compile. This reference cycle comes from a closure of a `MetaConverter` object within the `_empty_create_subclass` defined in `MetaConverter.empty_create_subclass`. This PR moves `_empty_create_subclass` to be a method of `MetaConverter` instead, adding additional arguments and moving imports as needed. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33) (if ran on the previous commit from this stack). This PR adds a unit test that checks that weakrefs created by `MetaConverter` are cleaned up when it is manually deleted even if garbage collection is disabled. Pull Request resolved: #175660 Approved by: https://github.com/anijain2305 ghstack dependencies: #175397, #175596
|
Thanks for landing it. I will try this out next week and get back! |
|
@azahed98 I bring bad news I am afraid. I tried it out but seems like it's contingent on pytorch/ao#4088. |
|
huggingface/diffusers#13276 -- opened a PR. Hopefully, this gets resolved. @lordaarush do you want to test it as well? |
…patch (pytorch#175397) Fixes an edge-case found with Diffusers+TorchAo+Dynamo where a Tensor Subclass can have it's `__init__` traced, which calls `__tensor_flatten__` prior to init. In this case. attributes used in `__tensor_flatten__` result in an error since they are not yet initialized. This PR instead adds an escape hatch to skip faking at the start of `VariableBuilder.wrap_tensor` in this case. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33). I was unable to reproduce the error end-to-end without Diffusers or TorchAO dependency, so I instead added a unit test that checks that the escape hatch is taken with mocks. Pull Request resolved: pytorch#175397 Approved by: https://github.com/anijain2305, https://github.com/williamwen42
…75596) Fixes an error with the TENSOR_SUBCLASS_METADATA_MATCH guard when the tensor subclass has a SymInt in its metadata. In this scenario, `deepcopy` of the metadata propagates through the SymInt down to the ShapeEnv, FakeMode, and then FakeTensors, causing an error due to no data pointer. This PR replaces SymInts in the metadata with an `_AnyCompare` object that always returns `True` for equals checks. This assumes dynamic shapes checks will handle correctness. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33) (if ran on the previous commit from this stack). This PR adds a regression test with a manually injected SymInt into the metadata, then compiles with `full_graph=True` and checks for no recompiles. Pull Request resolved: pytorch#175596 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#175397
…sure refcycle (pytorch#175660) Fixes a potential reference cycle that can block `swap_tensors` during or after compile. This reference cycle comes from a closure of a `MetaConverter` object within the `_empty_create_subclass` defined in `MetaConverter.empty_create_subclass`. This PR moves `_empty_create_subclass` to be a method of `MetaConverter` instead, adding additional arguments and moving imports as needed. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33) (if ran on the previous commit from this stack). This PR adds a unit test that checks that weakrefs created by `MetaConverter` are cleaned up when it is manually deleted even if garbage collection is disabled. Pull Request resolved: pytorch#175660 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#175397, pytorch#175596
…patch (pytorch#175397) Fixes an edge-case found with Diffusers+TorchAo+Dynamo where a Tensor Subclass can have it's `__init__` traced, which calls `__tensor_flatten__` prior to init. In this case. attributes used in `__tensor_flatten__` result in an error since they are not yet initialized. This PR instead adds an escape hatch to skip faking at the start of `VariableBuilder.wrap_tensor` in this case. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33). I was unable to reproduce the error end-to-end without Diffusers or TorchAO dependency, so I instead added a unit test that checks that the escape hatch is taken with mocks. Pull Request resolved: pytorch#175397 Approved by: https://github.com/anijain2305, https://github.com/williamwen42
…75596) Fixes an error with the TENSOR_SUBCLASS_METADATA_MATCH guard when the tensor subclass has a SymInt in its metadata. In this scenario, `deepcopy` of the metadata propagates through the SymInt down to the ShapeEnv, FakeMode, and then FakeTensors, causing an error due to no data pointer. This PR replaces SymInts in the metadata with an `_AnyCompare` object that always returns `True` for equals checks. This assumes dynamic shapes checks will handle correctness. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33) (if ran on the previous commit from this stack). This PR adds a regression test with a manually injected SymInt into the metadata, then compiles with `full_graph=True` and checks for no recompiles. Pull Request resolved: pytorch#175596 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#175397
…sure refcycle (pytorch#175660) Fixes a potential reference cycle that can block `swap_tensors` during or after compile. This reference cycle comes from a closure of a `MetaConverter` object within the `_empty_create_subclass` defined in `MetaConverter.empty_create_subclass`. This PR moves `_empty_create_subclass` to be a method of `MetaConverter` instead, adding additional arguments and moving imports as needed. **Test Plan:** The original error can be reproduced with [this script](https://gist.github.com/sayakpaul/929678132809874c5dbf9c5215460d33) (if ran on the previous commit from this stack). This PR adds a unit test that checks that weakrefs created by `MetaConverter` are cleaned up when it is manually deleted even if garbage collection is disabled. Pull Request resolved: pytorch#175660 Approved by: https://github.com/anijain2305 ghstack dependencies: pytorch#175397, pytorch#175596
|
huggingface/diffusers#13276 was merged and this is working really well now! Thanks @azahed98 |
Stack from ghstack (oldest at bottom):
Fixes an edge-case found with Diffusers+TorchAo+Dynamo where a Tensor Subclass can have it's
__init__traced, which calls__tensor_flatten__prior to init. In this case. attributes used in__tensor_flatten__result in an error since they are not yet initialized.This PR instead adds an escape hatch to skip faking at the start of
VariableBuilder.wrap_tensorin this case.Test Plan: The original error can be reproduced with this script. I was unable to reproduce the error end-to-end without Diffusers or TorchAO dependency, so I instead added a unit test that checks that the escape hatch is taken with mocks.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo