Conversation
[ghstack-poisoned]
🔗 Helpful links
❌ 1 New FailuresAs of commit 791b39f (more details on the Dr. CI page): Expand to see more
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
torch/_subclasses/fake_tensor.py
Outdated
| return common_device | ||
|
|
||
| class FakeTensorMode(FakeTensor): | ||
| context = no_dispatch |
There was a problem hiding this comment.
Can we do a modern style mode instead pretty please :)
There was a problem hiding this comment.
You didn't actually use context, AFAICT?
There was a problem hiding this comment.
What is a modern style mode ?
There was a problem hiding this comment.
inherit from TorchDispatchMode
There was a problem hiding this comment.
will do.. mind linking me differences between two why one should do modern style over existing ?
There was a problem hiding this comment.
the main difference is you can store instance variables on the mode, since it is an actual object not a class
There was a problem hiding this comment.
nice... i probably don't need setup_mode then
| # TODO: no real reason to restrict multiple outputs | ||
| return ( | ||
| len(schema.returns) == 1 and schema.returns[0].type is torch._C.TensorType.get() | ||
| ) |
| func, args=args, kwargs=kwargs, normalize_to_only_use_kwargs=True | ||
| ) | ||
| # cpu is default device if none is specified | ||
| out_device = new_kwargs.pop("device", torch.device("cpu")) |
There was a problem hiding this comment.
technically it's torch.get_default_tensor_type() lol but ok
There was a problem hiding this comment.
which is immutable 😛
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. [ghstack-poisoned]
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. [ghstack-poisoned]
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. [ghstack-poisoned]
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. [ghstack-poisoned]
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. [ghstack-poisoned]
|
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. Differential Revision: [D36618464](https://our.internmc.facebook.com/intern/diff/D36618464) [ghstack-poisoned]
|
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
I actually think the current behavior makes sense. The easiest mental model IMO is that everything is simulated, and that no real tensors will be affected. I think allowing inplacing views and other things to actually affect the input tensors would be a mistake. Additionally, the behavior above with |
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. Differential Revision: [D36618464](https://our.internmc.facebook.com/intern/diff/D36618464) [ghstack-poisoned]
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. Differential Revision: [D36618464](https://our.internmc.facebook.com/intern/diff/D36618464) [ghstack-poisoned]
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. Differential Revision: [D36618464](https://our.internmc.facebook.com/intern/diff/D36618464) [ghstack-poisoned]
|
We discussed this in person and we decided that for torchdynamo the easiest thing will be to just wrap all the tensors as fake tensors before running the computation. That means we don't have to support in place modifying non-fake tensors. |
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. Differential Revision: [D36618464](https://our.internmc.facebook.com/intern/diff/D36618464) [ghstack-poisoned]
This adds a mode which will intercept calls to `__torch__dispatch__` even if the inputs are not already `FakeTensors`. This mimics the convenient [prior existing usage](https://pytorch.org/torchdistx/latest/fake_tensor.html). It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators. Not Yet Implemented: I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to `meta` devices) following along with the [class here](https://github.com/pytorch/pytorch/blob/master/test/test_meta.py#L70). One open question is what should be the duration of the `FakeTensorConverter`. IMO, it would make sense & be convenient for it to live for the duration of `FakeTensorMode`. Since we shouldn't be allocating any new Tensors with actual data (just on `meta` devices) it is probably fine for those tensors to live for the duration of `FakeTensorMode`. If that is not sufficient, we could try using `weakref.WeakKeyDictionary` mapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with. [ghstack-poisoned]
|
@pytorchbot merge this please |
|
Hey @eellison. |
Pull Request resolved: #77972 Approved by: https://github.com/ezyang ghstack-source-id: de497bb
Summary: Pull Request resolved: #77972 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/cea7dd1646ab147edac8f0e22f0aa85cf3136fef Reviewed By: seemethere Differential Revision: D36784784 Pulled By: seemethere fbshipit-source-id: 55175d158483e4b388402a4ddcc273b69ef403c7
| def _is_tensor_constructor(func: OpOverload): | ||
| assert isinstance(func, OpOverload) | ||
| schema = func._schema | ||
| if any(contains_tensor_types(arg.type) for arg in schema.arguments): |
There was a problem hiding this comment.
those got added later
There was a problem hiding this comment.
I mean _is_tensor_constructor would still return false for them but it should return true right?
There was a problem hiding this comment.
ya. _no_tensor_arg_constructor is a better name
Stack from ghstack (oldest at bottom):
_make_subclass#77970This adds a mode which will intercept calls to
__torch__dispatch__even if the inputs are not alreadyFakeTensors. This mimics the convenient prior existing usage. It does so by wrapping input tensors to Fake Tensors and then continuing to run the operators.Not Yet Implemented:
I still need to memoize conversion of non-fake tensors to fake tensors (and internally, to
metadevices) following along with the class here.One open question is what should be the duration of the
FakeTensorConverter. IMO, it would make sense & be convenient for it to live for the duration ofFakeTensorMode. Since we shouldn't be allocating any new Tensors with actual data (just onmetadevices) it is probably fine for those tensors to live for the duration ofFakeTensorMode.If that is not sufficient, we could try using
weakref.WeakKeyDictionarymapping tensors to their fake equivalents. I looked into this a bit and there are at least some a few incompatibilities that need to be dealt with.