teach ivalue about List[Optional[Tensor]], fix fallbacks#75716
teach ivalue about List[Optional[Tensor]], fix fallbacks#75716bdhirsh wants to merge 7 commits intogh/bdhirsh/203/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 6107ac8 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
[ghstack-poisoned]
[ghstack-poisoned]
IValue's treat the `[Optional[Tensor]]` type a bit weird: - For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020)) - For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly. I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors). [ghstack-poisoned]
| auto eager_ivalue = c10::IValue(c10::List<c10::optional<at::Tensor>>( | ||
| to_eager(ivalue.toOptionalTensorVector(), device_type))); | ||
| (*stack)[arguments_begin + idx] = std::move(eager_ivalue); | ||
| opt_tensorlist_args.push_back(ivalue.toOptionalTensorList()); |
There was a problem hiding this comment.
cc @wconstab, I think this should fix index_put_ for LTC
There was a problem hiding this comment.
I forgot (if I knew) what the problem with index_put_ was for LTC. Is this fixing something that is broken when index_put falls back? Your explanation in the PR makes sense, and the change lgtm. thanks!
| TORCH_API bool isFunctionalTensor(const c10::ArrayRef<Tensor> t_list); | ||
|
|
||
| TORCH_API Tensor to_functional_tensor(const Tensor& tensor); | ||
| TORCH_API c10::optional<Tensor> to_functional_tensor(const c10::optional<Tensor>& tensor); |
There was a problem hiding this comment.
How does C++ pick which overload to use if you pass in to_functional_tensor(tensor)? (this doesn't really matter because the code compiles but I'm curious)
| TORCH_API Tensor to_functional_tensor(const Tensor& tensor); | ||
| TORCH_API c10::optional<Tensor> to_functional_tensor(const c10::optional<Tensor>& tensor); | ||
| TORCH_API c10::List<Tensor> to_functional_tensor(const c10::List<Tensor>& t_list); | ||
| TORCH_API c10::List<c10::optional<Tensor>> to_functional_tensor(const c10::List<c10::optional<Tensor>>& t_list); |
There was a problem hiding this comment.
(no action required) We really need something like a c++ tree_map for all dispatcher types
| } else if (ivalue.isOptionalTensorList()) { | ||
| any_tensor_inputs = true; | ||
| auto opt_tensors = ivalue.toOptionalTensorList(); | ||
| if (at::functionalization::impl::isFunctionalTensor(opt_tensors)) { |
There was a problem hiding this comment.
Is there an invariant that either all or none of these must be functional tensors, or can there be some hybrid mix?
| // We can't just call _to_eager() on the entire list of Tensors because it | ||
| // will break on undefined tensors. Separate out undefined tensors first. |
There was a problem hiding this comment.
I'm confused -- std::vector<c10::optionalat::Tensor> probably came from an IValue that is List[Optional[Tensor]], but it can contain undefined tensors?
There was a problem hiding this comment.
it's just copy pasted from the impl above lol
There was a problem hiding this comment.
I think it should be possible to assume that inside optional they are never undefined, and things should work
| return false; | ||
| } | ||
| const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; | ||
| return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); |
There was a problem hiding this comment.
How come oh because it's a compound typeisListOf can't be used here
ezyang
left a comment
There was a problem hiding this comment.
okey dokey. Check Richard's comments
IValue's treat the `[Optional[Tensor]]` type a bit weird: - For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020)) - For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly. I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors). [ghstack-poisoned]
IValue's treat the `[Optional[Tensor]]` type a bit weird: - For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020)) - For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly. I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors). [ghstack-poisoned]
IValue's treat the `[Optional[Tensor]]` type a bit weird: - For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020)) - For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly. I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors). [ghstack-poisoned]
|
@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
|
Merge failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/2185815282 |
Summary: Pull Request resolved: #75716 IValue's treat the `[Optional[Tensor]]` type a bit weird: - For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020)) - For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly. I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors). Test Plan: Imported from OSS Reviewed By: zhxchen17 Differential Revision: D35705376 Pulled By: bdhirsh fbshipit-source-id: 217df0c059af01794bcc034642b8faff43539878
Pull Request resolved: #75716 Approved by: https://github.com/ezyang (cherry picked from commit 204df13)
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
IValue's treat the
[Optional[Tensor]]type a bit weird:c10::optional<Tensor>types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle thec10::optional<Tensor>case directly (ivalue code)c10::list<c10::optional<Tensor>>though, boxed fallbacks need to handle that directly.I added API's for checking / retrieving
OptionalTensorListfrom anIValue, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them.I also added a test for
index_put_(which used to fail, because once of its arguments was a list of optional tensors).Stack from ghstack:
Differential Revision: D35705376