fix static init issue with JIT container types#76085
fix static init issue with JIT container types#76085bdhirsh wants to merge 9 commits intogh/bdhirsh/211/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit c8b962f (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
[ghstack-poisoned]
|
I tagged @ezyang for this fix, but I'm not sure if there's someone from JIT who would also want to look at this PR. |
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
| if (optionalTypePtrs.find(inner) == optionalTypePtrs.end()) { | ||
| TypePtr t = TypeFactory::create<OptionalType>(inner); | ||
| optionalTypePtrs.emplace(inner, t); | ||
| } |
There was a problem hiding this comment.
yep 😬 will add (and it shouldn't matter if it's slow since it's behind a static singleton)
There was a problem hiding this comment.
update - locking fixed the bug described above lol. With the lock, this now works:
static const auto& call() {
static auto inner_type = getTypePtr_<T>::call();
static auto type = OptionalType::get(inner_type);
return type;
}
I'm going to try to update the other container types to do something similar.
There was a problem hiding this comment.
same as above -- move & be careful not to use after move in the return statement
|
@eellison do you think you'd be able to help review this? |
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
|
OK, I'm A+ for this approach then. Here's the approve. |
|
@ezyang hey sorry do you still need me to take a look ? |
|
I think a high level "this makes sense for JIT C++ types" nod would be a help. PR is not long. |
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
| // the type List<T>. | ||
| // The extra "identifier" argument is needed beccause we have multiple container types | ||
| // that all re-use this function (List<T>, array<T, N>, etc.) | ||
| static TypePtr get(std::string identifier, TypePtr inner); |
There was a problem hiding this comment.
Can we make change the std::string identifier to an Enum?
There was a problem hiding this comment.
We can't make it an enum because of array<T, N> - we basically need a different output type for every different container template (and std::string gives maybe too much freedom, but I figured it was simple)
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
| // otherwise we'll end up with one singleton instance per shared library. | ||
| // (Concatenating the length onto the end of the string because we want a unique | ||
| // type_ptr created for every std::array<T, N> type). | ||
| static auto type = ListType::get(std::string("array") + std::to_string(N), inner_type); |
There was a problem hiding this comment.
it is definitely possible to produce the string here at compile time, but maybe it doesn't matter enough.
| // This hashing is all hidden behind a static initializer so it | ||
| // doesn't have to be optimal |
There was a problem hiding this comment.
great, might be a good idea to use hash_combine then
There was a problem hiding this comment.
Just saw that we have an at::hash_combine - I'll use that, thanks!
aten/src/ATen/core/type.cpp
Outdated
| TypePtr t = TypeFactory::create<OptionalType>(inner); | ||
| containerTypePtrs.emplace(inner, t); |
There was a problem hiding this comment.
good: std::move(t)
better: just eliminate t and create it directly
also, std::move(inner) in the emplace call
| if (containerTypePtrs.find(key) == containerTypePtrs.end()) { | ||
| TypePtr t = ListType::create(inner); | ||
| containerTypePtrs.emplace(key, t); | ||
| } | ||
| return containerTypePtrs[key]; |
There was a problem hiding this comment.
same as above, but needs some extra care to avoid use-after-move, and this still does some extra refcount bumps we could potentially avoid
auto it = containerTypePtrs.find(key);
if (it == containerTypePtrs.end()) {
it = containerTypePtrs.emplace(std::move(key), ListType::create(std::move(inner));
}
return it->second;
| if (optionalTypePtrs.find(inner) == optionalTypePtrs.end()) { | ||
| TypePtr t = TypeFactory::create<OptionalType>(inner); | ||
| optionalTypePtrs.emplace(inner, t); | ||
| } |
There was a problem hiding this comment.
same as above -- move & be careful not to use after move in the return statement
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88. ### The bug we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`). It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; return ty == c10::getTypePtr<c10::optional<at::Tensor>>(); ``` That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code: ``` template <typename T, typename U> bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) { return (void*)x.get() == (void*)y.get(); } ``` Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects). You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871). When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked: ``` const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType; const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>(); // compare pointers, but if that fails compare the actual type objects return expected_ty == ty || *expected_ty == *ty; ``` So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template). I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library. We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*. I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file. ### Testing? I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`). I confirmed that with this patch, this code runs successfully (it would break before) ``` import torch from functorch import make_fx from functorch.experimental import functionalize def f(x, y): return x[y] t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long)) print("Functionalized:\n", t1.graph) ``` ### Generalizing this fix to other container types? This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too. [ghstack-poisoned]
|
@pytorchbot merge this please |
|
Hey @bdhirsh. |
Summary: Pull Request resolved: #76085 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/40f3e85005b60dab26b89c6b09e3fb7cf686d2eb Reviewed By: osalpekar Differential Revision: D35938182 Pulled By: bdhirsh fbshipit-source-id: f21f52255d49daac558c1c595dee2894405d14fd
…ton bug" This should fix the remaining dynamo <> functionalization integration issue. I had a fix earlier for JIT containers not correctly having singleton instances [here](#76085), which fixed a bug in this code for functionalization: ``` def f(a, b): return a[b] functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long)) ``` But apparently the following code is still broken: ``` def f(a, b): return torch.ops.aten.index(a, b) functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long)) ``` Why? we have separate schema parsing logic for the ops in `torch.ops.aten`, and that logic circumvented my fix, creating its own singleton instance for Optional[Tensor]. We can't test this in core for the same reason as the last fix, so companion functorch tests here: pytorch/functorch#820 [ghstack-poisoned]
This should fix the remaining dynamo <> functionalization integration issue. I had a fix earlier for JIT containers not correctly having singleton instances [here](#76085), which fixed a bug in this code for functionalization: ``` def f(a, b): return a[b] functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long)) ``` But apparently the following code is still broken: ``` def f(a, b): return torch.ops.aten.index(a, b) functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long)) ``` Why? we have separate schema parsing logic for the ops in `torch.ops.aten`, and that logic circumvented my fix, creating its own singleton instance for Optional[Tensor]. We can't test this in core for the same reason as the last fix, so companion functorch tests here: pytorch/functorch#820 [ghstack-poisoned]
This should fix
at::index.Tensorfor functionalization and address pytorch/torchdynamo#88.The bug
we have a bunch of code that expects all of our
c10::Typeobjects to have a unique singleton instance, but it turns out that this isn't true for container types (likec10::optional).It turns out that the
IValue::isOptionalTensorListfunction I added earlier in #75716 doesn't always work:That equality check calls this code:
Every
c10::Typecan be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).You can call
c10::getTypePtr<T>(), and get back the singleton instance of the pointer to that type. Foroptional<T>, that singleton instance lives here.When I was debugging, I noticed that
isOptionalTensorListwas returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this withfunctionalize(), but I couldn't repro it directly with test in core. Changing to this code worked:So why do we have more than one "static singleton" instance of the same type object? The singleton instance for
c10::optionallives here, and is defined in a header file (it has to be because it's a template).I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file here. Basically, since functorch and pytorch core live in two separate
.sofile, we'll end up with a new static singleton instance for each library.We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance per
optional<T>type.I ended up doing it by converting each
Tinto aTypePtrobject, and keeping a mapping fromTypePtrobjects of the inner type to the static singleton instances in the .cpp file.Testing?
I couldn't figure out how to repro this failure in core, since I think the
functionalize()failure came from the fact that we're loading multiple libraries that we're invoking thec10::getTypePtrcall from (libtorch_cpu.soandfunctorch/_C.so).I confirmed that with this patch, this code runs successfully (it would break before)
Generalizing this fix to other container types?
This bug probably affects the other container
c10::Types, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.Stack from ghstack: