Skip to content

fix static init issue with JIT container types#76085

Closed
bdhirsh wants to merge 9 commits intogh/bdhirsh/211/basefrom
gh/bdhirsh/211/head
Closed

fix static init issue with JIT container types#76085
bdhirsh wants to merge 9 commits intogh/bdhirsh/211/basefrom
gh/bdhirsh/211/head

Conversation

@bdhirsh
Copy link
Collaborator

@bdhirsh bdhirsh commented Apr 20, 2022

This should fix at::index.Tensor for functionalization and address pytorch/torchdynamo#88.

The bug

we have a bunch of code that expects all of our c10::Type objects to have a unique singleton instance, but it turns out that this isn't true for container types (like c10::optional).

It turns out that the IValue::isOptionalTensorList function I added earlier in #75716 doesn't always work:

  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();

That equality check calls this code:

template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}

Every c10::Type can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call c10::getTypePtr<T>(), and get back the singleton instance of the pointer to that type. For optional<T>, that singleton instance lives here.

When I was debugging, I noticed that isOptionalTensorList was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with functionalize(), but I couldn't repro it directly with test in core. Changing to this code worked:

  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for c10::optional lives here, and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file here. Basically, since functorch and pytorch core live in two separate .so file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance per optional<T> type.

I ended up doing it by converting each T into a TypePtr object, and keeping a mapping from TypePtr objects of the inner type to the static singleton instances in the .cpp file.

Testing?

I couldn't figure out how to repro this failure in core, since I think the functionalize() failure came from the fact that we're loading multiple libraries that we're invoking the c10::getTypePtr call from (libtorch_cpu.so and functorch/_C.so).

I confirmed that with this patch, this code runs successfully (it would break before)

import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)

Generalizing this fix to other container types?

This bug probably affects the other container c10::Types, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.

Stack from ghstack:

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 20, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit c8b962f (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@bdhirsh bdhirsh requested a review from ezyang April 20, 2022 14:11
@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Apr 20, 2022

I tagged @ezyang for this fix, but I'm not sure if there's someone from JIT who would also want to look at this PR.

This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
if (optionalTypePtrs.find(inner) == optionalTypePtrs.end()) {
TypePtr t = TypeFactory::create<OptionalType>(inner);
optionalTypePtrs.emplace(inner, t);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need locking here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep 😬 will add (and it shouldn't matter if it's slow since it's behind a static singleton)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update - locking fixed the bug described above lol. With the lock, this now works:

static const auto& call() {
    static auto inner_type = getTypePtr_<T>::call();
    static auto type = OptionalType::get(inner_type);
    return type;
}

I'm going to try to update the other container types to do something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above -- move & be careful not to use after move in the return statement

@ezyang ezyang requested a review from eellison April 21, 2022 02:00
@ezyang
Copy link
Contributor

ezyang commented Apr 21, 2022

@eellison do you think you'd be able to help review this?

This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Apr 21, 2022

OK, I'm A+ for this approach then. Here's the approve.

@eellison
Copy link
Contributor

@ezyang hey sorry do you still need me to take a look ?

@bdhirsh bdhirsh changed the title [POC] fix static init issue with JIT container types fix static init issue with JIT container types Apr 21, 2022
@ezyang
Copy link
Contributor

ezyang commented Apr 21, 2022

I think a high level "this makes sense for JIT C++ types" nod would be a help. PR is not long.

This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look.. I don't know if i'm the best person to review this as is this a little deep in the weeds of C++ for me. Maybe @swolchok ? who has also done a good amount of work optimizing TypePtrs

// the type List<T>.
// The extra "identifier" argument is needed beccause we have multiple container types
// that all re-use this function (List<T>, array<T, N>, etc.)
static TypePtr get(std::string identifier, TypePtr inner);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make change the std::string identifier to an Enum?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't make it an enum because of array<T, N> - we basically need a different output type for every different container template (and std::string gives maybe too much freedom, but I figured it was simple)

@eellison eellison requested a review from swolchok April 22, 2022 00:36
bdhirsh added 2 commits April 22, 2022 05:45
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
// otherwise we'll end up with one singleton instance per shared library.
// (Concatenating the length onto the end of the string because we want a unique
// type_ptr created for every std::array<T, N> type).
static auto type = ListType::get(std::string("array") + std::to_string(N), inner_type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is definitely possible to produce the string here at compile time, but maybe it doesn't matter enough.

Comment on lines +18 to +19
// This hashing is all hidden behind a static initializer so it
// doesn't have to be optimal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, might be a good idea to use hash_combine then

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw that we have an at::hash_combine - I'll use that, thanks!

Comment on lines +267 to +268
TypePtr t = TypeFactory::create<OptionalType>(inner);
containerTypePtrs.emplace(inner, t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good: std::move(t)
better: just eliminate t and create it directly

also, std::move(inner) in the emplace call

Comment on lines +280 to +284
if (containerTypePtrs.find(key) == containerTypePtrs.end()) {
TypePtr t = ListType::create(inner);
containerTypePtrs.emplace(key, t);
}
return containerTypePtrs[key];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, but needs some extra care to avoid use-after-move, and this still does some extra refcount bumps we could potentially avoid

auto it = containerTypePtrs.find(key);
if (it == containerTypePtrs.end()) {
  it = containerTypePtrs.emplace(std::move(key), ListType::create(std::move(inner));
}
return it->second;

if (optionalTypePtrs.find(inner) == optionalTypePtrs.end()) {
TypePtr t = TypeFactory::create<OptionalType>(inner);
optionalTypePtrs.emplace(inner, t);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above -- move & be careful not to use after move in the return statement

This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Apr 25, 2022

@pytorchbot merge this please

@github-actions
Copy link
Contributor

Hey @bdhirsh.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request Apr 26, 2022
Summary:
Pull Request resolved: #76085

Approved by: https://github.com/ezyang

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/40f3e85005b60dab26b89c6b09e3fb7cf686d2eb

Reviewed By: osalpekar

Differential Revision: D35938182

Pulled By: bdhirsh

fbshipit-source-id: f21f52255d49daac558c1c595dee2894405d14fd
@facebook-github-bot facebook-github-bot deleted the gh/bdhirsh/211/head branch April 29, 2022 14:17
bdhirsh added a commit that referenced this pull request May 19, 2022
…ton bug"

This should fix the remaining dynamo <> functionalization integration issue.


I had a fix earlier for JIT containers not correctly having singleton instances [here](#76085), which fixed a bug in this code for functionalization:
```
def f(a, b):
    return a[b]

functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long))
```

But apparently the following code is still broken:
```
def f(a, b):
    return torch.ops.aten.index(a, b)

functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long))
```

Why? we have separate schema parsing logic for the ops in `torch.ops.aten`, and that logic circumvented my fix, creating its own singleton instance for Optional[Tensor].


We can't test this in core for the same reason as the last fix, so companion functorch tests here: pytorch/functorch#820




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request May 19, 2022
This should fix the remaining dynamo <> functionalization integration issue.


I had a fix earlier for JIT containers not correctly having singleton instances [here](#76085), which fixed a bug in this code for functionalization:
```
def f(a, b):
    return a[b]

functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long))
```

But apparently the following code is still broken:
```
def f(a, b):
    return torch.ops.aten.index(a, b)

functionalize(foo)(torch.arange(3), torch.ones(2, dtype=torch.long))
```

Why? we have separate schema parsing logic for the ops in `torch.ops.aten`, and that logic circumvented my fix, creating its own singleton instance for Optional[Tensor].


We can't test this in core for the same reason as the last fix, so companion functorch tests here: pytorch/functorch#820




[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants