Skip to content

teach ivalue about List[Optional[Tensor]], fix fallbacks#75716

Closed
bdhirsh wants to merge 7 commits intogh/bdhirsh/203/basefrom
gh/bdhirsh/203/head
Closed

teach ivalue about List[Optional[Tensor]], fix fallbacks#75716
bdhirsh wants to merge 7 commits intogh/bdhirsh/203/basefrom
gh/bdhirsh/203/head

Conversation

@bdhirsh
Copy link
Collaborator

@bdhirsh bdhirsh commented Apr 13, 2022

IValue's treat the [Optional[Tensor]] type a bit weird:

  • For normal c10::optional<Tensor> types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the c10::optional<Tensor> case directly (ivalue code)
  • For c10::list<c10::optional<Tensor>> though, boxed fallbacks need to handle that directly.

I added API's for checking / retrieving OptionalTensorList from an IValue, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them.

I also added a test for index_put_ (which used to fail, because once of its arguments was a list of optional tensors).

Stack from ghstack:

Differential Revision: D35705376

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 13, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 6107ac8 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@bdhirsh bdhirsh requested review from ezyang and zou3519 April 14, 2022 20:44
IValue's treat the `[Optional[Tensor]]` type a bit weird:
- For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020))
- For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly.

I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. 

I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors).






[ghstack-poisoned]
auto eager_ivalue = c10::IValue(c10::List<c10::optional<at::Tensor>>(
to_eager(ivalue.toOptionalTensorVector(), device_type)));
(*stack)[arguments_begin + idx] = std::move(eager_ivalue);
opt_tensorlist_args.push_back(ivalue.toOptionalTensorList());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @wconstab, I think this should fix index_put_ for LTC

Copy link
Contributor

@wconstab wconstab Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot (if I knew) what the problem with index_put_ was for LTC. Is this fixing something that is broken when index_put falls back? Your explanation in the PR makes sense, and the change lgtm. thanks!

TORCH_API bool isFunctionalTensor(const c10::ArrayRef<Tensor> t_list);

TORCH_API Tensor to_functional_tensor(const Tensor& tensor);
TORCH_API c10::optional<Tensor> to_functional_tensor(const c10::optional<Tensor>& tensor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does C++ pick which overload to use if you pass in to_functional_tensor(tensor)? (this doesn't really matter because the code compiles but I'm curious)

TORCH_API Tensor to_functional_tensor(const Tensor& tensor);
TORCH_API c10::optional<Tensor> to_functional_tensor(const c10::optional<Tensor>& tensor);
TORCH_API c10::List<Tensor> to_functional_tensor(const c10::List<Tensor>& t_list);
TORCH_API c10::List<c10::optional<Tensor>> to_functional_tensor(const c10::List<c10::optional<Tensor>>& t_list);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(no action required) We really need something like a c++ tree_map for all dispatcher types

} else if (ivalue.isOptionalTensorList()) {
any_tensor_inputs = true;
auto opt_tensors = ivalue.toOptionalTensorList();
if (at::functionalization::impl::isFunctionalTensor(opt_tensors)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an invariant that either all or none of these must be functional tensors, or can there be some hybrid mix?

Comment on lines +75 to +76
// We can't just call _to_eager() on the entire list of Tensors because it
// will break on undefined tensors. Separate out undefined tensors first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused -- std::vector<c10::optionalat::Tensor> probably came from an IValue that is List[Optional[Tensor]], but it can contain undefined tensors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just copy pasted from the impl above lol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to assume that inside optional they are never undefined, and things should work

@ezyang ezyang requested a review from ysiraichi April 15, 2022 01:09
return false;
}
const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
Copy link
Contributor

@ezyang ezyang Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come isListOf can't be used here oh because it's a compound type

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okey dokey. Check Richard's comments

IValue's treat the `[Optional[Tensor]]` type a bit weird:
- For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020))
- For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly.

I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. 

I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors).






[ghstack-poisoned]
bdhirsh added 2 commits April 15, 2022 15:26
IValue's treat the `[Optional[Tensor]]` type a bit weird:
- For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020))
- For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly.

I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. 

I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors).






[ghstack-poisoned]
IValue's treat the `[Optional[Tensor]]` type a bit weird:
- For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020))
- For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly.

I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them. 

I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors).






[ghstack-poisoned]
@bdhirsh
Copy link
Collaborator Author

bdhirsh commented Apr 17, 2022

@bdhirsh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge this

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge failed due to Command git -C /home/runner/work/pytorch/pytorch push origin master returned non-zero exit code 1
To https://github.com/pytorch/pytorch
! [remote rejected] master -> master (cannot lock ref 'refs/heads/master': is at 4c7b4b5 but expected 2602a5e)
error: failed to push some refs to 'https://github.com/pytorch/pytorch'

Raised by https://github.com/pytorch/pytorch/actions/runs/2185815282

facebook-github-bot pushed a commit that referenced this pull request Apr 18, 2022
Summary:
Pull Request resolved: #75716

IValue's treat the `[Optional[Tensor]]` type a bit weird:
- For normal `c10::optional<Tensor>` types, the optional-ness gets folded away into an undefined tensor. So boxed fallbacks don't need to handle the `c10::optional<Tensor>` case directly ([ivalue code](https://github.com/pytorch/pytorch/blob/c2d5f6a5a43ae3fd8ce78c88964c1357b94537b3/aten/src/ATen/core/ivalue_inl.h#L2020))
- For `c10::list<c10::optional<Tensor>>` though, boxed fallbacks need to handle that directly.

I added API's for checking / retrieving `OptionalTensorList` from an `IValue`, and also updated the functionalization + lazy_tensor_cpu boxed fallbacks to correctly handle them.

I also added a test for `index_put_` (which used to fail, because once of its arguments was a list of optional tensors).

Test Plan: Imported from OSS

Reviewed By: zhxchen17

Differential Revision: D35705376

Pulled By: bdhirsh

fbshipit-source-id: 217df0c059af01794bcc034642b8faff43539878
malfet pushed a commit that referenced this pull request Apr 20, 2022
Pull Request resolved: #75716

Approved by: https://github.com/ezyang

(cherry picked from commit 204df13)
bdhirsh added a commit that referenced this pull request Apr 20, 2022
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 21, 2022
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 21, 2022
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 22, 2022
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 22, 2022
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
@facebook-github-bot facebook-github-bot deleted the gh/bdhirsh/203/head branch April 22, 2022 14:17
bdhirsh added a commit that referenced this pull request Apr 25, 2022
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Apr 25, 2022
This should fix `at::index.Tensor` for functionalization and address pytorch/torchdynamo#88.

### The bug
we have a bunch of code that expects all of our `c10::Type` objects to have a unique singleton instance, but it turns out that this isn't true for container types (like `c10::optional`).

It turns out that the `IValue::isOptionalTensorList` function I added earlier in #75716 doesn't always work:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  return ty == c10::getTypePtr<c10::optional<at::Tensor>>();
```

That equality check calls [this](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type_base.h#L586) code:
```
template <typename T, typename U>
bool operator==(const SingletonOrSharedTypePtr<T>& x, const SingletonOrSharedTypePtr<U>& y) {
  return (void*)x.get() == (void*)y.get();
}
``` 

Every `c10::Type` can be compared, but it also has its own singleton instance to make equality checks cheaper (just check that the two singleton instances are the same, instead of comparing the full type objects).

You can call `c10::getTypePtr<T>()`, and get back the singleton instance of the pointer to that type. For `optional<T>`, that singleton instance lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871).

When I was debugging, I noticed that `isOptionalTensorList` was returning false because the two pointers being compared were different, but the actual type objects were equal. I was able to repro this with `functionalize()`, but I couldn't repro it directly with test in core. Changing to this code worked:
```
  const auto& ty = static_cast<detail::ListImpl*>(payload.u.as_intrusive_ptr)->elementType;
  const auto& expected_ty == c10::getTypePtr<c10::optional<at::Tensor>>();
  // compare pointers, but if that fails compare the actual type objects
  return expected_ty == ty || *expected_ty == *ty;
```

So why do we have more than one "static singleton" instance of the same type object? The singleton instance for `c10::optional` lives [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/jit_type.h#L1871), and is defined in a header file (it has to be because it's a template).

I think that's because "function local statics are duplicated across DSO's". We have a similar comment about the dispatcher singleton and why it needs to live in a .cpp file [here](https://github.com/pytorch/pytorch/blob/e20793b05426284d973ac25e563046c037b4e4b2/aten/src/ATen/core/dispatch/Dispatcher.h#L95). Basically, since functorch and pytorch core live in two separate `.so` file, we'll end up with a new static singleton instance for each library.

We can't just move the singleton into a cpp though, since the function is templated - we want one singleton instance *per `optional<T>` type*.

I ended up doing it by converting each `T` into a `TypePtr` object, and keeping a mapping from `TypePtr` objects of the inner type to the static singleton instances in the .cpp file.

### Testing?

I couldn't figure out how to repro this failure in core, since I think the `functionalize()` failure came from the fact that we're loading multiple libraries that we're invoking the `c10::getTypePtr` call from (`libtorch_cpu.so` and `functorch/_C.so`).

I confirmed that with this patch, this code runs successfully (it would break before)

```
import torch
from functorch import make_fx
from functorch.experimental import functionalize

def f(x, y):
    return x[y]

t1 = make_fx(functionalize(f))(torch.arange(3), torch.ones(2, dtype=torch.long))
print("Functionalized:\n", t1.graph)
```


### Generalizing this fix to other container types?

This bug probably affects the other container `c10::Type`s, like List/Dict. I put this up as a PoC first, but if this seems like a reasonable fix then I can use the same fix for the other container types too.




[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants