Skip to content

Restore storage on meta tensors; increase meta coverage#53973

Closed
ezyang wants to merge 16 commits intogh/ezyang/945/basefrom
gh/ezyang/945/head
Closed

Restore storage on meta tensors; increase meta coverage#53973
ezyang wants to merge 16 commits intogh/ezyang/945/basefrom
gh/ezyang/945/head

Conversation

@ezyang
Copy link
Copy Markdown
Contributor

@ezyang ezyang commented Mar 14, 2021

Stack from ghstack:

Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y.

The first part is restoring the concept of storage to meta tensors. Previously, meta tensors had a nullptr storage (e.g., meta_tensor.storage() is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by:

  • Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage
  • Turn on memory overlap checking in TensorIterator even for meta tensors
  • Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So x.storage() still raises an error and I have a cludge in __deepcopy__ to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment).

The second part is adding more support for the most used functions in the test suite.

  • Inplace operations have very simple meta functions. I added fill_, zero_, random_, uniform_ and normal_. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!)
  • copy_ is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case)
  • empty_strided usage from structured kernels now is implemented (TBH, this could have been done as soon as empty_strided was added)
  • Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them
  • Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway)
  • apply_, map_ and map2_ are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently.

Getting more meta function support triggers a number of bugs in the test suite, which I then fix:

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D27036572

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Mar 14, 2021

💊 CI failures summary and remediations

As of commit 8b44643 (more details on the Dr. CI page):


  • 7/7 failures possibly* introduced in this PR
    • 1/7 non-scanned failure(s)

6 failures not recognized by patterns:

Job Step Action
GitHub Actions quick-checks Unknown 🔁 rerun
GitHub Actions clang-tidy Unknown 🔁 rerun
GitHub Actions flake8-py3 Unknown 🔁 rerun
GitHub Actions mypy Unknown 🔁 rerun
GitHub Actions clang-format Unknown 🔁 rerun
GitHub Actions test Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ezyang added a commit that referenced this pull request Mar 14, 2021
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 80c5a44
Pull Request resolved: #53973
…operations"

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27036572](https://our.internmc.facebook.com/intern/diff/D27036572)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 14, 2021
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: fe0c1c4
Pull Request resolved: #53973
…operations"

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27036572](https://our.internmc.facebook.com/intern/diff/D27036572)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 14, 2021
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 21d9f7b
Pull Request resolved: #53973
…operations"

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27036572](https://our.internmc.facebook.com/intern/diff/D27036572)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 15, 2021
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: a705a13
Pull Request resolved: #53973
…operations"

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27036572](https://our.internmc.facebook.com/intern/diff/D27036572)

[ghstack-poisoned]
…operations"

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27036572](https://our.internmc.facebook.com/intern/diff/D27036572)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 17, 2021
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: c99f210
Pull Request resolved: #53973
@ezyang ezyang requested review from bdhirsh, bhosmer and pbelevich March 17, 2021 15:11
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 17, 2021

After this PR, here is the composition of reasons why tests with "meta" in their name skip:

{'CUDA not available': 4,
 'CUDA not found': 12,
 "Doesn't run on meta": 1985,
 'Only runs on cpu': 417,
 'Only runs on cuda': 127,
 'Requires SciPy': 4,
 'SciPy not found': 19,
 'Scipy not found': 3,
 'Scipy not found or older than 1.4.1': 1,
 'Scipy required for the test.': 82,
 'Scipy v1.0 and/or numpy not found': 5,
 'See https://github.com/pytorch/pytorch/pull/32720': 2,
 'Skipped!': 317,
 'Skipped! Inplace gradcheck marked to skip.': 121,
 'Skipped! Inplace gradgradcheck marked to skip.': 121,
 "Skipped! Op doesn't support out= kwarg.": 24,
 'Skipped! See https://github.com/pytorch/pytorch/issues/44489': 6,
 'Skipped! Unsupported dtypes for Sparse': 5,
 'Skipped! autograd not supported': 20,
 'Skipped! complex grad tests marked to skip.': 4,
 'This op does not handle extremal values': 13,
 'This op does not handle large values': 27,
 'Unknown device type': 5,
 'aten::_cat': 14,
 'aten::_cdist_forward': 3,
 'aten::_compute_linear_combination': 2,
 'aten::_copy_from': 3941,
 'aten::_cummax_helper': 1,
 'aten::_cumprod': 1,
 'aten::_cumsum': 1,
 'aten::_foreach_add.List': 9,
 'aten::_foreach_add.Scalar': 36,
 'aten::_foreach_add.ScalarList': 45,
 'aten::_linalg_qr_helper': 25,
 'aten::_log_softmax': 2,
 'aten::_logcumsumexp': 1,
 'aten::_lu_with_info': 5,
 'aten::_pdist_forward': 2,
 'aten::_svd_helper': 144,
 'aten::_symeig_helper': 1,
 'aten::abs.out': 37,
 'aten::acos': 10,
 'aten::acosh': 30,
 'aten::addcdiv.out': 2,
 'aten::addcmul.out': 16,
 'aten::addmm': 2,
 'aten::amax.out': 6,
 'aten::amin.out': 7,
 'aten::angle': 24,
 'aten::arange.start_out': 25,
 'aten::as_strided': 1314,
 'aten::asin': 36,
 'aten::asinh': 30,
 'aten::atan': 30,
 'aten::atan2': 1,
 'aten::atanh': 30,
 'aten::bitwise_not.out': 24,
 'aten::clamp': 11,
 'aten::clamp_min.out': 7,
 'aten::clone': 527,
 'aten::col2im': 1,
 'aten::copysign.Tensor': 36,
 'aten::cos': 24,
 'aten::diag.out': 7,
 'aten::div.Tensor': 10,
 'aten::div.Tensor_mode': 16,
 'aten::div_.Tensor': 1,
 'aten::elu': 1,
 'aten::eq.Scalar': 25,
 'aten::eq.Tensor': 16,
 'aten::exp': 24,
 'aten::exp2': 24,
 'aten::expm1': 24,
 'aten::exponential_': 1,
 'aten::eye.m_out': 53,
 'aten::fill_.Scalar': 233,
 'aten::flip': 18,
 'aten::ge.Scalar': 1,
 'aten::geqrf': 1,
 'aten::glu': 1,
 'aten::gt.Scalar': 26,
 'aten::im2col': 1,
 'aten::index.Tensor': 1,
 'aten::isnan': 3,
 'aten::kthvalue.values': 1,
 'aten::le.Scalar': 1,
 'aten::lerp.Tensor': 1,
 'aten::lgamma': 1,
 'aten::linalg_slogdet': 4,
 'aten::linspace.out': 32,
 'aten::log': 24,
 'aten::log10': 24,
 'aten::log2': 24,
 'aten::logical_not.out': 14,
 'aten::logspace.out': 18,
 'aten::lt.Scalar': 2,
 'aten::max': 7,
 'aten::max.dim': 6,
 'aten::max_pool3d_with_indices': 1,
 'aten::maximum': 13,
 'aten::mean': 2,
 'aten::median': 2,
 'aten::min': 11,
 'aten::min.dim': 6,
 'aten::minimum': 6,
 'aten::mm': 3,
 'aten::mul.Tensor': 5006,
 'aten::mul.out': 48,
 'aten::mul_.Tensor': 12,
 'aten::neg': 25,
 'aten::nll_loss_forward': 1,
 'aten::nonzero': 11,
 'aten::nonzero.out': 1,
 'aten::norm.ScalarOpt_dim': 4,
 'aten::pow.Scalar': 1,
 'aten::put_': 1,
 'aten::randperm.generator_out': 21,
 'aten::range.out': 2,
 'aten::reciprocal': 24,
 'aten::remainder.Tensor': 1,
 'aten::replication_pad1d': 1,
 'aten::replication_pad2d': 1,
 'aten::rsqrt': 24,
 'aten::sigmoid': 3,
 'aten::signbit.out': 20,
 'aten::sinc': 24,
 'aten::sort': 11,
 'aten::sqrt': 34,
 'aten::std_mean': 1,
 'aten::std_mean.dim': 2,
 'aten::sub.Tensor': 209,
 'aten::sum': 1,
 'aten::take': 1,
 'aten::tan': 24,
 'aten::tanh': 24,
 'aten::threshold': 1,
 'aten::trace': 1,
 'aten::triu.out': 2,
 'aten::unfold': 1,
 'aten::var.dim': 1,
 'aten::var_mean': 1,
 'aten::var_mean.dim': 2,
 'aten::view': 236,
 'aten::view_as_real': 2,
 'aten::xlogy.Scalar_Other': 1,
 'aten::xlogy.Tensor': 1,
 'fewer than 2 devices detected': 13,
 'not_implemented: {exc_value}': 102,
 'real and imag not implemented for complex': 1,
 'sparse': 1,
 "test doesn't work with meta tensors": 22,
 'test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test': 66,
 'test require SciPy, but SciPy not found': 30,
 'torch.storage': 833,
 'unconditional skip': 1}

Comment thread aten/src/ATen/native/Distributions.cpp Outdated
ezyang added a commit that referenced this pull request Mar 25, 2021
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: e158edb
Pull Request resolved: #53973
@ezyang ezyang changed the title Add some trivial meta functions for commonly used inplace operations Restore storage on meta tensors; increase meta coverage Mar 25, 2021
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 25, 2021

@bdhirsh I added some more "heft" to this PR, so you might want to take a second look, at least at the updated description.

if (!config.check_mem_overlap_) {
return;
}
if (is_meta_) {
Copy link
Copy Markdown
Collaborator

@bdhirsh bdhirsh Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, do you want to take this out for more semantic reasons or perf reasons?

Semantic: We no longer have to special case meta tensors for memory overlap, it "just works".

perf: We get one-less branch in happy-path TI code, although the benefit is probably minimal. Conversely, if we start to care about meta tensor performance we might want to add this back in to exit early.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's semantic reasons!

// Copies into meta self are OK and just ignored (similar to inplace)
if (self.is_meta()) {
// TODO: need to see if there is extra error checking needed
return self;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some specialization stuff going on above this code for FBGEMM. Do we need to worry about meta tensors being used with FBGEMM enabled? It might be worth moving this check further up in case people do funky early-returns closer to the top of the function 😛

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's safe because fbgemm only applies when it's CPU. But even better would be to actually get dispatch keys working correctly on this function

Comment thread aten/src/ATen/native/MetaTensor.cpp Outdated
// therefore meta does not track it (this is not a forced choice, but it's
// the choice we made)

check_size_nonnegative(size);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can kill this, since you end up calling empty_meta -> empty_generic -> check_size_nonnegative further down

Comment thread torch/csrc/utils/tensor_apply.cpp Outdated
Tensor & map2_(Tensor & self, const Tensor & x_, const Tensor & y_, PyObject* fn) {
if (self.is_meta()) {
return self; // Just skip
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about meta performing accurate error reporting in the below cases, like when x_ and y_ have different dtypes? These tensor_apply methods seem like special snowflakes, so maybe we don't care as much about total meta/impl parity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't even check this. Yes, you're right, ideally we would have matching error checking.

@bdhirsh
Copy link
Copy Markdown
Collaborator

bdhirsh commented Mar 25, 2021

Thanks! Just left some minor comments but nothing worth blocking the PR over.

// CPU ready queue is per GraphTask, but CUDA device ready queues are shared across all graph tasks
auto Engine::ready_queue(std::shared_ptr<ReadyQueue> cpu_ready_queue, at::Device device) -> std::shared_ptr<ReadyQueue>{
if (device.type() == at::kCPU) {
if (device.type() == at::kCPU || device.type() == at::DeviceType::Meta) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait you send meta Tensors through the autograd????

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some device-generic autograd tests so... yes :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that means that you can use that to run a full forward/backward without doing any actual computation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory! (In practice I don't think we have enough operators implemented yet)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds exciting!

Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y.

The first part is restoring the concept of storage to meta tensors.  Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by:

* Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage
* Turn on memory overlap checking in TensorIterator even for meta tensors
* Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment).

The second part is adding more support for the most used functions in the test suite.

* Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!)
* `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case)
* `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added)
* Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them
* Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway)
* `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently.

Getting more meta function support triggers a number of bugs in the test suite, which I then fix:

- Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in #53739
- dlpack obviously doesn't work with meta tensors, I just disabled the test

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27036572](https://our.internmc.facebook.com/intern/diff/D27036572)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 26, 2021
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: f42c5bc
Pull Request resolved: #53973
Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y.

The first part is restoring the concept of storage to meta tensors.  Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by:

* Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage
* Turn on memory overlap checking in TensorIterator even for meta tensors
* Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment).

The second part is adding more support for the most used functions in the test suite.

* Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!)
* `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case)
* `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added)
* Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them
* Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway)
* `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently.

Getting more meta function support triggers a number of bugs in the test suite, which I then fix:

- Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in #53739
- dlpack obviously doesn't work with meta tensors, I just disabled the test

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27036572](https://our.internmc.facebook.com/intern/diff/D27036572)

[ghstack-poisoned]
Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y.

The first part is restoring the concept of storage to meta tensors.  Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by:

* Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage
* Turn on memory overlap checking in TensorIterator even for meta tensors
* Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment).

The second part is adding more support for the most used functions in the test suite.

* Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!)
* `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case)
* `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added)
* Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them
* Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway)
* `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently.

Getting more meta function support triggers a number of bugs in the test suite, which I then fix:

- Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in #53739
- dlpack obviously doesn't work with meta tensors, I just disabled the test

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D27036572](https://our.internmc.facebook.com/intern/diff/D27036572)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 28, 2021
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 9d9f1a4
Pull Request resolved: #53973
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ezyang merged this pull request in 1f36ce6.

@facebook-github-bot facebook-github-bot deleted the gh/ezyang/945/head branch April 2, 2021 14:17
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#53973

Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y.

The first part is restoring the concept of storage to meta tensors.  Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by:

* Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage
* Turn on memory overlap checking in TensorIterator even for meta tensors
* Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment).

The second part is adding more support for the most used functions in the test suite.

* Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!)
* `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case)
* `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added)
* Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them
* Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway)
* `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently.

Getting more meta function support triggers a number of bugs in the test suite, which I then fix:

- Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in pytorch#53739
- dlpack obviously doesn't work with meta tensors, I just disabled the test

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D27036572

Test Plan: Imported from OSS

Reviewed By: agolynski, bdhirsh

Pulled By: ezyang

fbshipit-source-id: 7005ecf4feb92a643c37389fdfbd852dbf00ac78
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants