Unify TensorOptions signatures by smessmer · Pull Request #39611 · pytorch/pytorch

smessmer · 2020-06-05T23:40:20Z

Stack from ghstack:

Change to.dtype_layout to c10-full #40714 Change to.dtype_layout to c10-full
Unify TensorOptions signatures #39611 Unify TensorOptions signatures

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important.

This PR will break forward compatibility but only in the sense that newly written models might use the new feature of calling the ops with optional values and those new models will not be runnable on old PyTorch versions. Any model existing today will not be FC broken and can be serialized with a new PyTorch to run on older PyTorch versions.

Detailed reasoning:

This doesn't break BC because an IValue holding a optional that is defined looks exactly the same as an IValue holding a non-optional ScalarType. So any old model that got serialized and expects to call the non-optional operators will be able to call the optional operators and PyTorch won't be able to tell the difference.

This also doesn't immediately break FC since any model written today will only call those ops with defined values, so even if a pre-existing model gets re-serialized after my change, the values will be serialized as defined values and an old pytorch will be able to call the non-optional version of the operator with it.

However, after this change, users will be able to write new models that call these operators with None values. And if somebody writes such a model, then it will not be loadable on older PyTorch. So in a sense, it is FC breaking.

Differential Revision: D21915788

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) ghstack-source-id: 105381798 Pull Request resolved: #39611

dr-ci · 2020-06-05T23:48:52Z

💊 CI failures summary and remediations

As of commit fb61874 (more details on the Dr. CI page):

3/3 failures possibly* introduced in this PR
- 2/3 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 08 21:49:32 ERROR [0.109s]: test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest)

Jul 08 21:49:16   test_common_errors (__main__.RendezvousTCPTest) ... ok (0.001s) 
Jul 08 21:49:16   test_nominal (__main__.RendezvousTCPTest) ... ok (0.001s) 
Jul 08 21:49:26   test_tcp_store_timeout_set (__main__.RendezvousTCPTest) ... ok (10.002s) 
Jul 08 21:49:26   test_unknown_handler (__main__.RendezvousTest) ... ok (0.001s) 
Jul 08 21:49:26   test_address_already_in_use (__main__.TCPStoreTest) ... ok (0.002s) 
Jul 08 21:49:26   test_set_get (__main__.TCPStoreTest) ... ok (0.003s) 
Jul 08 21:49:29   test_default_store_timeout_gloo (__main__.TimeoutTest) ... ok (3.020s) 
Jul 08 21:49:32   test_default_store_timeout_nccl (__main__.TimeoutTest) ... ok (3.003s) 
Jul 08 21:49:32  
Jul 08 21:49:32 ====================================================================== 
Jul 08 21:49:32 ERROR [0.109s]: test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) 
Jul 08 21:49:32 ---------------------------------------------------------------------- 
Jul 08 21:49:32 Traceback (most recent call last): 
Jul 08 21:49:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 204, in wrapper 
Jul 08 21:49:32     self._join_processes(fn) 
Jul 08 21:49:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 311, in _join_processes 
Jul 08 21:49:32     self._check_return_codes(elapsed_time) 
Jul 08 21:49:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 344, in _check_return_codes 
Jul 08 21:49:32     raise RuntimeError(error) 
Jul 08 21:49:32 RuntimeError: Processes 3 exited with error code 10 
Jul 08 21:49:32

ci.pytorch.org: 2 failed

Failed: pr/pytorch-linux-xenial-rocm3.5.1-py3.6
Failed: pr/caffe2-pytorch-linux-xenial-rocm3.5.1-py3.6-test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 279 times.

ezyang · 2020-06-08T14:27:43Z

The last time we attempted this was #34250 I think there is a reason you cannot conveniently do this. @bhosmer might remember more clearly.

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

Pull Request resolved: #39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 105490308 Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

Pull Request resolved: #39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 105560051 Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

Pull Request resolved: #39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 105656331 Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

Pull Request resolved: #39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 105733177 Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

Pull Request resolved: #39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 106426628 Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

Pull Request resolved: #39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 106453931 Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

Pull Request resolved: #39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 106459644 Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

Pull Request resolved: #39611 A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. ghstack-source-id: 106709972 Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

aten/src/ATen/native/native_functions.yaml

ezyang · 2020-06-29T18:09:45Z

NB: this breaks forward compatibility

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important. This PR will break forward compatibility. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full. This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important. This PR will break forward compatibility but only in the sense that newly written models might use the new feature of calling the ops with optional values and those new models will not be runnable on old PyTorch versions. Any model existing today will not be FC broken and can be serialized with a new PyTorch to run on older PyTorch versions. Detailed reasoning: --- This doesn't break BC because an IValue holding a optional<ScalarType> that is defined looks exactly the same as an IValue holding a non-optional ScalarType. So any old model that got serialized and expects to call the non-optional operators will be able to call the optional operators and PyTorch won't be able to tell the difference. This also doesn't immediately break FC since any model written today will only call those ops with defined values, so even if a pre-existing model gets re-serialized after my change, the values will be serialized as defined values and an old pytorch will be able to call the non-optional version of the operator with it. However, after this change, users will be able to write new models that call these operators with None values. And if somebody writes such a model, then it will not be loadable on older PyTorch. So in a sense, it is FC breaking. Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/) [ghstack-poisoned]

facebook-github-bot · 2020-07-09T02:17:49Z

This pull request has been merged in b8d2ccf.

#39611 unified signatures of some ops taking TensorOptions arguments by making them optional. That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR). However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch, because we materialize default values when exporting. This is bad. This PR reverts that particular part of #39611 to revert the FC breakage. Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/) [ghstack-poisoned]

#39611 unified signatures of some ops taking TensorOptions arguments by making them optional. That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR). However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch, because we materialize default values when exporting. This is bad. This PR reverts that particular part of #39611 to revert the FC breakage. Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/) ghstack-source-id: 107456654 Pull Request resolved: #41198

#39611 unified signatures of some ops taking TensorOptions arguments by making them optional. That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR). However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch, because we materialize default values when exporting. This is bad. This PR reverts that particular part of #39611 to revert the FC breakage. Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/) [ghstack-poisoned]

Pull Request resolved: #41198 #39611 unified signatures of some ops taking TensorOptions arguments by making them optional. That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR). However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch, because we materialize default values when exporting. This is bad. This PR reverts that particular part of #39611 to revert the FC breakage. ghstack-source-id: 107475024 Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/)

Summary: Pull Request resolved: #41198 #39611 unified signatures of some ops taking TensorOptions arguments by making them optional. That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR). However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch, because we materialize default values when exporting. This is bad. This PR reverts that particular part of #39611 to revert the FC breakage. ghstack-source-id: 107475024 Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D22461661 fbshipit-source-id: ba2776267c3bba97439df66ecb50be7c1971d20d

This was referenced Jun 5, 2020

Show which type was the wrong one when a signature is invalid #39491

Closed

Assert that kernels are called with the right signature #38361

Closed

Move TensorOptions ops to c10 #39492

Closed

smessmer requested review from bhosmer and ezyang June 5, 2020 23:42

smessmer mentioned this pull request Jun 5, 2020

Gaps for making template-unboxing work for all operators #32366

Closed

24 tasks

smessmer mentioned this pull request Jun 10, 2020

TORCH_FN #39823

Closed

ezyang reviewed Jun 29, 2020

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved

ezyang approved these changes Jun 29, 2020

View reviewed changes

smessmer mentioned this pull request Jun 29, 2020

Change to.dtype_layout to c10-full #40714

Closed

smessmer added 4 commits July 8, 2020 01:02

facebook-github-bot closed this in b8d2ccf Jul 9, 2020

facebook-github-bot added the merged label Jul 9, 2020

smessmer mentioned this pull request Jul 9, 2020

Fix FC issue #41198

Closed

facebook-github-bot deleted the gh/smessmer/225/head branch July 12, 2020 14:18

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify TensorOptions signatures#39611

Unify TensorOptions signatures#39611
smessmer wants to merge 37 commits intogh/smessmer/225/basefrom
gh/smessmer/225/head

smessmer commented Jun 5, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Jun 5, 2020 •

edited

Loading

Uh oh!

ezyang commented Jun 8, 2020

Uh oh!

Uh oh!

ezyang commented Jun 29, 2020

Uh oh!

facebook-github-bot commented Jul 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

smessmer commented Jun 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Detailed reasoning:

Uh oh!

dr-ci bot commented Jun 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/1)

ci.pytorch.org: 2 failed

Uh oh!

ezyang commented Jun 8, 2020

Uh oh!

Uh oh!

ezyang commented Jun 29, 2020

Uh oh!

facebook-github-bot commented Jul 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

smessmer commented Jun 5, 2020 •

edited

Loading

dr-ci bot commented Jun 5, 2020 •

edited

Loading