Skip to content

Unify TensorOptions signatures#39611

Closed
smessmer wants to merge 37 commits intogh/smessmer/225/basefrom
gh/smessmer/225/head
Closed

Unify TensorOptions signatures#39611
smessmer wants to merge 37 commits intogh/smessmer/225/basefrom
gh/smessmer/225/head

Conversation

@smessmer
Copy link
Contributor

@smessmer smessmer commented Jun 5, 2020

Stack from ghstack:

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important.

This PR will break forward compatibility but only in the sense that newly written models might use the new feature of calling the ops with optional values and those new models will not be runnable on old PyTorch versions. Any model existing today will not be FC broken and can be serialized with a new PyTorch to run on older PyTorch versions.

Detailed reasoning:

This doesn't break BC because an IValue holding a optional that is defined looks exactly the same as an IValue holding a non-optional ScalarType. So any old model that got serialized and expects to call the non-optional operators will be able to call the optional operators and PyTorch won't be able to tell the difference.

This also doesn't immediately break FC since any model written today will only call those ops with defined values, so even if a pre-existing model gets re-serialized after my change, the values will be serialized as defined values and an old pytorch will be able to call the non-optional version of the operator with it.

However, after this change, users will be able to write new models that call these operators with None values. And if somebody writes such a model, then it will not be loadable on older PyTorch. So in a sense, it is FC breaking.

Differential Revision: D21915788

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jun 5, 2020
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

ghstack-source-id: 105381798
Pull Request resolved: #39611
@smessmer smessmer requested review from bhosmer and ezyang June 5, 2020 23:42
@dr-ci
Copy link

dr-ci bot commented Jun 5, 2020

💊 CI failures summary and remediations

As of commit fb61874 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 2/3 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 08 21:49:32 ERROR [0.109s]: test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest)
Jul 08 21:49:16   test_common_errors (__main__.RendezvousTCPTest) ... ok (0.001s) 
Jul 08 21:49:16   test_nominal (__main__.RendezvousTCPTest) ... ok (0.001s) 
Jul 08 21:49:26   test_tcp_store_timeout_set (__main__.RendezvousTCPTest) ... ok (10.002s) 
Jul 08 21:49:26   test_unknown_handler (__main__.RendezvousTest) ... ok (0.001s) 
Jul 08 21:49:26   test_address_already_in_use (__main__.TCPStoreTest) ... ok (0.002s) 
Jul 08 21:49:26   test_set_get (__main__.TCPStoreTest) ... ok (0.003s) 
Jul 08 21:49:29   test_default_store_timeout_gloo (__main__.TimeoutTest) ... ok (3.020s) 
Jul 08 21:49:32   test_default_store_timeout_nccl (__main__.TimeoutTest) ... ok (3.003s) 
Jul 08 21:49:32  
Jul 08 21:49:32 ====================================================================== 
Jul 08 21:49:32 ERROR [0.109s]: test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) 
Jul 08 21:49:32 ---------------------------------------------------------------------- 
Jul 08 21:49:32 Traceback (most recent call last): 
Jul 08 21:49:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 204, in wrapper 
Jul 08 21:49:32     self._join_processes(fn) 
Jul 08 21:49:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 311, in _join_processes 
Jul 08 21:49:32     self._check_return_codes(elapsed_time) 
Jul 08 21:49:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 344, in _check_return_codes 
Jul 08 21:49:32     raise RuntimeError(error) 
Jul 08 21:49:32 RuntimeError: Processes 3 exited with error code 10 
Jul 08 21:49:32  

ci.pytorch.org: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 279 times.

@ezyang
Copy link
Contributor

ezyang commented Jun 8, 2020

The last time we attempted this was #34250 I think there is a reason you cannot conveniently do this. @bhosmer might remember more clearly.

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jun 8, 2020
Pull Request resolved: #39611

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.
ghstack-source-id: 105490308

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jun 9, 2020
Pull Request resolved: #39611

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.
ghstack-source-id: 105560051

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
@smessmer smessmer mentioned this pull request Jun 10, 2020
smessmer added a commit that referenced this pull request Jun 10, 2020
Pull Request resolved: #39611

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.
ghstack-source-id: 105656331

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jun 11, 2020
Pull Request resolved: #39611

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.
ghstack-source-id: 105733177

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jun 23, 2020
Pull Request resolved: #39611

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.
ghstack-source-id: 106426628

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jun 23, 2020
Pull Request resolved: #39611

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.
ghstack-source-id: 106453931

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jun 23, 2020
Pull Request resolved: #39611

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.
ghstack-source-id: 106459644

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jun 26, 2020
Pull Request resolved: #39611

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.
ghstack-source-id: 106709972

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)
@ezyang
Copy link
Contributor

ezyang commented Jun 29, 2020

NB: this breaks forward compatibility

A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important.

This PR will break forward compatibility.

Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
smessmer added 4 commits July 8, 2020 01:02
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important.

This PR will break forward compatibility but only in the sense that newly written models might use the new feature of calling the ops with optional values and those new models will not be runnable on old PyTorch versions. Any model existing today will not be FC broken and can be serialized with a new PyTorch to run on older PyTorch versions.

Detailed reasoning:
---
This doesn't break BC because an IValue holding a optional<ScalarType> that is defined looks exactly the same as an IValue holding a non-optional ScalarType. So any old model that got serialized and expects to call the non-optional operators will be able to call the optional operators and PyTorch won't be able to tell the difference.

This also doesn't immediately break FC since any model written today will only call those ops with defined values, so even if a pre-existing model gets re-serialized after my change, the values will be serialized as defined values and an old pytorch will be able to call the non-optional version of the operator with it.

However, after this change, users will be able to write new models that call these operators with None values. And if somebody writes such a model, then it will not be loadable on older PyTorch. So in a sense, it is FC breaking.


Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important.

This PR will break forward compatibility but only in the sense that newly written models might use the new feature of calling the ops with optional values and those new models will not be runnable on old PyTorch versions. Any model existing today will not be FC broken and can be serialized with a new PyTorch to run on older PyTorch versions.

Detailed reasoning:
---
This doesn't break BC because an IValue holding a optional<ScalarType> that is defined looks exactly the same as an IValue holding a non-optional ScalarType. So any old model that got serialized and expects to call the non-optional operators will be able to call the optional operators and PyTorch won't be able to tell the difference.

This also doesn't immediately break FC since any model written today will only call those ops with defined values, so even if a pre-existing model gets re-serialized after my change, the values will be serialized as defined values and an old pytorch will be able to call the non-optional version of the operator with it.

However, after this change, users will be able to write new models that call these operators with None values. And if somebody writes such a model, then it will not be loadable on older PyTorch. So in a sense, it is FC breaking.


Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important.

This PR will break forward compatibility but only in the sense that newly written models might use the new feature of calling the ops with optional values and those new models will not be runnable on old PyTorch versions. Any model existing today will not be FC broken and can be serialized with a new PyTorch to run on older PyTorch versions.

Detailed reasoning:
---
This doesn't break BC because an IValue holding a optional<ScalarType> that is defined looks exactly the same as an IValue holding a non-optional ScalarType. So any old model that got serialized and expects to call the non-optional operators will be able to call the optional operators and PyTorch won't be able to tell the difference.

This also doesn't immediately break FC since any model written today will only call those ops with defined values, so even if a pre-existing model gets re-serialized after my change, the values will be serialized as defined values and an old pytorch will be able to call the non-optional version of the operator with it.

However, after this change, users will be able to write new models that call these operators with None values. And if somebody writes such a model, then it will not be loadable on older PyTorch. So in a sense, it is FC breaking.


Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
A few ops have been taking non-optional ScalarType, Device and Layout. That isn't supported by the hacky wrapper that makes those
kernels work with the c10 operator library. This PR unifies the signatures and makes those ops c10-full.

This should be backwards compatible because non-optional things are a subtype of optional things in the binary representation on the stack, i.e. if you read an optional from the stack but there was a non-optional, that's fine, so unboxing is BC. Boxing is not BC because it writes to the stack and that's contravariant, but for loading mobile models only the BC of unboxing is important.

This PR will break forward compatibility but only in the sense that newly written models might use the new feature of calling the ops with optional values and those new models will not be runnable on old PyTorch versions. Any model existing today will not be FC broken and can be serialized with a new PyTorch to run on older PyTorch versions.

Detailed reasoning:
---
This doesn't break BC because an IValue holding a optional<ScalarType> that is defined looks exactly the same as an IValue holding a non-optional ScalarType. So any old model that got serialized and expects to call the non-optional operators will be able to call the optional operators and PyTorch won't be able to tell the difference.

This also doesn't immediately break FC since any model written today will only call those ops with defined values, so even if a pre-existing model gets re-serialized after my change, the values will be serialized as defined values and an old pytorch will be able to call the non-optional version of the operator with it.

However, after this change, users will be able to write new models that call these operators with None values. And if somebody writes such a model, then it will not be loadable on older PyTorch. So in a sense, it is FC breaking.


Differential Revision: [D21915788](https://our.internmc.facebook.com/intern/diff/D21915788/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b8d2ccf.

@smessmer smessmer mentioned this pull request Jul 9, 2020
smessmer added a commit that referenced this pull request Jul 9, 2020
#39611 unified signatures of some ops taking TensorOptions arguments by making them optional.
That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR).

However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch,
because we materialize default values when exporting. This is bad.

This PR reverts that particular part of #39611 to revert the FC breakage.

Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jul 9, 2020
#39611 unified signatures of some ops taking TensorOptions arguments by making them optional.
That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR).

However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch,
because we materialize default values when exporting. This is bad.

This PR reverts that particular part of #39611 to revert the FC breakage.

Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/)

ghstack-source-id: 107456654
Pull Request resolved: #41198
smessmer added a commit that referenced this pull request Jul 9, 2020
#39611 unified signatures of some ops taking TensorOptions arguments by making them optional.
That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR).

However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch,
because we materialize default values when exporting. This is bad.

This PR reverts that particular part of #39611 to revert the FC breakage.

Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jul 9, 2020
#39611 unified signatures of some ops taking TensorOptions arguments by making them optional.
That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR).

However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch,
because we materialize default values when exporting. This is bad.

This PR reverts that particular part of #39611 to revert the FC breakage.

Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/)

[ghstack-poisoned]
smessmer added a commit that referenced this pull request Jul 9, 2020
Pull Request resolved: #41198

#39611 unified signatures of some ops taking TensorOptions arguments by making them optional.
That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR).

However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch,
because we materialize default values when exporting. This is bad.

This PR reverts that particular part of #39611 to revert the FC breakage.
ghstack-source-id: 107475024

Differential Revision: [D22461661](https://our.internmc.facebook.com/intern/diff/D22461661/)
@facebook-github-bot facebook-github-bot deleted the gh/smessmer/225/head branch July 12, 2020 14:18
facebook-github-bot pushed a commit that referenced this pull request Jul 14, 2020
Summary:
Pull Request resolved: #41198

#39611 unified signatures of some ops taking TensorOptions arguments by making them optional.
That has FC implications but only for models writting with a PyTorch version after that version (see explanation in description of that PR).

However, it also changed the default from `pin_memory=False` to `pin_memory=None`, which actually breaks FC for preexisting models too if they're re-exported with a newer PyTorch,
because we materialize default values when exporting. This is bad.

This PR reverts that particular part of #39611 to revert the FC breakage.
ghstack-source-id: 107475024

Test Plan: waitforsandcastle

Reviewed By: bhosmer

Differential Revision: D22461661

fbshipit-source-id: ba2776267c3bba97439df66ecb50be7c1971d20d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants