Change return type of Tensor::dtype() from ScalarType to TypeMeta by li-roy · Pull Request #12766 · pytorch/pytorch

li-roy · 2018-10-17T15:19:59Z

Stack:
    :black_circle: #12766 Change return type of Tensor::dtype() from ScalarType to TypeMeta  💛
    :white_circle: #12767 reduce Device to 32bits  💛
    :white_circle: #13172 use TypeMeta instead of ScalarType in TensorOptions  💛

In preparation of using TypeMeta in TensorOptions.

Differential Revision: D10232118

Differential Revision: D10232118 Differential Version: 60878911

ezyang · 2018-10-17T15:27:37Z

I see a bunch of call sites changed to use tensor.scalar_type() instead of tensor.dtype(), but this seems like a pessimization to me. Ideally, at::kDouble and friends would be TypeMetas, actually. A lot of sites that I see here could be "fixed" if we added something like operator==(const TypeMeta&, ScalarType) (and the other direction), prior to actually switching the constants to be DataType themselves (which I agree shouldn't be done in this patch.)

BTW: I'm not sure about this. Happy to talk about this some more. We can also get feedback from other people, like @gchanan and @goldsborough

goldsborough · 2018-10-17T15:31:16Z

I agree with @ezyang, operator== is your friend

goldsborough · 2018-10-17T15:33:03Z

@Roy-Li keep in mind every line you change in the C++ frontend equals one user who gets a confusing build error when they update and costs us github issues and friction with the community

Differential Revision: D10232118 Differential Version: 60941370

Differential Revision: D10232118 Differential Version: 61134801

Sign in to view

-  /// Returns a `Tensor`'s dtype (`ScalarType`). Defined in Type.h
-  ScalarType dtype() const noexcept;
+  /// Returns a `Tensor`'s dtype (`TypeMeta`). Defined in TensorMethods.h
+  const caffe2::TypeMeta& dtype() const noexcept;


ezyang

Approved but with comments.

actually, some things look weird

Differential Revision: D10232118 Differential Version: 61149897

Sign in to view

            at::Tensor a;
            pop(stack, a);
-            push(stack, static_cast<int64_t>(a.dtype()));
+            push(stack, static_cast<int64_t>(a.scalar_type()));


Sign in to view

    auto new_data = data.to(
        device.value_or(data.device()),
-        dtype.value_or(data.dtype()),
+        dtype.value_or(data.scalar_type()),


Differential Revision: D10232118 Differential Version: 61337003

Differential Revision: D10232118 Differential Version: 61345559

Differential Revision: D10232118 Differential Version: 61438578

Differential Revision: D10232118 Differential Version: 61590769

Differential Revision: D10232118 Differential Version: 61603922

Differential Revision: D10232118 Differential Version: 61616763

Differential Revision: D10232118 Differential Version: 61685407

Differential Revision: D10232118 Differential Version: 61759287

…2766) Summary: Pull Request resolved: pytorch/pytorch#12766 In preparation of using TypeMeta in TensorOptions. Reviewed By: ezyang Differential Revision: D10232118 fbshipit-source-id: 5c69a524fa38e50aa555fb9feb87540bc3575a63

Summary: Fixes colliding changes in #12766 and #12368 Pull Request resolved: #13171 Differential Revision: D12109430 Pulled By: li-roy fbshipit-source-id: f068c7df227d920aa3840762e892ce6e9c109237

alsrgv · 2018-10-29T05:02:36Z

@li-roy, @goldsborough,

This change broke the following code: https://github.com/uber/horovod/blob/master/horovod/torch/adapter_v2.cc#L40. Is there a plan to make it backwards compatible?

ezyang · 2018-10-29T14:52:08Z

@alsrgv The terminal state for DataType is that it will be an open data type, so yeah, that particular use is not going to work, if you're tracking master, use scalar_type() method instead. (This will fail if you get any dtype which is not in the ScalarType set.)

ezyang · 2018-10-29T14:52:24Z

(Let us know if we can add anything that will help out your use case.)

Details are in pytorch/pytorch#12766

alsrgv · 2018-10-29T16:59:46Z

@ezyang, would be great to stabilize the interface or make those breaking changes in bursts. This is the fourth break after switching to C++ API on Aug 30th.

Details are in pytorch/pytorch#12766

goldsborough · 2018-11-08T00:34:13Z

@alsrgv Unfortunately I don't think we can promise a lot of stability for the C++ interface in the next few months. We are undertaking large, really important changes to unify ATen with Caffe2 and improve its performance and portability to new platforms (like AMD or custom accelerators) over the long term, and guaranteeing a stable C++ interface would stand in the way of this effort. For this reason we never actually released a stable version of the C++ API, and we currently only release the C++ API in "beta" with the right to make frequent and breaking changes. This is also documented hopefully quite loudly in our docs at https://pytorch.org/cppdocs. In a sense, this should have been clear when you decided to depend on this C++ API (which, just to highlight this, is really the PyTorch backend).

I think for your case, it would make the most sense if you pinned particular nightlies of PyTorch and only updated on a monthly-or-so basis. In that case it would be more clear what has changed over the last month, and we could also work together to update your code (which is fortunately not that much). Of course, we do also try our best not to break BC if possible, since many users do depend on the C++ API, and we want to avoid friction unless that friction stands in the way of our progress. Let me know if you like the idea of doing less frequent upgrades of PyTorch.

alsrgv · 2018-11-08T00:52:21Z

@goldsborough, thanks for the comprehensive write-up! Unfortunately, pinning doesn't work for me because I don't bundle PyTorch, and Horovod is supposed to work with PyTorch that users got from you "today."

We migrated to C++ API primarily C API was being broken as well, and we weren't allowed to make changes, e.g., add Half support.

I suggest we work together on maintaining the compatibility. I'm not strongly against having a couple of #ifdef's here and there in the code for the sake of supporting PyTorch 0.4.1, 1.0, 1.1, etc., as long as they're used sparingly.

Would you be willing to add HOROVOD_WITH_PYTORCH=1 pip install git+https://github.com/uber/horovod to your "C++ API tests" to avoid unintended breaks?

goldsborough · 2018-11-09T18:16:59Z

@alsrgv I think the tricky thing is that we don't actually have a release of the C++ API yet, so there is no version guarding we could do. We're also a bit concerned about the idea that Horovod is supposed to work with PyTorch of "today". If I understand correctly, you are guaranteeing that Horovod works with PyTorch master? But then, isn't any project's master branch supposed to be the land of bold, unconstrained breaking changes? I don't think we can ever guarantee any stability for the C++ API on master, because that's fundamentally really not what master is intended for.

We discussed adding the horovod build to our CI, but we don't like the idea of particularly favoring one partner repository to run in our CI. If other folks see that we added some kind of higher-class support for one project, I don't see it taking long until lots of other projects expect us to run their build in our CI and ensure backwards compatibility and that will just slow us down a whole deal.

I still think you should just pin PyTorch to some particular nightly and have users depend on that version. I feel like depending on released (or in this case at least fixed) versions of projects is a well established practice of ensuring stability, and depending on master is always dangerous and comes with responsibility for the dependee to keep up to date with the changes of upstream.

alsrgv · 2018-11-10T01:01:51Z

@goldsborough, the reality is that you keep adding new great features to PyTorch, and subsequently users keep upgrading to latest version of PyTorch which may be incompatible with plugins like Horovod. Horovod only supporting certain pin of PyTorch master won't work, because users won't be able to use those new great features (and they're typically very impatient!).

If making a formal release of C++ API is not an option at this point, the ask is to make breaking changes in bursts, rather than trickling through. This way, plugin authors would have to make less frequently "catch up" releases.

soumith · 2018-11-10T01:09:46Z

@alsrgv between 0.4.1 and 1.0 stable there is a big gap, close to 5 months. We also urged users to try 1.0 preview. This was a one-time situation as we transitioned to our first stable release. Usually our release cycle is 3 months, and we always urge users to be on stable.
For everyone in this conversation, after December 3rd we will be in a happy stable shape, only having downstream packages use stable releases again.

alsrgv · 2018-11-10T01:15:17Z

@soumith, great! It'd still be great if breaking changes could be batched - even before 0.4.1 I knew a lot of researchers who were sitting on the master branch. :-)

Details are in pytorch/pytorch#12766

…torch#12766) Summary: Pull Request resolved: pytorch#12766 In preparation of using TypeMeta in TensorOptions. Reviewed By: ezyang Differential Revision: D10232118 fbshipit-source-id: 5c69a524fa38e50aa555fb9feb87540bc3575a63

Summary: Fixes colliding changes in pytorch#12766 and pytorch#12368 Pull Request resolved: pytorch#13171 Differential Revision: D12109430 Pulled By: li-roy fbshipit-source-id: f068c7df227d920aa3840762e892ce6e9c109237

V5: Initial commit

234d353

Differential Revision: D10232118 Differential Version: 60878911

li-roy requested review from ebetica and goldsborough as code owners October 17, 2018 15:19

This was referenced Oct 17, 2018

reduce Device to 32bits #12767

Closed

use TypeMeta instead of ScalarType in TensorOptions #12768

Closed

Roy Li added 2 commits October 17, 2018 13:53

V6: (no description)

0673abb

Differential Revision: D10232118 Differential Version: 60941370

V7: (no description)

eea59ed

Differential Revision: D10232118 Differential Version: 61134801

ezyang reviewed Oct 19, 2018

View reviewed changes

ezyang previously approved these changes Oct 19, 2018

View reviewed changes

V8: Merge with parent diff changes

45b4218

Differential Revision: D10232118 Differential Version: 61149897

ezyang reviewed Oct 19, 2018

View reviewed changes

Comment thread torch/csrc/jit/register_prim_ops.cpp

at::Tensor a;

pop(stack, a);

push(stack, static_cast<int64_t>(a.dtype()));

push(stack, static_cast<int64_t>(a.scalar_type()));

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Oct 19, 2018

View reviewed changes

Comment thread torch/csrc/jit/script/module.cpp

auto new_data = data.to(

device.value_or(data.device()),

dtype.value_or(data.dtype()),

dtype.value_or(data.scalar_type()),

This comment was marked as off-topic.

Sign in to view

Roy Li added 8 commits October 22, 2018 15:47

V9: (no description)

0afef52

Differential Revision: D10232118 Differential Version: 61337003

V10: (no description)

7c72799

Differential Revision: D10232118 Differential Version: 61345559

V11: Merge with parent diff changes

24ab4fa

Differential Revision: D10232118 Differential Version: 61438578

V12: Merge with parent diff changes

a1b1de8

Differential Revision: D10232118 Differential Version: 61590769

V13: Merge with parent diff changes

371fc24

Differential Revision: D10232118 Differential Version: 61603922

V14: Merge with parent diff changes

de966c8

Differential Revision: D10232118 Differential Version: 61616763

V15: Merge with parent diff changes

a530d1d

Differential Revision: D10232118 Differential Version: 61685407

V16: Merge with parent diff changes

88e888a

Differential Revision: D10232118 Differential Version: 61759287

facebook-github-bot closed this in a7ba4cb Oct 26, 2018

This was referenced Oct 26, 2018

Fix broken master #13171

Closed

use TypeMeta instead of ScalarType in TensorOptions #13172

Closed

alsrgv added a commit to horovod/horovod that referenced this pull request Oct 29, 2018

PyTorch break: .dtype() -> .scalar_type()

5dd12ac

Details are in pytorch/pytorch#12766

alsrgv mentioned this pull request Oct 29, 2018

PyTorch break: .dtype() -> .scalar_type() horovod/horovod#595

Merged

alsrgv added a commit to horovod/horovod that referenced this pull request Oct 29, 2018

PyTorch break: .dtype() -> .scalar_type() (#595)

3c58a4d

Details are in pytorch/pytorch#12766

ezyang added the merged label Jun 25, 2019

jeffdaily pushed a commit to ROCm/horovod that referenced this pull request Nov 27, 2019

PyTorch break: .dtype() -> .scalar_type() (horovod#595)

ae6c283

Details are in pytorch/pytorch#12766

Conversation

li-roy commented Oct 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Oct 17, 2018

Uh oh!

goldsborough commented Oct 17, 2018

Uh oh!

goldsborough commented Oct 17, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

alsrgv commented Oct 29, 2018

Uh oh!

ezyang commented Oct 29, 2018

Uh oh!

ezyang commented Oct 29, 2018

Uh oh!

alsrgv commented Oct 29, 2018

Uh oh!

goldsborough commented Nov 8, 2018

Uh oh!

alsrgv commented Nov 8, 2018

Uh oh!

goldsborough commented Nov 9, 2018

Uh oh!

alsrgv commented Nov 10, 2018

Uh oh!

soumith commented Nov 10, 2018

Uh oh!

alsrgv commented Nov 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

li-roy commented Oct 17, 2018 •

edited

Loading