Skip to content

Change return type of Tensor::dtype() from ScalarType to TypeMeta#12766

Closed
li-roy wants to merge 12 commits intomasterfrom
export-D10232118
Closed

Change return type of Tensor::dtype() from ScalarType to TypeMeta#12766
li-roy wants to merge 12 commits intomasterfrom
export-D10232118

Conversation

@li-roy
Copy link
Copy Markdown
Contributor

@li-roy li-roy commented Oct 17, 2018

Stack:
    :black_circle:  #12766 Change return type of Tensor::dtype() from ScalarType to TypeMeta  💛
    :white_circle:  #12767 reduce Device to 32bits  💛
    :white_circle:  #13172 use TypeMeta instead of ScalarType in TensorOptions  💛

In preparation of using TypeMeta in TensorOptions.

Differential Revision: D10232118

Differential Revision: D10232118
Differential Version: 60878911
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Oct 17, 2018

I see a bunch of call sites changed to use tensor.scalar_type() instead of tensor.dtype(), but this seems like a pessimization to me. Ideally, at::kDouble and friends would be TypeMetas, actually. A lot of sites that I see here could be "fixed" if we added something like operator==(const TypeMeta&, ScalarType) (and the other direction), prior to actually switching the constants to be DataType themselves (which I agree shouldn't be done in this patch.)

BTW: I'm not sure about this. Happy to talk about this some more. We can also get feedback from other people, like @gchanan and @goldsborough

@goldsborough
Copy link
Copy Markdown
Contributor

I agree with @ezyang, operator== is your friend

@goldsborough
Copy link
Copy Markdown
Contributor

@Roy-Li keep in mind every line you change in the C++ frontend equals one user who gets a confusing build error when they update and costs us github issues and friction with the community

Roy Li added 2 commits October 17, 2018 13:53
Differential Revision: D10232118
Differential Version: 60941370
Differential Revision: D10232118
Differential Version: 61134801
Comment thread aten/src/ATen/core/Tensor.h Outdated
/// Returns a `Tensor`'s dtype (`ScalarType`). Defined in Type.h
ScalarType dtype() const noexcept;
/// Returns a `Tensor`'s dtype (`TypeMeta`). Defined in TensorMethods.h
const caffe2::TypeMeta& dtype() const noexcept;

This comment was marked as off-topic.

ezyang
ezyang previously approved these changes Oct 19, 2018
Copy link
Copy Markdown
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but with comments.

@ezyang ezyang dismissed their stale review October 19, 2018 19:58

actually, some things look weird

Differential Revision: D10232118
Differential Version: 61149897
at::Tensor a;
pop(stack, a);
push(stack, static_cast<int64_t>(a.dtype()));
push(stack, static_cast<int64_t>(a.scalar_type()));

This comment was marked as off-topic.

auto new_data = data.to(
device.value_or(data.device()),
dtype.value_or(data.dtype()),
dtype.value_or(data.scalar_type()),

This comment was marked as off-topic.

Roy Li added 8 commits October 22, 2018 15:47
Differential Revision: D10232118
Differential Version: 61337003
Differential Revision: D10232118
Differential Version: 61345559
Differential Revision: D10232118
Differential Version: 61438578
Differential Revision: D10232118
Differential Version: 61590769
Differential Revision: D10232118
Differential Version: 61603922
Differential Revision: D10232118
Differential Version: 61616763
Differential Revision: D10232118
Differential Version: 61685407
Differential Revision: D10232118
Differential Version: 61759287
zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 26, 2018
…2766)

Summary:
Pull Request resolved: pytorch/pytorch#12766

In preparation of using TypeMeta in TensorOptions.

Reviewed By: ezyang

Differential Revision: D10232118

fbshipit-source-id: 5c69a524fa38e50aa555fb9feb87540bc3575a63
facebook-github-bot pushed a commit that referenced this pull request Oct 26, 2018
Summary:
Fixes colliding changes in #12766 and #12368
Pull Request resolved: #13171

Differential Revision: D12109430

Pulled By: li-roy

fbshipit-source-id: f068c7df227d920aa3840762e892ce6e9c109237
@alsrgv
Copy link
Copy Markdown

alsrgv commented Oct 29, 2018

@li-roy, @goldsborough,

This change broke the following code: https://github.com/uber/horovod/blob/master/horovod/torch/adapter_v2.cc#L40. Is there a plan to make it backwards compatible?

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Oct 29, 2018

@alsrgv The terminal state for DataType is that it will be an open data type, so yeah, that particular use is not going to work, if you're tracking master, use scalar_type() method instead. (This will fail if you get any dtype which is not in the ScalarType set.)

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Oct 29, 2018

(Let us know if we can add anything that will help out your use case.)

alsrgv added a commit to horovod/horovod that referenced this pull request Oct 29, 2018
@alsrgv
Copy link
Copy Markdown

alsrgv commented Oct 29, 2018

@ezyang, would be great to stabilize the interface or make those breaking changes in bursts. This is the fourth break after switching to C++ API on Aug 30th.

alsrgv added a commit to horovod/horovod that referenced this pull request Oct 29, 2018
@goldsborough
Copy link
Copy Markdown
Contributor

@alsrgv Unfortunately I don't think we can promise a lot of stability for the C++ interface in the next few months. We are undertaking large, really important changes to unify ATen with Caffe2 and improve its performance and portability to new platforms (like AMD or custom accelerators) over the long term, and guaranteeing a stable C++ interface would stand in the way of this effort. For this reason we never actually released a stable version of the C++ API, and we currently only release the C++ API in "beta" with the right to make frequent and breaking changes. This is also documented hopefully quite loudly in our docs at https://pytorch.org/cppdocs. In a sense, this should have been clear when you decided to depend on this C++ API (which, just to highlight this, is really the PyTorch backend).

screen shot 2018-11-07 at 4 28 32 pm

I think for your case, it would make the most sense if you pinned particular nightlies of PyTorch and only updated on a monthly-or-so basis. In that case it would be more clear what has changed over the last month, and we could also work together to update your code (which is fortunately not that much). Of course, we do also try our best not to break BC if possible, since many users do depend on the C++ API, and we want to avoid friction unless that friction stands in the way of our progress. Let me know if you like the idea of doing less frequent upgrades of PyTorch.

@alsrgv
Copy link
Copy Markdown

alsrgv commented Nov 8, 2018

@goldsborough, thanks for the comprehensive write-up! Unfortunately, pinning doesn't work for me because I don't bundle PyTorch, and Horovod is supposed to work with PyTorch that users got from you "today."

We migrated to C++ API primarily C API was being broken as well, and we weren't allowed to make changes, e.g., add Half support.

I suggest we work together on maintaining the compatibility. I'm not strongly against having a couple of #ifdef's here and there in the code for the sake of supporting PyTorch 0.4.1, 1.0, 1.1, etc., as long as they're used sparingly.

Would you be willing to add HOROVOD_WITH_PYTORCH=1 pip install git+https://github.com/uber/horovod to your "C++ API tests" to avoid unintended breaks?

@goldsborough
Copy link
Copy Markdown
Contributor

@alsrgv I think the tricky thing is that we don't actually have a release of the C++ API yet, so there is no version guarding we could do. We're also a bit concerned about the idea that Horovod is supposed to work with PyTorch of "today". If I understand correctly, you are guaranteeing that Horovod works with PyTorch master? But then, isn't any project's master branch supposed to be the land of bold, unconstrained breaking changes? I don't think we can ever guarantee any stability for the C++ API on master, because that's fundamentally really not what master is intended for.

We discussed adding the horovod build to our CI, but we don't like the idea of particularly favoring one partner repository to run in our CI. If other folks see that we added some kind of higher-class support for one project, I don't see it taking long until lots of other projects expect us to run their build in our CI and ensure backwards compatibility and that will just slow us down a whole deal.

I still think you should just pin PyTorch to some particular nightly and have users depend on that version. I feel like depending on released (or in this case at least fixed) versions of projects is a well established practice of ensuring stability, and depending on master is always dangerous and comes with responsibility for the dependee to keep up to date with the changes of upstream.

@alsrgv
Copy link
Copy Markdown

alsrgv commented Nov 10, 2018

@goldsborough, the reality is that you keep adding new great features to PyTorch, and subsequently users keep upgrading to latest version of PyTorch which may be incompatible with plugins like Horovod. Horovod only supporting certain pin of PyTorch master won't work, because users won't be able to use those new great features (and they're typically very impatient!).

If making a formal release of C++ API is not an option at this point, the ask is to make breaking changes in bursts, rather than trickling through. This way, plugin authors would have to make less frequently "catch up" releases.

@soumith
Copy link
Copy Markdown
Collaborator

soumith commented Nov 10, 2018

@alsrgv between 0.4.1 and 1.0 stable there is a big gap, close to 5 months. We also urged users to try 1.0 preview. This was a one-time situation as we transitioned to our first stable release. Usually our release cycle is 3 months, and we always urge users to be on stable.
For everyone in this conversation, after December 3rd we will be in a happy stable shape, only having downstream packages use stable releases again.

@alsrgv
Copy link
Copy Markdown

alsrgv commented Nov 10, 2018

@soumith, great! It'd still be great if breaking changes could be batched - even before 0.4.1 I knew a lot of researchers who were sitting on the master branch. :-)

@ezyang ezyang added the merged label Jun 25, 2019
jeffdaily pushed a commit to ROCm/horovod that referenced this pull request Nov 27, 2019
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…torch#12766)

Summary:
Pull Request resolved: pytorch#12766

In preparation of using TypeMeta in TensorOptions.

Reviewed By: ezyang

Differential Revision: D10232118

fbshipit-source-id: 5c69a524fa38e50aa555fb9feb87540bc3575a63
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Fixes colliding changes in pytorch#12766 and pytorch#12368
Pull Request resolved: pytorch#13171

Differential Revision: D12109430

Pulled By: li-roy

fbshipit-source-id: f068c7df227d920aa3840762e892ce6e9c109237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants