Skip to content

fix aliasing bug in pixel shuffle/unshuffle#86608

Closed
bdhirsh wants to merge 10 commits intogh/bdhirsh/322/basefrom
gh/bdhirsh/322/head
Closed

fix aliasing bug in pixel shuffle/unshuffle#86608
bdhirsh wants to merge 10 commits intogh/bdhirsh/322/basefrom
gh/bdhirsh/322/head

Conversation

@bdhirsh
Copy link
Collaborator

@bdhirsh bdhirsh commented Oct 10, 2022

Fixes #82235

cc @albanD - at::pixel_shuffle and at::pixel_unshuffle advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in test/test_ops.py failed when I ran locally with a DEBUG=1 build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)

I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the reshape() already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is pixel_shuffle/unshuffle commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if reshape() accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making reshape() copy-on-write

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 10, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86608

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures, 3 Pending

As of commit d51903b:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@albanD
Copy link
Collaborator

albanD commented Oct 10, 2022

(P.S.: when are we finally gonna get a debug build test in CI? smiley)

These should be some:
periodic / linux-bionic-cuda11.6-py3.7-gcc7-debug
periodic / linux-bionic-cuda11.7-py3.7-gcc7-debug

Not sure why this is isn't caught... cc @malfet any know issues left with the debug build?


return input_permuted.reshape(final_shape);
// pixel_shuffle expects to *never* return an alias of the input.
return input_permuted.reshape(final_shape).clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do: input_permuted.clone().view(final_shape) to avoid the extra copy ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you're totally right haha

Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 11, 2022
Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
Fixes #82235

cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.

I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build.

(P.S.: when are we finally gonna get a debug build test in CI? 😃)


I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.

Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write




[ghstack-poisoned]
@vadimkantorov
Copy link
Contributor

about extra copy:
there's a common theme that sometimes non-strict "aliasing is possible" (in reshape-style) semantics is preferable to elide copies in cases where it's known that no inplace will happen. e.g. one in-the-wild examples is torch.cat wrapper in detectron2:
https://github.com/facebookresearch/detectron2/blob/48b598b4f61fbb24182a69b521b2a0ba3252b842/detectron2/layers/wrappers.py#L39

that shortcuts if the number of passed tensors is 1

(similar cases may be multiplying by 1 / adding 0, but they might be less realistic...)

so not sure how to control for this - if deemed important aspect. maybe allow copy=True/copy=None arguments to such ops with copy=True by default

@vadimkantorov
Copy link
Contributor

Somehow, one way could be to allow express "no inplace ops are allowed on this tensor or its views" or at least "assume no inplace ops will happen on this tensor or its views", then the executor engine can optimize out the unneeded copies easily @albanD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants