fix aliasing bug in pixel shuffle/unshuffle#86608
fix aliasing bug in pixel shuffle/unshuffle#86608bdhirsh wants to merge 10 commits intogh/bdhirsh/322/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86608
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Failures, 3 PendingAs of commit d51903b: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
These should be some: Not sure why this is isn't caught... cc @malfet any know issues left with the debug build? |
|
|
||
| return input_permuted.reshape(final_shape); | ||
| // pixel_shuffle expects to *never* return an alias of the input. | ||
| return input_permuted.reshape(final_shape).clone(); |
There was a problem hiding this comment.
Just do: input_permuted.clone().view(final_shape) to avoid the extra copy ;)
There was a problem hiding this comment.
oh you're totally right haha
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
eellison
left a comment
There was a problem hiding this comment.
Cool! Would you remove the skips here ? https://github.com/pytorch/pytorch/blob/master/test/test_ops.py#L1778
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
Fixes #82235 cc albanD - `at::pixel_shuffle` and `at::pixel_unshuffle` advertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias. I happened to notice this because a bunch of tests in `test/test_ops.py` failed when I ran locally with a `DEBUG=1` build. (P.S.: when are we finally gonna get a debug build test in CI? 😃) I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the `reshape()` already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Is `pixel_shuffle/unshuffle` commonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case. Alternatively, it seems like it would be nice if `reshape()` accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about making `reshape()` copy-on-write [ghstack-poisoned]
|
about extra copy: that shortcuts if the number of passed tensors is 1 (similar cases may be multiplying by 1 / adding 0, but they might be less realistic...) so not sure how to control for this - if deemed important aspect. maybe allow copy=True/copy=None arguments to such ops with copy=True by default |
|
Somehow, one way could be to allow express "no inplace ops are allowed on this tensor or its views" or at least "assume no inplace ops will happen on this tensor or its views", then the executor engine can optimize out the unneeded copies easily @albanD |
Fixes #82235
cc @albanD -
at::pixel_shuffleandat::pixel_unshuffleadvertise as being non-aliasing, but they have a C++ decomposition that internally uses reshape(), which means that it might return an alias.I happened to notice this because a bunch of tests in
test/test_ops.pyfailed when I ran locally with aDEBUG=1build.(P.S.: when are we finally gonna get a debug build test in CI? 😃)
I fixed by adding an extra clone, which... is going to be an unnecessary perf hit in the case where the
reshape()already properly cloned the input. My hope is that this is fine, because this only impacts the composite kernel- we already have a "fast" CPU kernel that does the right thing. Ispixel_shuffle/unshufflecommonly used with cuda? Maybe we should just add a fast cuda kernel for it if that's the case.Alternatively, it seems like it would be nice if
reshape()accepted an optional argument to unconditionally return a copy. That seems like a rabbit hole that isn't worth going down for now though - I remember a discussion a while ago about makingreshape()copy-on-writeStack from ghstack (oldest at bottom):