Port remainder from TH to ATen (CPU and CUDA)#34136
Port remainder from TH to ATen (CPU and CUDA)#34136kurtamohler wants to merge 16 commits intopytorch:masterfrom
remainder from TH to ATen (CPU and CUDA)#34136Conversation
|
Please add performance benchmarks CPU/CUDA; Single Thread/Multi Thread |
💊 CircleCI build failures summary and remediationsAs of commit 80c15c0 (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following build failures do not appear to be due to upstream breakages (reran 2 jobs to discount flakiness):
|
|
CPU performance:
GPU performance:
As you can see, the CPU performance decreases with my changes. My comment in the CPU issue talks about why I think that is, and why I currently think that should be fixed in a separate issue: #24753 (comment) But if someone thinks differently, my mind is open to being changed! |
|
The test failure from the Dr CI comment above is caused by incorrect handling of datatypes in my implementation, when the divisor argument is a scalar. It seems that the scalar needs to be converted to match the datatype of the tensor. This was the old behavior: This is the behavior with my changes: So I will figure out how to fix that. |
|
Looks like I'm getting errors for the XLA device tests: I'm not sure how to fix that. |
There was a problem hiding this comment.
is it the case that currently non-broadcast-able but "same number of element" tensors work?
i.e. (2,3) X (3,2)
There was a problem hiding this comment.
No, that didn't work in TH and it doesn't work in my implementation either. They both fail with the same error:
>>> torch.ones([2,3]).remainder(torch.ones([3,2]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1
There was a problem hiding this comment.
I'm a bit confused about what is going on here.
In my nightly version of PyTorch, the following happens when I have a tensor with remainder 0:
On CPU, I get Floating point exception (core dumped)
On CUDA, I get a tensor filled with 4294967295.
Is that consistent with what you are seeing? Why the change to make (at least) CUDA an error?
There was a problem hiding this comment.
Yeah I do see that behavior. Shouldn't we change that though? It seems better to have a catchable exception rather than a core dump. And if the CPU version raises an exception, shouldn't the CUDA version as well?
There was a problem hiding this comment.
Changing the crash into an exception for div was considered high pri: #327
I suspect that changing this behaviour for remainder (and also fmod) is also desirable.
ailzhang
left a comment
There was a problem hiding this comment.
Hi thanks for the PR!
I took a look at XLA test failure and I think it's real.
For example:
In [2]: a = torch.tensor(6)
In [3]: torch.remainder(a, -3)
Out[3]: tensor(0)
But it returns -3 after this patch. CPU and CUDA tests passed since they were changed together in this PR.
|
@ailzhang , do you have any suggestions for fixing the XLA support? I don't know anything about how operations are dispatched to that device. |
|
@kurtamohler The XLA failure is not specific to XLA. I put an example in the comment above that this PR changed returned result of CPU and CUDA. If you fix the example above, XLA will pass as well. |
|
@ailzhang , oh I see! Sorry, I misunderstood. I'll go ahead and fix that. |
|
@kurtamohler Nw! Please also add a test case for the example after your fix! ;) We definitely should have added a test case for those edge cases. |
|
@ailzhang , yeah there seem to be a fair amount of missing tests for various cases. I will add them today. |
360d69f to
fd41b34
Compare
57c49fe to
5985e03
Compare
|
@ailzhang , the XLA build is working now. Thanks for the help! |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@ezyang , i just pushed the commit to remove the CUDA device-side assert. I think it came in right after you approved. Will phabricator pick it up? |
|
yeah we got it |
|
I think this is marked as BC-breaking because torch.remainder(a, b) now has different behavior on CUDA. EDIT: edited for correctness |
If you look at pytorch#34136, you will notice a commit (pytorch@80c15c0) that didn't get merged. This is to address that, to avoid crashing on remainder when the rhs is 0. ghstack-source-id: e805e29 Pull Request resolved: pytorch#36760
Summary: Pull Request resolved: #36760 If you look at #34136, you will notice a commit (80c15c0) that didn't get merged. This is to address that, to avoid crashing on remainder when the rhs is 0. Test Plan: Imported from OSS Differential Revision: D21078776 Pulled By: gchanan fbshipit-source-id: 0ac138cbafac28cf8d696a2a413d3c542138cff9
Summary: CPU issue pytorch#24753 CUDA issue pytorch#24615 Pull Request resolved: pytorch#34136 Differential Revision: D20375458 Pulled By: ezyang fbshipit-source-id: 1a9fb39a7e2d17a0d31bd14b211eaacea060e834
Summary: Pull Request resolved: pytorch#36760 If you look at pytorch#34136, you will notice a commit (pytorch@80c15c0) that didn't get merged. This is to address that, to avoid crashing on remainder when the rhs is 0. Test Plan: Imported from OSS Differential Revision: D21078776 Pulled By: gchanan fbshipit-source-id: 0ac138cbafac28cf8d696a2a413d3c542138cff9
CPU issue #24753
CUDA issue #24615