Fix precision issues in CPU remainder#38293
Fix precision issues in CPU remainder#38293peterbell10 wants to merge 4 commits intopytorch:masterfrom
Conversation
5d99903 to
7cd3b4a
Compare
💊 CI failures summary and remediationsAs of commit 91fa644 (more details on the Dr. CI page):
ci.pytorch.org: 2 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 5 times. |
| std::integral_constant<bool, | ||
| std::is_floating_point<T>::value || | ||
| std::is_same<T, at::Half>::value> { | ||
| }; |
There was a problem hiding this comment.
Hmmmm... are you sure we don't already have a concept for this elsewhere in the codebase?
There was a problem hiding this comment.
Not as far as I could see. grepping for is_floating_point didn't show anything and I had a look in Half.h
| static_assert(std::is_same<float_t_abs, T>::value, "float_t_abs must be T"); | ||
| // Specifically deal with floating-point because the generic code above won't handle -0.0 (which should result in | ||
| // 0.0) properly. | ||
| return map(std::abs); |
There was a problem hiding this comment.
There is no std::abs(at::Half) so this allows implicit conversions to float.
|
It would be great if we could have a little before and after perf comparison. The new implementation is going to be slower but it would be good to have a little napkin saying how much slower. |
|
@xwang233 do you think you could help review this? |
|
cc @andreaskoepf perhaps, too |
|
My benchmark shows the new version is ~50-60% slower for both |
xwang233
left a comment
There was a problem hiding this comment.
Thanks for the PR! Since the remainder tests are passed, this LGTM. I like the use of XOR instead of != in the mask.
| Vec256<BFloat16> expm1() const { | ||
| return map(Sleef_expm1f8_u10); | ||
| } | ||
| Vec256<BFloat16> fmod(const Vec256<BFloat16> & q) const { |
There was a problem hiding this comment.
Is the BFloat16 fmod specialization used somewhere? I saw only the dispatch code in BinaryOpsKernel.cpp for kHalf but not for kBFloat16?
There was a problem hiding this comment.
It's not currently used, but since it's a floating point type it should have that operation available. I can also enable BFloat16 dispatch if that's desirable.
| Vec256<scalar_t> r = a - b * (a / b).floor(); | ||
| return r; | ||
| auto mod = a.fmod(b); | ||
| const auto zero = Vec256<scalar_t>(0); |
There was a problem hiding this comment.
I noticed that other functions in BinaryOpsKernel.cpp declare constants outside the lambda, e.g. before the cpu_kernel_vec() call, but I doubt this has a perf impact...
|
The perf loss is regrettable, but correctness first! |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Together with pytorch#37758, this fixes pytorch#37743 and fixes pytorch#24861. This follows the CUDA fix in pytorch#37758, vectorised using a `blendv` to replace the if conditionals. Most of the complication is from `remainder` supporting `at::Half` where `fmod` doesn't. I've now got `fmod` working on `Vec256<at::Half>` as well as enabling half dispatch for `fmod` so it matches `remainder`. I also added `fmod` support to `Vec256<at::BFloat16>` before realising that `remainder` doesn't support `BFloat16` anyway. I could also enable `BFloat16` if that's desirable. If not, I don't think `Vec256<BFloat16>` should be missing `fmod` anyway. Pull Request resolved: pytorch#38293 Differential Revision: D21539801 Pulled By: ezyang fbshipit-source-id: abac6a3ed2076932adc459174cd3d8d510f3e1d5
Together with #37758, this fixes #37743 and fixes #24861.
This follows the CUDA fix in #37758, vectorised using a
blendvto replace the if conditionals.Most of the complication is from
remaindersupportingat::Halfwherefmoddoesn't. I've now gotfmodworking onVec256<at::Half>as well as enabling half dispatch forfmodso it matchesremainder.I also added
fmodsupport toVec256<at::BFloat16>before realising thatremainderdoesn't supportBFloat16anyway. I could also enableBFloat16if that's desirable. If not, I don't thinkVec256<BFloat16>should be missingfmodanyway.