Fix precision issues in CPU remainder by peterbell10 · Pull Request #38293 · pytorch/pytorch

peterbell10 · 2020-05-12T00:48:55Z

Together with #37758, this fixes #37743 and fixes #24861.

This follows the CUDA fix in #37758, vectorised using a blendv to replace the if conditionals.

Most of the complication is from remainder supporting at::Half where fmod doesn't. I've now got fmod working on Vec256<at::Half> as well as enabling half dispatch for fmod so it matches remainder.

I also added fmod support to Vec256<at::BFloat16> before realising that remainder doesn't support BFloat16 anyway. I could also enable BFloat16 if that's desirable. If not, I don't think Vec256<BFloat16> should be missing fmod anyway.

dr-ci · 2020-05-12T00:55:17Z

💊 CI failures summary and remediations

As of commit 91fa644 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)

ci.pytorch.org: 2 failed

Failed: pr/py3.6-clang7-rocmdeb-ubuntu16.04
Failed: pr/caffe2-py3.6-devtoolset7-rocmrpm-centos7-test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 5 times.

ezyang · 2020-05-12T15:24:56Z

+    std::integral_constant<bool,
+      std::is_floating_point<T>::value ||
+      std::is_same<T, at::Half>::value> {
+};


Hmmmm... are you sure we don't already have a concept for this elsewhere in the codebase?

Not as far as I could see. grepping for is_floating_point didn't show anything and I had a look in Half.h

ezyang · 2020-05-12T15:25:27Z

    static_assert(std::is_same<float_t_abs, T>::value, "float_t_abs must be T");
    // Specifically deal with floating-point because the generic code above won't handle -0.0 (which should result in
    // 0.0) properly.
-    return map(std::abs);


huh, what's going on here?

There is no std::abs(at::Half) so this allows implicit conversions to float.

ezyang · 2020-05-12T15:28:57Z

It would be great if we could have a little before and after perf comparison. The new implementation is going to be slower but it would be good to have a little napkin saying how much slower.

ezyang · 2020-05-12T15:32:31Z

@xwang233 do you think you could help review this?

ezyang · 2020-05-12T15:33:27Z

cc @andreaskoepf perhaps, too

peterbell10 · 2020-05-12T17:58:02Z

My benchmark shows the new version is ~50-60% slower for both float and double tensors.

xwang233

Thanks for the PR! Since the remainder tests are passed, this LGTM. I like the use of XOR instead of != in the mask.

andreaskoepf · 2020-05-12T22:36:20Z

  Vec256<BFloat16> expm1() const {
    return map(Sleef_expm1f8_u10);
  }
+  Vec256<BFloat16> fmod(const Vec256<BFloat16> & q) const {


Is the BFloat16 fmod specialization used somewhere? I saw only the dispatch code in BinaryOpsKernel.cpp for kHalf but not for kBFloat16?

It's not currently used, but since it's a floating point type it should have that operation available. I can also enable BFloat16 dispatch if that's desirable.

andreaskoepf · 2020-05-13T01:14:10Z

-          Vec256<scalar_t> r = a - b * (a / b).floor();
-          return r;
+          auto mod = a.fmod(b);
+          const auto zero = Vec256<scalar_t>(0);


I noticed that other functions in BinaryOpsKernel.cpp declare constants outside the lambda, e.g. before the cpu_kernel_vec() call, but I doubt this has a perf impact...

ezyang · 2020-05-13T02:23:30Z

The perf loss is regrettable, but correctness first!

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-05-14T18:13:59Z

@ezyang merged this pull request in 0a159b0.

Summary: Together with pytorch#37758, this fixes pytorch#37743 and fixes pytorch#24861. This follows the CUDA fix in pytorch#37758, vectorised using a `blendv` to replace the if conditionals. Most of the complication is from `remainder` supporting `at::Half` where `fmod` doesn't. I've now got `fmod` working on `Vec256<at::Half>` as well as enabling half dispatch for `fmod` so it matches `remainder`. I also added `fmod` support to `Vec256<at::BFloat16>` before realising that `remainder` doesn't support `BFloat16` anyway. I could also enable `BFloat16` if that's desirable. If not, I don't think `Vec256<BFloat16>` should be missing `fmod` anyway. Pull Request resolved: pytorch#38293 Differential Revision: D21539801 Pulled By: ezyang fbshipit-source-id: abac6a3ed2076932adc459174cd3d8d510f3e1d5

Fix precision issues in CPU remainder

5bd93f1

peterbell10 added the open source label May 12, 2020

peterbell10 requested a review from ezyang May 12, 2020 00:48

Add half dispatch for fmod

7cd3b4a

peterbell10 force-pushed the remainder-fix branch from 5d99903 to 7cd3b4a Compare May 12, 2020 00:54

ezyang reviewed May 12, 2020

View reviewed changes

ezyang requested a review from xwang233 May 12, 2020 15:32

ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 12, 2020

andreaskoepf reviewed May 12, 2020

View reviewed changes

Comment thread aten/src/ATen/native/cpu/BinaryOpsKernel.cpp Outdated

peterbell10 added 2 commits May 12, 2020 18:54

Remove Sleef comment

170faae

Add remainder/fmod benchmark

91fa644

xwang233 approved these changes May 12, 2020

View reviewed changes

andreaskoepf reviewed May 12, 2020

View reviewed changes

Comment thread aten/src/ATen/native/cpu/BinaryOpsKernel.cpp

andreaskoepf reviewed May 12, 2020

View reviewed changes

andreaskoepf reviewed May 13, 2020

View reviewed changes

andreaskoepf approved these changes May 13, 2020

View reviewed changes

facebook-github-bot reviewed May 13, 2020

View reviewed changes

facebook-github-bot closed this in 0a159b0 May 14, 2020

facebook-github-bot added the merged label May 14, 2020

mruberry added the Merged label Oct 28, 2020

Conversation

peterbell10 commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci Bot commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 2 failed

Uh oh!

ezyang May 12, 2020

Choose a reason for hiding this comment

Uh oh!

peterbell10 May 12, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang May 12, 2020

Choose a reason for hiding this comment

Uh oh!

peterbell10 May 12, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang commented May 12, 2020

Uh oh!

ezyang commented May 12, 2020

Uh oh!

ezyang commented May 12, 2020

Uh oh!

Uh oh!

peterbell10 commented May 12, 2020

Uh oh!

xwang233 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreaskoepf May 12, 2020

Choose a reason for hiding this comment

Uh oh!

peterbell10 May 12, 2020

Choose a reason for hiding this comment

Uh oh!

andreaskoepf May 13, 2020

Choose a reason for hiding this comment

Uh oh!

ezyang commented May 13, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

peterbell10 commented May 12, 2020 •

edited

Loading

dr-ci Bot commented May 12, 2020 •

edited

Loading