Add division overload with rounding_mode selection#50280
Add division overload with rounding_mode selection#50280peterbell10 wants to merge 26 commits intogh/peterbell10/36/basefrom
Conversation
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 71b0cfe (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. [ghstack-poisoned]
| BINARY_POINTWISE(mul); | ||
| BINARY_POINTWISE(div); | ||
| { | ||
| using Binop = Tensor (*)(const Tensor&, const Tensor&, std::string); |
There was a problem hiding this comment.
It was, thanks @peterbell10. Darn autocomplete!
cc @zou3519
| } else if (isIntegralType(dtype, /*includeBool*/ false)) { | ||
| // There's no SIMD integer division, so don't try to vectorize it. | ||
| // TODO: if the divisor is a scalar, rewrite as multiplication by a constant. | ||
| AT_DISPATCH_INTEGRAL_TYPES(iter.common_dtype(), "div_floor_cpu", [&]() { |
There was a problem hiding this comment.
This is inconsistent between using dtype and iter.common_dtype().
There was a problem hiding this comment.
I have removed all uses of iter.dtype(). If instead you meant the variable dtype, then I would note that it's assigned from iter.common_dtype() above. Just a bit less to type.
There was a problem hiding this comment.
I realize the value is the same, just for readability the code might want to stick to either dtype or iter.common_dtype(). No big deal either way.
| }); | ||
| }); | ||
| } else { | ||
| AT_DISPATCH_FLOATING_TYPES_AND2(kBFloat16, kHalf, iter.common_dtype(), "div_floor_cpu", [&]() { |
There was a problem hiding this comment.
Same dtype vs iter.common_dtype here, too.
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
|
Hi @mruberry, I think I can get pt/xla pr ready this week. I will ping you when that is ready. |
|
@mruberry PTAL when you can. Have addressed your comments and rebased. |
| * ``"true"`` - default behavior. Performs no rounding and, if both :attr:`input` and | ||
| :attr:`other` are integer types, promotes the inputs to the default scalar type. | ||
| Equivalent to true division in Python (the ``/`` operator) and NumPy's ``np.true_divide``. | ||
| * ``"trunc"`` - rounds the results of the division down. |
There was a problem hiding this comment.
The description for trunc and floor are identical, "rounds the results of the division down".
For trunc I think we can say rounds towards zero?
| torch.set_default_dtype(dtype) | ||
| yield | ||
| torch.set_default_dtype(saved_dtype) | ||
| try: |
There was a problem hiding this comment.
Thank you for fixing this.
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
As mentioned in pytorchgh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. ghstack-source-id: 372c4be Pull Request resolved: pytorch#50280
|
One of the test failures is real: test_div_rounding_numpy_cuda_bfloat16. We can skip the test for simplicity to unblock this landing. @JackCaoG's PR is ready to go so I'd like to land this during PST business hours on Tuesday, February 2nd. |
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
|
In hindsight, the BFloat16 comparison with random test data is unlikely to work perfectly. It's sensitive to exact rounding and since NumPy is rounding to float32 precision, it will occasionally get different answers. |
We can disable the test for now. In the future we could use a fixture to ensure there are no rounding issues. |
| } | ||
| return floordiv; | ||
| }, | ||
| [](Vec256<scalar_t> a, Vec256<scalar_t> b) -> Vec256<scalar_t>{ |
There was a problem hiding this comment.
This is triggering some internal build issues. Adding a vectorized function can be a little tricky because we often have to stub them out on some platforms, like Android.
Since we're so close to the branch cut, I propose removing the copysign implementation and this vectorized implementation. We can file an issue and add them back in a later PR where we can take our time and focus on that issue.
There was a problem hiding this comment.
@mruberry this should be good now. Removed all Vec256 changes and unvectorized floor_divide.
As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Differential Revision: [D26123271](https://our.internmc.facebook.com/intern/diff/D26123271) [ghstack-poisoned]
|
FYI there's another internal issue that I'm reviewing now. I'll keep this updated. |
|
Update: hacking through internal infra issues still. |
|
Update: blocking infra team confirms it's fixed its issue. This should land today. |
|
Landed. Some changes had to be made internally:
|
Summary: Pull Request resolved: pytorch#51706 Pull Request resolved: pytorch#50280 As mentioned in pytorchgh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}` argument so `torch.div` can be used as a replacement for `floor_divide` during the transitional period. I've included dedicated kernels for truncated and floor division which aren't strictly necessary for float, but do perform significantly better (~2x) than doing true division followed by a separate rounding kernel. Note: I introduce new overloads for `aten::div` instead of just adding a default `rounding_mode` because various JIT passes rely on the exact operator schema. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26123271 Pulled By: mruberry fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46
Stack from ghstack:
As mentioned in gh-43874, this adds a
rounding_mode={'true', 'trunc', 'floor'}argument so
torch.divcan be used as a replacement forfloor_divideduringthe transitional period.
I've included dedicated kernels for truncated and floor division which
aren't strictly necessary for float, but do perform significantly better (~2x) than
doing true division followed by a separate rounding kernel.
Note: I introduce new overloads for
aten::divinstead of just adding a defaultrounding_modebecause various JIT passes rely on the exact operator schema.Differential Revision: D26123271