[quant][core][gpu][improvement] Enabled broadcasting multiplication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops by dzdang · Pull Request #76518 · pytorch/pytorch

dzdang · 2022-04-28T03:26:36Z

Stack from ghstack:

[quant][core][gpu][improvement] Made plan and run for quantized cudnn conv op conform with Conv_v8.cpp #76788
[quant][core][gpu][improvement] Removed conv_output and set output tensors as virtual in quantized cudnn conv2d op #76787
-> [quant][core][gpu][improvement] Enabled broadcasting multiplication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops #76518

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:

python test/test_quantization.py -k test_qconv2d_cudnn
python test/test_quantization.py -k test_qadd_relu_cudnn
python test/test_quantization.py -k test_qlinear_cudnn

Differential Revision: D35993580

support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` [ghstack-poisoned]

support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` ghstack-source-id: 3af0410 Pull Request resolved: #76518

facebook-github-bot · 2022-04-28T03:26:45Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/76518
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 378abf5 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

dzdang · 2022-04-28T03:28:21Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…plication" support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]

dzdang · 2022-04-28T14:57:14Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]

dzdang · 2022-04-28T15:06:04Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]

…upport for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` ghstack-source-id: 975294c Pull Request resolved: #76518

dzdang · 2022-04-28T15:08:40Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]

dzdang · 2022-05-04T03:43:30Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jerryzh168 · 2022-05-10T23:49:51Z

aten/src/ATen/native/quantized/cudnn/BinaryOps.cpp

                                                            output_scale, output_zero_point, memory_format);
-  // TODO: When cudnn enables support for broadcasting, we can remove this tensor
-  at::Tensor requantize_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), memory_format);
+  // We will employ broadcasting scalar multiplication in cudnn in the requant_op below. For this to work, cudNN requires


do we want to have a utility function for this?

for creating a scalar tensor for an arbitrary number of dimensions? I don't think so

@jerryzh168 for conv, requantize_multiplier_tensor size is known at compile time, so I use std::array, but for linear and add, size is not known at compile time, so I have to use at::SmallVector. if I make a utility function that all 3 ops can use, I would make conv's requantize_multiplier_tensor use at::SmallVector instead of std::array -- probably a small hit on performance associated with this

yeah I think using SmallVector for conv sounds good, it shouldn't matter much for perf

a utility function to create this requantize_multipler_tensor

@jerryzh168 done. can you reapprove so I can land the stack?

jerryzh168

maybe write a utility function that returns a requantize_multiplier_tensor?

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]

dzdang · 2022-05-11T20:04:08Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]

dzdang · 2022-05-16T02:22:26Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jerryzh168 · 2022-05-24T19:32:26Z

aten/src/ATen/native/quantized/cudnn/utils.h

+// pointwise multiplication operations. the only reason why we need this right now is
+// we use broadcasting scalar multiplication in conv, linear, and add ops, and cuDNN requires
+// the scalar to be a scalar tensor with the same number of dimensions (num_dim) as the tensor we're multiplying to
+at::Tensor get_requant_multiplier_tensor(double requant_multiplier, uint8_t num_dim) {


nit: maybe align the naming, looks like we should be using getReauntMultiplerTensor from the other names in this file

jerryzh168

lg, thanks, had a nit comment

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]

dzdang · 2022-05-24T21:28:57Z

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-05-25T02:38:15Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

…upport for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops (#76518) Summary: Pull Request resolved: #76518 Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: D35993580 D35993580 Reviewed By: jerryzh168 Pulled By: dzdang fbshipit-source-id: 1f72a3b32d2036733e2e04afeed7bc4d7b3e3a77

facebook-github-bot added the cla signed label Apr 28, 2022

dzdang added release notes: quantization release notes category topic: improvements topic category topic: performance topic category labels Apr 28, 2022

dzdang requested a review from jerryzh168 April 28, 2022 15:03

dzdang changed the title ~~[quant][core][gpu][improvement] Enabled broadcasting multiplication~~ [quant][core][gpu][improvement] Enabled broadcasting multiplication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Apr 28, 2022

This was referenced May 4, 2022

[quant][core][gpu][improvement] Removed conv_output and set output tensors as virtual in quantized cudnn conv2d op #76787

Closed

[quant][core][gpu][improvement] Made plan and run for quantized cudnn conv op conform with Conv_v8.cpp #76788

Closed

jerryzh168 reviewed May 10, 2022

View reviewed changes

jerryzh168 requested changes May 10, 2022

View reviewed changes

dzdang requested a review from jerryzh168 May 11, 2022 20:04

jerryzh168 reviewed May 24, 2022

View reviewed changes

jerryzh168 approved these changes May 24, 2022

View reviewed changes

dzdang added 2 commits May 24, 2022 16:32

pytorchmergebot added the Merged label May 25, 2022

pytorchmergebot closed this in b291752 May 25, 2022

facebook-github-bot deleted the gh/dzdang/101/head branch May 28, 2022 14:16

Conversation

dzdang commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

dzdang commented Apr 28, 2022

Uh oh!

dzdang commented Apr 28, 2022

Uh oh!

dzdang commented Apr 28, 2022

Uh oh!

dzdang commented Apr 28, 2022

Uh oh!

dzdang commented May 4, 2022

Uh oh!

jerryzh168 May 10, 2022

Choose a reason for hiding this comment

Uh oh!

dzdang May 10, 2022

Choose a reason for hiding this comment

Uh oh!

dzdang May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 May 11, 2022

Choose a reason for hiding this comment

Uh oh!

dzdang May 12, 2022

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

dzdang commented May 11, 2022

Uh oh!

dzdang commented May 16, 2022

Uh oh!

jerryzh168 May 24, 2022

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

dzdang commented May 24, 2022

Uh oh!

facebook-github-bot commented May 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dzdang commented Apr 28, 2022 •

edited

Loading

facebook-github-bot commented Apr 28, 2022 •

edited

Loading

dzdang May 11, 2022 •

edited

Loading

jerryzh168 May 11, 2022 •

edited

Loading