[quant][core][gpu][improvement] Enabled broadcasting multiplication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops#76518
Conversation
support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` [ghstack-poisoned]
support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` ghstack-source-id: 3af0410 Pull Request resolved: #76518
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 378abf5 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…plication" support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]
…upport for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` ghstack-source-id: 975294c Pull Request resolved: #76518
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| output_scale, output_zero_point, memory_format); | ||
| // TODO: When cudnn enables support for broadcasting, we can remove this tensor | ||
| at::Tensor requantize_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), memory_format); | ||
| // We will employ broadcasting scalar multiplication in cudnn in the requant_op below. For this to work, cudNN requires |
There was a problem hiding this comment.
do we want to have a utility function for this?
There was a problem hiding this comment.
for creating a scalar tensor for an arbitrary number of dimensions? I don't think so
There was a problem hiding this comment.
@jerryzh168 for conv, requantize_multiplier_tensor size is known at compile time, so I use std::array, but for linear and add, size is not known at compile time, so I have to use at::SmallVector. if I make a utility function that all 3 ops can use, I would make conv's requantize_multiplier_tensor use at::SmallVector instead of std::array -- probably a small hit on performance associated with this
There was a problem hiding this comment.
yeah I think using SmallVector for conv sounds good, it shouldn't matter much for perf
There was a problem hiding this comment.
a utility function to create this requantize_multipler_tensor
There was a problem hiding this comment.
@jerryzh168 done. can you reapprove so I can land the stack?
jerryzh168
left a comment
There was a problem hiding this comment.
maybe write a utility function that returns a requantize_multiplier_tensor?
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| // pointwise multiplication operations. the only reason why we need this right now is | ||
| // we use broadcasting scalar multiplication in conv, linear, and add ops, and cuDNN requires | ||
| // the scalar to be a scalar tensor with the same number of dimensions (num_dim) as the tensor we're multiplying to | ||
| at::Tensor get_requant_multiplier_tensor(double requant_multiplier, uint8_t num_dim) { |
There was a problem hiding this comment.
nit: maybe align the naming, looks like we should be using getReauntMultiplerTensor from the other names in this file
jerryzh168
left a comment
There was a problem hiding this comment.
lg, thanks, had a nit comment
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops" Summary: Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580) [ghstack-poisoned]
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
…upport for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops (#76518) Summary: Pull Request resolved: #76518 Previously, requantize_multiplier_tensor was set to be the same size as quantized_output, as broadcasting by multiplication was not supported in cudnn. This support has been added in cudnn as of 8.3.3. The requirement is that requantize_multiplier_tensor still has to be a tensor, but it can be a scalar tensor with the same number of dimensions as the tensor it is being multiplied to. Test Plan: ``` python test/test_quantization.py -k test_qconv2d_cudnn python test/test_quantization.py -k test_qadd_relu_cudnn python test/test_quantization.py -k test_qlinear_cudnn ``` ``` python test/test_quantization.py -k test_qconv2d_cudnn ``` Differential Revision: D35993580 D35993580 Reviewed By: jerryzh168 Pulled By: dzdang fbshipit-source-id: 1f72a3b32d2036733e2e04afeed7bc4d7b3e3a77
Stack from ghstack:
Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.
Test Plan:
Differential Revision: D35993580