Skip to content

[quant][core][gpu][improvement] Enabled broadcasting multiplication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops#76518

Closed
dzdang wants to merge 9 commits intogh/dzdang/101/basefrom
gh/dzdang/101/head
Closed

[quant][core][gpu][improvement] Enabled broadcasting multiplication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops#76518
dzdang wants to merge 9 commits intogh/dzdang/101/basefrom
gh/dzdang/101/head

Conversation

@dzdang
Copy link
Contributor

@dzdang dzdang commented Apr 28, 2022

Stack from ghstack:

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:

python test/test_quantization.py -k test_qconv2d_cudnn
python test/test_quantization.py -k test_qadd_relu_cudnn
python test/test_quantization.py -k test_qlinear_cudnn

Differential Revision: D35993580

support for requantize_multiplier_tensor in quantized cudnn add, linear,
and conv2d ops

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Apr 28, 2022
support for requantize_multiplier_tensor in quantized cudnn add, linear,
and conv2d ops

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```

ghstack-source-id: 3af0410
Pull Request resolved: #76518
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 28, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 378abf5 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@dzdang
Copy link
Contributor Author

dzdang commented Apr 28, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dzdang dzdang added release notes: quantization release notes category topic: improvements topic category topic: performance topic category labels Apr 28, 2022
…plication"

support for requantize_multiplier_tensor in quantized cudnn add, linear,
and conv2d ops

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```

Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580)

[ghstack-poisoned]
@dzdang
Copy link
Contributor Author

dzdang commented Apr 28, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dzdang dzdang requested a review from jerryzh168 April 28, 2022 15:03
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops"

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```

Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580)

[ghstack-poisoned]
@dzdang dzdang changed the title [quant][core][gpu][improvement] Enabled broadcasting multiplication [quant][core][gpu][improvement] Enabled broadcasting multiplication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops Apr 28, 2022
@dzdang
Copy link
Contributor Author

dzdang commented Apr 28, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops"

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```

Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Apr 28, 2022
…upport for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```

ghstack-source-id: 975294c
Pull Request resolved: #76518
@dzdang
Copy link
Contributor Author

dzdang commented Apr 28, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops"

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
```

Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580)

[ghstack-poisoned]
@dzdang
Copy link
Contributor Author

dzdang commented May 4, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

output_scale, output_zero_point, memory_format);
// TODO: When cudnn enables support for broadcasting, we can remove this tensor
at::Tensor requantize_multiplier_tensor = at::empty(quantized_output.sizes(), at::device(at::kCUDA).dtype(at::kFloat), memory_format);
// We will employ broadcasting scalar multiplication in cudnn in the requant_op below. For this to work, cudNN requires
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to have a utility function for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for creating a scalar tensor for an arbitrary number of dimensions? I don't think so

Copy link
Contributor Author

@dzdang dzdang May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 for conv, requantize_multiplier_tensor size is known at compile time, so I use std::array, but for linear and add, size is not known at compile time, so I have to use at::SmallVector. if I make a utility function that all 3 ops can use, I would make conv's requantize_multiplier_tensor use at::SmallVector instead of std::array -- probably a small hit on performance associated with this

Copy link
Contributor

@jerryzh168 jerryzh168 May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think using SmallVector for conv sounds good, it shouldn't matter much for perf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a utility function to create this requantize_multipler_tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 done. can you reapprove so I can land the stack?

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe write a utility function that returns a requantize_multiplier_tensor?

…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops"

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
python test/test_quantization.py -k test_qadd_relu_cudnn
python test/test_quantization.py -k test_qlinear_cudnn
```

Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580)

[ghstack-poisoned]
@dzdang
Copy link
Contributor Author

dzdang commented May 11, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dzdang dzdang requested a review from jerryzh168 May 11, 2022 20:04
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops"

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
python test/test_quantization.py -k test_qadd_relu_cudnn
python test/test_quantization.py -k test_qlinear_cudnn
```

Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580)

[ghstack-poisoned]
@dzdang
Copy link
Contributor Author

dzdang commented May 16, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

// pointwise multiplication operations. the only reason why we need this right now is
// we use broadcasting scalar multiplication in conv, linear, and add ops, and cuDNN requires
// the scalar to be a scalar tensor with the same number of dimensions (num_dim) as the tensor we're multiplying to
at::Tensor get_requant_multiplier_tensor(double requant_multiplier, uint8_t num_dim) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe align the naming, looks like we should be using getReauntMultiplerTensor from the other names in this file

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg, thanks, had a nit comment

dzdang added 2 commits May 24, 2022 16:32
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops"

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
python test/test_quantization.py -k test_qadd_relu_cudnn
python test/test_quantization.py -k test_qlinear_cudnn
```

Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580)

[ghstack-poisoned]
…plication support for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops"

Summary:
Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
python test/test_quantization.py -k test_qadd_relu_cudnn
python test/test_quantization.py -k test_qlinear_cudnn
```

Differential Revision: [D35993580](https://our.internmc.facebook.com/intern/diff/D35993580)

[ghstack-poisoned]
@dzdang
Copy link
Contributor Author

dzdang commented May 24, 2022

@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

facebook-github-bot pushed a commit that referenced this pull request May 25, 2022
…upport for requantize_multiplier_tensor in quantized cudnn add, linear, and conv2d ops (#76518)

Summary:
Pull Request resolved: #76518

Previously, requantize_multiplier_tensor was set to be the same size as
quantized_output, as broadcasting by multiplication was not supported in
cudnn. This support has been added in cudnn as of 8.3.3. The requirement
is that requantize_multiplier_tensor still has to be a tensor, but it
can be a scalar tensor with the same number of dimensions as the tensor
it is being multiplied to.

Test Plan:
```
python test/test_quantization.py -k test_qconv2d_cudnn
python test/test_quantization.py -k test_qadd_relu_cudnn
python test/test_quantization.py -k test_qlinear_cudnn
```

```
python test/test_quantization.py -k test_qconv2d_cudnn
```

Differential Revision:
D35993580
D35993580

Reviewed By: jerryzh168

Pulled By: dzdang

fbshipit-source-id: 1f72a3b32d2036733e2e04afeed7bc4d7b3e3a77
@facebook-github-bot facebook-github-bot deleted the gh/dzdang/101/head branch May 28, 2022 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants