Skip to content

[GSoC] dnn: Blockwise quantization support #25644

Merged
asmorkalov merged 17 commits intoopencv:4.xfrom
DaniAffCH:blockwise-quantization
Jul 30, 2024
Merged

[GSoC] dnn: Blockwise quantization support #25644
asmorkalov merged 17 commits intoopencv:4.xfrom
DaniAffCH:blockwise-quantization

Conversation

@DaniAffCH
Copy link
Copy Markdown
Contributor

@DaniAffCH DaniAffCH commented May 25, 2024

This PR introduces blockwise quantization in DNN allowing the parsing of ONNX models quantized in blockwise style. In particular it modifies the Quantize and Dequantize operations. The related PR opencv/opencv_extra#1181 contains the test data.

Additional notes:

  • The original quantization issue has been fixed. Previously, for 1D scale and zero-point, the operation applied was $y = int8(x/s - z)$ instead of $y = int8(x/s + z)$. Note that the operation was already correctly implemented when the scale and zero-point were scalars. The previous implementation failed the ONNX test cases, but now all have passed successfully. Reference
  • the function block_repeat broadcasts scale and zero-point to the input shape. It repeats all the elements of a given axis n times. This function generalizes the behavior of repeat from the core module which is defined just for 2 axis assuming Mat has 2 dimensions. If appropriate and useful, you might consider moving block_repeat to the core module.
  • Now, the scale and zero-point can be taken as layer inputs. This increases the ONNX layers' coverage and enables us to run the ONNX test cases (previously disabled) being fully compliant with ONNX standards. Since they are now supported, I have enabled the test cases for: test_dequantizelinear, test_dequantizelinear_axis, test_dequantizelinear_blocked, test_quantizelinear, test_quantizelinear_axis, test_quantizelinear_blocked just in CPU backend. All of them pass successfully.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@DaniAffCH DaniAffCH changed the title [GSoC] Blockwise quantization support [GSoC] dnn: Blockwise quantization support May 25, 2024
@fengyuentau fengyuentau added GSoC category: dnn category: dnn (onnx) ONNX suport issues in DNN module labels May 27, 2024
@fengyuentau fengyuentau self-requested a review May 27, 2024 12:05
@fengyuentau fengyuentau self-assigned this May 27, 2024
@DaniAffCH
Copy link
Copy Markdown
Contributor Author

@fengyuentau feel free to review

@DaniAffCH DaniAffCH marked this pull request as ready for review May 29, 2024 08:52
@fengyuentau fengyuentau added the category:dnn_timvx TIM-VX related issues in DNN module label May 31, 2024
@DaniAffCH
Copy link
Copy Markdown
Contributor Author

All previous comments have been addressed. For multiple inferences, the scale and zero point are now cached and reused in subsequent executions when they are inputs.

@fengyuentau fengyuentau added this to the 4.11.0 milestone Jul 3, 2024
Copy link
Copy Markdown
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these comments were made a week ago but I forget to submit it... Anyway your PR looks good to me.


copyVecToMat(tmpMat,data);

block_repeat(tmpMat, axis, block_size, mat);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid creating a new Mat here? Repeating is like reusing a piece of memory multiple times. We can rewrite the forward implementation with a loop, each time it proceeds with a correct step. So we can save time and memory from creating new Mat.


Also if this is implemented in the end, could you put the core function, e.g. quantize(...) in modules/dnn/src/layers/cpu_kernels? I think, when it comes to importing stage, we need to call quantize or dequantize functions in modules/dnn/src/onnx/onnx_graph_simplifier.cpp.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done later when we need to fuse qdq.

fengyuentau

This comment was marked as resolved.

@fengyuentau
Copy link
Copy Markdown
Member

@vpisarev @asmorkalov @dkurt Feel free to review this PR. I propose to merge it because it is needed in the second stage of the GSoC project.

asmorkalov pushed a commit to opencv/opencv_extra that referenced this pull request Jul 30, 2024
Support Blockwise Quantization #1181

This PR contains the test data necessary to verify the correctness of blockwise quantization introduced in opencv/opencv#25644
@asmorkalov asmorkalov merged commit 2a333a6 into opencv:4.x Jul 30, 2024
@DaniAffCH DaniAffCH deleted the blockwise-quantization branch July 30, 2024 12:55
@asmorkalov asmorkalov mentioned this pull request Aug 6, 2024
fengyuentau pushed a commit to fengyuentau/opencv that referenced this pull request Aug 15, 2024
[GSoC] dnn: Blockwise quantization support opencv#25644

This PR introduces blockwise quantization in DNN allowing the parsing of ONNX models quantized in blockwise style. In particular it modifies the `Quantize` and `Dequantize` operations. The related PR opencv/opencv_extra#1181 contains the test data.

Additional notes:
- The original quantization issue has been fixed. Previously, for 1D scale and zero-point, the operation applied was  $y = int8(x/s - z)$ instead of $y = int8(x/s + z)$. Note that the operation was already correctly implemented when the scale and zero-point were scalars. The previous implementation failed the ONNX test cases, but now all have passed successfully.  [Reference](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- the function `block_repeat` broadcasts scale and zero-point to the input shape. It repeats all the elements of a given axis n times. This function generalizes the behavior of `repeat` from the core module which is defined just for 2 axis assuming `Mat` has 2 dimensions. If appropriate and useful, you might consider moving `block_repeat` to the core module.
- Now, the scale and zero-point can be taken as layer inputs. This increases the ONNX layers' coverage and enables us to run the ONNX test cases (previously disabled) being fully compliant with ONNX standards. Since they are now supported, I have enabled the test cases for: `test_dequantizelinear`, `test_dequantizelinear_axis`, `test_dequantizelinear_blocked`, `test_quantizelinear`, `test_quantizelinear_axis`, `test_quantizelinear_blocked` just in CPU backend. All of them pass successfully.
   
### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to CobbsLab/OPENCV that referenced this pull request Feb 13, 2025
[GSoC] dnn: Blockwise quantization support opencv#25644

This PR introduces blockwise quantization in DNN allowing the parsing of ONNX models quantized in blockwise style. In particular it modifies the `Quantize` and `Dequantize` operations. The related PR opencv/opencv_extra#1181 contains the test data.

Additional notes:
- The original quantization issue has been fixed. Previously, for 1D scale and zero-point, the operation applied was  $y = int8(x/s - z)$ instead of $y = int8(x/s + z)$. Note that the operation was already correctly implemented when the scale and zero-point were scalars. The previous implementation failed the ONNX test cases, but now all have passed successfully.  [Reference](https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear)
- the function `block_repeat` broadcasts scale and zero-point to the input shape. It repeats all the elements of a given axis n times. This function generalizes the behavior of `repeat` from the core module which is defined just for 2 axis assuming `Mat` has 2 dimensions. If appropriate and useful, you might consider moving `block_repeat` to the core module.
- Now, the scale and zero-point can be taken as layer inputs. This increases the ONNX layers' coverage and enables us to run the ONNX test cases (previously disabled) being fully compliant with ONNX standards. Since they are now supported, I have enabled the test cases for: `test_dequantizelinear`, `test_dequantizelinear_axis`, `test_dequantizelinear_blocked`, `test_quantizelinear`, `test_quantizelinear_axis`, `test_quantizelinear_blocked` just in CPU backend. All of them pass successfully.
   
### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: dnn (onnx) ONNX suport issues in DNN module category:dnn_timvx TIM-VX related issues in DNN module category: dnn GSoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants