Resolve uncovered CUDA dnn layer by dkurt · Pull Request #24080 · opencv/opencv

dkurt · 2023-07-31T14:33:56Z

Pull Request Readiness Checklist

Gelu activation layer on CUDA
Try to relax GEMM from ONNX

resolves #24064

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

modules/dnn/src/layers/mvn_layer.cpp

dkurt · 2023-08-01T13:11:38Z

Seems like LayerNorm changes became bigger and wider. Make sense to move to a separate PR.

dkurt · 2023-08-01T17:15:25Z

@WanliZhong, @fengyuentau, please review a part about GEMM. I have switched is_matmul to false when weights are just 2D.

asmorkalov · 2023-08-02T10:59:30Z

@dkurt Thanks a lot for the patch. I made experiments with the code. The matmul change is covered by accuracy tests and it works well, but not covered with performance tests. Could you add some performance test to ensure performance regressions before merge.

asmorkalov

👍

dkurt · 2023-08-02T18:40:37Z

@asmorkalov, added performance test. To compare, replace lp.set("is_matmul", weights.dims > 2); to lp.set("is_matmul", true); on the same branch. This is the only change made in ONNX importer.

Geometric mean (ms)

                           Name of Test                             always  matmul    matmul  
                                                                    matmul multidim  multidim 
                                                                                        vs    
                                                                                      always  
                                                                                      matmul  
                                                                                    (x-factor)
fc::Layer_FullyConnected::([5, 16, 512, 128], 256, false, OCV/CPU)  17.941  16.875     1.06   
fc::Layer_FullyConnected::([5, 16, 512, 128], 256, true, OCV/CPU)   19.657  19.652     1.00   
fc::Layer_FullyConnected::([5, 16, 512, 128], 512, false, OCV/CPU)  35.543  33.532     1.06   
fc::Layer_FullyConnected::([5, 16, 512, 128], 512, true, OCV/CPU)   39.381  39.283     1.00   
fc::Layer_FullyConnected::([5, 16, 512, 128], 1024, false, OCV/CPU) 71.357  68.120     1.05   
fc::Layer_FullyConnected::([5, 16, 512, 128], 1024, true, OCV/CPU)  80.729  81.594     0.99   
fc::Layer_FullyConnected::([5, 512, 384, 0], 256, false, OCV/CPU)   3.217   3.152      1.02   
fc::Layer_FullyConnected::([5, 512, 384, 0], 256, true, OCV/CPU)    3.326   3.301      1.01   
fc::Layer_FullyConnected::([5, 512, 384, 0], 512, false, OCV/CPU)   6.435   6.414      1.00   
fc::Layer_FullyConnected::([5, 512, 384, 0], 512, true, OCV/CPU)    6.718   6.741      1.00   
fc::Layer_FullyConnected::([5, 512, 384, 0], 1024, false, OCV/CPU)  17.074  17.059     1.00   
fc::Layer_FullyConnected::([5, 512, 384, 0], 1024, true, OCV/CPU)   17.356  17.388     1.00

fengyuentau

Gemm part looks good to me. IIRC, is_matmul is added by Wanli and used to deal with some optimization things. @WanliZhong Could you add some details here?

dkurt · 2023-08-03T05:35:52Z

@fengyuentau, is_matmul is not about optimization but enabling matmul broadcast. It utilizes the same implementation: #22828

opencv/modules/dnn/src/layers/fully_connected_layer.cpp

Lines 534 to 567 in 0245c0c

    
           if (isMatMul) 
        
           { 
        
               int matNum = input[0].total(0, inp1Dim - 2); 
        
               int rowMatMul = oriMat.size[oriMat.dims - 2]; 
        
               Mat srcMatTmp = input[0].reshape(1, matNum); 
        
               Mat dstMatTmp = output[0].reshape(1, matNum); 
        
               int outerSize = input[0].size[inp1Dim - 2]; 
        
               int rowStart = -rowMatMul; 
        
               for (int n = 0; n < matNum; ++n) 
        
               { 
        
                   Mat srcMat = srcMatTmp.row(n).reshape(1, outerSize); 
        
                   Mat dstMat = dstMatTmp.row(n).reshape(1, outerSize); 
        
                   rowStart = (rowStart + rowMatMul) % weightsMat.rows; 
        
                   Mat weiMat = weightsMat.rowRange(rowStart, rowStart + rowMatMul); 
        
                   const int nstripes = getNumThreads(); 
        
                   FullyConnected::run(srcMat, weiMat, biasMat, dstMat, activ.get(), nstripes); 
        
               } 
        
           } 
        
           else 
        
           { 
        
               int axisCan = normalize_axis(axis, inp1Dim); 
        
               int outerSize = input[0].total(0, axisCan); 
        
               for (size_t i = 0; i < input.size(); i++) 
        
               { 
        
                   Mat srcMat = input[i].reshape(1, outerSize); 
        
                   Mat dstMat = output[i].reshape(1, outerSize); 
        
                   const int nstripes = getNumThreads(); 
        
                   FullyConnected::run(srcMat, weightsMat, biasMat, dstMat, activ.get(), nstripes); 
        
               } 
        
           }

WanliZhong

When the matrix is 2D, the implementation is the same whether or not is_matmul is true. Someday matrix multiplication should have to be separated from the inner product implementation. :-)

asmorkalov

👍

Resolve uncovered CUDA dnn layer opencv#24080 ### Pull Request Readiness Checklist * Gelu activation layer on CUDA * Try to relax GEMM from ONNX resolves opencv#24064 See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

dkurt added 3 commits July 30, 2023 20:21

Implement Gelu layer

99f27c7

Try replace LayerNorm to MVN

38969c1

Relax is_matmul for better CUDA coverage

0d18d43

dkurt added optimization category: dnn labels Jul 31, 2023

LayerNorm redirects to MVN

2f61118

dkurt force-pushed the dnn_cuda_layers branch from d4f4c74 to 2f61118 Compare August 1, 2023 10:26

dkurt commented Aug 1, 2023

View reviewed changes

modules/dnn/src/layers/mvn_layer.cpp Outdated Show resolved Hide resolved

LayerNorm with OpenCL

1641183

Remove LayerNorm changes

77c59a7

dkurt marked this pull request as ready for review August 1, 2023 13:19

dkurt requested a review from WanliZhong August 1, 2023 17:14

dkurt requested a review from fengyuentau August 1, 2023 17:15

asmorkalov added the pr: needs test New functionality requires minimal tests set label Aug 2, 2023

asmorkalov self-requested a review August 2, 2023 10:59

asmorkalov approved these changes Aug 2, 2023

View reviewed changes

Add perf test

d4ebb0f

fengyuentau reviewed Aug 3, 2023

View reviewed changes

WanliZhong approved these changes Aug 3, 2023

View reviewed changes

asmorkalov approved these changes Aug 3, 2023

View reviewed changes

asmorkalov added category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib and removed pr: needs test New functionality requires minimal tests set labels Aug 3, 2023

asmorkalov self-assigned this Aug 3, 2023

asmorkalov merged commit 96f23e3 into opencv:4.x Aug 3, 2023

asmorkalov added this to the 4.9.0 milestone Aug 3, 2023

dkurt mentioned this pull request Aug 5, 2023

DNN: OpenCL FP16 tests are broken (Test_ONNX_layers.MatMul_init_bcast) (2023-08-03) #24111

Closed

asmorkalov mentioned this pull request Aug 7, 2023

(5.x) Merge 4.x #24119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolve uncovered CUDA dnn layer#24080

Resolve uncovered CUDA dnn layer#24080
asmorkalov merged 7 commits intoopencv:4.xfrom
dkurt:dnn_cuda_layers

dkurt commented Jul 31, 2023 •

edited

Loading

Uh oh!

Uh oh!

dkurt commented Aug 1, 2023

Uh oh!

dkurt commented Aug 1, 2023 •

edited

Loading

Uh oh!

asmorkalov commented Aug 2, 2023

Uh oh!

asmorkalov left a comment

Uh oh!

dkurt commented Aug 2, 2023 •

edited

Loading

Uh oh!

fengyuentau left a comment

Uh oh!

dkurt commented Aug 3, 2023

Uh oh!

WanliZhong left a comment

Uh oh!

asmorkalov left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

dkurt commented Jul 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

Uh oh!

dkurt commented Aug 1, 2023

Uh oh!

dkurt commented Aug 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Aug 2, 2023

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

dkurt commented Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fengyuentau left a comment

Choose a reason for hiding this comment

Uh oh!

dkurt commented Aug 3, 2023

Uh oh!

WanliZhong left a comment

Choose a reason for hiding this comment

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dkurt commented Jul 31, 2023 •

edited

Loading

dkurt commented Aug 1, 2023 •

edited

Loading

dkurt commented Aug 2, 2023 •

edited

Loading