Conversation
modules/dnn/src/layers/convolution_arm/conv_2d_3x3s1_winograd.hpp
Outdated
Show resolved
Hide resolved
4204f89 to
e85f3d5
Compare
3281b3c to
67d4dc2
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
|
||
| #include "opencv2/core/hal/intrin.hpp" | ||
|
|
||
| #ifndef FAST_CONV_PRAM |
There was a problem hiding this comment.
since you've added separate fast_convolution.simd.hpp, why not move all the new convolution kernels there? Please, move conv_block and depthWiseBlock there. At once, please, normalize the naming. If you use lowercase names with underscores for conv_block, use the same style for depthwise_block. Or make everything mixed case: convBlock, depthWiseBlock
| } | ||
| } | ||
|
|
||
| void runFastConv2d(InputArray _input, OutputArray _output, |
There was a problem hiding this comment.
you have 2 functions with very similar names: doConvolution and runFastConv2d. What's the difference between 'do' and 'run'? Can you modify the name of one of the functions to make it more clear?
| int Kstripes = Kg_nblocks*stripes_per_sample; | ||
| int nsubtasks = N*ngroups*Kstripes; | ||
|
|
||
| float* inpbuf_all = (float *)fastMalloc(inputbufsize * sizeof(float )); |
There was a problem hiding this comment.
In Ficus engine I had to use C, because this is the native output language for Ficus compiler. In C++ code, please, never use plain "malloc" or its alternatives. Use std::vector<> instead.
There was a problem hiding this comment.
Thanks for code reviewing, I will fix these issues in the next update.
|
Hi @vpisarev, the Code has been updated. And some comments are left. |
| using namespace cv::dnn::cuda4dnn; | ||
| #endif | ||
|
|
||
| #include "./fast_convolution/fast_convolution.hpp" |
There was a problem hiding this comment.
./
Relative prefix should not be used.
| public: | ||
| enum { VEC_ALIGN = 8, DFT_TYPE = CV_32F }; | ||
| Mat weightsMat; | ||
| Mat weightsMat, fastWeights; |
There was a problem hiding this comment.
It is better to put this near fastConv2dImpl
Also it makes sense to add documetation/information about its layout and the difference from weighstMat.
| /* | ||
| This file is a part of ficus language project. | ||
| See ficus/LICENSE for the licensing terms | ||
| */ | ||
| // This file is modified from the ficus (https://github.com/vpisarev/ficus/blob/master/lib/NN/OpConv.fx) |
There was a problem hiding this comment.
@vpisarev Please verify as this integration contradicts to 3rdparty original files and/or 3rdparty adopted files.
There was a problem hiding this comment.
Thanks for code reviewing. Any advice on this? I don't know how to modify it.
There was a problem hiding this comment.
@alalek, could you please explain your comment? What's the contradiction? I can confirm that the code has been borrowed from Ficus, licensed under Apache 2 license.
There was a problem hiding this comment.
I expect header similar to modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp
where we have OpenCV header on top and then the original license header of the adapted code.
| CV_LOG_WARNING(NULL, "Runing at unoptimized code. The combination of FAST_CONV_MR and/or FAST_CONV_NR " | ||
| "is not supported in SIMD128 branch."); |
There was a problem hiding this comment.
ISA-targeted/SIMD code should not emit warnings (or in general call any other non-optimized functions).
There was a problem hiding this comment.
@zihaomu, it should be compile-time error. If user changes FAST_CONV_MR/FAST_CONV_NR, he/she should also modify the optimized loop or explicitly disable it and switch to C implementation.
There was a problem hiding this comment.
Ok, I will update it later.
| using namespace cv::dnn::cuda4dnn; | ||
| #endif | ||
|
|
||
| #include "fast_convolution/fast_convolution.hpp" |
There was a problem hiding this comment.
Need to take a look on:
- test failures in Linux Debug configuration
- test failures in Linux AVX2 configuration (
-DCPU_BASELINE=AVX2) - looks like unconditional doubling of weights storage requires more memory and several Win32 tests started to fail with OOM message. @vpisarev
There was a problem hiding this comment.
Thanks for code reviewing. The failure in Linux Debug and Linux AVX2 only occurs in the quantized model. Since parameters of int8 layers rely on the output of fp32 models, we can modify the threshold to solve it in a short time.
For Win32, I'm looking for a way around it.
There was a problem hiding this comment.
For Win32, I have changed the memory limitation of FasterRCNN_vgg16 from 1GB to 2GB. FasterRCNN_vgg16 has a large memory requirement(412 MB needed on one FC layer). The new patch needs to pack the weightMats in advance in the initialization phase. So for Conv, we need twice as much memory to store the weights.
modules/dnn/test/test_backends.cpp
Outdated
| Mat inp = blobFromImage(img, 1.0, Size(320, 240), Scalar(103.939, 116.779, 123.68), false, false); | ||
| // Output image has values in range [-143.526, 148.539]. | ||
| float l1 = 4e-5, lInf = 2e-3; | ||
| float l1 = 5e-5, lInf = 2e-3; |
There was a problem hiding this comment.
Please use x2-x5 values for test tolerance checks.
4e-5 => 1e-4 instead of 5e-5
Below:
1e-5 => 1e-4 instead of 1.01e-5
|
After merging this PR, Android build fails: https://github.com/opencv/ci-gha-workflow/runs/7157789387?check_suite_focus=true#step:8:1639 |
Thanks for that, i'm looking for a solution, it's estimated to take a day or two to resolve. |
DNN: Accelerating convolution * Fast Conv of ARM, X86 and universal intrinsics. * improve code style. * error fixed. * improve the License * optimize memory allocated and Adjust the threshold. * change FasterRCNN_vgg16 to 2GB memory.
The goal of this proposal is to speed up the convolution layer, thereby speeding up the running speed of the dnn module.
Performance Test on ARM (Appel M1 Chip, 8 threads)
On ARM platform, it can achieve 2.5X speed up on ResNet50, 1.7X speed up on MobileNetV2.
Performance Test on X86 (AMD 5600X, 12 threads)
It can achieve 15% faster on X86 platform.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.
WIP