Classify and extend convolution and depthwise performance tests#24547
Classify and extend convolution and depthwise performance tests#24547asmorkalov merged 10 commits intoopencv:4.xfrom
Conversation
fengyuentau
left a comment
There was a problem hiding this comment.
Could you give a summary on what have been removed and added?
|
@fengyuentau Totally add 284 new cases, remove 279 old cases. You can check each cases in file diff.txt |
|
UPDATE: Test results are attached. I have finished the convolution performance test with each OpenCV release version. The result show depthwise may not the biggest problem. Many performance issues have been fixed at 4.8.1 |
|
Great job! Looks like we are not good at some case which has the small output shape or small output channles. With these performance tests, we can control compute branches in more detail. Regarding the x86 platform, these problems become more serious. |
|
@WanliZhong, we need to split this test into several cases: 1x1 convolution, 3x3s1d1 (winograd and im2row-based), depthwise, generic (the remaining cases). each convolution case should test FP32 and FP16 |
c38bfd4 to
5bb32e1
Compare
|
UPDATE
|
|
UPDATE
|
| Target targetId = get<1>(get<2>(GetParam())); | ||
| bool winograd = get<1>(GetParam()); | ||
| Net net = build_net(params, backendId, targetId); | ||
| net.enableWinograd(winograd); |
There was a problem hiding this comment.
There is "warmup" stage in the original test code.
If we change settings, then we should do that again.
There was a problem hiding this comment.
warm up happen on build_net() function. I didn't change this part of the code
There was a problem hiding this comment.
If you play with network configuration setting (like net.enableWinograd(winograd); on the line 932) then you should do "warmup" again.
There was a problem hiding this comment.
Thanks! It's right. I will do it soon
|
@WanliZhong, Winograd is only valid for 3x3s1d1; for other tests it does not make any sense. Could you please adjust your tests, otherwise we will have many useless cases for 1x1, depthwise, generic etc. |
| return net; | ||
| } | ||
|
|
||
| typedef tuple<ConvParam_t, bool, tuple<Backend, Target> > ConvTestParam_t; |
There was a problem hiding this comment.
use the following definitions instead to add Winograd parameter only to 3x3S1D1
typedef tuple<ConvParam_t, tuple<Backend, Target> > ConvTestParam_t;
typedef tuple<ConvParam_t, tuple<Backend, Target>, bool> Conv3x3S1D1TestParam_t;
typedef TestBaseWithParam<ConvTestParam_t> Conv;
typedef TestBaseWithParam<ConvTestParam_t> Conv_1x1;
typedef TestBaseWithParam<Conv3x3S1D1TestParam_t> Conv_3x3S1D1;
typedef TestBaseWithParam<ConvTestParam_t> Conv_Depthwise;
Classify and extend convolution and depthwise performance tests opencv#24547 This PR aims to: 1. Extend the test cases from models: `YOLOv5`, `YOLOv8`, `EfficientNet`, `YOLOX`, `YuNet`, `SFace`, `MPPalm`, `MPHand`, `MPPose`, `ViTTrack`, `PPOCRv3`, `CRNN`, `PPHumanSeg`. (371 new test cases are added) 2. Classify the existing convolution performance test to below cases - CONV_1x1 - CONV_3x3_S1_D1 (winograd) - CONV - DEPTHWISE 3. Reduce unnecessary test cases by follow 3 rules (366 test cases are pruned): (i). For all tests, except for pad and bias related parameters, all other parameters are the same. Only one case can be reserved. (ii). When the only difference is the channel of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 3], [4, 7], [8, 15], [16, 31], [32, 63], [64, 127], [128, 255], [256, 511], [512, 1023], [1024, 2047], [2048, 4095]` (iii). When the only difference is the width and height of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 31], [32, 63], [64, 95]... ` > **Reproduced**: 1. follow step in alalek@dnn_dump_conv_kernels to dump all convolution cases from new models. (declared flops may not right, need to be checked manually) 2 and 3. Use the script from python code [classify conv.txt](https://github.com/opencv/opencv/files/13522228/classify.conv.txt) **Performance test result on Apple M2** **Test result details**: [M2.md](https://github.com/opencv/opencv/files/13379189/M2.md) **Additional test result details with FP16**: [m2_results_with_fp16.zip](https://github.com/opencv/opencv/files/13491070/m2_results_with_fp16.zip) **Brief summary for 4.8.1 vs 4.7.0 or 4.6.0**: 1. `CONV_1x1_S1_D1` dropped significant with small or large input shape. 2. `DEPTHWISE_5x5 ` dropped a little compared with 4.7.0. --- **Performance test result on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html)**: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads. **Test result details**: [INTEL.md](https://github.com/opencv/opencv/files/13374093/INTEL.md) **Brief summary for 4.8.1 vs 4.5.5**: 1. `CONV_5x5_S1_D1` dropped significant. 2. `CONV_1x1_S1_D1`, `CONV_3x3_S1_D1`, `DEPTHWISE_3x3_S1_D1`, `DEPTHWISW_3x3_S2_D1` dropped with small input shape. --- TODO: - [x] Perform tests on arm with each opencv version - [x] Perform tests on x86 with each opencv version - [x] Split each test classification with single test config - [x] test enable fp16
This PR aims to:
Extend the test cases from models:
YOLOv5,YOLOv8,EfficientNet,YOLOX,YuNet,SFace,MPPalm,MPHand,MPPose,ViTTrack,PPOCRv3,CRNN,PPHumanSeg. (371 new test cases are added)Classify the existing convolution performance test to below cases
Reduce unnecessary test cases by follow 3 rules (366 test cases are pruned):
(i). For all tests, except for pad and bias related parameters, all other parameters are the same. Only one case can be reserved.
(ii). When the only difference is the channel of input shape, and other parameters are the same. Only one case can be reserved in each range
[1, 3], [4, 7], [8, 15], [16, 31], [32, 63], [64, 127], [128, 255], [256, 511], [512, 1023], [1024, 2047], [2048, 4095](iii). When the only difference is the width and height of input shape, and other parameters are the same. Only one case can be reserved in each range
[1, 31], [32, 63], [64, 95]...Performance test result on Apple M2
Test result details: M2.md
Additional test result details with FP16: m2_results_with_fp16.zip
Brief summary for 4.8.1 vs 4.7.0 or 4.6.0:
CONV_1x1_S1_D1dropped significant with small or large input shape.DEPTHWISE_5x5dropped a little compared with 4.7.0.Performance test result on Intel Core i7-12700K: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.
Test result details: INTEL.md
Brief summary for 4.8.1 vs 4.5.5:
CONV_5x5_S1_D1dropped significant.CONV_1x1_S1_D1,CONV_3x3_S1_D1,DEPTHWISE_3x3_S1_D1,DEPTHWISW_3x3_S2_D1dropped with small input shape.TODO: