Skip to content

Classify and extend convolution and depthwise performance tests#24547

Merged
asmorkalov merged 10 commits intoopencv:4.xfrom
WanliZhong:refactor_conv_perf_test
Dec 11, 2023
Merged

Classify and extend convolution and depthwise performance tests#24547
asmorkalov merged 10 commits intoopencv:4.xfrom
WanliZhong:refactor_conv_perf_test

Conversation

@WanliZhong
Copy link
Copy Markdown
Member

@WanliZhong WanliZhong commented Nov 15, 2023

This PR aims to:

  1. Extend the test cases from models: YOLOv5, YOLOv8, EfficientNet, YOLOX, YuNet, SFace, MPPalm, MPHand, MPPose, ViTTrack, PPOCRv3, CRNN, PPHumanSeg. (371 new test cases are added)

  2. Classify the existing convolution performance test to below cases

    • CONV_1x1
    • CONV_3x3_S1_D1 (winograd)
    • CONV
    • DEPTHWISE
  3. Reduce unnecessary test cases by follow 3 rules (366 test cases are pruned):
    (i). For all tests, except for pad and bias related parameters, all other parameters are the same. Only one case can be reserved.
    (ii). When the only difference is the channel of input shape, and other parameters are the same. Only one case can be reserved in each range [1, 3], [4, 7], [8, 15], [16, 31], [32, 63], [64, 127], [128, 255], [256, 511], [512, 1023], [1024, 2047], [2048, 4095]
    (iii). When the only difference is the width and height of input shape, and other parameters are the same. Only one case can be reserved in each range [1, 31], [32, 63], [64, 95]...

Reproduced: 1. follow step in alalek@dnn_dump_conv_kernels to dump all convolution cases from new models. (declared flops may not right, need to be checked manually) 2 and 3. Use the script from python code classify conv.txt

Performance test result on Apple M2

Test result details: M2.md

Additional test result details with FP16: m2_results_with_fp16.zip

Brief summary for 4.8.1 vs 4.7.0 or 4.6.0:

  1. CONV_1x1_S1_D1 dropped significant with small or large input shape.
  2. DEPTHWISE_5x5 dropped a little compared with 4.7.0.

Performance test result on Intel Core i7-12700K: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.

Test result details: INTEL.md
Brief summary for 4.8.1 vs 4.5.5:

  1. CONV_5x5_S1_D1 dropped significant.
  2. CONV_1x1_S1_D1, CONV_3x3_S1_D1, DEPTHWISE_3x3_S1_D1, DEPTHWISW_3x3_S2_D1 dropped with small input shape.

TODO:

  • Perform tests on arm with each opencv version
  • Perform tests on x86 with each opencv version
  • Split each test classification with single test config
  • test enable fp16

Copy link
Copy Markdown
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give a summary on what have been removed and added?

@WanliZhong
Copy link
Copy Markdown
Member Author

WanliZhong commented Nov 16, 2023

@fengyuentau Totally add 284 new cases, remove 279 old cases. You can check each cases in file diff.txt

@WanliZhong
Copy link
Copy Markdown
Member Author

UPDATE: Test results are attached. I have finished the convolution performance test with each OpenCV release version. The result show depthwise may not the biggest problem. Many performance issues have been fixed at 4.8.1

@zihaomu
Copy link
Copy Markdown
Member

zihaomu commented Nov 17, 2023

Great job! Looks like we are not good at some case which has the small output shape or small output channles. With these performance tests, we can control compute branches in more detail.

Regarding the x86 platform, these problems become more serious.

@vpisarev
Copy link
Copy Markdown
Contributor

@WanliZhong, we need to split this test into several cases: 1x1 convolution, 3x3s1d1 (winograd and im2row-based), depthwise, generic (the remaining cases). each convolution case should test FP32 and FP16

@WanliZhong WanliZhong force-pushed the refactor_conv_perf_test branch from c38bfd4 to 5bb32e1 Compare November 28, 2023 14:33
@WanliZhong
Copy link
Copy Markdown
Member Author

WanliZhong commented Nov 28, 2023

UPDATE

  1. splitting the test to 4 types.
  2. upload a additional test results source file with FP16. m2_results_with_fp16.zip
  3. not sure why abi check fail.

@WanliZhong
Copy link
Copy Markdown
Member Author

WanliZhong commented Nov 30, 2023

UPDATE

  1. Currently, each test case will be tested in four situation: FP32 with Winograd, FP32 without Winograd, FP16 with Winograd, FP16 without Winograd by default.
  2. For 4 type tests, only run top 20 cases by default to save CI time.
  3. DO ANYONE HAVE OTHER GOOD RULES FOR PRUNING TEST CASES?

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Target targetId = get<1>(get<2>(GetParam()));
bool winograd = get<1>(GetParam());
Net net = build_net(params, backendId, targetId);
net.enableWinograd(winograd);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is "warmup" stage in the original test code.

If we change settings, then we should do that again.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warm up happen on build_net() function. I didn't change this part of the code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you play with network configuration setting (like net.enableWinograd(winograd); on the line 932) then you should do "warmup" again.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It's right. I will do it soon

@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Dec 3, 2023

@WanliZhong, Winograd is only valid for 3x3s1d1; for other tests it does not make any sense. Could you please adjust your tests, otherwise we will have many useless cases for 1x1, depthwise, generic etc.

return net;
}

typedef tuple<ConvParam_t, bool, tuple<Backend, Target> > ConvTestParam_t;
Copy link
Copy Markdown
Contributor

@vpisarev vpisarev Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the following definitions instead to add Winograd parameter only to 3x3S1D1

typedef tuple<ConvParam_t, tuple<Backend, Target> > ConvTestParam_t;
typedef tuple<ConvParam_t, tuple<Backend, Target>, bool> Conv3x3S1D1TestParam_t;
typedef TestBaseWithParam<ConvTestParam_t> Conv;
typedef TestBaseWithParam<ConvTestParam_t> Conv_1x1;
typedef TestBaseWithParam<Conv3x3S1D1TestParam_t> Conv_3x3S1D1;
typedef TestBaseWithParam<ConvTestParam_t> Conv_Depthwise;

@vpisarev vpisarev self-requested a review December 11, 2023 18:31
@asmorkalov asmorkalov merged commit 6ee71fe into opencv:4.x Dec 11, 2023
@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Classify and extend convolution and depthwise performance tests opencv#24547

This PR aims to:
1. Extend the test cases from models: `YOLOv5`, `YOLOv8`, `EfficientNet`, `YOLOX`, `YuNet`, `SFace`, `MPPalm`, `MPHand`, `MPPose`, `ViTTrack`, `PPOCRv3`, `CRNN`, `PPHumanSeg`. (371 new test cases are added)

2. Classify the existing convolution performance test to below cases
    - CONV_1x1
    - CONV_3x3_S1_D1 (winograd)
    - CONV
    - DEPTHWISE

3. Reduce unnecessary test cases by follow 3 rules (366 test cases are pruned):
(i). For all tests, except for pad and bias related parameters, all other parameters are the same. Only one case can be reserved.
(ii). When the only difference is the channel of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 3], [4, 7], [8, 15], [16, 31], [32, 63], [64, 127], [128, 255], [256, 511], [512, 1023], [1024, 2047], [2048, 4095]`
(iii). When the only difference is the width and height of input shape, and other parameters are the same. Only one case can be reserved in each range `[1, 31], [32, 63], [64, 95]... `

> **Reproduced**: 1. follow step in alalek@dnn_dump_conv_kernels to dump all convolution cases from new models. (declared flops may not right, need to be checked manually) 2 and 3. Use the script from python code [classify conv.txt](https://github.com/opencv/opencv/files/13522228/classify.conv.txt)


**Performance test result on Apple M2**

**Test result details**:  [M2.md](https://github.com/opencv/opencv/files/13379189/M2.md)

**Additional test result details with FP16**:  [m2_results_with_fp16.zip](https://github.com/opencv/opencv/files/13491070/m2_results_with_fp16.zip)


**Brief summary for 4.8.1 vs 4.7.0 or 4.6.0**: 
1. `CONV_1x1_S1_D1` dropped significant with small or large input shape.
2. `DEPTHWISE_5x5 ` dropped a little compared with 4.7.0. 

---

**Performance test result on [Intel Core i7-12700K](https://www.intel.com/content/www/us/en/products/sku/134594/intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html)**: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.

**Test result details**: [INTEL.md](https://github.com/opencv/opencv/files/13374093/INTEL.md)
**Brief summary for 4.8.1 vs 4.5.5**: 
1. `CONV_5x5_S1_D1` dropped significant. 
2. `CONV_1x1_S1_D1`, `CONV_3x3_S1_D1`, `DEPTHWISE_3x3_S1_D1`, `DEPTHWISW_3x3_S2_D1` dropped with small input shape.

---

TODO:
- [x] Perform tests on arm with each opencv version
- [x] Perform tests on x86 with each opencv version
- [x] Split each test classification with single test config
- [x] test enable fp16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants