DNN: ARMv7 compatible fastConv by zihaomu · Pull Request #22183 · opencv/opencv

zihaomu · 2022-07-03T04:33:16Z

This PR is compatible fastConv and winogradConv with ARMv7.
The previous #21910 PR only supported AARCH64 or ARMv8. And it has bugs on ARMv7
as @asenyaev reported.

closes #22188

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

zihaomu · 2022-07-04T03:28:28Z

modules/dnn/test/test_model.cpp

    double confThreshold = 0.24;
    double nmsThreshold = (target == DNN_TARGET_MYRIAD) ? 0.397 : 0.4;
-    double scoreDiff = 8e-5, iouDiff = 1e-5;
+    double scoreDiff = 1e-4, iouDiff = 1e-5;


The arm instruction vfmaq_laneq_f32 is only supported at ARMv8. So I use the vmlaq_n_f32 instead. I adjusted the thresholds here and below because I found the vmlaq_n_f32 will generate very little different results than vmlaq_n_f32.

On my M1 chip, the vmlaq_n_f32 will be parsed as follows:

fmul.4s v24, v22, v20[0] fadd.4s v3, v3, v24

And the vfmaq_laneq_f32 will be parsed as follows:

fmla.4s v16, v2, v14[3]

I'm not sure if the FMA instruction (vmlaq_n_f32) can be parsed into a single arm assembly(vmla.f32 q8, q5, d0[1]) on ARMv7. If that's true, I believe we can do it without adjusting thresholds.

vmlaq_lane_f32 is available on armv7, you can use it as the replacement of vfmaq_laneq_f32

Hi @nihui, thanks for your suggestion. I have tested vmlaq_lane_f32 and vmlaq_n_f32 on my M1 mac, and I found that they were both parsed to follow assembly code:

fmul.4s v24, v22, v20[0] fadd.4s v3, v3, v24

Do you mean the vmlaq_lane_f32 will be parsed to a single arm assembly(like vmla.f32 q8, q5, d0[1]) on ARMv7?

Hi @nihui, I have updated the code with vmlaq_lane_f32 and everything works fine. Big thanks!

zihaomu · 2022-07-04T09:02:17Z

Hi @alalek, I can't pass the ARMv7 CI. And from the CI's log, I have no idea how to fix it. Any advice?

nihui · 2022-07-04T09:02:46Z

modules/dnn/test/test_model.cpp

    double confThreshold = 0.24;
    double nmsThreshold = (target == DNN_TARGET_MYRIAD) ? 0.397 : 0.4;
-    double scoreDiff = 8e-5, iouDiff = 1e-5;
+    double scoreDiff = 1e-4, iouDiff = 1e-5;


vmlaq_lane_f32 is available on armv7, you can use it as the replacement of vfmaq_laneq_f32

alalek · 2022-07-04T09:25:53Z

@zihaomu Please ignore (ARMv7 configuration is not working on BuildBot)

vpisarev · 2022-07-05T14:08:36Z

modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp

-                        float32x4_t r04 = r00, r05 = r00, r06 = r00, r07 = r00;
-                        float32x4_t r08 = r00, r09 = r00, r10 = r00, r11 = r00;
-                        float32x4_t r12 = r00, r13 = r00, r14 = r00, r15 = r00;
+                        float32x2_t q00 = vdup_n_f32(0.0f), q01 = q00, q02 = q00, q03 = q00,


could you please explain why you use halves of NEON registers on ARMv7? ARMv7 still has 128-bit NEON registers, I don't see why not use all of them

Thanks for code reviewing. As @nihui's commented, vmlaq_lane_f32 is the best substitute for vfmaq_laneq_f32 under the ARMv7 platform.
Another option is vmlaq_n_f32, it will be parsed into two arm assembly code fmul.4s v24, v22, v20[0] and fadd.4s v3, v3, v24. And vmlaq_lane_f32 will be parsed into only one arm assembly code vmla.f32 q8, q5, d0[1] on ARMv7.
At the same time, two consecutive half-length register loads will be converted into a 128bit load during loading data to register, so the data load time is the same.

vpisarev · 2022-07-07T01:51:34Z

👍

vpisarev · 2022-07-07T01:53:37Z

@nihui, btw, let me use this opportunity to thank you for ncnn, which we took some code from. ncnn is a real masterpiece 👍

DNN: ARMv7 compatible fastConv * support armv7 on fastConv * remove whitespace.

zihaomu added the category: dnn label Jul 3, 2022

zihaomu requested review from alalek and vpisarev July 3, 2022 04:34

zihaomu commented Jul 4, 2022

View reviewed changes

asmorkalov mentioned this pull request Jul 4, 2022

Compilation problems on Pi Gen 2 with Distribution Bullseye Full (32 bit) #22188

Closed

4 tasks

nihui suggested changes Jul 4, 2022

View reviewed changes

zihaomu force-pushed the fastConv_ARMv7_compatible branch from a542fa7 to 057a32e Compare July 4, 2022 11:55

support armv7 on fastConv

7bfc1fe

zihaomu force-pushed the fastConv_ARMv7_compatible branch from 057a32e to 7bfc1fe Compare July 4, 2022 14:33

remove whitespace.

f3e5f1c

vpisarev reviewed Jul 5, 2022

View reviewed changes

vpisarev self-assigned this Jul 7, 2022

vpisarev self-requested a review July 7, 2022 01:51

vpisarev approved these changes Jul 7, 2022

View reviewed changes

alalek merged commit 139c443 into opencv:4.x Jul 7, 2022

alalek mentioned this pull request Aug 21, 2022

(5.x) Merge 4.x #22408

Merged

a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023

Merge pull request opencv#22183 from zihaomu:fastConv_ARMv7_compatible

88d1dbc

DNN: ARMv7 compatible fastConv * support armv7 on fastConv * remove whitespace.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DNN: ARMv7 compatible fastConv#22183

DNN: ARMv7 compatible fastConv#22183
alalek merged 2 commits intoopencv:4.xfrom
zihaomu:fastConv_ARMv7_compatible

zihaomu commented Jul 3, 2022 •

edited by alalek

Loading

Uh oh!

zihaomu Jul 4, 2022

Uh oh!

zihaomu Jul 4, 2022 •

edited

Loading

Uh oh!

nihui Jul 4, 2022

Uh oh!

zihaomu Jul 4, 2022

Uh oh!

zihaomu Jul 4, 2022

Uh oh!

zihaomu commented Jul 4, 2022

Uh oh!

nihui Jul 4, 2022

Uh oh!

alalek commented Jul 4, 2022

Uh oh!

vpisarev Jul 5, 2022

Uh oh!

zihaomu Jul 5, 2022

Uh oh!

vpisarev commented Jul 7, 2022

Uh oh!

vpisarev commented Jul 7, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

zihaomu commented Jul 3, 2022 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

zihaomu Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

zihaomu Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nihui Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

zihaomu Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

zihaomu Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

zihaomu commented Jul 4, 2022

Uh oh!

nihui Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

alalek commented Jul 4, 2022

Uh oh!

vpisarev Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

zihaomu Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

vpisarev commented Jul 7, 2022

Uh oh!

vpisarev commented Jul 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zihaomu commented Jul 3, 2022 •

edited by alalek

Loading

zihaomu Jul 4, 2022 •

edited

Loading

vpisarev commented Jul 7, 2022 •

edited

Loading