Skip to content

dnn: optimize activations with v_exp#25881

Merged
asmorkalov merged 15 commits intoopencv:4.xfrom
fengyuentau:dnn/cpu/optimize_activations_with_v_exp
Jul 19, 2024
Merged

dnn: optimize activations with v_exp#25881
asmorkalov merged 15 commits intoopencv:4.xfrom
fengyuentau:dnn/cpu/optimize_activations_with_v_exp

Conversation

@fengyuentau
Copy link
Copy Markdown
Member

@fengyuentau fengyuentau commented Jul 8, 2024

Merge with opencv/opencv_extra#1191.

This PR optmizes the following activations:

  • Swish
  • Mish
  • Elu
  • Celu
  • Selu
  • HardSwish

Performance (Updated on 2024-07-18)

AmLogic A311D2 (ARM Cortex A73 + A53)

Geometric mean (ms)

            Name of Test              activations activations.patch activations.patch
                                                                              vs
                                                                         activations
                                                                          (x-factor)
Celu::Layer_Elementwise::OCV/CPU        115.859          27.930              4.15
Elu::Layer_Elementwise::OCV/CPU          27.846          27.003              1.03
Gelu::Layer_Elementwise::OCV/CPU         0.657           0.602               1.09
HardSwish::Layer_Elementwise::OCV/CPU    31.885          6.781               4.70
Mish::Layer_Elementwise::OCV/CPU         35.729          32.089              1.11
Selu::Layer_Elementwise::OCV/CPU         61.955          27.850              2.22
Swish::Layer_Elementwise::OCV/CPU        30.819          26.688              1.15

Apple M1

Geometric mean (ms)

               Name of Test                activations activations.patch activations.patch
                                                                                   vs        
                                                                              activations   
                                                                               (x-factor)    
Celu::Layer_Elementwise::OCV/CPU              16.184          2.118               7.64       
Celu::Layer_Elementwise::OCV/CPU_FP16         16.280          2.123               7.67       
Elu::Layer_Elementwise::OCV/CPU               9.123           1.878               4.86       
Elu::Layer_Elementwise::OCV/CPU_FP16          9.085           1.897               4.79       
Gelu::Layer_Elementwise::OCV/CPU              0.089           0.081               1.11       
Gelu::Layer_Elementwise::OCV/CPU_FP16         0.086           0.074               1.17       
HardSwish::Layer_Elementwise::OCV/CPU         1.560           1.555               1.00       
HardSwish::Layer_Elementwise::OCV/CPU_FP16    1.536           1.523               1.01       
Mish::Layer_Elementwise::OCV/CPU              6.077           2.476               2.45       
Mish::Layer_Elementwise::OCV/CPU_FP16         5.990           2.496               2.40       
Selu::Layer_Elementwise::OCV/CPU              11.351          1.976               5.74       
Selu::Layer_Elementwise::OCV/CPU_FP16         11.533          1.985               5.81       
Swish::Layer_Elementwise::OCV/CPU             4.687           1.890               2.48       
Swish::Layer_Elementwise::OCV/CPU_FP16        4.715           1.873               2.52

Intel i7-12700K

Geometric mean (ms)

            Name of Test              activations activations.patch activations.patch
                                                                    vs
                                                               activations
                                                                (x-factor)
Celu::Layer_Elementwise::OCV/CPU        17.106       3.560         4.81
Elu::Layer_Elementwise::OCV/CPU          5.064       3.478         1.46
Gelu::Layer_Elementwise::OCV/CPU         0.036       0.035         1.04
HardSwish::Layer_Elementwise::OCV/CPU    2.914       2.893         1.01
Mish::Layer_Elementwise::OCV/CPU         3.820       3.529         1.08
Selu::Layer_Elementwise::OCV/CPU        10.799       3.593         3.01
Swish::Layer_Elementwise::OCV/CPU        3.651       3.473         1.05

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentau fengyuentau force-pushed the dnn/cpu/optimize_activations_with_v_exp branch 2 times, most recently from fd391ff to 4f4e94d Compare July 11, 2024 08:25
@fengyuentau

This comment was marked as resolved.

@fengyuentau fengyuentau marked this pull request as ready for review July 11, 2024 10:10
@fengyuentau fengyuentau requested a review from vpisarev July 11, 2024 10:10
@fengyuentau fengyuentau added this to the 4.11.0 milestone Jul 11, 2024
@asmorkalov
Copy link
Copy Markdown
Contributor

Perf report for Jetson-tk1 (armv7 with NEON):

Celu::Layer_Elementwise::OCV/CPU                                                                                                               757.488   160.034     4.73   
Elu::Layer_Elementwise::OCV/CPU                                                                                                                 27.301   145.437     0.19   
Gelu::Layer_Elementwise::OCV/CPU                                                                                                                2.595     2.599      1.00   
HardSwish::Layer_Elementwise::OCV/CPU                                                                                                           85.895   23.835      3.60   
Mish::Layer_Elementwise::OCV/CPU                                                                                                               551.165   208.658     2.64   
Selu::Layer_Elementwise::OCV/CPU                                                                                                                29.957   153.643     0.19   
Swish::Layer_Elementwise::OCV/CPU                                                                                                              462.444   175.172     2.64 

@fengyuentau fengyuentau force-pushed the dnn/cpu/optimize_activations_with_v_exp branch from f9a653f to 43419d9 Compare July 15, 2024 03:23
@fengyuentau
Copy link
Copy Markdown
Member Author

Performance tests are fixed now. Also performance testing results are updated.

@asmorkalov
Copy link
Copy Markdown
Contributor

Jetson Orin:

Celu::Layer_Elementwise::OCV/CPU                                                                                                               21.493   3.722     5.78   
Elu::Layer_Elementwise::OCV/CPU                                                                                                                 6.687   3.565     1.88   
Gelu::Layer_Elementwise::OCV/CPU                                                                                                                0.084   0.085     0.98   
HardSwish::Layer_Elementwise::OCV/CPU                                                                                                           9.806   1.683     5.83   
Mish::Layer_Elementwise::OCV/CPU                                                                                                                6.352   4.233     1.50   
Selu::Layer_Elementwise::OCV/CPU                                                                                                               12.252   3.825     3.20   
Swish::Layer_Elementwise::OCV/CPU                                                                                                               4.992   3.522     1.42 

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov
Copy link
Copy Markdown
Contributor

Jetson-tk1 (armv7 + neon):

Celu::Layer_Elementwise::OCV/CPU                                                                                                               889.066   151.524     5.87   
Elu::Layer_Elementwise::OCV/CPU                                                                                                                317.781   135.762     2.34   
Gelu::Layer_Elementwise::OCV/CPU                                                                                                                2.591     2.491      1.04   
HardSwish::Layer_Elementwise::OCV/CPU                                                                                                          155.399   23.333      6.66   
Mish::Layer_Elementwise::OCV/CPU                                                                                                               511.301   211.649     2.42   
Selu::Layer_Elementwise::OCV/CPU                                                                                                               472.572   150.138     3.15   
Swish::Layer_Elementwise::OCV/CPU                                                                                                              458.643   181.607     2.53 

asmorkalov pushed a commit to opencv/opencv_extra that referenced this pull request Jul 19, 2024
…s_with_v_exp

Add some activation conformance tests #1191

Merge with opencv/opencv#25881
@asmorkalov asmorkalov merged commit 23b244d into opencv:4.x Jul 19, 2024
@asmorkalov asmorkalov mentioned this pull request Jul 25, 2024
@fengyuentau fengyuentau deleted the dnn/cpu/optimize_activations_with_v_exp branch July 30, 2024 15:05
fengyuentau added a commit to fengyuentau/opencv that referenced this pull request Aug 15, 2024
…ivations_with_v_exp

dnn: optimize activations with v_exp opencv#25881

Merge with opencv/opencv_extra#1191.

This PR optimizes the following activations:

- [x] Swish
- [x] Mish
- [x] Elu
- [x] Celu
- [x] Selu
- [x] HardSwish

### Performance (Updated on 2024-07-18)

#### AmLogic A311D2 (ARM Cortex A73 + A53)

```
Geometric mean (ms)

            Name of Test              activations activations.patch activations.patch
                                                                              vs
                                                                         activations
                                                                          (x-factor)
Celu::Layer_Elementwise::OCV/CPU        115.859          27.930              4.15
Elu::Layer_Elementwise::OCV/CPU          27.846          27.003              1.03
Gelu::Layer_Elementwise::OCV/CPU         0.657           0.602               1.09
HardSwish::Layer_Elementwise::OCV/CPU    31.885          6.781               4.70
Mish::Layer_Elementwise::OCV/CPU         35.729          32.089              1.11
Selu::Layer_Elementwise::OCV/CPU         61.955          27.850              2.22
Swish::Layer_Elementwise::OCV/CPU        30.819          26.688              1.15
```

#### Apple M1

```
Geometric mean (ms)

               Name of Test                activations activations.patch activations.patch
                                                                                   vs
                                                                              activations
                                                                               (x-factor)
Celu::Layer_Elementwise::OCV/CPU              16.184          2.118               7.64
Celu::Layer_Elementwise::OCV/CPU_FP16         16.280          2.123               7.67
Elu::Layer_Elementwise::OCV/CPU               9.123           1.878               4.86
Elu::Layer_Elementwise::OCV/CPU_FP16          9.085           1.897               4.79
Gelu::Layer_Elementwise::OCV/CPU              0.089           0.081               1.11
Gelu::Layer_Elementwise::OCV/CPU_FP16         0.086           0.074               1.17
HardSwish::Layer_Elementwise::OCV/CPU         1.560           1.555               1.00
HardSwish::Layer_Elementwise::OCV/CPU_FP16    1.536           1.523               1.01
Mish::Layer_Elementwise::OCV/CPU              6.077           2.476               2.45
Mish::Layer_Elementwise::OCV/CPU_FP16         5.990           2.496               2.40
Selu::Layer_Elementwise::OCV/CPU              11.351          1.976               5.74
Selu::Layer_Elementwise::OCV/CPU_FP16         11.533          1.985               5.81
Swish::Layer_Elementwise::OCV/CPU             4.687           1.890               2.48
Swish::Layer_Elementwise::OCV/CPU_FP16        4.715           1.873               2.52
```

#### Intel i7-12700K

```
Geometric mean (ms)

            Name of Test              activations activations.patch activations.patch
                                                                    vs
                                                               activations
                                                                (x-factor)
Celu::Layer_Elementwise::OCV/CPU        17.106       3.560         4.81
Elu::Layer_Elementwise::OCV/CPU          5.064       3.478         1.46
Gelu::Layer_Elementwise::OCV/CPU         0.036       0.035         1.04
HardSwish::Layer_Elementwise::OCV/CPU    2.914       2.893         1.01
Mish::Layer_Elementwise::OCV/CPU         3.820       3.529         1.08
Selu::Layer_Elementwise::OCV/CPU        10.799       3.593         3.01
Swish::Layer_Elementwise::OCV/CPU        3.651       3.473         1.05
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants