Skip to content

cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion#17200

Merged
alalek merged 2 commits intoopencv:masterfrom
YashasSamaga:cuda4dnn-general-opt1
May 9, 2020
Merged

cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion#17200
alalek merged 2 commits intoopencv:masterfrom
YashasSamaga:cuda4dnn-general-opt1

Conversation

@YashasSamaga
Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga commented May 1, 2020

Pull Request Readiness Checklist

Mish (+ swish and sigmoid):

This activation is used in YOLOv4. Surprisingly, the mish kernels are compute-bound instead of being bandwidth-bound. The first convolution in YOLOv4 takes less time than the fused bias mish activation step.

Mish is a composition of several functions: tanh(log(1 + exp(x))). Hence, it is very likely that there is a very good approximation or simplification to the mish function (clearly the log isn't necessary since tanh is a composition over e^x). This PR introduces a fast numerically stable implementation of the mish activation.

Details of the approximation can be found here.

Operation without this PR with this PR
biasN_mish (YOLOv4 conv_1) 1.35ms 980us
biasN_mish (YOLOv4 conv_2) 742us 496us

Based on similar reasoning, good fast approximations to swish and sigmoid were also added.

Identity is fused with convolution layer:

Activations would not be fused with convolution if an identity layer is present between them. This happens in the case of Mask RCNN: the final sigmoid operation is not fused with the preceding convolution operation.

Operation without fusion with fusion
biasN_sigmoid (Mask RCNN) 368us 206us

This kernel is compute-bound if the sigmoid approximation mentioned above is not used. The timings reported here are with the approximation.

Others:

Operation without this PR with this PR
transpose (YOLOv4 permute1) 195us 154us
region_box 62us 56us
region_sigmoid_class_score 100.4us 72us
crop_and_resize 2.98ms 2.72ms
resize_bilinear 1.77ms 1.65ms

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under OpenCV (BSD) License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu-cuda:18.04

@YashasSamaga YashasSamaga changed the title cuda4dnn: optimizations for swish, mish, sigmoid, region, crop_and_resize, roi pooling, resize, transpose, identity-conv fusion cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion May 1, 2020
@YashasSamaga YashasSamaga changed the title cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion [WIP] cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion May 2, 2020
@YashasSamaga YashasSamaga changed the title [WIP] cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion May 2, 2020
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@alalek alalek merged commit d981d04 into opencv:master May 9, 2020
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
cuda4dnn: optimizations for swish, mish, sigmoid, region, resize based ops, transpose, identity-conv fusion

* bunch of optimizations

* more accurate implementation for mish
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants