Skip to content

Optimize Mish for CPU backend#17624

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
dkurt:dnn_optimize_mish
Jun 23, 2020
Merged

Optimize Mish for CPU backend#17624
opencv-pushbot merged 1 commit intoopencv:3.4from
dkurt:dnn_optimize_mish

Conversation

@dkurt
Copy link
Copy Markdown
Member

@dkurt dkurt commented Jun 22, 2020

Use approximation for Mish layer introduced by @YashasSamaga in #17200 for CPU backend.

3.4 PR
YOLOv4 @ 416x416 475.93ms 246.38ms (x1.93)

Also tried the following nGraph implementation:

float two = 2.f, thresh = 20.f;
auto two_node = std::make_shared<ngraph::op::Constant>(ngraph::element::f32, ngraph::Shape{1}, &two);
auto thresh_node = std::make_shared<ngraph::op::Constant>(ngraph::element::f32, ngraph::Shape{1}, &thresh);
auto x = std::make_shared<ngraph::op::v1::Minimum>(node, thresh_node);
auto exp_node = std::make_shared<ngraph::op::v0::Exp>(x);
auto add = std::make_shared<ngraph::op::v1::Add>(exp_node, two_node);
auto mul = std::make_shared<ngraph::op::v1::Multiply>(add, exp_node);
auto add2 = std::make_shared<ngraph::op::v1::Add>(mul, two_node);
auto mul2 = std::make_shared<ngraph::op::v1::Multiply>(node, mul);
return std::make_shared<ngraph::op::v1::Divide>(mul2, add2);

but it makes performance worse.

force_builders=Custom,Custom Win,Custom Mac
build_image:Custom=ubuntu-openvino-2020.3.0:16.04
build_image:Custom Win=openvino-2020.3.0
build_image:Custom Mac=openvino-2020.3.0

test_modules:Custom=dnn,python2,python3,java
test_modules:Custom Win=dnn,python2,python3,java
test_modules:Custom Mac=dnn,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
test_opencl:Custom=OFF
test_bigdata:Custom=1
test_filter:Custom=*

@dkurt dkurt force-pushed the dnn_optimize_mish branch from 3c33eec to 1491934 Compare June 22, 2020 20:28
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@YashasSamaga
Copy link
Copy Markdown
Contributor

YashasSamaga commented Jul 17, 2020

I think the threshold of 20.0f in std::min can be tweaked to improve accuracy. It mostly won't make any difference though. But for the record, here are graphs:

The graphs compare three implementations: x * tanh(log1p(exp(x))), x * tanh(log(1 + exp(x))) and another for which source is given above the graph.

Current:

float eX = exp(std::min(x, 20.f));
float n = (eX + 2) * eX;
dstptr[i] = (x * n) / (n + 2);

errors

New:

if (x > 8.688687f)
    return x;
else
{
    float eX = std::exp(x);
    float n = (eX + 2) * eX;
    return (x * n) / (n + 2);
}

If we assume that the distribution of activation inputs is normally distributed and centred at zero, we can drop a hint to the compiler that x > 8.688687f path is rare.

errors_fixed

@dkurt
Copy link
Copy Markdown
Member Author

dkurt commented Jul 17, 2020

@YashasSamaga, thanks! So somebody new to open a PR :) Feel free to propose a patch or, if you don't mind, we can ask one of our local students to reproduce this experiment and make their first open source contribution 😃

@YashasSamaga
Copy link
Copy Markdown
Contributor

YashasSamaga commented Jul 17, 2020

we can ask one of our local students to reproduce this experiment and make their first open source contribution

This sounds better.

And here is the code I used to generate data and plots if it helps:

#include <cmath>
#include <cstdio>
#include <algorithm>

float mish2(float x)
{
    if (x > 8.688687f)
        return x;
    else
    {
        float eX = exp(std::min(x, 20.f));
        float n = (eX + 2) * eX;
        return (x * n) / (n + 2);
    }
}

int main ()
{
    for (float x = -30; x < 30; x += 0.0001)
    {
        //float activation = x * std::tanh(std::log(1 + std::exp(x)));
        //float activation = x * std::tanh(std::log1p(std::exp(x)));
        float activation = mish2(x);
        std::printf("%.7f %.7e\n", x, activation);
    }
    return 0;
}

Plot code: https://github.com/YashasSamaga/ConvolutionBuildingBlocks/blob/master/mish/plot.py

Reference values are computed directly by the formula with 128-bit floats.

@dkurt dkurt mentioned this pull request Jul 22, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants