Optimize Mish for CPU backend#17624
Conversation
|
I think the threshold of The graphs compare three implementations: Current: New: If we assume that the distribution of activation inputs is normally distributed and centred at zero, we can drop a hint to the compiler that |
|
@YashasSamaga, thanks! So somebody new to open a PR :) Feel free to propose a patch or, if you don't mind, we can ask one of our local students to reproduce this experiment and make their first open source contribution 😃 |
This sounds better. And here is the code I used to generate data and plots if it helps: Plot code: https://github.com/YashasSamaga/ConvolutionBuildingBlocks/blob/master/mish/plot.py Reference values are computed directly by the formula with 128-bit floats. |


Use approximation for Mish layer introduced by @YashasSamaga in #17200 for CPU backend.
Also tried the following nGraph implementation:
but it makes performance worse.