SSE optimizations for 5x5 convolution.#241
Conversation
|
Does anyone have any opinions on this? We also have a fast-path for 3x3 convolutions. |
|
no opinion. it looks good! |
|
Great, let's add the 3x3 kernels in then. |
|
More or less same comment than Soumith': looks great, would love to have more like those. The simd directory is indeed a good idea. The only question is do we want this in the core, or as a package (like simd ;))? |
|
I think it's great too! Some tests would be good. And can we hook this up to Lua somehow? I don't understand why the code is in the "generic" folder though if it's only for float. |
|
@dominikgrewe it looks like both Float and Double. |
|
or extendable to be so. because the sse instructions are macro-templated as well. |
|
The macros are Float only actually, that's right. I guess we put them in generic because they're loaded by THTensorConv, and we prob want to have a double version at some point. Also we only did 3x3 and 5x5 because it's the only two kernels we use for everything. |
|
I'll give the final merge call on this to @andresy , this will establish our directory and file structure for pushing SIMD optimizations going forward, so needs a bit of thought. |
SSE optimizations for 5x5 convolution.
|
these aren't hooked into the lua side yet seems like it... coming in a later PR? |
SSE optimizations for 5x5 convolution.
This is a pretty specific optimization to Twitter's usage of 5x5 but it could be extended to support more sizes in the future.