Conversation
sivanov-work
left a comment
There was a problem hiding this comment.
looks similar with prevs PR, LGTM
| cv::Size sz; | ||
| MatType type = -1; | ||
| int dtype = -1; | ||
| double scale = 1.0; |
There was a problem hiding this comment.
scale is not configurable param?
There was a problem hiding this comment.
While bug "Fluid: Add scaling support to MulC kernel." hasn't fixed yet, the MulC kernel doesn't support scaling, so it makes no sense to configure it via test parameters.
| float* sc = scratch.OutLine<float>(); | ||
|
|
||
| for (int i = 0; i < scratch.length(); ++i) | ||
| sc[i] = static_cast<float>(_scalar[i % chan]); |
There was a problem hiding this comment.
looks similar... can all the same similar places aggregated into single inline function like as load_scalar_from_scratch or something more meaningful
|
|
||
| // OpenCV code /////////////////////////////////////////////////////////// | ||
| cv::multiply(in_mat1, sc, out_mat_ocv, 1, dtype); | ||
| cv::multiply(in_mat1, sc, out_mat_ocv, scale, dtype); |
There was a problem hiding this comment.
scale is double, but in implementation functions it is float: is there any compile wraning about it?
There was a problem hiding this comment.
If we leave the scale of the double type, then we will have to convert all data to the double type. Only 2 elements will fit in a 128-bit vector, so we will have to do 2 times more iterations, which include load / store and many other high-latency operations. We will significantly reduce performance.
There was a problem hiding this comment.
SIMDs for the Mul and the Div kernels also were implemented (by me and before by Evgeny Latkin) with this approach.
There was a problem hiding this comment.
sorry, if i confused: i meant interface part not implementation: does cv::multiply accept double (otherwise warning must happen) and how where it convert into float in gapi kernels?
If yes, then there is a some confusion between interface and it's implementation. But from other hand this situation doesn't affect scale so much, because its probably expected x2,x4,x8 etc
|
|
||
| #define MULC_SIMD(SRC, DST) \ | ||
| int mulc_simd(const SRC in[], const float scalar[], DST out[], \ | ||
| const int length, const int chan, const float scale) \ |
There was a problem hiding this comment.
float here, but UT has double - but maybe it is minor question...
There was a problem hiding this comment.
If we leave the scale of the double type, then we will have to convert all data to the double type. Only 2 elements will fit in a 128-bit vector, so we will have to do 2 times more iterations, which include load / store and many other high-latency operations. We will significantly reduce performance.
| case 2: \ | ||
| case 4: \ | ||
| { \ | ||
| if (std::fabs(scale - 1.0f) <= FLT_EPSILON) \ |
There was a problem hiding this comment.
could you put comment please about what happened here?
if scale ~ 1, then we use scalar version?
There was a problem hiding this comment.
If scale = 1.0, we go to the branch without scaling, i.e. a*scalar only.
309a221 to
0cc6ca0
Compare
0cc6ca0 to
fcc8a67
Compare
|
There are several GPU tests failed: "Custom Win" by default doesn't run OpenCL testing. |
I've configured my windows the same as mentioned "Custom Win" CI check and run the test, but unfortunately I didn't manage to reproduce this failures. I can delivery fix for this failure in separate PR. Could you please run this test on new PR? |
* GAPI Fluid: SIMD for MulC kernel. * Changes for MulDouble kernel.
SIMD for GAPI Fluid MulC kernel.
Performance report:
FluidMulCSIMD.xlsx