Univ Intrinsics implementation of Add, Sub, Absdiff kernels#18338
Univ Intrinsics implementation of Add, Sub, Absdiff kernels#18338alalek merged 3 commits intoopencv:masterfrom
Conversation
f023760 to
ed3e1bb
Compare
|
@anna-khakimova Welcome back! Please take a look on CI failures: https://pullrequest.opencv.org/buildbot/builders/precommit_linux64/builds/27868. |
4640203 to
de8eab5
Compare
|
@OrestChura , please test this patch on KMB. |
4c2a6e4 to
dcb6425
Compare
9ea8588 to
2050b86
Compare
17747d1 to
931caaf
Compare
| { return v_float32x4(_mm_castsi128_ps(a.val)); } | ||
| inline v_float32x4 v_reinterpret_as_f32(const v_int64x2& a) | ||
| { return v_float32x4(_mm_castsi128_ps(a.val)); } | ||
|
|
There was a problem hiding this comment.
please remove unrelated changes from the patch
| { return v_float32x8(_mm256_cvtepi32_ps(a.val)); } | ||
|
|
||
| inline v_float32x8 v_cvt_f32(const v_uint32x8& a) | ||
| { return v_float32x8(_mm256_cvtepi32_ps(a.val)); } |
There was a problem hiding this comment.
_mm256_cvtepi32_ps() documentation has this statement:
Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- This doesn't work with unsigned values.
- Testing doesn't test "unsigned" case.
- If this works with your code, then you probably don't really need to add this intrinsic.
| { | ||
| v_reg<float, n> c; | ||
| for (int i = 0; i < n; i++) | ||
| c.s[i] = (float)a.s[i]; |
There was a problem hiding this comment.
let left this "as is" (this file prefers C style casts)
| return *this; | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Git hook must be installed before development to avoid that.
Refer to "How to contribute" Wiki page.
| v_int16 b1 = vx_load(reinterpret_cast<const short*>(&in2[x])); | ||
| v_int16 b2 = vx_load(reinterpret_cast<const short*>(&in2[x + nlanes / 2])); | ||
|
|
||
| vx_store(reinterpret_cast<uchar*>(&out[x]), v_pack_u(a1 + b1, a2 + b2)); |
There was a problem hiding this comment.
+ is overloaded operation with saturation.
| #if CV_SIMD | ||
| absdiff_simd(in1, in2, out, length, x); | ||
| #endif | ||
| for (; x < length; ++x) | ||
| out[x] = absdiff<DST>(in1[x], in2[x]); | ||
| } |
|
|
||
| #if CV_SIMD | ||
| template<typename T, typename VT> | ||
| static inline void absdiff_impl(const T in1[], const T in2[], T out[], int length, int& x) |
There was a problem hiding this comment.
void ... int& x
It is better to "return x" instead (avoid code which may block compiler optimizations).
588f8e4 to
ff1078c
Compare
|
@alalek please take a look one more |
|
Remove unnecessary changes from the patch and make required builds green. |
| //! @endcond | ||
|
|
||
| } // cv:: No newline at end of file | ||
| } // cv:: |
There was a problem hiding this comment.
Unfortunately I can't unroll this changes. Visual Studio insert shift to next line automatically. However as I know file should be ended by empty line.
There was a problem hiding this comment.
I think it is configurable, can you please find this setting?
There was a problem hiding this comment.
BTW, git can unroll any file. We don't need editor for that.
git checkout upstream/master -- modules/core/include/opencv2/core/hal/intrin_forward.hpp
|
@alalek Sorry for inconvenient. Builds are passed now. Please look one more. |
|
Test in AVX512 build (custom) is crashed. |
|
This is not true.
you should configure your local environment properly. |
|
GDB log of this crash is below. Details |
|
Now test log from Custom build: |
It's mean that now Custom build tests fail only because of test infrastructure. |
There is no /cc @dmatveev Please help how to check build logs Currently test app just hangs: |
I've already run tests on AVX512 machine and show the log mentioned in previous comment. |
| } | ||
|
|
||
| if (x < length) | ||
| x = length - nlanes; |
There was a problem hiding this comment.
Such tail processing requires that in != out. (However that's ok if such a check is performed somewhere higher by call stack)
There was a problem hiding this comment.
Ok. Add check for cases when input and output types are the same.
There was a problem hiding this comment.
@dmatveev Do we really need these checks in G-API code? Does G-API support inplace processing? If no, then it makes sense to add CV_DbgAssert() instead.
@anna-khakimova There are 7 similar loops in this patch. Commit contains 3 updates only. Why?
There was a problem hiding this comment.
@alalek
Answer for first question: This checks necessary to process last several elements of input array (tail) via univ intrinsics since their number is less than nlanes .
Answer for second question: As I've already mentioned in previous comment, for inplace implementation it's necessary that input and output array types should be the same. So, this check is needed only in 3 cases (in 3 functions which calls when input and output types are the same).
Note: Please read my comments more attentively.
There was a problem hiding this comment.
Does G-API support inplace processing?
Yes, we have to support inplace processing at least to avoid copy when calling cv::gapi::wip::draw::render().
Not sure if Fluid have ever been tested with such type of inplace execution, but the Fluid itself never forces input and output buffer to be the same.
There was a problem hiding this comment.
@alalek
There are no such cases in GAPI.
For AbsDiff there are cases such as:
- uchar inputs and uchar output
- short int inputs and short int output.
- ushort inputs and ushort output.
- float inputs and float outputs
For all cases mentioned above there is check to detect inplace.
For Add and Sub there are cases such as:
- uchar inputs and uchar output <--- For this case there are checks for inplace.
- short int inputs and uchar output <---- There is no sense to check for inplace.
- float inputs and uchar output <------ There is no sense to check for inplace.
- short int inputs and short int output <------- For this case there are check for inplace.
- float inputs and float output <-------- For this case there are checks for inplace.
- uchar inputs and float output <------- There is no sense to check for inplace.
- short int inputs and float output <------- There is no sense to check for inplace.
As you can see, there are no cases with int at all and with int and float in particular.
These GAPI kernels don't support int type at all.
Inplace implementation is absent for these kernels. And case when user gives one the same int matrix for inputs, then cast this matrix to float and pass it to output is difficult to imagine in reality.
What's the point in adding checks for all occasions?
There was a problem hiding this comment.
... for now.
Assertions verify assumptions which are required by related code below.
It is a really powerful tool.
This helps with investigations in the future through emitting error messages. This reduces annoying debugging process of related problems.
There was a problem hiding this comment.
... for now.
Assertions verify assumptions which are required by related code below.
It is a really powerful tool.
This helps with investigations in the future through emitting error messages. This reduces annoying debugging process of related problems.
@alalek Ok. Done.
There was a problem hiding this comment.
Anyway, I would suggest to put
CV_DbgAssert()for other cases.when input and output types are the same
to catch cases when different types are inplaced (e.g. both
floatandintare 32-bit)
I've already put checks for inplace when input and output types are the same.
There was a problem hiding this comment.
I've already put checks for inplace when input and output types are the same.
Please open an issue internally to start checking for inplace at the Fluid backend level.
Then we could remove such checks from the kernels (where it may be costly given that kernels are called for every line of an image)
|
@alalek please review one more. All checks are passed. |
| { | ||
| VT a = vx_load(&in1[x]); | ||
| VT b = vx_load(&in2[x]); | ||
| absdiff_store(out, a, b, x); |
There was a problem hiding this comment.
Is it the only place where absdiff_store is used, or did I miss something ? If so - that is the point of separate function for this , why not inline it's body here ?
There was a problem hiding this comment.
@anton-potapov
I forced to add these overloads because I realized that new universal intrinsic absdiffs() (added by me for v_uint8, v_uint16, v_float32 types earlier) work the same as already exist absdiff() for argument types mentioned above. So now I have to use only one absdiffs() for v_int16 type only. For the rest types (v_uint8, v_uint16, v_float32) I use absdiff(). And so these overloads are need here.
dmatveev
left a comment
There was a problem hiding this comment.
The code can be really simplified using templates, but that's another story
Performance report:
AbsDiff_Add_Sub_perf_report.xlsx
SIMD optimization via universal intrinsics for Add, Sub and AbsDiff fluid kernels.
Published for review 24th of September.
@smirnov-alexey , @anton-potapov , @OrestChura, @rgarnov please take a look.