core: update convertFp16 using CV_CPU_CALL_FP16 by tomoaki0705 · Pull Request #8838 · opencv/opencv

tomoaki0705 · 2017-06-01T09:41:08Z

This pullrequest changes

follow the new CPU optimization mode for FP16

alalek

Looks great! Thank you!
Please see my comments below.

alalek · 2017-06-01T11:29:28Z

modules/core/src/convert_opt_fp16.cpp

@@ -0,0 +1,178 @@
+/*M///////////////////////////////////////////////////////////////////////////////////////


Please rename this file to "convert.fp16.cpp"

You can check effect via CMake output (FP16 (0 files)):

-- Dispatched code generation: SSE4_1 FP16 AVX AVX2 -- requested: SSE4_1 AVX FP16 AVX2 -- SSE4_1 (0 files): + SSE4_1 -- FP16 (0 files): + SSE4_1 POPCNT SSE4_2 FP16 AVX -- AVX (1 files): + SSE4_1 POPCNT SSE4_2 AVX -- AVX2 (1 files): + SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2

Right!
I'll do this

alalek · 2017-06-01T11:42:24Z

modules/core/src/convert_opt_fp16.cpp

+void cvtScaleHalf_SIMD32f16f( const float* src, size_t sstep, short* dst, size_t dstep, cv::Size size )
+{
+#if !defined CV_CPU_BASELINE_COMPILE_AVX
+    _mm256_zeroupper();


We should call this at the end of AVX function. Calling this in the beginning of function is optional (and has less effect).
https://stackoverflow.com/a/28356924

BTW, I wrapped this into CV_INSTRUMENT_REGION() (via this macro)

That's a great macro.
I'll follow this comment

alalek · 2017-06-01T11:48:32Z

modules/core/src/convert_opt_fp16.cpp

+
+        for ( ; x < size.width; x++ )
+        {
+            dst[x] = convertFp16SW(src[x]);


Calling of "non optimized code" is not recommended.
It is better to put implementations of these functions into header file under "anonymous namespace" (to avoid calling of SSE instructions without vzeroupper).

alalek · 2017-06-01T11:49:34Z

modules/core/src/convert_opt_fp16.cpp

+void cvtScaleHalf_SIMD32f16f( const float* src, size_t sstep, short* dst, size_t dstep, cv::Size size )
+{
+    return;
+}


Probably we don't need these stubs.

Right. I'll remove these stubs.

tomoaki0705 · 2017-06-02T10:06:52Z

@alalek , thank you very much for your very kind and nice comment.
I pushed a commit which applies your suggestion 1, 2 and 4.

About 3, I'm still lost of how to go.

First of all, I implemented the optimized code using CV_CPU_CALL_FP16 which is either

#  define CV_CPU_CALL_FP16(fn, args) return (opt_FP16::fn args)

or

#  define CV_CPU_CALL_FP16(fn, args) if (CV_CPU_HAS_SUPPORT_FP16) return (opt_FP16::fn args)

Since both define have return inside, if the optimized function is called, the code following the function call will NOT be executed.

On the other hand, this is vectorized optimization, which means that we always have to consider that the line is not a multiple of the vector width. In another word, non-optimized code for few elements.

From these two points, I came to a conclusion that in this case, I've no choice but to implement as following.

in convert.cpp

cvtScaleHalf_<float, short>( const float* src, size_t sstep, short* dst, size_t dstep, Size size )
{
   // optimized version
    CV_CPU_CALL_FP16(cvtScaleHalf_SIMD32f16f, (src, sstep, dst, dstep, size));
    // call to the non-optimized version (software version)
    dst[x] = convertFp16SW(src[x]);
}

in convert.hpp

// declaration of SW version
short convertFp16SW(float f);
float convertFp16SW(short s);

// optimized version
void cvtScaleHalf_SIMD32f16f(src, sstep, dst, dstep, size);
void cvtScaleHalf_SIMD16f32f(src, sstep, dst, dstep, size);

in convert.fp16.cpp

        for ( ; x <= size.width - cVectorWidth ; x += cVectorWidth )
        {
            __m128i v_src = _mm_loadu_si128((__m128i*)(src + x));

            __m256 v_dst = _mm256_cvtph_ps(v_src);

            _mm256_storeu_ps(dst + x, v_dst);
        }

        for ( ; x < size.width; x++ )
        {
            dst[x] = convertFp16SW(src[x]);
        }

So, from my point of view, I'm lost about where to put the SW non-optimized version.
Could you give me bit more suggestion about what to write in which file, please ?

I basically agree about your suggestion, but I'm not sure how to follow it.
Best,

alalek · 2017-06-02T12:20:02Z

modules/core/src/convert.hpp

+namespace cv
+{
+float convertFp16SW(short fp16);
+short convertFp16SW(float fp32);


I mean, try to move implementations of these functions into header file under anonymous namespace.
So compilation unit with SSE* generated code will have one version of these functions, and AVX*(FP16) compilation unit will have own version too.
Anonymous namespace is needed to eliminate linker error about duplicate identifiers or unwanted code "replacement".

namespace { #if !CV_FP16_TYPE // const numbers for floating points format const unsigned int kShiftSignificand = 13; const unsigned int kMaskFp16Significand = 0x3ff; const unsigned int kBiasFp16Exponent = 15; const unsigned int kBiasFp32Exponent = 127; #endif #if CV_FP16_TYPE float convertFp16SW(short fp16) { // Fp16 -> Fp32 Cv16suf a; a.i = fp16; return (float)a.h; } #else float convertFp16SW(short fp16) { ... } #endif #if CV_FP16_TYPE short convertFp16SW(float fp32) { // Fp32 -> Fp16 Cv16suf a; a.h = (__fp16)fp32; return a.i; } #else short convertFp16SW(float fp32) { ... } #endif } // namespace

Loops for tails should be left "as is" in optimized version too.

tomoaki0705 · 2017-06-05T10:00:47Z

@alalek , I appreciate your comment.
Based on your feedback, I moved the implementations to the header file.
I'm getting many errors on the build test, so I'm checking why it's happening.

Meanwhile, I'll highly appreciate if you could give an feedback on this implementation.
Are we on the same page ?

alalek · 2017-06-05T13:56:56Z

@tomoaki0705 Looks good to me! Thank you!

Looks like build failures are related to changes from out.cpp file.

BTW, getConvertFuncFp16() can be local "static" (can be removed from precomp.hpp).

tomoaki0705 · 2017-06-06T09:51:05Z

@alalek, thank you very much for your feedback.
Indeed, I forgot to revert the local modification about out.cpp
I also changed getConvertFuncFp16() as local static.

Thank you very much, again.

alalek · 2017-06-06T12:59:22Z

modules/core/src/convert.fp16.cpp

+{
+namespace opt_FP16
+{
+#if defined(__x86_64__) || defined(_M_X64)


This "#if" makes file this file empty for x86 32-bit builds.
Please use this pattern:

#if !defined(CV_NEON) || !CV_NEON ... x86 code ... #elif CV_NEON ... ARM code ... #else #error "Unsupported build configuration" #endif

Thank you! sorry for such a stupid mistake.

* avoid link error (move the implementation of software version to header) * make getConvertFuncFp16 local (move from precomp.hpp to convert.hpp) * fix error on 32bit x86

alalek · 2017-06-06T15:31:51Z

Well done! Thank you for contribution!
👍

alalek reviewed Jun 1, 2017

View reviewed changes

alalek reviewed Jun 2, 2017

View reviewed changes

tomoaki0705 force-pushed the dispatchFp16 branch from 7e10377 to 652f012 Compare June 5, 2017 23:58

alalek reviewed Jun 6, 2017

View reviewed changes

update convertFp16 using CV_CPU_CALL_FP16

e269ef9

* avoid link error (move the implementation of software version to header) * make getConvertFuncFp16 local (move from precomp.hpp to convert.hpp) * fix error on 32bit x86

tomoaki0705 force-pushed the dispatchFp16 branch from 652f012 to e269ef9 Compare June 6, 2017 13:27

opencv-pushbot merged commit e269ef9 into opencv:master Jun 6, 2017

opencv-pushbot pushed a commit that referenced this pull request Jun 6, 2017

Merge pull request #8838 from tomoaki0705:dispatchFp16

125abe2

tomoaki0705 deleted the dispatchFp16 branch June 6, 2017 22:18

		@@ -0,0 +1,178 @@
		/*M///////////////////////////////////////////////////////////////////////////////////////

Uh oh!

Conversation

tomoaki0705 commented Jun 1, 2017

This pullrequest changes

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomoaki0705 commented Jun 2, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomoaki0705 commented Jun 5, 2017

Uh oh!

alalek commented Jun 5, 2017

Uh oh!

tomoaki0705 commented Jun 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalek commented Jun 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants