Skip to content

Added MSA implementations for mips platforms.#15422

Merged
alalek merged 8 commits intoopencv:3.4from
mipsopen-fwu:msa-dev
Sep 20, 2019
Merged

Added MSA implementations for mips platforms.#15422
alalek merged 8 commits intoopencv:3.4from
mipsopen-fwu:msa-dev

Conversation

@mipsopen-fwu
Copy link
Copy Markdown
Contributor

@mipsopen-fwu mipsopen-fwu commented Aug 30, 2019

Signed-off-by: Fei Wu fwu@wavecomp.com

  1. Implemented toolchain configurations for the build of mips32 and mips64 platforms under platform/linux/ directory. The mips toolchain can be foud from the link: https://www.mips.com/develop/tools/codescape-mips-sdk/ .
  2. Implemented MIPS SIMD (MSA) intrinsics under modules/core/include/opencv2/core/ directory.
  3. Made some optimizations with MSA functions in dnn/imgproc/video/videoio modules.

Materials:

force_builders=linux,docs,Custom
buildworker:Custom=linux-1
build_image:Custom=mips64el

@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

mipsopen-fwu commented Aug 30, 2019

I have added MSA implementations for MIPS platforms. The changes include:

  1. The implementation of toolchain configurations for the build of mips32 and mips64 platforms under platform/linux/ directory. The mips toolchain can be foud from the link: https://www.mips.com/develop/tools/codescape-mips-sdk/ .
  2. The implementation of MIPS SIMD (MSA) intrinsics under modules/core/include/opencv2/core/ directory.
  3. Some optimizations with MSA functions in dnn/imgproc/video/videoio modules.

Performance and functionality testsuites have been run locally for verification. According to the opencv_perf tests results, the MSA optimized version is 39% faster than the No-MSA version.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job 👍

General notes:

  • please split PR: SIMD backend + build scripts should be reviewed and merged first. Other parts please move into separate PRs. Please check related comment below about RAW MSA code.
  • as an "optimization" activity please re-target this PR onto 3.4 branch

set(PNG_MIPS_MSA "on" CACHE STRING "Enable MIPS_MSA optimizations:
off: disable the optimizations")
set_property(CACHE PNG_MIPS_MSA PROPERTY STRINGS
${PNG_MIPS_MSA_POSSIBLE_VALUES})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-...
+set(PNG_MIPS_MSA ON CACHE BOOL ...)

-if(${PNG_MIPS_MSA} STREQUAL "on")
+if(PNG_MIPS_MSA)

instead of strings manipulation.

Also it is better to add this pre-check:

if(";${CPU_BASELINE_FINAL};" MATCHES ";MSA;")

ocv_optimization_process_obsolete_option(ENABLE_VFPV3 VFPV3 OFF)
ocv_optimization_process_obsolete_option(ENABLE_NEON NEON OFF)

ocv_optimization_process_obsolete_option(ENABLE_MSA MSA OFF)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't introduce obsolete options.
CPU_BASELINE=MSA should be used instead (it is defaulted to this values below on the line 350)

#define CV_CPU_AVX512_CEL 261
#define CV_CPU_AVX512_ICL 262

#define CV_CPU_MSA 300
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use value "150"

// of this distribution and at http://opencv.org/license.html.

#ifndef OPENCV_CORE_HAL_MSA_MACROS_H
#define OPENCV_CORE_HAL_MSA_MACROS_H
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this file into core/hal directory.


if(USE_MSA)
message(WARNING "You use obsolete variable USE_MSA to enable MSA instruction set. Use -DENABLE_MSA=ON instead." )
set(ENABLE_MSA TRUE)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove obsolete options.

//! @name Check SIMD support
//! @{
//! @brief Check CPU capability of SIMD operation
static inline bool hasSIMD128()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this call. It is gone (handled by compile time checks).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub code reference is buggy again - correct line is 1748 about hasSIMD128().

Copy link
Copy Markdown
Member

@alalek alalek Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove static inline bool hasSIMD128() - it is not used anymore in OpenCV
(wrong code is linked due GitHub bug, but file is correct)

}
#endif

////// FP16 suport ///////
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

#define SLDI_B2_0(RTYPE, in0, in1, out0, out1, slide_val) \
{ \
out0 = (RTYPE) __msa_sldi_b((v16i8) __msa_fill_b(0), (v16i8) in0, slide_val); \
out1 = (RTYPE) __msa_sldi_b((v16i8) __msa_fill_b(0), (v16i8) in1, slide_val); \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use source from libpng 1.6.37
Add separate patch file if required (to avoid fixes lost after 3rdparty upgrade in the future)

vs12 = v_fma(w1, r2, vs12);
vs13 = v_fma(w1, r3, vs13);
#else
vs00 += w0*r0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this can be replaced to v_fma() unconditionally for all platforms.

{
v8u16 r0 = msa_ld1q_u16((const ushort*)bayer);
v8u16 r1 = msa_ld1q_u16((const ushort*)(bayer + bayer_step));
v8u16 r2 = msa_ld1q_u16((const ushort*)(bayer + bayer_step * 2));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main principle of "universal intrinsics" (SIMD) approach is "code once, run anywhere".

You need to write code once (on some platform), debug it with some backend (there is even C++ backend for SIMD for reference) and then it should run on other backends with reasonable speed.

So we are strictly against huge code blocks using native intrinsic if this code can be replaced by universal SIMD code without significant lost of speed.
Universal SIMD code is regularly tested via x86/ARM backends, native RAW MIPS has very low testing coverage.

See discussion here: #9763 (comment)

Could you please move these changes into a separate PR?
Lets merge first addition of SIMD backend and related build scripts (so we can add cross-platforms builds for that into our infrastructure (but no tests)).

@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

mipsopen-fwu commented Aug 31, 2019 via email

@mipsopen-fwu mipsopen-fwu changed the base branch from master to 3.4 September 2, 2019 06:45
@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

@alalek Since I need to separate the PR into two, should I close this pull request first?
Next Steps I'm planning to do:

  1. Create development branch from 3.4 and commit the changes for SIMD backend+build scripts.
  2. Submit pull request for the commits in Step 1.
  3. After the PR in Step 2 is merged, commit the optimizations for other modules and create another PR.

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 2, 2019

Lets keep basic part of MSA integration in this PR (at least to track conversation above).

You can "force push" commits in your branch to update PR ("mipsopen-fwu:msa-dev").

@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

@alalek
I'm trying to rebase this pull request from 3.4 by following the steps specified in:
https://github.com/opencv/opencv/wiki/Branches#rebase-pull-request-from-master-to-34
Conflicts are reported when I changed base branch to '3.4'.
Need I resolve these conflicts first before I do the following steps?
rebase your commits from master onto 3.4 branch
git checkout
(optional) create backup branch: git branch -backup
git remote update upstream (assuming upstream is pointing to opencv/opencv GitHub repository)
git rebase -i --onto upstream/3.4 upstream/master
editor will be opened, check list of commits - there should be only your commits, save and exit
git push --force origin (assuming origin is pointing to /opencv GitHub repository)

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 3, 2019

Need I resolve these conflicts

Yes. You can resolve conflicts during rebase.

Before that you may enable diff3 conflict style for more information about changes: https://stackoverflow.com/questions/27417656/should-diff3-be-default-conflictstyle-on-git
Edit conflicts. Add file into index (git add -i). Then continue rebase for next commit (git rebase --continue).

To abort rebase process use git rebase --abort

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 3, 2019

I rebased PR onto 3.4 branch - fetch changes from your fork.
Looks like GitHub conflicts are appear after switching the branch (they are not related to your patch - just between our 3.4/master branches)

Old saved commits are here: https://github.com/alalek/opencv/commits/pr15422_0903

@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

@alalek Thanks for rebasing! So what I need to do for the next step is to separate SIMD backend & build scripts from my previous commits, then commit and 'force push' to my branch (msa-dev) to update PR, right?

…build scripts for MIPS platforms are added.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

@alalek I have separated MSA backend and build scripts from the old commit in this PR which has been rebased to 3.4 branch.

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 4, 2019

I have added builder with cross-compilation from Ubuntu 18:04 x64:

  • see here "Custom" column
  • looks like toolchain is not able to detect properly cross-compiler (perhaps we can't use this one - we need to fix it or create new one)
  • list of installed packages and corresponding scripts are here

Update:

  • adjusted some build options, but we still need to remove strange code block to unblock compiler detection

@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

mipsopen-fwu commented Sep 5, 2019

@alalek Can you build the project for mips platform successfully now?
Looking from the cmake errors, it seems all the mips compiling options were not recognized. I think maybe you need to intall the mips linux gnu toolchain. The build-image name should be mips64r6el or mips32r5el.
Here is the link for the toolchain installer:
Online installer: https://codescape.mips.com/sdk/v2.0.0k/mipssdk.v2.0.0k.linux-x64.x64.webinstall.run
Offline installer: https://codescape.mips.com/sdk/v2.0.0k/mipssdk.v2.0.0k.linux-x64.x64.offline.run
Usually we use cmake-gui to configure and generate the buid scripts by selecting /platform/linux/mips64r6el-gnu.toolchain.cmake .
I will remove the unused code block you refered and commit it.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

@alalek
I reviewed the build result. There are one failure and some warnings.

  1. https://pullrequest.opencv.org/buildbot/builders/precommit_linux64/builds/22854/steps/Generate%20ABI%20dump/logs/log
    The GCC parameters:
    gcc -fdump-translation-unit -fkeep-inline-functions -c -x c++ -fpermissive -w -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-delete-non-virtual-dtor -Wno-comment -Wno-missing-field-initializers -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG -DOPENCV_ABI_CHECK=1 "/tmp/iJM73d75Px/dump1.h" -I/build/precommit_linux64/install/include -I/build/precommit_linux64/install/include/opencv -I/usr/include/vtk-6.0 -I/build/precommit_linux64/build

In file included from /tmp/iJM73d75Px/dump1.h:55:0:
/build/precommit_linux64/install/include/opencv2/core/hal/msa_macros.h:8:17: fatal error: msa.h: No such file or directory
#include "msa.h"
^
compilation terminated.
-- There is no -mmsa compiling option in the gcc parameters, instead -msse is used.

  1. Warnings: https://pullrequest.opencv.org/buildbot/builders/precommit_custom_linux/builds/2596/steps/compile%20release/logs/warnings%20%2815%29
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:315:11: warning: declaration of 'zero_m' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:315:11: warning: declaration of 'zero_m' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:315:11: warning: declaration of 'zero_m' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:338:11: warning: declaration of 'zero' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:338:11: warning: declaration of 'zero' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:338:11: warning: declaration of 'zero' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:338:11: warning: declaration of 'zero' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:315:11: warning: declaration of 'zero_m' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:315:11: warning: declaration of 'zero_m' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:315:11: warning: declaration of 'zero_m' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:338:11: warning: declaration of 'zero' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:338:11: warning: declaration of 'zero' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:338:11: warning: declaration of 'zero' shadows a previous local [-Wshadow]
    /build/precommit_custom_linux/3.4/opencv/3rdparty/libpng/mips/filter_msa_intrinsics.c:338:11: warning: declaration of 'zero' shadows a previous local [-Wshadow]

--These warnings are introduced by the original filter_msa_intrinsics.c file from libpng-1.6.37. We didn't make any changes on it. If needed, I can make a fix.

cc1plus: warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)' forming offset [25, 32] is out of the bounds [0, 24] of object 'scharr' with type 'const int [6]' [-Warray-bounds]

--This warning is not introduced by my commit.

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 6, 2019

Thank you for recommended build instructions!
I believe it make sense to add them into PR description and/or add comment in r6 toolchain file with SDK link + this PR link.

Currently on CI we prefer using cross-compilation mode through builtin Ubuntu compiler kits. We believe it is enough for basic validation builds (no tests). It is easy enough to integrate and upgrade dependencies from our side.

  1. It is Linux x86-64 build. No -mmsa option is expected here.
    Each header is included directly into ABI checker. Need to add guard in hal/msa_macros.h:
#ifdef __mips_msa
...
#endif
  1. In 3rdparty/libpng/CMakeLists.txt near add_definitions(-DPNG_MIPS_MSA_OPT=2) add this line:
ocv_warnings_disable(CMAKE_C_FLAGS -Wshadow)
  1. OK, I will try to check this with GCC-8 on other archs and find out how to workaround that.

…g warnings for libpng.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

mipsopen-fwu commented Sep 8, 2019

@alalek I checked the failed build log for 'Android pack' : https://pullrequest.opencv.org/buildbot/builders/precommit_pack_android/builds/11090/steps/build%20sdk/logs/stdio

This error Error ‘ opcode not supported on this processor: mips32 (mips32) `pause' ’ is not introduced by my commit.

I think maybe this error can be fixed by adding __mips_isa_rev check in modules/core/src/parallel_impl.cpp:
elif defined GNUC && defined mips
define CV_PAUSE(v) do { for (int __delay = (v); __delay > 0; --__delay) { asm volatile("pause" ::: "memory"); } } while (0)
------>
elif defined GNUC && defined mips && __mips_isa_rev >= 3
define CV_PAUSE(v) do { for (int __delay = (v); __delay > 0; --__delay) { asm volatile("pause" ::: "memory"); } } while (0)

… is less than 2.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

@alalek I have committed the fix for compiling error in 'Android pack' build

@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

mipsopen-fwu commented Sep 10, 2019 via email

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 'patch' files for '3rdparty' code.

Please take a look on comments below.

ocv_update(CPU_MSA_TEST_FILE "${OpenCV_SOURCE_DIR}/cmake/checks/cpu_msa.cpp")
ocv_update(CPU_KNOWN_OPTIMIZATIONS "MSA")
ocv_update(CPU_MSA_FLAGS_ON "")
ocv_update(CPU_FP16_IMPLIES "MSA")
Copy link
Copy Markdown
Member

@alalek alalek Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP16 is not enabled in current builds.
What did you mean here?

CPU_AVX512_CEL = 261, //!< Cascade Lake with AVX-512F/CD/BW/DQ/VL/IFMA/VBMI/VNNI
CPU_AVX512_ICL = 262, //!< Ice Lake with AVX-512F/CD/BW/DQ/VL/IFMA/VBMI/VNNI/VBMI2/BITALG/VPOPCNTDQ

CPU_MSA = 300,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

150

//! @name Check SIMD support
//! @{
//! @brief Check CPU capability of SIMD operation
static inline bool hasSIMD128()
Copy link
Copy Markdown
Member

@alalek alalek Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove static inline bool hasSIMD128() - it is not used anymore in OpenCV
(wrong code is linked due GitHub bug, but file is correct)

…ptimizations.cmake.

2. Use CV_CPU_COMPILE_MSA instead of __mips_msa for MSA feature check in cv_cpu_dispatch.h.
3. Removed hasSIMD128() in intrin_msa.hpp.
4. Define CPU_MSA as 150.
Signed-off-by: Fei Wu <fwu@wavecomp.com>
@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

@alalek

  1. Thanks a lot for committing patches for the '3rdparth' code.
  2. FP16 related option seems to be some legacy code. Since it's not supported and mips also doesn't support 16 bit wide floating point register, I removed this line.
  3. I replaced __mips_msa with CV_CPU_COMPILE_MSA for MSA feature checking in cv_cpu_dispatch.h.
  4. CPU_MSA has been defined as 150 now.
  5. hasSIMD128() has been removed.

OPENCV_HAL_IMPL_MSA_EXPAND(v_uint32x4, v_uint64x2, uint, u32, s32, v4u32, v4i32)
OPENCV_HAL_IMPL_MSA_EXPAND(v_int32x4, v_int64x2, int, s32, s32, v4i32, v4i32)

#if 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this version of implementation is disabled? Is it slow, inaccurate or something else?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The '#if 0' version is slower according to the test results. The reason is that MSA doesn't actually provide data type expanding instructions for vectors. The 'movl' related conversion are done by C.
So it would take more steps to use 'movl' interface than doing converion directly.

typedef bin_loader<OP, T1, Tvec> ldr;
enum {wide_step = Tvec::nlanes};
#if !CV_NEON && CV_SIMD_WIDTH == 16
#if !CV_NEON && !CV_MSA && CV_SIMD_WIDTH == 16
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary? Do manual loop unrolling affect performance?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case data width is less than wide_step * 2, the performance would be better with this change. Otherwise, the performance shouldn't be better. Considering 'width' should be larger in most cases, this change would not be necessary. I will remove it.

i += blockSize;
}
}
#elif CV_MSA
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do performance of this implementation differs essentially from generic universal intrinsics version? I suppose generic UI version should work quite fast due to availability of native intrinsic for v_dotrpod. In case generic version provide similar performance I prefer to avoid platform specific code wherever possible.
BTW This code is anyway disabled since CV_SIMD is available and is checked prior to this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove CV_MSA related code block dotProd_8u(), it's not necessary.

@mipsopen-fwu
Copy link
Copy Markdown
Contributor Author

@terfendail
Sorry for late response.

  1. Yes, you are right. CV_SIMD128_64F guarding is not needed since it's unconditionally supported in this implementation. I will removed them form intrin_msa.hpp, only the define is kept.
  2. The '#if 0' version is slower according to the test results. The reason is that MSA doesn't actually provide data type expanding instructions for vectors. The 'movl' related conversion are done by C.
    So it would take more steps to use 'movl' interface than doing converion directly.
  3. In case data width is less than wide_step * 2, the performance would be better with this change. Otherwise, the performance wouldn't be better. Considering 'width' should be larger in most cases, this change is not necessary. I will remove it.
  4. I will remove CV_MSA related code block dotProd_8u(), it's not necessary.
    Thanks.

2. Removed unnecessary CV_MSA related code block in dotProd_8u().

Signed-off-by: Fei Wu <fwu@wavecomp.com>
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

i += blockSize;
}
}
#elif CV_MSA
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if CV_SIMD blocks this code execution here too (and compilation). The same note is about #elif CV_NEON code block (dead code too)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mipsopen-fwu Have you compared performance of this MSA specific implementation against generic UI implementation?

Copy link
Copy Markdown
Contributor Author

@mipsopen-fwu mipsopen-fwu Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terfendail
This CV_MSA related code block was added by mistake when I merged msa implementations from 4.0.1 branch to 4.1.1 and 3.4 branch. On our pervious 4.0.1 development branch, this MSA specific implementation doesn't exist. It's correct to remove them from the pull request.

Even though, in order to make clear the performance difference, I did MatType_Length_dot test on mips32r5 HW by setting testparam=(8UC1, 1024) and samples=15. From the result, we can see that the performance of generic UI implementation(using CV_SIMD) is better against CV_MSA specific implementation which was added by mistake in this code block.
The following is the test result:

  1. The total running time for CV_MSA implementation is:
    [----------] 18 tests from MatType_Length_dot (2177 ms total)
  2. The total running time for CV_SIMD implementations is:
    [----------] 18 tests from MatType_Length_dot (618 ms total)

elseif(MIPS)
ocv_update(CPU_MSA_TEST_FILE "${OpenCV_SOURCE_DIR}/cmake/checks/cpu_msa.cpp")
ocv_update(CPU_KNOWN_OPTIMIZATIONS "MSA")
ocv_update(CPU_MSA_FLAGS_ON "")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it empty? It should contain "-mmsa" or something similar.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option seems not be used elsewhere, but it should be better to be defined as '-mmsa' for future use. I will modify it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is used, but it not obvious:

  • there is a list of CPU_KNOWN_OPTIMIZATIONS which contain MSA
  • cmake first tries to compile file CPU_${OPT}_TEST_FILE with option CPU_${OPT}_FLAGS_ON, where OPT=MSA
  • if compilation succeeded and MSA is requested by user (via CPU_BASELINE), then these options will be added to compiler

Is the -mmsa option the only one required by implemented intrinsics? What about -fhard-float or others? We should move these options from toolchain file to CompilerOptimization.cmake.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. -mmsa option is enough. When '-mmsa' is included, '-mfp64' and '-mhard-float' will be enabled by default. '-mmsa' has the same effect as '-mmsa -mfp64 -mhard-float'.

inline bool v_check_any(const v_float32x4& a)
{ return v_check_any(v_reinterpret_as_u32(a)); }

#if CV_SIMD128_64F
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_check_all(const v_int64x2&) shouldn't be guarded by CV_SIMD128_64F as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove CV_SIMD128_64F guarding form intrin_msa.hpp, only the define is kept.

2. Removed CV_SIMD128_64F guardings in intrin_msa.hpp.

Signed-off-by: Fei Wu <fwu@wavecomp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants