Skip to content

[GSoC 2022] nasm/simd support for libjpeg-turbo#22372

Merged
asmorkalov merged 2 commits intoopencv:4.xfrom
ocpalo:libjpegturbo_nasm
Sep 14, 2022
Merged

[GSoC 2022] nasm/simd support for libjpeg-turbo#22372
asmorkalov merged 2 commits intoopencv:4.xfrom
ocpalo:libjpegturbo_nasm

Conversation

@ocpalo
Copy link
Copy Markdown
Collaborator

@ocpalo ocpalo commented Aug 12, 2022

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Related with #22344

@sturkmen72
Copy link
Copy Markdown
Contributor

LGTM ( i tested only on Windows )

you can see build logs with nasm is available and nasm is unavailable

result output of test code is like below

SIMD Support: YES

Frames captured (VideoCapture) :   100    Average FPS:     170.8    Average time per frame:      5.85 ms
Frames captured (imread)       :   100    Average FPS:     109.0    Average time per frame:      9.17 ms

SIMD Support: NO

Frames captured (VideoCapture) :   100    Average FPS:     171.1    Average time per frame:      5.84 ms
Frames captured (imread)       :   100    Average FPS:      82.9    Average time per frame:     12.06 ms

@asmorkalov
Copy link
Copy Markdown
Contributor

FYI: perf tests can report status and performance statistics with to xml with flag --gtest_output=xml:report_file_name.xml. The xml reports could be compared with summary.py script: https://github.com/opencv/opencv/blob/4.x/modules/ts/misc/summary.py

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Aug 16, 2022

I have changed some parts of the CMakeLists take into account your comments. But i am thinking that is not what you want exactly. Can you give me more info about that and if possible, could you look at last workflow errors? In the macOS ARM build, 3 tests are failed. I did not understand why this is the case because these pass in my computer. @asmorkalov

@ocpalo ocpalo force-pushed the libjpegturbo_nasm branch from 0e25aa1 to 427e392 Compare August 17, 2022 21:03
@sturkmen72
Copy link
Copy Markdown
Contributor

image
I guess the simd related codes do not appear in the relevant place in Solution Explorer. something needs to be done to solve it

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Aug 18, 2022

Geometric mean (ms)

Name of Test libjpeg libjpeg-turbo libjpeg-turbo vs previous (x-factor)
Decode::JPEG 62.426 ms 38.651 ms 1.62
Encode::JPEG 58.559 ms 33.405 ms 1.75

@ocpalo ocpalo force-pushed the libjpegturbo_nasm branch from 7e69a7b to e7b457e Compare August 23, 2022 20:24
@vpisarev vpisarev changed the title nasm/simd support for libjpeg-turbo [GSoC 2022] nasm/simd support for libjpeg-turbo Aug 26, 2022
@ocpalo ocpalo changed the title [GSoC 2022] nasm/simd support for libjpeg-turbo [GSoC 2022][WIP] nasm/simd support for libjpeg-turbo Aug 29, 2022
@vpisarev
Copy link
Copy Markdown
Contributor

@ocpalo, we really need to finalize this PR by the end of GSoC. Please, be more active

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 2, 2022

So far, we have verified in the following OS and architecture.

  • macOS x64(Intel CPU)
  • macOS ARM64
  • Ubuntu ARM64
  • Ubuntu x64
  • Windows x64

Performance comparision between SIMD enabled vs not. Ubuntu x64

Geometric mean (ms)

Name of Test withoutsimd.txt simd.txt simd.txt vs withoutsimd.txt (x-factor)
Decode::JPEG 67.759 39.150 1.73
Encode::JPEG 119.672 33.812 3.54

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 5, 2022

@asmorkalov Hi,

Vadim informed me in the last meeting about CPU type and optimization baseline. I have a few comments about this. Let me know what you think so I can do the changes according to your comments. In the 3rdparty/libjpeg-turbo/CMakeLists.txt you have commented before that i need to reuse OpenCV variables as x86_64,x86 etc. Then alalek wrote a comment that about we should do minimal changes if code is borrowed from upstream(which is true). Should I still use the openCV variables or do not use to keep align with the upstream.

My second question is, as you said before some CPU's does not support SIMD instructions. Since we provide this SIMD extension as optional(enabled by default), the user can disable this extension. Should I worry about this? I think it will make CMakeLists complicated.

@ocpalo ocpalo changed the title [GSoC 2022][WIP] nasm/simd support for libjpeg-turbo [GSoC 2022] nasm/simd support for libjpeg-turbo Sep 8, 2022
@ocpalo ocpalo requested a review from asmorkalov September 9, 2022 12:59
@asmorkalov
Copy link
Copy Markdown
Contributor

Hello @ocpalo Thanks for the great job! I reviewed CMakeLists in libjpeg-turbo repos and tested previous version of the patch. Findings:

  • Libjpeg-turbo cmake checks NEON/altivec/DSPr2 support with C compiler checks in CMake. So we are in save, if the code is build, for example, for amrv6. NEON part is not included and CMake reports warning about this. I tested it with old Raspberry Pi and it passes tests correctly. For the builds where NEON support is enabled on compiler side with -mfpu=neon NEON is enabled with libjpeg turbo too. I tested it with my Jetson TK1 board (armv7 with NEON). So everything works well with C compiler indirectly. I do not have Power and MIPS boards to check, but the CMake approach is similar and I do not see any reasons for suspicious.
  • Libjpeg-turbo always has AVX2 assembly included into build for Intel systems, but the tests do not fail on old system without AVX2 support. Most probably it's dispatched in runtime.

So What I propose to do:

  • Presume original CMake as much as possible. Do not wary about duplicate system checks and other specifics.
  • Make WITH_SIMD cache variable with value (NOT CV_DISABLE_OPTIMIZATION) to have precise control for testing.
  • Allow libjpeg-turbo for other Unixes like BSD, Solaris, SunOS, etc. It's already supported by original CMake and I do not see any reason to drop them, if all tools hve been found.

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 11, 2022

@asmorkalov Thanks for the feedback!

  • I have created a another flag to enable SIMD which is WITH_LIBJPEG_TURBO_SIMD and it is value is NOT CV_DISABLE_OPTIMIZATION. WITH_SIMD flag is depends on that flag. I think this satisfy the second item in your propose list.
  • I did not drop other Unixes support on purpose. If you are talking about this line, it is related with TurboJPEG api and in OpenCV it is not enabled anyway. Other than that, I updated the compiler flags to enable maximum optimization in SunOS|Pro and clang.

I am trying to solve how to catch if old systems does not support AVX2 assembly support at compile time.


# No SIMD
set(JPEG_SOURCES ${JPEG_SOURCES} jsimd_none.c)
if(MSVC)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC
MSVC_IDE

Looks like there is some mess in using of these variables.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrowed these lines from the upstream. Same as MSVC_IDE parts.

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 11, 2022

CMake emits warning when libjpeg-turbo SIMD Extension is enabled but could not activated due to no support. Hence this seems like workflow fails. Should I change it WARNING to the STATUS?

Also, I was planning to add the following lines to the CMake to see if SIMD extension is supported on compile time. But could not be sure if this is right or not.

if(X86_64)
  if(NOT "${CPU_BASELINE_FINAL}" MATCHES "SSE2|AVX2" OR NOT "${CPU_DISPATCH}" MATCHES "SSE2|AVX2")
    set(WITH_SIMD 0)
  endif()
endif()
if(X86)
  if(NOT "${CPU_BASELINE_FINAL}" MATCHES "MMX|AVX2|SSE2" OR NOT "${CPU_DISPATCH}" MATCHES "MMX|AVX2|SSE2")
    set(WITH_SIMD 0)
  endif()
endif()

If someone confirm, i will push it.

@ocpalo ocpalo requested review from alalek and removed request for asmorkalov September 11, 2022 21:27
@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 11, 2022

Only "CPU_BASELINE_FINAL" should be used. Also take a look on other 3rdparty file hows to do this right.

CPU_DISPATCH is for dispatching in "runtime" - probably can't be used here due to missing checks (and missing compiler flags).

I believe NOT CV_DISABLE_OPTIMIZATION is enough as suggested by @asmorkalov .

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 12, 2022

If NOT CV_DISABLE_OPTIMIZATION is enough, then I think pr is ready.

ENABLE_LIBJPEG_TURBO_SIMD "Include SIMD extensions for libjpeg-turbo, if available for this platform" (NOT CV_DISABLE_OPTIMIZATION)

SIMD extension is controlled via ENABLE_LIBJPEG_TURBO_SIMD flag. And this flags value is NOT CV_DISABLE_OPTIMIZATION. If there are no magic behind this value, we are good.

cc @alalek @asmorkalov

@ocpalo ocpalo requested review from asmorkalov and removed request for alalek September 12, 2022 06:56
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me except debug messages in CMake.

endif()

if(WITH_SIMD)
message(STATUS "SIMD extensions: ${CPU_TYPE} (WITH_SIMD = ${WITH_SIMD})")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this message is redundant. SIMD status is reported form simd/CmakeLists.txt. In particular on armv6 I see:

SIMD extensions: arm (WITH_SIMD = 1)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this line reports this. There is no message reported on simd/CMakeLists.txt except failures and it reports NEON or partial NEON support. I believe it is good have for other type CPU's.

@asmorkalov asmorkalov self-requested a review September 13, 2022 19:51
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Well Done!

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 13, 2022

Thanks, I appreciate for the support and the help!

@asmorkalov asmorkalov merged commit 364b3f1 into opencv:4.x Sep 14, 2022
@asmorkalov asmorkalov added this to the 4.7.0 milestone Sep 14, 2022
@ocpalo ocpalo deleted the libjpegturbo_nasm branch September 16, 2022 10:35
@alalek alalek mentioned this pull request Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants