[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js)#15371
[GSoC 2019] Improve the performance of JavaScript version of OpenCV (OpenCV.js)#15371alalek merged 4 commits intoopencv:3.4from
Conversation
50aee2c to
1d12a88
Compare
|
@Wenzhao-Xiang , please use |
|
@huningxin Thanks! I will fix the trailing white space issues and merge the two commits into one to take it as my GSoC final commit. |
1097238 to
82e98fa
Compare
alalek
left a comment
There was a problem hiding this comment.
Thank you for the contribution! Great job 👍
|
@huningxin @terfendail I just found, almost all the implementation of |
|
Update the performance analysis #15371 (comment) |
Could you please specify what they are? I think it will help the decision. Thanks. |
| #endif | ||
|
|
||
| #if defined(EMSCRIPTEN) | ||
| # define CV_WASM_SIMD 1 |
There was a problem hiding this comment.
Usually this macro is used in OpenCV for check:
defined(__EMSCRIPTEN__)
Is there any difference?
How SIMD feature can be disabled (via CMake/.py script parameters)? (it is useful for debugging purposes)
There was a problem hiding this comment.
Thanks for review!
According to Detecting Emscripten in preprocessor, the correct define to use is __EMSCRIPTEN__.
emscripten-core/emscripten#4665 introduced a strict build mode and removed the EMSCRIPTEN define. Therefore it is not recommended to use EMSCRIPTEN even though it still works in non-strict build mode.
I'll fix that then.
There was a problem hiding this comment.
For how to disable SIMD feature, it's decided by a .py script flag --simd. If you build with this flag, CV_ENABLE_INTRINSICS will be turned on, and then SIMD feature will be detected. And if not, only scalar version will be built.
|
@huningxin They are almost for |
|
@alalek updated it. Is there any issues? |
+1. Thanks for the information. |
|
I suppose that retaining a few more fallback functions shouldn't essentially affect the size of the library. So let's keep them. |
I agree! Thanks! |
|
Any updates here? @alalek @terfendail @huningxin |
|
Thanks! @alalek |
ecdc729 to
c15f138
Compare
|
Rebased this branch to solve the conflicts. |
|
@Wenzhao-Xiang thanks for the rebase. @terfendail @alalek , is it OK to merge now? Otherwise we need Wenzhao to keep rebasing this PR. |
|
I rebased PR onto 3.4 branch: https://github.com/alalek/opencv/commits/pr15371_r Please pull these changes into |
Improve the performance of JavaScript version of OpenCV (OpenCV.js):
1. Create the base of OpenCV.js performance test:
This perf test is based on benchmark.js(https://benchmarkjs.com). And first add `cvtColor`, `Resize`, `Threshold` into it.
2. Optimize the OpenCV.js performance by WASM threads:
This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser.
3. Optimize the OpenCV.js performance by WASM SIMD:
Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development.
1. use short license header 2. fix documentation node issue 3. remove the unused `hasSIMD128()` api
1. fix emscripten define 2. use fallback function for f16
Fix rebase issue
7186dbb to
b6467d0
Compare
|
@alalek |
|
Awesome! Thanks @alalek @terfendail @Wenzhao-Xiang . |
|
Guys, begging you to release. This will be so dope! |
Overview
This pullrequest changes
This perf test is based on benchmark.js. And we first add
cvtColor,Resize,Thresholdinto it. We support both browser and Node.js version of it for test.This optimization is based on Web Worker API and SharedArrayBuffer, so it can be only used in browser. We expose two new API
cv.parallel_pthreads_set_threads_num(number)andcv.parallel_pthreads_get_threads_num(), so we can use the former to set threads number dynamically and use the latter to get the current threads number. And the default threads number is the logic core number of the device.Add WASM SIMD backend for OpenCV Universal Intrinsics. It's experimental as WASM SIMD is still in development. The simd version of OpenCV.js built by latest LLVM upstream may not work with the stable browser or old version of Node.js. Please use the latest version of unstable browser or Node.js to get new features, like
Chrome Dev.The Test
Test Environment:
Results
Thresholdkernel with parameter(1920x1080, CV_8UC1, THRESH_BINARY)as example:Performance Analysis
Kernel performance(ms)
Test Environment:
OS: Ubuntu 16.04
Emscripten: 1.38.42, LLVM upstream backend
Browser: Chrome, Version 78.0.3880.4 (Official Build) dev (64-bit)
Hardware: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz with 8 logical cores:
Analysis
With the current optimization, threads optimization works as we expected. However, wasm simd still have some issues. As we can see in the
Kernel performance result, nowresizeonly have 1.34x speed up than scalar version andcvtColoris even 2-3x slower than scalar version, which still have a big gap compared with Native SIMD optimization.Thanks @huningxin for the investigation, here are some analysis results:
shiftto simulationinteger wideninginstructions inv_dotprod. We have opened an emscripten issue. And we can continue to optimizeresizekernel after this new feature is enabled.pshufbwith memory operands are generated by V8 for current implementation.One solution is to refer to sse implementation that uses
punpcklbwandpunpckhqdq. We tried but it still fails due to an emscripten issue that leads V8 fails to generate those instructions. Let's see the response from emscripten community.