Skip to content

[GSoC] OpenCV.js: WASM SIMD optimization 2.0#18068

Merged
alalek merged 11 commits intoopencv:3.4from
lionkunonly:gsoc_2020_simd
Oct 18, 2020
Merged

[GSoC] OpenCV.js: WASM SIMD optimization 2.0#18068
alalek merged 11 commits intoopencv:3.4from
lionkunonly:gsoc_2020_simd

Conversation

@lionkunonly
Copy link
Copy Markdown
Contributor

@lionkunonly lionkunonly commented Aug 11, 2020

Overview

This pull request changes

  1. Update the version of supported emscripten to 1.39.16 and modify the js files in test and perf test to fit it. Add required emscripten version into opencv.js build instructions.
  2. Implement the type 64 intrinsic and add perf test for some kernels.
  3. Optimize the perf test. Remove the redundant code in the perf test and move the functions that play similar roles into the file perf_helpfunc.js. Reconstruct current perf tests.
  4. Add more perf tests for kernels. Supported perf tests: cvtColor, resize, threshold, Sobel, filter2D, Scharr, gaussianBlur, blur, medianBlur, erode, dilate, remap, warpAffine, warpPerspective, pyrDown.
  5. Implement a loader that can detect the features of the browser and load the corresponding version of OpenCV.js automatically. It utilizes the help of WebAssembly Feature Detection.

The Test

Test Environment:

OS: Ubuntu Linux 18.04.4
Emscripten: 1.39.16, LLVM upstream backend
Browser: Chrome, Version 85.0.4183.26 dev (64-bit)
Hardware: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 12 logical cores
  • OpenCV.js tests: all passed
  • Universal Intrinsics WASM backend test: all passed
  • All perf tests: all run successfully

Results

  • Perf test for 64bit intrinsics
OS: Ubuntu Linux 18.04.4
Emscripten: 1.39.16, LLVM upstream backend
Browser: Chrome, Version 85.0.4183.26 dev (64-bit)
Hardware: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 12 logical cores
Function Before 64bit implementation (ms) After 64bit implementation (ms) Speedup
countNonZero() 0.5386 0.5221 1.032x
Mat::dot() 0.8079 0.7960 1.015x
split() 2.1213 2.0483 1.036x
merge() 2.2383 2.2264 1.005x

The performance of kernels after 64-bit implementation is similar to the performance before implementation.

  • Performance of resize kernel with widen instructions
OS: Ubuntu Linux 18.04.4
Emscripten: 1.39.16, LLVM upstream backend
Browser: Chrome, Version 85.0.4183.26 dev (64-bit)
Hardware: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 12 logical cores
Parameters Mean time of Scalar Mean time of SIMD with shift(ms) Speedup Mean time of SIMD with widen(ms) Speedup
(CV_8UC4, 1280x720, 640x480) 5.113 2.888 1.77x 4.031 1.27x
(CV_8UC1, 1280x720, 640x480) 1.870 1.128 1.66x 1.249 1.50x

The perform of the implementation with widen instructions does not bring improvement. One of the reasons is that the wasm_v8x16_shuffle and widen instructions have to be used together. The original implementation is left finally.

  • Perf tests
Supported Kernels now (lines) Supported Kernels before (lines)
cvtColor (421) cvtColor (572)
resize (165) resize (262)
threshold (158) threshold (217)
Sobel (170) -
filter2D (127) -
Scharr (156) -
gaussianBlur (126) -
blur (130) -
medianBlur (118) -
erode (117) -
dilate (117) -
remap (182) -
warpAffine (130) -
warpPerspective (143) -
pyrDown (116) -

The collected performance data is stored in the Google Drive. Performance data

Performance Analysis

Because there are more than ten kernels are tested and some of them are tested with parameters with different data type and channels, the table that records the performance data is large. So I put the performance data in my personal Google drive.

Performance data

Analysis
Based on the collected performance data, the SIMD optimization works as we expected in most situations. It can achieve similar performance as the SSE2 optimization for the native OpenCV. For example, for the kernel blur with the parameter (1280x720, CV_8UC1, BORDER_REPLICATE) ksize=3, the SIMD version has 1.357x speed up and the SSE2 has 1.415x speed up. Sometimes the SIMD optimization is better. For example, for the kernel pyrDown with the parameter (1920x1080, CV_32FC4), the SIMD version has 3.094x speed up and the SSE2 has 1.83 speed up. The data that achieves similar or better speed up is tagged with the color green.

However, there still exist some bad cases, for example, for the kernel blur with (1280x720, CV_32FC1, BORDER_REPLICATE) ksize=3, the SIMD version has 0.519 speed up which is 2x slower than the SSE optimization for native OpenCV. Such data is tagged with the color yellow.

The data tagged with color red means that it is unnormal. In my point of view, some yellow data is brought by the red unnormal data like the kernel medianBlur with the parameter (1280x720, CV_16SC1, 5).

I hope the collected data can help people know the gap between the SIMD optimization in the OpenCV.js and the optimization in the native OpenCV better.

force_builders_only=linux,docs,Custom
buildworker:Custom=linux-4
build_image:Docs=docs-js
build_image:Custom=javascript-simd:1.39.16

fix the trailing whitespace.
@lionkunonly
Copy link
Copy Markdown
Contributor Author

Hi, I have implemented the OpenCV.js loader and modified the js_setup.markdown to explain the usage of the loader. But I am not sure my expression is clear enough. So, Cloud you give some suggestions?

let threadsPath = "";
let mtSIMDPath = "";


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the redundant blank line (just need one).



this.judgeWASM = function() {
try{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please insert a space between try and {.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you can check WebAssembly by

return !(typeof WebAssembly === 'undefined')

let mtSIMDPath = "";


this.judgeWASM = function() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

judge sounds not a good name. how about checkWasm?

}
}

this.judgeSIMD = function() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

}

this.judgeSIMD = function() {
return WebAssembly.validate(new Uint8Array([0,97,115,109,1,0,0,0,1,4,1,96,0,0,3,2,1,0,10,9,1,7,0,65,0,253,15,26,11]));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please document what's the content of the Unit8Array?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you need to call checkWasm first.

Copy link
Copy Markdown
Contributor Author

@lionkunonly lionkunonly Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By reading the docs of wasm-feature-detect, I think my implementation for detecting threads feature and simd feature is good enough. Because the browser will update in the future, which may block my implementation. So, I decide to utilize the wasm-feature-detect to solve my problem. Do you agree it ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good to me.

try {
let test1 = (new MessageChannel).port1.postMessage(new SharedArrayBuffer(1));
let result = WebAssembly.validate(new Uint8Array([0,97,115,109,1,0,0,0,1,4,1,96,0,0,3,2,1,0,5,4,1,3,1,1,10,11,1,9,0,65,0,254,16,2,0,26,11]));
return result;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return true?

let result = WebAssembly.validate(new Uint8Array([0,97,115,109,1,0,0,0,1,4,1,96,0,0,3,2,1,0,5,4,1,3,1,1,10,11,1,9,0,65,0,254,16,2,0,26,11]));
return result;
} catch(e) {
return !1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return false?

}
}

this.setPaths = function(paths) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this method and merge its implementation into the constructor.

}
}

this.loadOpenCV = function (onloadCallback) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can encapsulate all functions into this method. Do you really need a loader object?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function loadOpenCV(paths, onloadCallback) {...}

It would have simpler usage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loader object is not necessary. I think I can try to encapsulate all functions in loadOpenCV.

if (simdSupported && threadsSupported && OPENCV_URL == "" && self.mtSIMDPath != "") {
OPENCV_URL = self.mtSIMDPath;
} else if (simdSupported && threadsSupported && OPENCV_URL == "") {
throw new Error("The browser supports simd and threads, but the path of OpenCV.js with simd and threads optimization is empty");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, if developer only provides the threadsPath or simdPath, it should also work.

log.info("=====")
builder.build_opencvjs()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove trailing whitespaces

@lionkunonly
Copy link
Copy Markdown
Contributor Author

@huningxin @terfendail Hi, I have modified the implementation based on your comments in code review. However, the most important change in this implementation is that I use the WebAssembly Feature Detection in my implementation. The reason for using it is that I find the original implementation may fail when the browser update in the future. With the help of the WebAssembly Feature Detection would be available even if the browser update. Because this library would update at that time.

Copy link
Copy Markdown
Contributor

@huningxin huningxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lionkunonly , could you please share the tests and the results you did for your loader?

OPENCV_URL = simdPath;
console.log("The OpenCV.js with simd optimization is loaded now.");
} else if (threadsSupported && threadsPath != "") {
if (simdSupported && threadsSupported) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (simdSupported && threadsSimdPath === "")?

console.log("The browser supports wasm, but the path of OpenCV.js for wasm is empty");
}

if(OPENCV_URL == "") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a space between if and (

throw new Error("The browser supports simd, but the path of OpenCV.js with simd optimization is empty");
OPENCV_URL = threadsPath;
console.log("The OpenCV.js with threads optimization is loaded now");
} else if (threadsPath) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if (threadsSupported)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it is my fault. This choice should be deleted.

loader = new Loader();

//Set paths configuration
pathsConfig = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let pathsConfig?

// Create an instance
loader = new Loader();

//Set paths configuration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space after //


//Load OpenCV.js and use main function as the param
loader.loadOpenCV(main);
//Load OpenCV.js and use the pathsConfiguration and main function as the params.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@lionkunonly
Copy link
Copy Markdown
Contributor Author

@lionkunonly , could you please share the tests and the results you did for your loader?

The Scene1: SIMD and threads are enabled.

The path configuration of the example

Config1

The log for loading status and html elements

result1
result2

In this scene, the application load the opencv.js with simd and threads optimization successfully.

The Scene2: SIMD and threads are enabled, but the path to threads+simd version is invalid.

The path configuration in js file

Config2

Results

result3
result4

In this scene, the path for opencv.js with threads and simd optimization is empty. So the loader loads the opencv.js with simd optimization automatically.

The Scene3: Threads is enabled, simd is diabled.

The feature configuration in browser

Config3

Results

result5
result6

In this scene, the loader loads the opencv.js with threads optimization automatically without any failed log like the log in the scene2.

Finally, I am trying to design a demo to show the effect of the loader and deploy the demo on the github.io.

@lionkunonly
Copy link
Copy Markdown
Contributor Author

lionkunonly commented Aug 29, 2020

@lionkunonly , could you please share the tests and the results you did for your loader?

You can try the example with the Demo -- Perf tests and loader demo

@huningxin @terfendail I think I have completed all the tasks in GSoC2020 now. Should I remove the [WIP] in this PR title?

@lionkunonly lionkunonly changed the title WIP: [GSoC] OpenCV.js: WASM SIMD optimization 2.0. [GSoC] OpenCV.js: WASM SIMD optimization 2.0 Aug 29, 2020
@lionkunonly
Copy link
Copy Markdown
Contributor Author

@terfendail @terfendail The final report is Here. You can leave your comment in the Google drive version .

@huningxin
Copy link
Copy Markdown
Contributor

You can try the example with the Demo -- Perf tests and loader demo

The loader demo works great regarding to my test, even when disabling WASM in Chrome. Thanks for that!

My another suggestion would be using gaussianBlur kernel that has good threads speedup (4.98X) and simd speedup (3.36X). So user can see difference easily.

The final report is Here. You can leave your comment in the Google drive version .

Please add the summary and analysis of your performance data into the final report. Both of them are valuable. Thanks!

@huningxin
Copy link
Copy Markdown
Contributor

Should I remove the [WIP] in this PR title?

It seems all tasks are done. Please remove that so the OpenCV maintainers can review.

@lionkunonly
Copy link
Copy Markdown
Contributor Author

lionkunonly commented Sep 4, 2020

@terfendail @huningxin Hi, Vitaly and Ningxin. I have updated the video report please check it here

@alalek alalek merged commit c824176 into opencv:3.4 Oct 18, 2020
@alalek alalek mentioned this pull request Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants