fix race condition in oclCvtColorBGR2HSV (color_hsv.dispatch.cpp) by using a mutex by kallaballa · Pull Request #24444 · opencv/opencv

kallaballa · 2023-10-24T11:52:00Z

Pertaining issue: #24443

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

opencv-alalek · 2023-10-24T12:17:05Z

modules/imgproc/src/color_hsv.dispatch.cpp

+        static thread_local UMat sdiv_data;
+        static thread_local UMat hdiv_data180;
+        static thread_local UMat hdiv_data256;
+        static thread_local int sdiv_table[256];


Lets try to apply thread_local for UMat variables only. No need to touch other tables (there is no problem with them).

Use dedicated initializedUMat for that.

initializedUMat

I am not sure what you mean by the last part

Split initialization on 2 parts:

regular CPU buffers (like int sdiv_table[256];) - they don't need thread-local at all.

OpenCL related stuff which causes problems and needs thread-local variables (also put it under condition that this data is really needed - CPU-only processing doesn't need that at all).

probably, even better solution would be to merge all 3 tables together (of course, 3 pointers can be initialized to point to different parts of the joint table) and then we can convert it to UMat on-fly:

int rgb2hsv_tab[256*3]; volatile bool rgb2hsv_initialized = false; if (!rgb2hsv_initialized) { ... } if (use_opencl) { UMat rgb2hsv_utab = Mat(1, 256*3, CV_32S, rgb2hsv_tab).getUMat(); ... // run opencl kernel with rgb2hsv_utab as one of the parameters }

yes, it means extra copy, but compared to the processed data it should be a tiny overhead, and much easier to handle than static thread-local UMat's.

I'm not experienced enough to say the roundtrip-lateny will be more than we'd like, but it think so. But I like the safety and simplicity of your approach, so I guess I'll profile it.

I neither like unecessary round-trips nor messing with the memory model, so i came up with this:

if(_src.depth() == CV_8U) { static std::mutex mtx; { std::unique_lock<std::mutex> lock(mtx); static UMat sdiv_data; static UMat hdiv_data180; static UMat hdiv_data256; static int combined_table[256]; static volatile bool initialized180 = false, initialized256 = false; static volatile bool initialized = hrange == 180 ? initialized180 : initialized256; if (!initialized) { int * const hdiv_table = hrange == 180 ? combined_table : &combined_table[128], hsv_shift = 12; UMat & hdiv_data = hrange == 180 ? hdiv_data180 : hdiv_data256; combined_table[0] = 0; int v = 255 << hsv_shift; if (!initialized180 && !initialized256) { for(int i = 1; i < 256; i++ ) combined_table[i] = saturate_cast<int>(v/(1.*i)); Mat(1, 256, CV_32SC1, combined_table).copyTo(sdiv_data); } v = hrange << hsv_shift; for (int i = 1; i < 256; i++ ) hdiv_table[i] = saturate_cast<int>(v/(6.*i)); Mat(1, 256, CV_32SC1, hdiv_table).copyTo(hdiv_data); initialized = true; } h.setArg(ocl::KernelArg::PtrReadOnly(sdiv_data)); h.setArg(hrange == 256 ? ocl::KernelArg::PtrReadOnly(hdiv_data256) : ocl::KernelArg::PtrReadOnly(hdiv_data180)); } }

Proposed code locks mutex on each call. That should be properly avoided.

@kallaballa, let's try the proposed by me approach without use of static UMat's at all.

@opencv-alalek double checked pattern and lock_guard 0f290bc
@vpisarev I'll give it a try and provide numbers

Test failed e.g. on MacOS: OCL_TEST_P(CvtColor8u32f, RGB2HSV_FULL) { performTest(3, 3, CVTCODE(RGB2HSV_FULL), IPP_EPS); }

I don't have a MacOS setup, but it stands to reason that it is a concurrency issue. So i checked on the memory model guarantees of volatile and there are none. So I guessed that is the problem. Let's see the test.

@vpisarev implementation without static still pending. want both version working.

asmorkalov · 2023-11-02T07:33:20Z

@kallaballa Friendly reminder.

kallaballa · 2023-11-08T05:49:32Z

Was locked out of my account. I am back on it :)

…ovide guarantess about atomicity or ordering of operations.

kallaballa · 2023-11-20T17:21:41Z

this PR depends on #24326 to work correctly in mutli-threaded environments.

I implemented a compromise in matters of static.

kallaballa · 2023-11-20T17:24:16Z

Waiting for the tests. Apart from that i think the PR is ready.

opencv-alalek · 2023-11-21T15:55:50Z

modules/imgproc/src/color_hsv.dispatch.cpp

-        static int hdiv_table256[256];
-        static volatile bool initialized180 = false, initialized256 = false;
-        volatile bool & initialized = hrange == 180 ? initialized180 : initialized256;
+        static std::mutex mtx;


This breaks bare metal build configurations without any threading support (see OPENCV_DISABLE_THREAD_SUPPORT)

Use OpenCV wrappers instead.

Reuse cv::getInitializationMutex() instead of creation of new one.

Oh i understand. thx for the info. is there a list of features i should usually test for? i started to test for WITH_OPENCL=OFF and i will keep in mind that there are targets without thread support... are there more configurations i should keep in mind?

Done 031abea. Tested locally by running opencv_test_improc with following build configuration. Btw. i found cv::utils::lock_guard but it doesn't seem necessary since there is C++11 support.

cmake -DINSTALL_BIN_EXAMPLES=OFF -DBUILD_SHARED_LIBS=ON -DWITH_OPENCL=ON -DBUILD_opencv_java=OFF -DBUILD_opencv_js=OFF -DBUILD_opencv_python2=OFF -DBUILD_opencv_python3=OFF -DBUILD_EXAMPLES=OFF -DBUILD_PACKAGE=OFF -DBUILD_TESTS=ON -DBUILD_PERF_TESTS=ON -DBUILD_DOCS=OFF -DBUILD_opencv_videoio=ON -DBUILD_opencv_highgui=ON -DBUILD_opencv_ts=ON -DBUILD_opencv_imgcodecs=ON -DBUILD_opencv_plot=OFF -DBUILD_opencv_tracking=OFF -DBUILD_opencv_video=OFF -DBUILD_opencv_world=ON -DCMAKE_C_FLAGS=-DOPENCV_DISABLE_THREAD_SUPPORT -DCMAKE_CXX_FLAGS=-DOPENCV_DISABLE_THREAD_SUPPORT ../opencv

std::lock_guard => cv::AutoLock

Did you measure performance changes due to added of .copyTo() calls?

done: 8cbf9c7

std::lock_guard => cv::AutoLock

Did you measure performance changes due to added of .copyTo() calls?

I compared the current HEAD of 4.x with the branch of the PR (which i updated to the least 4.x HEAD) .
I used follwing build configuration for both branches:

cmake -DINSTALL_BIN_EXAMPLES=OFF -DBUILD_SHARED_LIBS=ON -DWITH_OPENCL=ON -DBUILD_opencv_java=OFF -DBUILD_opencv_js=OFF -DBUILD_opencv_python2=OFF -DBUILD_opencv_python3=OFF -DBUILD_EXAMPLES=OFF -DBUILD_PACKAGE=OFF -DBUILD_TESTS=ON -DBUILD_PERF_TESTS=ON -DBUILD_DOCS=OFF -DBUILD_opencv_videoio=ON -DBUILD_opencv_highgui=ON -DBUILD_opencv_ts=ON -DBUILD_opencv_imgcodecs=ON -DBUILD_opencv_plot=OFF -DBUILD_opencv_tracking=OFF -DBUILD_opencv_video=OFF -DBUILD_opencv_world=ON -DENABLE_PROFILING=ON ../opencv

And the following test program: https://github.com/kallaballa/OpenCV-Issues/blob/main/src/oclCvtColorBGR2HSV-race/perf-test.cpp

Without patch it takes: ~14.8s
With patch it takes: ~15.2s

So there is a slow-down. I'll optimize a bit and measure again.

When profiling i had suspicious results and so i increased the iterations by 10x.
This gave me

154.483s for patched without copy.

156.108s for patch with copy.

153.803s without patch.

I think the differences are still quite influenced by variance. anyway it is not much. should i continue?

I meassured also the variant no-static/no-thread_local/no-locking: 155.512s

Please always reuse OpenCV performance tests or add them if something is missing.

Avoid using of hand-written tests cases

suspicious results

Provided measurement code has bug due to async execution of cvtColor(UMat) (code should schedule OpenCL kernels, but do not wait the final result).

P.S. Mat to UMat coping is synchronous call in current API - forces wait of OpenCL execution queue.

…e \-DOPENCV_DISABLE_THREAD_SUPPORT builds fail.

asmorkalov · 2023-12-13T09:47:49Z

@opencv-alalek @vpisarev please review.

kallaballa · 2023-12-14T06:17:45Z

@opencv-alalek @vpisarev please review.

I would prefer the always-copy variant till diverging cl_context issues have been resolved. because the other variants need special handling in multi-context/multi-thread situations. and the performance difference is neglect-able.

Which one should i commit before you review?

kallaballa · 2023-12-14T06:55:12Z

@opencv-alalek @vpisarev please review.

I would prefer the always-copy variant till diverging cl_context issues have been resolved. because the other variants need special handling in multi-context/multi-thread situations. and the performance difference is neglect-able.

Which one should i commit before you review?

Also i have changes to the current state of the PR. should i add them?

asmorkalov · 2023-12-20T11:59:40Z

@opencv-alalek @kallaballa What is the PR status? Is it ready for merge?

kallaballa · 2025-01-03T20:36:13Z

I just checked. It isn't ready. Picking it back up.

asmorkalov · 2025-04-24T13:16:59Z

@kallaballa What is the PR status? Is it ready for review?

opencv#24443 fix race condition by using thread_local

51a5c98

opencv-alalek added bug category: imgproc category: ocl labels Oct 24, 2023

opencv-alalek added this to the 4.9.0 milestone Oct 24, 2023

opencv-alalek reviewed Oct 24, 2023

View reviewed changes

kallaballa changed the title ~~fix race condition in oclCvtColorBGR2HSV (color_hsv.dispatch.cpp) by using thread_local~~ fix race condition in oclCvtColorBGR2HSV (color_hsv.dispatch.cpp) by using <stike>thread_local</strike> a mutex Nov 10, 2023

kallaballa changed the title ~~fix race condition in oclCvtColorBGR2HSV (color_hsv.dispatch.cpp) by using <stike>thread_local</strike> a mutex~~ fix race condition in oclCvtColorBGR2HSV (color_hsv.dispatch.cpp) by using a mutex Nov 10, 2023

kallaballa added 3 commits November 10, 2023 12:54

fix the data race with a mutex

d1a9341

use the double checked pattern and lock_guard

0f290bc

replace the volatile bool with std::atomic<bool>. volatile doesn't pr…

a292965

…ovide guarantess about atomicity or ordering of operations.

asmorkalov requested review from opencv-alalek and vpisarev November 13, 2023 06:38

don't use static UMats, they cause problems with diverging contexts

8c2e869

opencv-alalek reviewed Nov 21, 2023

View reviewed changes

kallaballa and others added 3 commits November 25, 2023 04:18

use cv::Mutex abd cv::initializedMutex() instead of std::mutex becaus…

031abea

…e \-DOPENCV_DISABLE_THREAD_SUPPORT builds fail.

use AutoLock instead if std::lock_guard

8cbf9c7

Merge branch 'opencv:4.x' into color_hsv_dispatch_thread_local

5f739c9

opencv-alalek modified the milestones: 4.9.0, 4.10.0 Dec 21, 2023

asmorkalov modified the milestones: 4.10.0, 4.11.0 May 16, 2024

asmorkalov modified the milestones: 4.11.0, 4.12.0 Dec 20, 2024

Merge branch 'opencv:4.x' into color_hsv_dispatch_thread_local

6d335fd

asmorkalov mentioned this pull request May 5, 2025

Reworked HSV color conversion tables initialization for OpenCL branch #27287

Merged

6 tasks

asmorkalov closed this May 5, 2025

Uh oh!

Conversation

kallaballa commented Oct 24, 2023

Pull Request Readiness Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vpisarev Nov 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Nov 2, 2023

Uh oh!

kallaballa commented Nov 8, 2023

Uh oh!

kallaballa commented Nov 20, 2023

Uh oh!

kallaballa commented Nov 20, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kallaballa Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kallaballa Dec 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Dec 13, 2023

Uh oh!

kallaballa commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallaballa commented Dec 14, 2023

Uh oh!

asmorkalov commented Dec 20, 2023

Uh oh!

kallaballa commented Jan 3, 2025

Uh oh!

asmorkalov commented Apr 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vpisarev Nov 8, 2023 •

edited

Loading

kallaballa Nov 22, 2023 •

edited

Loading

kallaballa Dec 3, 2023 •

edited

Loading

kallaballa commented Dec 14, 2023 •

edited

Loading