Parallelize implementation of HDR MergeMertens.#22959
Conversation
bccb598 to
026aca5
Compare
Please don't do that for already existed files. This clang-format is recommendation for new files only. BTW, you may format added/modified lines. |
Thanks @alalek, I removed the clang formats for |
modules/photo/perf/perf_hdr.cpp
Outdated
| static vector<float> DEFAULT_VECTOR; | ||
| void loadExposureSeq(const std::string& list_filename, vector<Mat>& images, | ||
| vector<float>& times = DEFAULT_VECTOR) |
There was a problem hiding this comment.
Looks like tests affects each other through the global variable.
It is better to drop times parameter if it is not used.
There was a problem hiding this comment.
I removed the static vector and return a helper struct now.
modules/photo/src/merge.cpp
Outdated
|
|
||
| Laplacian(gray, contrast, CV_32F); | ||
| contrast = abs(contrast); | ||
| images[i].convertTo(images[i], CV_32F, 1.0f/255.0f); |
There was a problem hiding this comment.
Could we keep using img instead of multiple images[i]?
With declaring it in this way:
Mat& img = images[i];
modules/photo/src/merge.cpp
Outdated
| } | ||
| int maxlevel = static_cast<int>(logf(static_cast<float>(min(size.width, size.height))) / logf(2.0f)); | ||
| std::vector<Mat> res_pyr(maxlevel + 1); | ||
| std::vector<std::vector<Mat>> img_pyrs(images.size(), std::vector<Mat>(maxlevel + 1)); |
There was a problem hiding this comment.
What is the memory consumption "x-factor"?
There was a problem hiding this comment.
The memory consumption increases by the number of images * pyramid size, which should be ok for most use cases, where only a couple of images are used to create an HDR image, but in the worst case of course can explode. I rewrote the code to use a mutex/condition_variable instead, which showed only a very small overhead (~3ms slower) in my local tests. This way we are back to the original memory consumption.
feuerste
left a comment
There was a problem hiding this comment.
Thanks a lot for your review! I addressed all your comments.
modules/photo/perf/perf_hdr.cpp
Outdated
| static vector<float> DEFAULT_VECTOR; | ||
| void loadExposureSeq(const std::string& list_filename, vector<Mat>& images, | ||
| vector<float>& times = DEFAULT_VECTOR) |
There was a problem hiding this comment.
I removed the static vector and return a helper struct now.
modules/photo/src/merge.cpp
Outdated
|
|
||
| Laplacian(gray, contrast, CV_32F); | ||
| contrast = abs(contrast); | ||
| images[i].convertTo(images[i], CV_32F, 1.0f/255.0f); |
modules/photo/src/merge.cpp
Outdated
| } | ||
| int maxlevel = static_cast<int>(logf(static_cast<float>(min(size.width, size.height))) / logf(2.0f)); | ||
| std::vector<Mat> res_pyr(maxlevel + 1); | ||
| std::vector<std::vector<Mat>> img_pyrs(images.size(), std::vector<Mat>(maxlevel + 1)); |
There was a problem hiding this comment.
The memory consumption increases by the number of images * pyramid size, which should be ok for most use cases, where only a couple of images are used to create an HDR image, but in the worst case of course can explode. I rewrote the code to use a mutex/condition_variable instead, which showed only a very small overhead (~3ms slower) in my local tests. This way we are back to the original memory consumption.
|
@alalek I think I addressed all your comments. Can you please take another look? |
modules/photo/src/merge.cpp
Outdated
| std::unique_lock<std::mutex> lock(weight_sync.mutex); | ||
| weight_sync.cond.wait(lock, [&]{ return weight_sync.index == i; }); | ||
| weight_sync.sum += weights[i]; | ||
| weight_sync.index++; | ||
| weight_sync.cond.notify_all(); |
There was a problem hiding this comment.
weight_sync.index == i
Do we really need to preserve summation operations order?
If no, then please use cv::AutoLock and cv::Mutex instead (to keep working OPENCV_DISABLE_THREAD_SUPPORT mode)
Also manual usage of cond variables is not exception-safe.
There was a problem hiding this comment.
Do we really need to preserve summation operations order?
I wasn't sure what the policy for OpenCV is, so my first attempt was to preserve deterministic results.
If no, then please use
cv::AutoLockandcv::Mutexinstead (to keep workingOPENCV_DISABLE_THREAD_SUPPORTmode)
If fine with the policy, then we can of course also remove the condition variable and use OpenCV's Mutex/AutoLock. I changed it accordingly, PTAL!
I adjusted the python test_umat_merge_mertens though to only use one thread for testing, as we want to have determinism for the summation here.
|
@alalek Is there anything else I can do to support your review? Thanks a lot for your support! |
alalek
left a comment
There was a problem hiding this comment.
LGTM 👍 Thank you for the contribution!
Thanks a lot @alalek ! Unfortunately I'm not permitted to merge, only authorized users are. Whom can I contact for the merge? |
Parallelize implementation of HDR MergeMertens. * Parallelize MergeMertens. * Added performance tests for HDR. * Ran clang-format. * Optimizations. * Fix data path for Windows. * Remove compiiation warning on Windows. * Remove clang-format for existing file. * Addressing reviewer comments. * Ensure correct summation order. * Add test for determinism. * Move result pyramid into sync struct. * Reuse sync for first loop as well. * Use OpenCV's threading primitives. * Remove cout.
With this PR the process function for MergeMertens is parallelized. Also performance tests for all HDR methods are added as well as a determinism test for the parallel version of MergeMertens.
Running the photo performance tests locally on my workstation before this PR:
Running with the parallized code of this PR:
Also all new files have been clang-formatted with https://github.com/opencv/opencv/wiki/.clang_format and the fix proposed in #22960.
Best to be reviewed with
Hide whitespacebecause of the new indentations caused by the surroundingparallel_for_loops.Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.