Skip to content

Parallelize implementation of HDR MergeMertens.#22959

Merged
alalek merged 14 commits intoopencv:4.xfrom
feuerste:parallel_mertens
Dec 21, 2022
Merged

Parallelize implementation of HDR MergeMertens.#22959
alalek merged 14 commits intoopencv:4.xfrom
feuerste:parallel_mertens

Conversation

@feuerste
Copy link
Copy Markdown
Contributor

@feuerste feuerste commented Dec 14, 2022

With this PR the process function for MergeMertens is parallelized. Also performance tests for all HDR methods are added as well as a determinism test for the parallel version of MergeMertens.

Running the photo performance tests locally on my workstation before this PR:

[----------] 3 tests from HDR
[ RUN      ] HDR.Mertens
[ PERFSTAT ]    (samples=13   mean=216.34   median=216.96   min=204.84   stddev=6.12 (2.8%))
[       OK ] HDR.Mertens (2949 ms)
[ RUN      ] HDR.Debevec
[ PERFSTAT ]    (samples=13   mean=69.54   median=69.15   min=67.06   stddev=1.77 (2.5%))
[       OK ] HDR.Debevec (1016 ms)
[ RUN      ] HDR.Robertson
[ PERFSTAT ]    (samples=13   mean=50.11   median=50.38   min=47.92   stddev=1.28 (2.6%))
[       OK ] HDR.Robertson (763 ms)
[----------] 3 tests from HDR (4728 ms total)

Running with the parallized code of this PR:

[ RUN      ] HDR.Mertens
[ PERFSTAT ]    (samples=100   mean=48.72   median=47.81   min=40.61   stddev=3.24 (6.6%))
[       OK ] HDR.Mertens (5144 ms)

Also all new files have been clang-formatted with https://github.com/opencv/opencv/wiki/.clang_format and the fix proposed in #22960.

Best to be reviewed with Hide whitespace because of the new indentations caused by the surrounding parallel_for_ loops.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 14, 2022

Also all edited files have been clang-formatted

Please don't do that for already existed files. This clang-format is recommendation for new files only.
It complicates review due to massive bunch of unnecessary changes (and hiding whitespace changes doesn't really help completely)
It breaks merges between branches.
Some discussions were in this thread: #12588

BTW, you may format added/modified lines.

@feuerste
Copy link
Copy Markdown
Contributor Author

Also all edited files have been clang-formatted

Please don't do that for already existed files. This clang-format is recommendation for new files only. It complicates review due to massive bunch of unnecessary changes (and hiding whitespace changes doesn't really help completely) It breaks merges between branches. Some discussions were in this thread: #12588

Thanks @alalek, I removed the clang formats for merge.cpp.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update!

Comment on lines +7 to +9
static vector<float> DEFAULT_VECTOR;
void loadExposureSeq(const std::string& list_filename, vector<Mat>& images,
vector<float>& times = DEFAULT_VECTOR)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like tests affects each other through the global variable.

It is better to drop times parameter if it is not used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the static vector and return a helper struct now.


Laplacian(gray, contrast, CV_32F);
contrast = abs(contrast);
images[i].convertTo(images[i], CV_32F, 1.0f/255.0f);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep using img instead of multiple images[i]?

With declaring it in this way:

Mat& img = images[i];

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

}
int maxlevel = static_cast<int>(logf(static_cast<float>(min(size.width, size.height))) / logf(2.0f));
std::vector<Mat> res_pyr(maxlevel + 1);
std::vector<std::vector<Mat>> img_pyrs(images.size(), std::vector<Mat>(maxlevel + 1));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the memory consumption "x-factor"?

Copy link
Copy Markdown
Contributor Author

@feuerste feuerste Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory consumption increases by the number of images * pyramid size, which should be ok for most use cases, where only a couple of images are used to create an HDR image, but in the worst case of course can explode. I rewrote the code to use a mutex/condition_variable instead, which showed only a very small overhead (~3ms slower) in my local tests. This way we are back to the original memory consumption.

Copy link
Copy Markdown
Contributor Author

@feuerste feuerste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your review! I addressed all your comments.

Comment on lines +7 to +9
static vector<float> DEFAULT_VECTOR;
void loadExposureSeq(const std::string& list_filename, vector<Mat>& images,
vector<float>& times = DEFAULT_VECTOR)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the static vector and return a helper struct now.


Laplacian(gray, contrast, CV_32F);
contrast = abs(contrast);
images[i].convertTo(images[i], CV_32F, 1.0f/255.0f);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

}
int maxlevel = static_cast<int>(logf(static_cast<float>(min(size.width, size.height))) / logf(2.0f));
std::vector<Mat> res_pyr(maxlevel + 1);
std::vector<std::vector<Mat>> img_pyrs(images.size(), std::vector<Mat>(maxlevel + 1));
Copy link
Copy Markdown
Contributor Author

@feuerste feuerste Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory consumption increases by the number of images * pyramid size, which should be ok for most use cases, where only a couple of images are used to create an HDR image, but in the worst case of course can explode. I rewrote the code to use a mutex/condition_variable instead, which showed only a very small overhead (~3ms slower) in my local tests. This way we are back to the original memory consumption.

@feuerste
Copy link
Copy Markdown
Contributor Author

@alalek I think I addressed all your comments. Can you please take another look?

Comment on lines +236 to +240
std::unique_lock<std::mutex> lock(weight_sync.mutex);
weight_sync.cond.wait(lock, [&]{ return weight_sync.index == i; });
weight_sync.sum += weights[i];
weight_sync.index++;
weight_sync.cond.notify_all();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight_sync.index == i

Do we really need to preserve summation operations order?

If no, then please use cv::AutoLock and cv::Mutex instead (to keep working OPENCV_DISABLE_THREAD_SUPPORT mode)

Also manual usage of cond variables is not exception-safe.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to preserve summation operations order?

I wasn't sure what the policy for OpenCV is, so my first attempt was to preserve deterministic results.

If no, then please use cv::AutoLock and cv::Mutex instead (to keep working OPENCV_DISABLE_THREAD_SUPPORT mode)

If fine with the policy, then we can of course also remove the condition variable and use OpenCV's Mutex/AutoLock. I changed it accordingly, PTAL!
I adjusted the python test_umat_merge_mertens though to only use one thread for testing, as we want to have determinism for the summation here.

@feuerste
Copy link
Copy Markdown
Contributor Author

@alalek Is there anything else I can do to support your review? Thanks a lot for your support!

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thank you for the contribution!

@feuerste
Copy link
Copy Markdown
Contributor Author

LGTM +1 Thank you for the contribution!

Thanks a lot @alalek ! Unfortunately I'm not permitted to merge, only authorized users are. Whom can I contact for the merge?

@alalek alalek merged commit bc8d494 into opencv:4.x Dec 21, 2022
@alalek alalek mentioned this pull request Jan 8, 2023
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
Parallelize implementation of HDR MergeMertens.

* Parallelize MergeMertens.

* Added performance tests for HDR.

* Ran clang-format.

* Optimizations.

* Fix data path for Windows.

* Remove compiiation warning on Windows.

* Remove clang-format for existing file.

* Addressing reviewer comments.

* Ensure correct summation order.

* Add test for determinism.

* Move result pyramid into sync struct.

* Reuse sync for first loop as well.

* Use OpenCV's threading primitives.

* Remove cout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants