Mcc add perf tests improve performance#3699
Conversation
modules/mcc/src/utils.hpp
Outdated
| const int num_elements = (int)src.total()*channel; | ||
| const double *psrc = (double*)src.data; | ||
| double *pdst = (double*)dst.data; | ||
| const int batch = 128; |
There was a problem hiding this comment.
This "batch" optimization improves performance in Windows
There was a problem hiding this comment.
Which are common values of num_elements? We can make batch dependent on number of threads:
const int batch = num_elements / max(1, getNumThreads());or
const int batch = num_elements / (getNumThreads() > 1 ? getNumThreads() * 4 : 1);instead of 4 you may choose another constant to get batch=128 in you configuration.
There was a problem hiding this comment.
In your second sample I got the same performance (47 ms) with a constant of 1024.
const int batch = std::max(1, getNumThreads() > 1 ? num_elements / (1024*getNumThreads()) : num_elements);
// if getNumThreads() == 1 -> batch = num_elements
In your first sample const int batch = num_elements / max(1, getNumThreads()); a regression in performance appears (from 47 ms to 57 ms).
There was a problem hiding this comment.
I would suggest using batch 128, but your second sample would also work.
There was a problem hiding this comment.
Batch - the minimum required number of consecutive elements in an array that a thread can process at one time.
b77f40d to
8ca90eb
Compare
8ca90eb to
5b829da
Compare
Added perf tests to mcc module.
Also these optimizations have been added:
parallel_for_toperformThreshold()toL/fromLand addeddstto avoid copy dataparallel_for_toelementWise()("batch" optimization improves performance of Windows version, Linux without changes).Configuration:
Ryzen 5950X, 2x16 GB 3000 MHz DDR4
OS: Windows 10, Ubuntu 20.04.5 LTS
Performance results in milliseconds:
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.