Fix slower CV_PAUSE on SkyLake and above.#22966
Conversation
|
This is fixing opencv#22852
b36fdf3 to
b7b08fa
Compare
| # define CV_PAUSE(v) do { for (int __delay = (v); __delay > 0; --__delay) { _mm_pause(); } } while (0) | ||
| // 5 * v is meants for backward compatibility: with pre-Skylake CPUs, _mm_pause took 4 or 5 cycles. | ||
| // With post-Skylake CPUs, _mm_pause takes 140 cycles. | ||
| # define CV_PAUSE(v) do { const uint64_t __delay = 5 * v; uint64_t __init = __rdtsc(); do { _mm_pause(); } while ((__rdtsc() - __init) < __delay); } while (0) |
There was a problem hiding this comment.
RDTSC latency is about 25-35 cycles.
@vrabaud Alternative is ignoring of v parameter and run _mm_pause() once. If it works for you, then we could do that.
BTW, TBB code doesn't suggest anything new (still uses euristic constants): https://github.com/search?q=repo%3Aoneapi-src%2FoneTBB%20machine_pause&type=code
There was a problem hiding this comment.
We could indeed use _mm_pause once, that would save some cycles (I thought RDTSC was 18 cycles) BUT that would not be compatible with older CPUs where _mm_pause is only 4/5 cycles.
Do we have a way to check for SkyLake and above CPUs ? I'll gladly make a patch for that.
There was a problem hiding this comment.
Skylake is here for 7+ years: https://en.wikipedia.org/wiki/Skylake_(microarchitecture)
There is OPENCV_THREAD_POOL_ACTIVE_WAIT_PAUSE_LIMIT runtime parameter which could control total number of pause operations for fine tuning.
There was a problem hiding this comment.
Indeed, so what do you suggest: that I make a new pull request that only does one _mm_pause where defined ?
And for older CPUs, OPENCV_THREAD_POOL_ACTIVE_WAIT_PAUSE_LIMIT should be used ? I am totally fine with that.
This is fixing #22852
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request