-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
higher CV_PAUSE cost on skylake #22852
Description
System Information
OpenCV version: 4.6.0
Operating System / Platform: Custom Linux
Compiler & compiler version: Custom clang
Detailed description
On Intel architectures, CV_PAUSE is implemented with __mm_pause:
opencv/modules/core/src/parallel_impl.cpp
Line 47 in 6ca205a
| # define CV_PAUSE(v) do { for (int __delay = (v); __delay > 0; --__delay) { _mm_pause(); } } while (0) |
But it is called with the same number of loops independently from the architecture:
opencv/modules/core/src/parallel_impl.cpp
Line 393 in 6ca205a
| CV_PAUSE(16); |
And the cost of __mm_pause went from 5 micro-ops on Haswell to 140 on Skylake thus creating more CPU consumption from the Threadpool on Skylake.
This is documented (as well as a workaround) here: https://www.intel.com/content/www/us/en/developer/articles/technical/a-common-construct-to-avoid-the-contention-of-threads-architecture-agnostic-spin-wait-loops.html
Steps to reproduce
Profiling any multi-threaded code on Haswell and then Skylake.
Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- I updated to the latest OpenCV version and the issue is still there
- There is reproducer code and related data files (videos, images, onnx, etc)