add thread-safe startup of fastMalloc and fastFree#19029
add thread-safe startup of fastMalloc and fastFree#19029alalek merged 3 commits intoopencv:masterfrom diablodale:fix19004-memthreadstart
Conversation
|
I ran 100,000 times the test app in the related issue #15691 |
alalek
left a comment
There was a problem hiding this comment.
There was no crash with the code of this PR.
BTW, without this PR mentioned sample doesn't crash too.
(there is one thread until main() entry)
You can explore generated code for both cases here: https://godbolt.org/z/n1nhe6
| @@ -100,25 +100,24 @@ static bool readMemoryAlignmentParameter() | |||
| // TODO add checks for valgrind, ASAN if value == false | |||
There was a problem hiding this comment.
// should not call fastMalloc() internally
Main assumption is here to avoid recursive deadlock.
There was a problem hiding this comment.
This also holds for this PRs code. C++11 standard declares that reentrancy on the same thread that is in a static local initialization results in undefined behavior.
There was a problem hiding this comment.
Actually I didn't use proposed static bool useMemalign = readMemoryAlignmentParameter(); form initially even under #if CV_CXX11 on 3.4 branch.
OpenCV 3.4 supports C++98 but it allows to be compiled in C++11 mode (GCC 7.x+ uses C++14 mode by default).
In OpenCV 3.x there is own cv::String which uses fastMalloc() internally (OpenCV 4.x uses typedef on std::string).
getConfigurationParameterBool() uses cv::String which uses fastAlloc().
Adding scope-based fastMalloc()/fastFree() here causes hang with this PR (on master):
void* p = fastMalloc(1024);
fastFree(p);
(triggers undefined behavior of dynamic initialization)
So this comment about assumption was not correct initially...
- for OpenCV 3.4 now
- there are plans to implement configuration manager (but probably such manager is not usable in this place if it will use
fastMalloc()).
modules/core/src/alloc.cpp
Outdated
| #if defined __GNUC__ | ||
| __attribute__((unused)) | ||
| #endif |
There was a problem hiding this comment.
OK. I'll make the change and spin up tests Monday.
Sure. 👍 I'm highlighting there is no regression. That this PR's code also prevents the crashes that appeared in the sample code from that dev. I would like to have that code as a test case itself and run on every build. Is there some way for the overall test harness to run an arbitrary program 10000 times? I need it to be a separate program, not something merged together like gtest currently does with the test cases in opencv, or how it puts the iterations inside the program instead of outside...I need repeating program startup 🤔 |
|
There is no support for such test cases (also there is no goal to add this feature). |
|
Sidebar joke 😆 the windows builds are all the same. This PR has no effect on Windows. All the variations we see on Windows are unrelated fluctuations... the Why? Because the current OpenCV codebase...on Windows...uses It is the non-Windows test scenarios that are currently interesting. The reason I needed to setup my Ubuntu 18.04 test harness this weekend. |
alalek
left a comment
There was a problem hiding this comment.
Is there easy (and cross-platform) way to detect multiple threads during initialization?
| typedef perf::TestBaseWithParam<MatType> MatDepth_tb; | ||
|
|
||
| PERF_TEST_P(MatDepth_tb, Allocation_Aligned, | ||
| testing::Values(CV_8UC1, CV_16SC1, CV_8UC3, CV_8UC4)) |
There was a problem hiding this comment.
Change is harmless from performance perspective: https://godbolt.org/z/n1nhe6
"Fast code path" is the same (with/without patch) so there is no any measurable performance changes expected.
(some difference may appear only on the measurement of the first sample of the first iteration)
Please remove this perf test from test suite (you can put its code into PR's description under <details> ... </details> block).
There was a problem hiding this comment.
I am not aware of a way to "detect multiple threads during initialization". Because what does "during initialization" mean? Initialization is done by the C++ runtime, not user code. So the user code is not aware of where it is in the overall progress of initialization.
I'm uncomfortable in removing the perf test. Here's my thinking:
- A custom memory allocator like fastMalloc() is a bedrock critical function...a core feature on top of which other code components are used. I see it used 19 times in the OpenCV 4.5 codebase. And it it used in themost critical places like
class Allocator,class Mat, andclass UMat. Therefore, I believe it important to always run performance tests on core components and do it isolated. - This specific performance test helps evaluate forthcoming updates. I have pending the native Windows aligned memory PR. I will need this exact same test in the code. And I need to run the same test cases yet this time to alter the envvar
OPENCV_ENABLE_MEMALIGN. - Performance testing is not a one-time thing. Performance testing is an ongoing process https://www.google.com/search?q=performance+testing+is+an+ongoing+process
There was a problem hiding this comment.
Perhaps leave the perf test cpp, and disable it. Then the dev/testers for the Windows PR, or a fix for the GNULIBC issue is made, they can re-enable it for their testing.
| // do not use variable directly, details: https://github.com/opencv/opencv/issues/15691 | ||
| static const bool g_force_initialization_memalign_flag | ||
| #if defined __GNUC__ | ||
| __attribute__((unused)) | ||
| #endif | ||
| = isAlignedAllocationEnabled(); | ||
|
|
There was a problem hiding this comment.
Please restore this (to avoid issues like #17617 - it is unlikely, but I prefer to keep it)
There was a problem hiding this comment.
I will revert and force push the branch again.
I when changes are fixed, I'll also squash so it is all in one commit. It is my understand that is the preferred process.
There was a problem hiding this comment.
Looks like wrong commit is reverted (about __attribute__((unused))).
This request is to restore global static initializer g_force_initialization_memalign_flag.
| @@ -100,25 +100,24 @@ static bool readMemoryAlignmentParameter() | |||
| // TODO add checks for valgrind, ASAN if value == false | |||
There was a problem hiding this comment.
Actually I didn't use proposed static bool useMemalign = readMemoryAlignmentParameter(); form initially even under #if CV_CXX11 on 3.4 branch.
OpenCV 3.4 supports C++98 but it allows to be compiled in C++11 mode (GCC 7.x+ uses C++14 mode by default).
In OpenCV 3.x there is own cv::String which uses fastMalloc() internally (OpenCV 4.x uses typedef on std::string).
getConfigurationParameterBool() uses cv::String which uses fastAlloc().
Adding scope-based fastMalloc()/fastFree() here causes hang with this PR (on master):
void* p = fastMalloc(1024);
fastFree(p);
(triggers undefined behavior of dynamic initialization)
So this comment about assumption was not correct initially...
- for OpenCV 3.4 now
- there are plans to implement configuration manager (but probably such manager is not usable in this place if it will use
fastMalloc()).
|
I'm unclear something I can read above. I see a section where @alalek writes "Actually I didn't use proposed". |
|
This was just a FYI, why this is not added initially (as part of #15544 to 3.4 branch, under "#if CV_CXX11" macro). |
alalek
left a comment
There was a problem hiding this comment.
I believe we can merge that approach.
Please take a look on comments below.
| @@ -0,0 +1,44 @@ | |||
| #include <array> | |||
| #include "perf_precomp.hpp" | |||
There was a problem hiding this comment.
perf_precomp.hpp
Please put perf_precomp.hpp first.
There was a problem hiding this comment.
I'll reverse and test.
| @@ -0,0 +1,44 @@ | |||
| #include <array> | |||
There was a problem hiding this comment.
Please add short license header: https://github.com/opencv/opencv/wiki/Coding_Style_Guide
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
| typedef perf::TestBaseWithParam<MatType> MatDepth_tb; | ||
|
|
||
| PERF_TEST_P(MatDepth_tb, Allocation_Aligned, | ||
| testing::Values(CV_8UC1, CV_16SC1, CV_8UC3, CV_8UC4)) |
| ::perf::sz5MP, ::perf::sz2K, ::perf::szSmall128, ::perf::szODD, ::perf::szQVGA, \ | ||
| ::perf::szVGA, ::perf::szSVGA, ::perf::sz720p, ::perf::sz1080p, ::perf::sz2160p, \ | ||
| ::perf::sz4320p, ::perf::sz3MP, ::perf::szXGA, ::perf::szSXGA, ::perf::szWQHD, \ | ||
| ::perf::sznHD, ::perf::szqHD |
There was a problem hiding this comment.
BTW, It makes sense to make size as a test parameter and remove "types" (CV_8UC1, CV_16SC1, CV_8UC3, CV_8UC4).
There was a problem hiding this comment.
I don't think I can make this change.
I do want to test all sizes and all types. Unfortunately, the testing harness requires I pass one or more test parameters. And that parameter has a restricted list of possible c++ types. So I chose the CV_Types.
There was a problem hiding this comment.
Oooh. You want that global static variable g_force_initialization_memalign_flag? That doesn't make sense. There is no logic to have it. Its confusing to have it when it does nothing and provides nothing.
If you force me to add it, then I request that I put a comment there that I disagree with it but that you specifically asked for it.
There was a problem hiding this comment.
And there is no "force" or negative from you, that's my poor choice of words. Better is, "if you require this as part of the PR, then I request...".
Here's my suggestion of the comment // need for this static const is disputed; retaining as it doesn't cause any harm
| // do not use variable directly, details: https://github.com/opencv/opencv/issues/15691 | ||
| static const bool g_force_initialization_memalign_flag | ||
| #if defined __GNUC__ | ||
| __attribute__((unused)) | ||
| #endif | ||
| = isAlignedAllocationEnabled(); | ||
|
|
There was a problem hiding this comment.
Looks like wrong commit is reverted (about __attribute__((unused))).
This request is to restore global static initializer g_force_initialization_memalign_flag.
add thread-safe startup of fastMalloc and fastFree * add perf test core memory allocation * fix threading in isAlignedAllocationEnabled() * tweaks requested by maintainer
Fixes program startup unsafe thread behavior of the core memory allocator
fastMalloc()anddeallocator
fastFree()from issue #19004 and adds performance test case.Summary: The code works as expected on below tested platforms. I do not see any
consistent performance gains or losses.
🚨Please note: this PR is only for OpenCV 4 and newer. This fix requires
a C++11 compliant compiler. Do not apply this exact code to the 3.x codebase since
that code might be compiled on an older C++98 compiler. Only C++11 or newer
promises thread-safe initialization of static local variables
https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables.
Such a variable is low cost, will only initialize once, handles exceptions, and
threads block if they try to pass through the same declaration if another thread
is initializing.
Unit functionality testing
It is difficult with the current test harness to unit test this fix in isolation
since the easiest failure would only appear at startup and other failures would
occur randomly seconds, minutes, or hours later. Instead, most OpenCV test cases
will indirectly test this since most allocate memory. If random or widespread
memory violations appear after this PR, it suggests there is an issue that needs
investigation. If anyone has an idea how to reliably unit test this, please contact
me and I'll write the test.
Meanwhile, I have run the core unit tests
opencv_test_core. Test was compiledwith code that includes this fix and the performance test case. All unit tests
passed.
Performance testing
Code changes of this type have very small changes; often in the low micro or innanoseconds.
In an effort to see any patterns of poor performance, 100000 iterations are run and multiple
samples collected. Code was compiled on two separate computers and the test run afterwards with
run.py . -t core --configuration Release --gtest_filter=*Allocation* --perf_min_samples=50Absolute results are not useful as they will different on all machines/vm. Relative results will
slightly vary due to variations of resource availability on the test machine/vm at the time of
the test.
This Ubuntu is running in a Docker within WSL2 running on host Win10.
Code
masteris commit e726ff3 + the performance test case.Code
fixis above code + this fix.Summary is generated with
summary.py *.xmlWindows build
4 tests with the fix, 4 tests with master
Ubuntu build
3 tests with fix, 3 tests with master
Pull Request Readiness Checklist
Patch to opencv_extra has the same branch name.