Skip to content

add windows native aligned malloc + unit test case#19195

Merged
opencv-pushbot merged 1 commit intoopencv:masterfrom
diablodale:win32AlignAlloc
Dec 23, 2020
Merged

add windows native aligned malloc + unit test case#19195
opencv-pushbot merged 1 commit intoopencv:masterfrom
diablodale:win32AlignAlloc

Conversation

@diablodale
Copy link
Copy Markdown
Contributor

Implements #19147 by adding native Win32 memory alignment to the opencv core memory allocator and test case. Test data collected prior to PR suggests allocation might be 2% slower. Core maintainer prefers this code be considered for merge. Refer to issue for much detail on accuracy and performance data.

Pull Request Readiness Checklist

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Thank you for contribution 👍

TEST(Core_Allocation, alignedAllocation)
{
// iterate from size=1 to approximate byte size of 8K 32bpp image buffer
for (int i = 0; i < 200; i++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, some external container outside of for loop was critical in GLIBC case to catch memory fragmentation problem.

std::vector<std::string> external_container;
for (...)
{
    ...
    external_container.emplace_back("test");
    cv::fastFree(buf);
}

(or with swapped emplace_back()/fastFree() lines - both patterns can be used in applications)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separately tested for fragmentation. Please see the issue for detailed analysis.

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 22, 2020

This patch should go into 3.4 branch first.
We will merge changes from 3.4 into master regularly (weekly/bi-weekly).

Please:

  • change "base" branch of this PR: master => 3.4 (use "Edit" button near PR title)
  • rebase your commits from master onto 3.4 branch. For example:
    git rebase -i --onto upstream/3.4 upstream/master
    (check list of your commits, save and quit (Esc + "wq" + Enter)
    where upstream is configured by following this GitHub guide and fetched (git fetch upstream).
  • push rebased commits into source branch of your fork (with --force option)

Note: no needs to re-open PR, apply changes "inplace".

@diablodale
Copy link
Copy Markdown
Contributor Author

This patch should go into 3.4 branch first.
We will merge changes from 3.4 into master regularly (weekly/bi-weekly).

Please:

  • change "base" branch of this PR: master => 3.4 (use "Edit" button near PR title)
  • rebase your commits from master onto 3.4 branch. For example:
    git rebase -i --onto upstream/3.4 upstream/master
    (check list of your commits, save and quit (Esc + "wq" + Enter)
    where upstream is configured by following this GitHub guide and fetched (git fetch upstream).
  • push rebased commits into source branch of your fork (with --force option)

Note: no needs to re-open PR, apply changes "inplace".

Won't be able to do that. Several issues:

  1. This PR is 4.5 and greater only. It will not work on 3.x. It requires the earlier PR to the 4.x branch that was merged that fixes the startup behavior in this translation unit. It absolutely 100% will not work in 3.x
  2. All the testing, performance, and fragmentation analysis I have done was using the 4.x commit that is listed in the issue. Those test results and analysis can not be used to make decisions about 3.x.
  3. OpenCV 3.x is bugfix only. That is everywhere in official documentation. This PR is a new feature and therefore for the 4.x branch.

Usually, I will not be submitting PRs for the legacy 3.x branch. All the things I do (code, test, analysis, etc.) will be on the current 4.x branch. I do not have the bandwidth to do all the work with a broken 22 year old standard in a branch declared as bugfix-only. I understand you are a maintainer of this project. And I'm a contributor. We have to have a middle-ground where contributions work for us both. 🤝

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 22, 2020

OK, I will backport this separately after merge.

* implements #19147
* CAUTION: this PR will only functions safely in the
  4+ branches that already include PR 19029
* CAUTION: this PR requires thread-safe startup of the alloc.cpp
  translation unit as implemented in PR 19029
@diablodale
Copy link
Copy Markdown
Contributor Author

You wrote

OK, I will backport this separately after merge.

I discourage and caution you not to do this. Why? It is unsafe and you risk Windows crashes.

The native Windows aligned APIs used in this PR are unsafe until the thread-safe startup bugs of the core opencv allocator are fixed. One collection of these bugs are #19004. It is a complex bug to fix correctly if you don't use C++11. You would need to create critical sections and/or mutex to protect in needed areas, and put cascaded try/catch handlers throughout the codepath. And all that has to be done very carefully. It will likely reduce performance.

Only after those multiple unsafe (thread and exception unsafe) allocator bugs are fixed using C++98, can this PR's code then be applied. If you don't fix those unsafe bugs, then you risk a hard crash of Windows applications. Windows will not allow a mix of deallocators free and _aligned_free. Linux is different, it only uses free.

We have a saying in English, "Let sleeping dogs lie." 🐶 It means: if there isn't a current problem...and a change is challenging...it is wise to not make a change. I believe that applies with opencv 3.x.

The existing opencv 3.x codebase on windows does not have the startup bug 19004. This is because that 3.x codebase uses ifdef on Windows to remove all the buggy code, uses fixed branches, and uses the handmade opencv alignment function. All of this works. And doesn't have any startup issues. I believe the complexity and challenge to fix 19004 in C++98 just to get this PR is not wise.

@opencv-pushbot opencv-pushbot merged commit cd68cc1 into opencv:master Dec 23, 2020
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport is needed Label for maintainers. Authors of PR can ignore this category: core feature platform: win32

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proposal to add native Windows aligned memory apis --> little benefit

3 participants