Skip to content

core fastMalloc(), fastFree() memory allocators not thread-safe on startup #19004

@diablodale

Description

@diablodale

The core memory allocation routines fastMalloc() and fastFree() are not thread-safe on startup due to a bug in a dependent function isAlignedAllocationEnabled() called by both.

I have a PR to fix this. Earlier attempts to fix memory allocator startup behavior like issue #15691 and its PR are not themselves thread-safe. I request the team review my analysis below. When we agree the code is errant and undesired behavior, I will submit my fix PR. 👍

I found this issue when I implemented native aligned memory on Windows. The fix for this issue is distinct from the native Windows aligned memory feature update. However, there is benefit both in reliability and speed to deploy this fix before native Windows aligned memory.

This is a fail of the "static initialization order problem" https://isocpp.org/wiki/faq/ctors#static-init-order and can be fixed with the "use construct on first use idiom" https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use

System information (version)

  • OpenCV => 4.5.0 or newer like master e726ff3
  • Operating System / Platform => Windows 10 64-Bit
  • Compiler => Visual Studio Community 2019 v16.8.2

Steps to reproduce

It is more reliable to reproduce through code review. This is due to thread race conditions and that the errant code has few bad side-affects on Linux. On Windows it will crash (but you would need my native aligned memory update to expose this issue).

Below I provide two scenarios and walk readers through them. Important things to remember...

  • C++ compilation units (.cpp files) are initialized at runtime in an undetermined order. There is no promise by the C++ runtime for order.
  • The thread names I give below do not imply any order. Thread A does not necessarily start before Thread B.
  • These threads are running in a multi-threaded OS on a CPU that has multiple cores which allows for simultaneous thread activity. This is normal for all modern OS and CPUs.
  • The rows do imply sequences. Top to bottom is the passage of time. Activities on the same row are happening at exactly the same time. This is possible because of multiple CPU cores on a multi-threaded OS.
  • The errant code is in the file modules\core\src\alloc.cpp. Refer to that file to follow the scenarios below.
  • For these scenarios, I declare that the runtime did not chose alloc.cpp compilation unit to initialize first. This means that static const bool g_force_initialization_memalign_flag = isAlignedAllocationEnabled(); does not run in these scenarios.

Scenario 1

Time thread A thread B
1 fastMalloc()
2   isAlignedAllocationEnabled()
3 fastMalloc()  static bool initialized = false
4 isAlignedAllocationEnabled() static bool useMemalign = true
5 if (!initialized) if (!initialized)
6 initialized = true initialized = true
7 useMemalign = readMemoryAlignmentParameter(); useMemalign = readMemoryAlignmentParameter();
8

You can see that there are two threads that are both in the initialization block of code. This is itself undesired. Next, both threads are calling readMemoryAlignmentParameter() and its many dependent functions. It is technically possible for readMemoryAlignmentParameter() and its dependencies to return different results. And therefore, it is possible useMemalign has different values in memory. Even the timing of the write to the memory location of useMemalign could lead to one thread overwriting the other thread's value.

This is a coding error that probably has few bad side-affects. readMemoryAlignmentParameter() probably has no problems being called at the same time with the same parameters, and probably returns the same value. It is unlikely the environment variable it is reading is changed by a third thread while threads A and B are reading it. So in this scenario there is duplicate work but likely no critically bad side-affects.

Scenario 2

time thread C thread D
1 fastMalloc()
2   isAlignedAllocationEnabled()
3   static bool initialized = false
4   static bool useMemalign = true
5 fastMalloc() if (!initialized)
6 isAlignedAllocationEnabled() initialized = true
7 if (!initialized) thread delayed
8 return useMemalign true thread delayed
9 … memory is allocated using the aligned codepath thread delayed
10 useMemalign = readMemoryAlignmentParameter() false due to conditions or envvar
11 return useMemalign false
12 … memory is allocated using the un-aligned codepath
13 fastFree() fastFree()
14 if (isAlignedAllocationEnabled()) this is false if (isAlignedAllocationEnabled()) this is false
15 … 💥unalignedfree(aligned memory pointer) … unalignedfree(un-aligned memory pointer)

This scenario has bad side-affects and potentially program crashes. The current codebase often hides this errant code because free() is used both for native-aligned and native-unaligned memory. Also, the timing (like scenario 1) may be such that the fallback aligning index[-1] approach isn't allocating/free incompatible blocks of memory while initialization is occuring. But this more likely to fail on Windows because Windows has distinct functions free() and _aligned_free().

Follow this...

  • time 8: thread C gets true from isAlignedAllocationEnabled() because thread D set useMemalign=true and has not yet completed the initialization block that will eventually set it to false
  • time 9: thread C gets memory allocated using the aligned codepath
  • time 11: thread D gets false from isAlignedAllocationEnabled()
  • time 12: thread D gets memory allocated using un-aligned codepath
  • time 14: both threads get false from isAlignedAllocationEnabled()
  • time 15: both threads use the un-aligned memory deallocator

The fault occurs at time 15.
Thread C uses the un-aligned memory deallocator on aligned memory. ☹
Thread D uses the un-aligned memory deallocator on un-aligned memory.

Fix

The fix is to use the classic "Construct On First Use Idiom" as documented by our ISO friends at https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use. The specific approach in my PR might also be unmeasurably faster as it has fewer if() tests and fewer variables.

Issue submission checklist

  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues,
    answers.opencv.org, Stack Overflow, etc and have not found solution
  • I updated to latest OpenCV version and the issue is still there
  • There is reproducer code and related data files: videos, images, onnx, etc

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions