G-API: oneVPL (simplification) added CPU, DX11(fallback CPU) accels & surface pool#20727
Conversation
| for (auto& pair : pool_table) { | ||
| pair.second.clear(); | ||
| // do not free key here: last surface will release it | ||
| } | ||
| pool_table.clear(); |
There was a problem hiding this comment.
Doesn't this happen automatically?
There was a problem hiding this comment.
it does, I wanted to make sure that all pointers are released straight before print "destroyed".
But later the next situation was emerged when some data consumer was keeping referenced surface in MediaFrame. So clear for pool_table here doesn't means that all pointers released until last consumer takes MediaFrame.
So "destoyed" doesn't mean free and this row might be removed
There was a problem hiding this comment.
So clear for pool_table here doesn't means that all pointers released until last consumer takes MediaFrame.
Sorry but all this object lifetime thing is still not very clear to me. Can you please explain in detail how these different objects relate?
There was a problem hiding this comment.
yes, sure:
Accelerator created shared_ptr<void> for preallocated_memory and used its raw ptr as an unique key for a pool in pools table. Let's assume that "pool" here is a simple std::vector of "surface" ( actually shared_ptr<surface>). So
nor pool owns preallocated memory nor 'Accelerator.
as oneVPL mentioned - it's important for CPU that all surfaces shares continuous memory allocation (with different offset). According to ownership idea, this continuous memory area must exist until the last surface destroyed. This continuous area is preallocated memory which passed to every surface as shared_ptr argument during surface constructions. So when last surface dies it will release last shared_ptr for preallocated memory
Surfaces as shared_ptr<surface> remembered in pool ( std::vector or CachedPool) in pools table in Accelerator.
Additionally, surface as shared_ptr uses in MediaFrame construction which also shared its ownership. So Accellerator has shared ownership for surface through pool and MediaFrame shares surface ownership as aggregation in class.
If MediaFrames (usually in another thread) was destroyed before Accelerator then final shared_ptr<surface> release would happen in Accelerator in its dtor. After last surface shared_ptr release it will release last instance of shared_ptr for preallocated_memory (Normal shutdown)
If Accelerator dies first right before MediaFrame (living in another thread) then Accelerator just decrease surface reference counting during pool destruction. The last MediaFrame would keep last shared_ptr<surface > in its own thread. When it dies then last surface reference counter will reach 0 and surface destroyes preallocated_memory too (Delayed shutdown).
P.S. This problem is more acute for DX11 pool, because surface need living Accelerator object for destruction.... I'll check it in the DX11 PRs more precisely
P.P.S. Additionally, problem for "pool reallocation" is exposed here
it's important for CPU that all surfaces shares continuous memory allocation (with different offset).
I cannot simply reallocate preallocated_memory for bigger size in runtime, because shared_ptr<surfaces> (which owns this preallocated_memory) may or may not be assigned to MediaFrames in different threads and still be in-use. to solve this problem I need make COPY for all depended surfaces involved in VPL decode (and not assinged as MediaFrames) at prev step (because they may be dependendt as I/B frames) and realloc new pool with new key. So gready-pool is unavailable at now, and may require additional investigation (orphant MediaFrames)
| #endif | ||
|
|
||
| if (!preallocated_pool_memory_ptr) { | ||
| throw std::runtime_error("VPLCPUAccelerationPolicy::create_surface_pool - failed: not enough memory." |
There was a problem hiding this comment.
Would it throw this now on Linux by default?
There was a problem hiding this comment.
good question... I aimed to write it for WIN-only as discussed in feature request. But most part of other code was written without WIN-specific details and I assume it would work on Linux too. But it hasn't been tested
So there are two possible way:
-Spent time to test/fix Linux incompatible now
-or put message for Windows-only on source creation.
I suggest the second one. But! it maybe make sense to ask another team member for trying to set it up under Linux (with removing such incompatibles) to keep sharing code expertise. @dmatveev, what do you think?
There was a problem hiding this comment.
+1 to the second one, and I am definitely sure we'll have to keep Linux in mind - maybe in the backlog
There was a problem hiding this comment.
Issue was added in backlog
| GAPI_LOG_INFO(nullptr, "Released workspace memory: " << ptr); | ||
| ptr = nullptr; | ||
| #else | ||
| abort(); //not implemented |
There was a problem hiding this comment.
abort() is a wrong way to indicate error for a library I believe - but it seems it can't be an exception here (due to destructor context) - right?
There was a problem hiding this comment.
agree, i suggest to put something up like
#else
GAPI_Assert(false && "Not implemented for systems differ than \"_WIN32\". "
"Please feel free to set it up under OPENCV contribution policy");
#endif
| try { | ||
| for (; i < pool_size; i++) { | ||
| size_t preallocated_mem_offset = static_cast<size_t>(i) * surface_size_bytes; | ||
|
|
||
| surface_ptr_t surf_ptr = creator(workspace_mem_owner, | ||
| preallocated_mem_offset, | ||
| preallocated_raw_bytes); | ||
| pool.push_back(std::move(surf_ptr)); | ||
| } | ||
| } catch (const std::exception& ex) { | ||
| throw std::runtime_error(std::string("VPLCPUAccelerationPolicy::create_surface_pool - ") + | ||
| "cannot construct surface index: " + std::to_string(i) + ", error: " + | ||
| ex.what() + | ||
| "Requested surface count: " + std::to_string(pool_size) + | ||
| ", surface bytes: " + std::to_string(surface_size_bytes)); | ||
| } |
There was a problem hiding this comment.
Do we really need this handling here? What throws here in particular - is it our code or the external one?
Any exception handling within our module complicates the future debug (or I miss something)
There was a problem hiding this comment.
From my perspective his method creates_surface_pool has the same semantics as operator new which normally throws exception std::bad_alloc to make for caller to take decision: decrease requested memory size or crash.
later we will introduce external params for configured pool size in optimization purposes (preallocated frames count due to SafeQueue size to avoid decoder stall when waiting till drawing thread or other consumer releases MediaFrames) and customization might required
Any exception handling within our module complicates the future debug
Not sure:
-
in the next PRs this part would be handled by caller side in
initialize_session
https://github.com/sivanov-work/opencv/pull/4/files#diff-4abf50e2ded39bdb37c29822533abed9cd6395cc8b9826c667e3cd10af59c9b0R204
then
it would be handled by SourcePriv here
https://github.com/sivanov-work/opencv/pull/5/files#diff-d1526c45e93418aa18b64502a4e01cc25151be376b891015a7d9049ec876fcddR216
which would made all part for gracefull deinitialization: session close, mxf unload and so on with GAPI_LOG_WARNING printing -
go on with customization (prev paragrraph)
-
"unit-testsable" for multiple scenarious
There was a problem hiding this comment.
ok let's see in the future PRs
modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.hpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp
Outdated
Show resolved
Hide resolved
| next_free_it = surfaces.begin(); | ||
| } | ||
|
|
||
| CachedPool::surface_ptr_t CachedPool::find_free() { |
There was a problem hiding this comment.
Is this pool supposed to be thread safe or the protection is done outside?
There was a problem hiding this comment.
Pool considered as single-threaded. It contains constantly defined surface count and finds a first "free" one.
each pool is consumed (by key in accelerator table) by only one MFX session by requesting a "free" surface for the next decode frame portion.
Each MFX session makes async request to VPL for decode frame and provides "free" session from exclusive pool object (by key) . Looks like we can use only single thread to use up VPL decoder (CONSUMER thread).
"free" means LockedCounter == 0 in MFX surface inner structure it also means that this surface is not using by VPL decoder and we can use it for another decode or read-write it from application code. So Pool returns "free" surface if
- it is not currently used by VPL decode
and...
Surface, as a data provider must be aggregated, by MediaFrame to allow read or write to it by application after MediaFrame::access, especially in another threads. So, when we create MediaFrame from fully decoded surface - the surface itself may or may not get LockedCounter ==0 (not 0 for I and B frames), but we additionally increment application atomic locked counter - ApplicationCounter. It happens in the CONSUMER thread after frame is decoded.
So created MediaFrame (LockedCounter ==0 or LockedCounter != 0 AND ApplicationCounter==1) just returned as data for another thread/threads.
So "free" means LockedCounter ==0 AND ApplicationCounter==0 in result
...and Pool returns "free" surface if
- and ApplicationCounter==0 too. Both counters must be 0
This ensure guarantees that pool doesn't return surface to decode (for CONSUMER thread) if it is used in MediaFrame in another threads.
Right after another thread destroy MediaFrame then ApplicationCounter decreased to 0 which means full "free" surface and pool may returns it for CONSUMER thread.
Classic solution for pool was to keep a queue and make pop in and insert back by critical section. In current case we just use lightweight atomic.load() for the next surface in list to make sure that it is free by application. But with single consume limitation (which is applicable here for decode model)
There was a problem hiding this comment.
MediaFrames may be destroyed concurrently, is it ok for this implementation?
There was a problem hiding this comment.
yes, it is ok. It just decreased surface shared counter as shared_ptr<surface> and ApplicationCounter inside surface ( which helps surfrace became free in pool)
modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.hpp
Outdated
Show resolved
Hide resolved
| for (auto& pair : pool_table) { | ||
| pair.second.clear(); | ||
| // do not free key here: last surface will release it | ||
| } | ||
| pool_table.clear(); |
There was a problem hiding this comment.
it does, I wanted to make sure that all pointers are released straight before print "destroyed".
But later the next situation was emerged when some data consumer was keeping referenced surface in MediaFrame. So clear for pool_table here doesn't means that all pointers released until last consumer takes MediaFrame.
So "destoyed" doesn't mean free and this row might be removed
| #endif | ||
|
|
||
| if (!preallocated_pool_memory_ptr) { | ||
| throw std::runtime_error("VPLCPUAccelerationPolicy::create_surface_pool - failed: not enough memory." |
There was a problem hiding this comment.
good question... I aimed to write it for WIN-only as discussed in feature request. But most part of other code was written without WIN-specific details and I assume it would work on Linux too. But it hasn't been tested
So there are two possible way:
-Spent time to test/fix Linux incompatible now
-or put message for Windows-only on source creation.
I suggest the second one. But! it maybe make sense to ask another team member for trying to set it up under Linux (with removing such incompatibles) to keep sharing code expertise. @dmatveev, what do you think?
| GAPI_LOG_INFO(nullptr, "Released workspace memory: " << ptr); | ||
| ptr = nullptr; | ||
| #else | ||
| abort(); //not implemented |
There was a problem hiding this comment.
agree, i suggest to put something up like
#else
GAPI_Assert(false && "Not implemented for systems differ than \"_WIN32\". "
"Please feel free to set it up under OPENCV contribution policy");
#endif
| try { | ||
| for (; i < pool_size; i++) { | ||
| size_t preallocated_mem_offset = static_cast<size_t>(i) * surface_size_bytes; | ||
|
|
||
| surface_ptr_t surf_ptr = creator(workspace_mem_owner, | ||
| preallocated_mem_offset, | ||
| preallocated_raw_bytes); | ||
| pool.push_back(std::move(surf_ptr)); | ||
| } | ||
| } catch (const std::exception& ex) { | ||
| throw std::runtime_error(std::string("VPLCPUAccelerationPolicy::create_surface_pool - ") + | ||
| "cannot construct surface index: " + std::to_string(i) + ", error: " + | ||
| ex.what() + | ||
| "Requested surface count: " + std::to_string(pool_size) + | ||
| ", surface bytes: " + std::to_string(surface_size_bytes)); | ||
| } |
There was a problem hiding this comment.
From my perspective his method creates_surface_pool has the same semantics as operator new which normally throws exception std::bad_alloc to make for caller to take decision: decrease requested memory size or crash.
later we will introduce external params for configured pool size in optimization purposes (preallocated frames count due to SafeQueue size to avoid decoder stall when waiting till drawing thread or other consumer releases MediaFrames) and customization might required
Any exception handling within our module complicates the future debug
Not sure:
-
in the next PRs this part would be handled by caller side in
initialize_session
https://github.com/sivanov-work/opencv/pull/4/files#diff-4abf50e2ded39bdb37c29822533abed9cd6395cc8b9826c667e3cd10af59c9b0R204
then
it would be handled by SourcePriv here
https://github.com/sivanov-work/opencv/pull/5/files#diff-d1526c45e93418aa18b64502a4e01cc25151be376b891015a7d9049ec876fcddR216
which would made all part for gracefull deinitialization: session close, mxf unload and so on with GAPI_LOG_WARNING printing -
go on with customization (prev paragrraph)
-
"unit-testsable" for multiple scenarious
| next_free_it = surfaces.begin(); | ||
| } | ||
|
|
||
| CachedPool::surface_ptr_t CachedPool::find_free() { |
There was a problem hiding this comment.
Pool considered as single-threaded. It contains constantly defined surface count and finds a first "free" one.
each pool is consumed (by key in accelerator table) by only one MFX session by requesting a "free" surface for the next decode frame portion.
Each MFX session makes async request to VPL for decode frame and provides "free" session from exclusive pool object (by key) . Looks like we can use only single thread to use up VPL decoder (CONSUMER thread).
"free" means LockedCounter == 0 in MFX surface inner structure it also means that this surface is not using by VPL decoder and we can use it for another decode or read-write it from application code. So Pool returns "free" surface if
- it is not currently used by VPL decode
and...
Surface, as a data provider must be aggregated, by MediaFrame to allow read or write to it by application after MediaFrame::access, especially in another threads. So, when we create MediaFrame from fully decoded surface - the surface itself may or may not get LockedCounter ==0 (not 0 for I and B frames), but we additionally increment application atomic locked counter - ApplicationCounter. It happens in the CONSUMER thread after frame is decoded.
So created MediaFrame (LockedCounter ==0 or LockedCounter != 0 AND ApplicationCounter==1) just returned as data for another thread/threads.
So "free" means LockedCounter ==0 AND ApplicationCounter==0 in result
...and Pool returns "free" surface if
- and ApplicationCounter==0 too. Both counters must be 0
This ensure guarantees that pool doesn't return surface to decode (for CONSUMER thread) if it is used in MediaFrame in another threads.
Right after another thread destroy MediaFrame then ApplicationCounter decreased to 0 which means full "free" surface and pool may returns it for CONSUMER thread.
Classic solution for pool was to keep a queue and make pop in and insert back by critical section. In current case we just use lightweight atomic.load() for the next surface in list to make sure that it is free by application. But with single consume limitation (which is applicable here for decode model)
modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp
Outdated
Show resolved
Hide resolved
| [](const surface_ptr_t& val) { | ||
| GAPI_DbgAssert(val && "Pool contains empty surface"); | ||
| return !val->get_locks_count(); | ||
| }); |
There was a problem hiding this comment.
circular buffer imitation
modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.hpp
Outdated
Show resolved
Hide resolved
dmatveev
left a comment
There was a problem hiding this comment.
ok to me, stuff can be fixed with the next set of PRs
| //TODO Remove the next MACRO right after DX11 implementation | ||
| #define CPU_ACCEL_ADAPTER |
There was a problem hiding this comment.
ok, let's avoid this macro-defined behavior in the future (if I get what's being addressed there)
| next_free_it = it; | ||
| ++next_free_it; | ||
|
|
||
| return *it; |
There was a problem hiding this comment.
Formatting may look strange
|
@alalek or @mshabunin could you please merge it? |
G-API: oneVPL (simplification) added CPU, DX11(fallback CPU) accels & surface pool * Add CPU, DX11(fallback CPU) accels & surface pool * Fix build for surface_pool * Apply some comments * Fix indentation
this PR is the single one in series of #20469
Added VPL CPU accelerator, DX11 ( fallback CPU) accelerator & surface pool
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.
Build configuration