G-API: oneVPL (simplification) added CPU, DX11(fallback CPU) accels & surface pool by sivanov-work · Pull Request #20727 · opencv/opencv

sivanov-work · 2021-09-21T07:23:27Z

this PR is the single one in series of #20469
Added VPL CPU accelerator, DX11 ( fallback CPU) accelerator & surface pool

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

Build configuration

force_builders=Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
build_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

Xbuild_image:Custom=centos:7
Xbuildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

build_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.2.0
build_image:Custom Mac=openvino-2021.2.0

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
test_opencl:Custom=OFF
test_bigdata:Custom=1
test_filter:Custom=*

modules/gapi/CMakeLists.txt

dmatveev · 2021-09-21T11:23:59Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp

+    for (auto& pair : pool_table) {
+        pair.second.clear();
+        // do not free key here: last surface will release it
+    }
+    pool_table.clear();


Doesn't this happen automatically?

it does, I wanted to make sure that all pointers are released straight before print "destroyed".

But later the next situation was emerged when some data consumer was keeping referenced surface in MediaFrame. So clear for pool_table here doesn't means that all pointers released until last consumer takes MediaFrame.

So "destoyed" doesn't mean free and this row might be removed

So clear for pool_table here doesn't means that all pointers released until last consumer takes MediaFrame.

Sorry but all this object lifetime thing is still not very clear to me. Can you please explain in detail how these different objects relate?

yes, sure:

Accelerator created shared_ptr<void> for preallocated_memory and used its raw ptr as an unique key for a pool in pools table. Let's assume that "pool" here is a simple std::vector of "surface" ( actually shared_ptr<surface>). So
nor pool owns preallocated memory nor 'Accelerator.
as oneVPL mentioned - it's important for CPU that all surfaces shares continuous memory allocation (with different offset). According to ownership idea, this continuous memory area must exist until the last surface destroyed. This continuous area is preallocated memory which passed to every surface as shared_ptr argument during surface constructions. So when last surface dies it will release last shared_ptr for preallocated memory
Surfaces as shared_ptr<surface> remembered in pool ( std::vector or CachedPool) in pools table in Accelerator.
Additionally, surface as shared_ptr uses in MediaFrame construction which also shared its ownership. So Accellerator has shared ownership for surface through pool and MediaFrame shares surface ownership as aggregation in class.

If MediaFrames (usually in another thread) was destroyed before Accelerator then final shared_ptr<surface> release would happen in Accelerator in its dtor. After last surface shared_ptr release it will release last instance of shared_ptr for preallocated_memory (Normal shutdown)

If Accelerator dies first right before MediaFrame (living in another thread) then Accelerator just decrease surface reference counting during pool destruction. The last MediaFrame would keep last shared_ptr<surface > in its own thread. When it dies then last surface reference counter will reach 0 and surface destroyes preallocated_memory too (Delayed shutdown).

P.S. This problem is more acute for DX11 pool, because surface need living Accelerator object for destruction.... I'll check it in the DX11 PRs more precisely

P.P.S. Additionally, problem for "pool reallocation" is exposed here

it's important for CPU that all surfaces shares continuous memory allocation (with different offset).

I cannot simply reallocate preallocated_memory for bigger size in runtime, because shared_ptr<surfaces> (which owns this preallocated_memory) may or may not be assigned to MediaFrames in different threads and still be in-use. to solve this problem I need make COPY for all depended surfaces involved in VPL decode (and not assinged as MediaFrames) at prev step (because they may be dependendt as I/B frames) and realloc new pool with new key. So gready-pool is unavailable at now, and may require additional investigation (orphant MediaFrames)

dmatveev · 2021-09-21T11:24:51Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp

+#endif
+
+    if (!preallocated_pool_memory_ptr) {
+        throw std::runtime_error("VPLCPUAccelerationPolicy::create_surface_pool - failed: not enough memory."


Would it throw this now on Linux by default?

good question... I aimed to write it for WIN-only as discussed in feature request. But most part of other code was written without WIN-specific details and I assume it would work on Linux too. But it hasn't been tested

So there are two possible way:
-Spent time to test/fix Linux incompatible now
-or put message for Windows-only on source creation.

I suggest the second one. But! it maybe make sense to ask another team member for trying to set it up under Linux (with removing such incompatibles) to keep sharing code expertise. @dmatveev, what do you think?

+1 to the second one, and I am definitely sure we'll have to keep Linux in mind - maybe in the backlog

Issue was added in backlog

dmatveev · 2021-09-21T11:25:38Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp

+        GAPI_LOG_INFO(nullptr, "Released workspace memory: " << ptr);
+        ptr = nullptr;
+#else
+        abort(); //not implemented


abort() is a wrong way to indicate error for a library I believe - but it seems it can't be an exception here (due to destructor context) - right?

agree, i suggest to put something up like

#else GAPI_Assert(false && "Not implemented for systems differ than \"_WIN32\". " "Please feel free to set it up under OPENCV contribution policy"); #endif

dmatveev · 2021-09-21T13:30:24Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp

+    try {
+        for (; i < pool_size; i++) {
+            size_t preallocated_mem_offset = static_cast<size_t>(i) * surface_size_bytes;
+
+            surface_ptr_t surf_ptr = creator(workspace_mem_owner,
+                                             preallocated_mem_offset,
+                                             preallocated_raw_bytes);
+            pool.push_back(std::move(surf_ptr));
+        }
+    } catch (const std::exception& ex) {
+        throw std::runtime_error(std::string("VPLCPUAccelerationPolicy::create_surface_pool - ") +
+                                 "cannot construct surface index: " + std::to_string(i) + ", error: " +
+                                 ex.what() +
+                                 "Requested surface count: " + std::to_string(pool_size) +
+                                 ", surface bytes: " + std::to_string(surface_size_bytes));
+    }


Do we really need this handling here? What throws here in particular - is it our code or the external one?

Any exception handling within our module complicates the future debug (or I miss something)

From my perspective his method creates_surface_pool has the same semantics as operator new which normally throws exception std::bad_alloc to make for caller to take decision: decrease requested memory size or crash.
later we will introduce external params for configured pool size in optimization purposes (preallocated frames count due to SafeQueue size to avoid decoder stall when waiting till drawing thread or other consumer releases MediaFrames) and customization might required

Any exception handling within our module complicates the future debug

Not sure:

in the next PRs this part would be handled by caller side in initialize_session
https://github.com/sivanov-work/opencv/pull/4/files#diff-4abf50e2ded39bdb37c29822533abed9cd6395cc8b9826c667e3cd10af59c9b0R204
then
it would be handled by SourcePriv here
https://github.com/sivanov-work/opencv/pull/5/files#diff-d1526c45e93418aa18b64502a4e01cc25151be376b891015a7d9049ec876fcddR216
which would made all part for gracefull deinitialization: session close, mxf unload and so on with GAPI_LOG_WARNING printing

go on with customization (prev paragrraph)

"unit-testsable" for multiple scenarious

ok let's see in the future PRs

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.hpp

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp

dmatveev · 2021-09-21T13:38:53Z

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp

+    next_free_it = surfaces.begin();
+}
+
+CachedPool::surface_ptr_t CachedPool::find_free() {


Is this pool supposed to be thread safe or the protection is done outside?

Pool considered as single-threaded. It contains constantly defined surface count and finds a first "free" one.
each pool is consumed (by key in accelerator table) by only one MFX session by requesting a "free" surface for the next decode frame portion.
Each MFX session makes async request to VPL for decode frame and provides "free" session from exclusive pool object (by key) . Looks like we can use only single thread to use up VPL decoder (CONSUMER thread).

"free" means LockedCounter == 0 in MFX surface inner structure it also means that this surface is not using by VPL decoder and we can use it for another decode or read-write it from application code. So Pool returns "free" surface if

it is not currently used by VPL decode
and...

Surface, as a data provider must be aggregated, by MediaFrame to allow read or write to it by application after MediaFrame::access, especially in another threads. So, when we create MediaFrame from fully decoded surface - the surface itself may or may not get LockedCounter ==0 (not 0 for I and B frames), but we additionally increment application atomic locked counter - ApplicationCounter. It happens in the CONSUMER thread after frame is decoded.
So created MediaFrame (LockedCounter ==0 or LockedCounter != 0 AND ApplicationCounter==1) just returned as data for another thread/threads.
So "free" means LockedCounter ==0 AND ApplicationCounter==0 in result

...and Pool returns "free" surface if

and ApplicationCounter==0 too. Both counters must be 0

This ensure guarantees that pool doesn't return surface to decode (for CONSUMER thread) if it is used in MediaFrame in another threads.
Right after another thread destroy MediaFrame then ApplicationCounter decreased to 0 which means full "free" surface and pool may returns it for CONSUMER thread.

Classic solution for pool was to keep a queue and make pop in and insert back by critical section. In current case we just use lightweight atomic.load() for the next surface in list to make sure that it is free by application. But with single consume limitation (which is applicable here for decode model)

MediaFrames may be destroyed concurrently, is it ok for this implementation?

yes, it is ok. It just decreased surface shared counter as shared_ptr<surface> and ApplicationCounter inside surface ( which helps surfrace became free in pool)

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.hpp

modules/gapi/CMakeLists.txt

sivanov-work · 2021-09-22T05:50:19Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp

+    for (auto& pair : pool_table) {
+        pair.second.clear();
+        // do not free key here: last surface will release it
+    }
+    pool_table.clear();


it does, I wanted to make sure that all pointers are released straight before print "destroyed".

But later the next situation was emerged when some data consumer was keeping referenced surface in MediaFrame. So clear for pool_table here doesn't means that all pointers released until last consumer takes MediaFrame.

So "destoyed" doesn't mean free and this row might be removed

sivanov-work · 2021-09-22T06:01:08Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp

+#endif
+
+    if (!preallocated_pool_memory_ptr) {
+        throw std::runtime_error("VPLCPUAccelerationPolicy::create_surface_pool - failed: not enough memory."


good question... I aimed to write it for WIN-only as discussed in feature request. But most part of other code was written without WIN-specific details and I assume it would work on Linux too. But it hasn't been tested

So there are two possible way:
-Spent time to test/fix Linux incompatible now
-or put message for Windows-only on source creation.

I suggest the second one. But! it maybe make sense to ask another team member for trying to set it up under Linux (with removing such incompatibles) to keep sharing code expertise. @dmatveev, what do you think?

sivanov-work · 2021-09-22T06:05:28Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp

+        GAPI_LOG_INFO(nullptr, "Released workspace memory: " << ptr);
+        ptr = nullptr;
+#else
+        abort(); //not implemented


agree, i suggest to put something up like

#else GAPI_Assert(false && "Not implemented for systems differ than \"_WIN32\". " "Please feel free to set it up under OPENCV contribution policy"); #endif

sivanov-work · 2021-09-22T06:23:59Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp

+    try {
+        for (; i < pool_size; i++) {
+            size_t preallocated_mem_offset = static_cast<size_t>(i) * surface_size_bytes;
+
+            surface_ptr_t surf_ptr = creator(workspace_mem_owner,
+                                             preallocated_mem_offset,
+                                             preallocated_raw_bytes);
+            pool.push_back(std::move(surf_ptr));
+        }
+    } catch (const std::exception& ex) {
+        throw std::runtime_error(std::string("VPLCPUAccelerationPolicy::create_surface_pool - ") +
+                                 "cannot construct surface index: " + std::to_string(i) + ", error: " +
+                                 ex.what() +
+                                 "Requested surface count: " + std::to_string(pool_size) +
+                                 ", surface bytes: " + std::to_string(surface_size_bytes));
+    }


From my perspective his method creates_surface_pool has the same semantics as operator new which normally throws exception std::bad_alloc to make for caller to take decision: decrease requested memory size or crash.
later we will introduce external params for configured pool size in optimization purposes (preallocated frames count due to SafeQueue size to avoid decoder stall when waiting till drawing thread or other consumer releases MediaFrames) and customization might required

Any exception handling within our module complicates the future debug

Not sure:

in the next PRs this part would be handled by caller side in initialize_session
https://github.com/sivanov-work/opencv/pull/4/files#diff-4abf50e2ded39bdb37c29822533abed9cd6395cc8b9826c667e3cd10af59c9b0R204
then
it would be handled by SourcePriv here
https://github.com/sivanov-work/opencv/pull/5/files#diff-d1526c45e93418aa18b64502a4e01cc25151be376b891015a7d9049ec876fcddR216
which would made all part for gracefull deinitialization: session close, mxf unload and so on with GAPI_LOG_WARNING printing

go on with customization (prev paragrraph)

"unit-testsable" for multiple scenarious

sivanov-work · 2021-09-22T07:51:19Z

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp

+    next_free_it = surfaces.begin();
+}
+
+CachedPool::surface_ptr_t CachedPool::find_free() {


Pool considered as single-threaded. It contains constantly defined surface count and finds a first "free" one.
each pool is consumed (by key in accelerator table) by only one MFX session by requesting a "free" surface for the next decode frame portion.
Each MFX session makes async request to VPL for decode frame and provides "free" session from exclusive pool object (by key) . Looks like we can use only single thread to use up VPL decoder (CONSUMER thread).

"free" means LockedCounter == 0 in MFX surface inner structure it also means that this surface is not using by VPL decoder and we can use it for another decode or read-write it from application code. So Pool returns "free" surface if

it is not currently used by VPL decode
and...

Surface, as a data provider must be aggregated, by MediaFrame to allow read or write to it by application after MediaFrame::access, especially in another threads. So, when we create MediaFrame from fully decoded surface - the surface itself may or may not get LockedCounter ==0 (not 0 for I and B frames), but we additionally increment application atomic locked counter - ApplicationCounter. It happens in the CONSUMER thread after frame is decoded.
So created MediaFrame (LockedCounter ==0 or LockedCounter != 0 AND ApplicationCounter==1) just returned as data for another thread/threads.
So "free" means LockedCounter ==0 AND ApplicationCounter==0 in result

...and Pool returns "free" surface if

and ApplicationCounter==0 too. Both counters must be 0

This ensure guarantees that pool doesn't return surface to decode (for CONSUMER thread) if it is used in MediaFrame in another threads.
Right after another thread destroy MediaFrame then ApplicationCounter decreased to 0 which means full "free" surface and pool may returns it for CONSUMER thread.

Classic solution for pool was to keep a queue and make pop in and insert back by critical section. In current case we just use lightweight atomic.load() for the next surface in list to make sure that it is free by application. But with single consume limitation (which is applicable here for decode model)

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp

sivanov-work · 2021-09-22T08:00:28Z

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp

+                            [](const surface_ptr_t& val) {
+                    GAPI_DbgAssert(val && "Pool contains empty surface");
+                    return !val->get_locks_count();
+                });


circular buffer imitation

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.hpp

dmatveev

ok to me, stuff can be fixed with the next set of PRs

dmatveev · 2021-09-22T14:11:58Z

modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.hpp

+//TODO Remove the next MACRO right after DX11 implementation
+#define  CPU_ACCEL_ADAPTER


ok, let's avoid this macro-defined behavior in the future (if I get what's being addressed there)

dmatveev · 2021-09-22T14:12:50Z

modules/gapi/src/streaming/onevpl/accelerators/surface/surface_pool.cpp

+        next_free_it = it;
+        ++next_free_it;
+
+    return *it;


Formatting may look strange

sivanov-work · 2021-09-23T09:13:15Z

@alalek or @mshabunin could you please merge it?
There is an one unaddressed comment about DX11 linkage: https://github.com/opencv/opencv/pull/20727/files#r713648649
Is there any best known practices from your side?

G-API: oneVPL (simplification) added CPU, DX11(fallback CPU) accels & surface pool * Add CPU, DX11(fallback CPU) accels & surface pool * Fix build for surface_pool * Apply some comments * Fix indentation

Add CPU, DX11(fallback CPU) accels & surface pool

ae1fb3e

sivanov-work added category: g-api / gapi feature labels Sep 21, 2021

sivanov-work requested review from AsyaPronina, TolyaTalamanov, dmatveev, rgarnov and smirnov-alexey September 21, 2021 07:24

sivanov-work mentioned this pull request Sep 21, 2021

G-API: oneVPL (simplification) Implement CPU accel policies(DX11 decorated) sivanov-work/opencv#3

Closed

6 tasks

Fix build for surface_pool

fb09535

dmatveev reviewed Sep 21, 2021

View reviewed changes

sivanov-work commented Sep 22, 2021

View reviewed changes

Apply some comments

0ed0828

dmatveev approved these changes Sep 22, 2021

View reviewed changes

Fix indentation

31e533c

alalek assigned dmatveev Sep 23, 2021

alalek merged commit 54386c8 into opencv:master Sep 23, 2021

alalek mentioned this pull request Oct 15, 2021

(5.x) Merge 4.x #20886

Merged

sivanov-work deleted the merge_vpl_accel_impl branch March 5, 2022 05:49

		//TODO Remove the next MACRO right after DX11 implementation
		#define CPU_ACCEL_ADAPTER

Uh oh!

Conversation

sivanov-work commented Sep 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Build configuration

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sivanov-work Sep 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sivanov-work Sep 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dmatveev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sivanov-work commented Sep 23, 2021

Uh oh!

Reviewers

Assignees

Labels

sivanov-work commented Sep 21, 2021 •

edited

Loading

sivanov-work Sep 22, 2021 •

edited

Loading

sivanov-work Sep 22, 2021 •

edited

Loading