G-API: oneVPL merge DX11 acceleration#21049
Conversation
766f801 to
f3eb0a3
Compare
mpashchenkov
left a comment
There was a problem hiding this comment.
First wave (stupid comments).
| @@ -157,19 +154,26 @@ GAPI_OCV_KERNEL_ST(RenderFrameOCVImpl, cv::gapi::wip::draw::GRenderFrame, Render | |||
| * | |||
| */ | |||
There was a problem hiding this comment.
Is it necessary to align comment with other code in {}?
| switch (FourCC) { | ||
| case MFX_FOURCC_I420: | ||
| case MFX_FOURCC_NV12: | ||
| nbytes = width * height + 2 * half_width * half_height; |
There was a problem hiding this comment.
| nbytes = width * height + 2 * half_width * half_height; | |
| nbytes = width * height + 2 * half_width * half_height; |
| } | ||
|
|
||
| surface_ptr_t create_surface_RGB4_(mfxFrameInfo frameInfo, | ||
| std::shared_ptr<void> out_buf_ptr, |
There was a problem hiding this comment.
One more space for parameters.
| } | ||
|
|
||
| surface_ptr_t create_surface_other_(mfxFrameInfo frameInfo, | ||
| std::shared_ptr<void> out_buf_ptr, |
|
|
||
| void VPLDX11AccelerationPolicy::init(session_t session) { | ||
| mfxStatus sts = MFXVideoCORE_GetHandle(session, MFX_HANDLE_D3D11_DEVICE, reinterpret_cast<mfxHDL*>(&hw_handle)); | ||
| mfxStatus sts = MFXVideoCORE_SetHandle(session, MFX_HANDLE_D3D11_DEVICE, (mfxHDL) hw_handle); |
There was a problem hiding this comment.
(mfxHDL)
Why is it not some_cast<>()?
There was a problem hiding this comment.
some kind of copy-paste from public VPL examples. I'll fix it later
| auto read_work = [&lock, &shared_value](size_t count) { | ||
|
|
||
| auto old_shared_value = shared_value; | ||
| for (size_t i = 0; i < count; i ++) { |
There was a problem hiding this comment.
| for (size_t i = 0; i < count; i ++) { | |
| for (size_t i = 0; i < count; i++) { |
|
|
||
| // TEST invariant: last_visitor_id MUST be one from any LATTER worker visitor_id | ||
| bool one_of_available_ids_matched = false; | ||
| for (size_t id = 0; id < max_worker_count; id ++) { |
There was a problem hiding this comment.
| for (size_t id = 0; id < max_worker_count; id ++) { | |
| for (size_t id = 0; id < max_worker_count; id++) { |
| if (adaptee.owns()) { | ||
| w_lock_counter ++; | ||
| } else { | ||
| r_lock_counter ++; |
There was a problem hiding this comment.
Remove extra space before increment ++ 648 and 650
| if (adaptee.owns()) { | ||
| w_unlock_counter ++; | ||
| } else { | ||
| r_unlock_counter ++; |
There was a problem hiding this comment.
Remove extra space before increment ++ 657 and 659
|
|
||
| mfxFrameData data; | ||
| const int exec_count = 123; | ||
| for (int i = 0; i < exec_count; i ++) { |
There was a problem hiding this comment.
| for (int i = 0; i < exec_count; i ++) { | |
| for (int i = 0; i < exec_count; i++) { |
| throw std::runtime_error("VPLDX11AccelerationPolicy::create_surface_pool() is not implemented"); | ||
| // allocate textures by explicit request | ||
| mfxFrameAllocResponse mfxResponse; | ||
| //TODO |
There was a problem hiding this comment.
i think it must be removed
| desc.MiscFlags = 0; | ||
| std::vector<ID3D11Texture2D*> staging_textures; | ||
| staging_textures.reserve(request->NumFrameSuggested); | ||
| for (int i = 0; i < request->NumFrameSuggested; i ++ ) { |
There was a problem hiding this comment.
| for (int i = 0; i < request->NumFrameSuggested; i ++ ) { | |
| for (int i = 0; i < request->NumFrameSuggested; i++ ) { |
| pitch, | ||
| pitch / 2, | ||
| pitch / 2, 0u | ||
| }; |
| auto work = [&lock, &shared_value](size_t count) { | ||
| for (size_t i = 0; i < count; i ++) { | ||
| lock.lock(); | ||
| shared_value ++; |
There was a problem hiding this comment.
| shared_value ++; | |
| shared_value++; |
| }; | ||
|
|
||
| std::thread worker_thread([&barrier, sync, work] () { | ||
|
|
|
|
||
| std::function<mfxStatus(mfxMemId, mfxFrameData *)> lock = | ||
| [&lock_counter] (mfxMemId, mfxFrameData *) { | ||
| lock_counter ++; |
There was a problem hiding this comment.
| lock_counter ++; | |
| lock_counter++; |
|
|
||
| mfxFrameData data; | ||
| const int exec_count = 123; | ||
| for (int i = 0; i < exec_count; i ++) { |
There was a problem hiding this comment.
| for (int i = 0; i < exec_count; i ++) { | |
| for (int i = 0; i < exec_count; i++) { |
| bool pred_excl = exclusive_lock.load(); | ||
| do { | ||
| if (!pred_excl) { | ||
| // if no exclusive lock than start shared lock transaction |
modules/gapi/src/streaming/onevpl/accelerators/accel_policy_cpu.cpp
Outdated
Show resolved
Hide resolved
| VPLDX11AccelerationPolicy(device_selector_ptr_t selector); | ||
| ~VPLDX11AccelerationPolicy(); | ||
|
|
||
| using pool_t = CachedPool; |
There was a problem hiding this comment.
Can this using be a part of the parent class, or it is specific for cpu & dx11 only?
There was a problem hiding this comment.
got it,
when new policies would be intorduced i suggest to make light refactoring here
modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.cpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.cpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.cpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.cpp
Outdated
Show resolved
Hide resolved
|
Looks good for me. Great work! |
|
Please configure builders with enabled oneVPL support (see previous oneVPL-related PRs) |
modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.cpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/accelerators/accel_policy_dx11.cpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/accelerators/dx11_alloc_resource.cpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/accelerators/dx11_alloc_resource.hpp
Outdated
Show resolved
Hide resolved
| return MFX_ERR_NONE; | ||
| } | ||
|
|
||
| DX11AllocationRecord::DX11AllocationRecord() = default; |
There was a problem hiding this comment.
can this be declared in .hpp?
modules/gapi/src/streaming/onevpl/accelerators/utils/shared_lock.cpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/accelerators/utils/shared_lock.cpp
Outdated
Show resolved
Hide resolved
| std::this_thread::yield(); | ||
| } | ||
| prev_shared = shared_counter.load(); | ||
| } while (prev_shared != 0 || !in_progress); |
There was a problem hiding this comment.
Did I understand correctly that when in_progress == true (and in the end of cycle body shared_counter == 0, execution should go out of cycle?
Then why it is named in_progress? I would expect that if some process is in progress, it means it should remain cycling.
More likely this variable should have been called like complete, because it means that the current action is complete and execution should move forward.
The same is for the method shared_lock above
There was a problem hiding this comment.
it's locking with rollback in failed case:
When i'm write thread and want to make exclusive lock, then when i get current shared_counter and make sure no other competitors is here (shared_counter == 0) (---1----) then i will make exclusive_lock (----2----) and flag operation as in progress. Starting from this moment no one reader thread cannot take shared access because bool pred_excl = exclusive_lock.load(); will fail for them.
But let's imaging situation when after check (---1---) and right before (---2----) one of the competitors reader thread would be succeeded in obtaining shared lock... So when i finish operation (---2---) it is more readers thread obtained lock already... So i need to rolled back my (---2---) operation in this case and wait until all readers complete their works before continue...
So in_progress is using as flag that i must rollback extra exclusive lock when the second condition fails. And must proceed when both condition true: is_progress == true && `shared_lock ==0)
There was a problem hiding this comment.
That's what I meant - in the end, in_progress == true, but you must proceed further. The comment is regarding naming. "In progress" itself means you are in the middle of some process and you cannot move further. It doesn't match with the meaning of this variable
If to suggest, this variable means that write access is gained, so it can be named like gained, succeeded; or, better, meaningful locked, because exclusive_lock was set true, but in the case you've described it should be rolled back. But it is locked that time.
There was a problem hiding this comment.
this variable cannot be named as gained, succeeded of locked because resource is not locked until we finished final while. It actually means: we are in the middle of progress something and we must not make some assumption about succeed or not until we check the second condition shared_lock ==0
So in_progress means: I increased exclusive lock counter but i'm not sure that i'm allowed to continue with exclusive access until i check the second condition.
This is sync with using two independent atomic states: and we cannot proceed it both of them wouldn't reach the right state
| mfx_param_0.Data.U32 = provider_ptr->get_mfx_codec_id(); | ||
| EXPECT_EQ(MFXSetConfigFilterProperty(cfg_inst_0,(mfxU8 *)"mfxImplDescription.mfxDecoderDescription.decoder.CodecID", | ||
| EXPECT_EQ(MFXSetConfigFilterProperty(cfg_inst_0,(mfxU8 *)CfgParam::decoder_id_name(), | ||
| mfx_param_0), MFX_ERR_NONE); |
There was a problem hiding this comment.
Please, remove the C-style cast
| auto args = cv::compile_args(use_only{pkg}); | ||
| if (cap) { | ||
| args += cv::compile_args(streaming::queue_capacity{cap.value()}); | ||
| args += cv::compile_args(cv::gapi::streaming::queue_capacity{cap.value()}); |
There was a problem hiding this comment.
Then, maybe using namespace cv::gapi; should be removed 3 lines upper
modules/gapi/src/streaming/onevpl/cfg_param_device_selector.cpp
Outdated
Show resolved
Hide resolved
modules/gapi/src/streaming/onevpl/cfg_param_device_selector.cpp
Outdated
Show resolved
Hide resolved
| decRequest.NumFrameSuggested = preallocated_frames_count; | ||
| if (static_cast<size_t>(std::numeric_limits<mfxU16>::max()) < preallocated_frames_count) { | ||
| GAPI_LOG_WARNING(nullptr, "Cannot proceed with CfgParam \"" << CfgParam::frames_pool_size_name() << "\": " << | ||
| preallocated_frames_count << ". It must not be equal than " << |
There was a problem hiding this comment.
must not be greater I believe
| ocv_target_link_libraries(opencv_test_gapi PRIVATE ${VPL_IMPORTED_TARGETS}) | ||
| if(MSVC) | ||
| target_compile_options(opencv_test_gapi PUBLIC "/wd4201") | ||
| endif() |
There was a problem hiding this comment.
You've rewritten includes via pragma in tests, too - is it still necessary?
OrestChura
left a comment
There was a problem hiding this comment.
Thank you for all the efforts!
mpashchenkov
left a comment
There was a problem hiding this comment.
I didn't find places with possible nulls and going out of memory. I'm give up.
| static_cast<DX11AllocationItem::subresource_id_t>( | ||
| reinterpret_cast<uint64_t>( | ||
| reinterpret_cast<DX11AllocationItem::subresource_id_t *>( | ||
| handle.second)))}};//, |
There was a problem hiding this comment.
| handle.second)))}};//, | |
| handle.second)))}}; |
| #if defined(_MSC_VER) | ||
| #pragma warning(push) | ||
| #pragma warning(disable : 4201) | ||
| #pragma warning(disable : 4302) | ||
| #pragma warning(disable : 4311) | ||
| #pragma warning(disable : 4312) | ||
| #endif // defined(_MSC_VER) |
There was a problem hiding this comment.
Why is not:
if(MSVC)
if(HAVE_GAPI_ONEVPL) <-- I'm not sure about the flag
ocv_warnings_disable(CMAKE_CXX_FLAGS /wd4201 /wd4302 /wd4311 /wd4312)
endif()
in CMakeLists.txt?
One place of disabling can reduce problems in the future.
There was a problem hiding this comment.
Because we should not disable warnings anywhere.
Problems with warnings should be suppressed locally.
|
if no other comments are coming then I assume to apply them in the next PRs:
|
| ocv_target_compile_definitions(opencv_test_gapi PRIVATE -DHAVE_ONEVPL) | ||
| ocv_target_link_libraries(opencv_test_gapi PRIVATE ${VPL_IMPORTED_TARGETS}) | ||
| if(MSVC) | ||
| target_compile_options(opencv_test_gapi PUBLIC "/wd4201") |
There was a problem hiding this comment.
Shouldn't this be PRIVATE?
| }; | ||
|
|
||
| if (!type.empty()) { | ||
| cfg_params.push_back(CfgParam::create_decoder_id(type.c_str())); |
There was a problem hiding this comment.
BTW this looks odd from the user perspective - why do you ask users to provide you with c_str() while there's std::string ? Is that char* thing copied or should stay alive - for how long?
There was a problem hiding this comment.
to retain public interface consistent (maybe for future): because STL classes are not ABI compatible in general
| namespace wip { | ||
| namespace onevpl { | ||
| namespace utils { | ||
| mfxU32 GetSurfaceSize_(mfxU32 FourCC, mfxU32 width, mfxU32 height) { |
There was a problem hiding this comment.
alignment is still not counted here right?
note sometimes planes are aligned by up to 4096
There was a problem hiding this comment.
it was get from VPL examples, may be we could add it when faces with the issue later
| surface_ptr_t create_surface_RGB4_(mfxFrameInfo frameInfo, | ||
| std::shared_ptr<void> out_buf_ptr, | ||
| size_t out_buf_ptr_offset, | ||
| size_t out_buf_size) |
There was a problem hiding this comment.
It may be a tiny detail but how this function is supposed to be used?
I see it creates something and returns, but that's the out_buf_ptr then?
There was a problem hiding this comment.
it is shared across surfaces: each surface memory is sliced from out_buf_ptr
So all surfaces have reference to that buf and mapped on that buffer memory by offset
Referencing is required to keep out_buf_ptr alive when alive at least one surface
| // TODO more intelligent check | ||
| if (out_buf_size <= | ||
| out_buf_ptr_offset + (surfW * surfH) + ((surfW / 2) * (surfH / 2))) { | ||
| GAPI_LOG_WARNING(nullptr, "Not enough buffer, ptr: " << out_buf_ptr << |
There was a problem hiding this comment.
-> Insufficient buffer size?
There was a problem hiding this comment.
TODO in next PR
| if (mode == MediaFrame::Access::R) { | ||
| alloc_data->unlock_read(mid, data); | ||
| } else { | ||
| alloc_data->unlock_write(mid, data); |
There was a problem hiding this comment.
so can this be a plain abstract rwlock (e.g. like one from boost or pthreads)? Or something differs in the access at the VPL level?
There was a problem hiding this comment.
adapter is quite different, but shared_mutex inside it should be reused from official sources further
P.S. is_own_locked can be put on upper level to do not customize official shared_mutex
P.P.S. official c++17 shared_mutex is mutex-based with wait-lock, but tbb has imple,entation with rw_spinlock to make busy wait
| namespace onevpl { | ||
|
|
||
| template<typename Impl> | ||
| class elastic_barrier { |
There was a problem hiding this comment.
Is the idea of this structure described anywhere?
There was a problem hiding this comment.
(I'd expect to see it here)
There was a problem hiding this comment.
May i add in the next PR such description?
In short overview:
pass only single one thread (the first one) to make some initialization: resource acquisition - another will waiting for the first thread job completion.
And make final deinitialization: resource release - only for last one( if no other threads hold resource or intendent to acquire it)
Some kind of double check lock patter in Gstreamer MediaResource Adapter - but support more scenarios( and more efficient?)
| } | ||
| bool SharedLock::owns() const { | ||
| return exclusive_lock.load(); | ||
| } |
There was a problem hiding this comment.
why do we need our own shared lock here?
There was a problem hiding this comment.
yes it is, but this can be easily emulated on upper level in LockAdapter
| namespace wip { | ||
| namespace onevpl { | ||
|
|
||
| class GAPI_EXPORTS SharedLock { |
There was a problem hiding this comment.
is it modelled after the C++14's shared_lock?
There was a problem hiding this comment.
yes, homegrown solution...
| return ret; | ||
| } | ||
|
|
||
| size_t strtoull_or_throw(const char* str) { |
There was a problem hiding this comment.
to satisty c-string in interface functions (to make interface ABI consistent)
|
@alalek Could you merge it please? Non addressed comment would be addressed at the next PR |
G-API: oneVPL merge DX11 acceleration * Merge DX11 initial * Fold conditions row in MACRO in utils * Inject DeviceSelector * Turn on DeviceSelector in DX11 * Change sharedLock logic & Move FMT checking in FrameAdapter c-tor * Move out NumSuggestFrame to configure params * Drain file source fix * Fix compilation * Force zero initializetion of SharedLock * Fix some compiler warnings * Fix integer comparison warnings * Fix integers in sample * Integrate Demux * Fix compilation * Add predefined names for some CfgParam * Trigger CI * Fix MultithreadCtx bug, Add Dx11 GetBlobParam(), Get rif of ATL CComPtr * Fix UT: remove unit test with deprecated video from opencv_extra * Add creators for most usable CfgParam * Eliminate some warnings * Fix warning in GAPI_Assert * Apply comments * Add VPL wrapped header with MSVC pragma to get rid of global warning masking
Add DX11 allocation resources: surfaces
Implement actual DX11 acceleration with MFX allocator
Reuse IDeviceSelector interface for selecting device in DX11 accelerator policy
Add
shared_lock&elastics_barrierto work out R-W pattern & provide lock-free access to MediaFrame data.Add unit tests for lock-free primitives
Improve decode pipeline: add yield on queue decode operations for used up busy surfaces
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.
Build Configuration