Skip to content

TBB executor for GAPI#17170

Closed
anton-potapov wants to merge 8 commits intoopencv:masterfrom
anton-potapov:gapi_tbb_executor
Closed

TBB executor for GAPI#17170
anton-potapov wants to merge 8 commits intoopencv:masterfrom
anton-potapov:gapi_tbb_executor

Conversation

@anton-potapov
Copy link
Copy Markdown
Contributor

@anton-potapov anton-potapov commented Apr 28, 2020

This Merge Request adds and opt in TBB based executor to GAPI

  • opt in for now (default off)
  • only non-streaming mode (for now)
  • The executor uses priority queue based logic for ready to execution tasks

TBB executor description

TBB executor engine operate with graph of tasks represented via cv::gimpl::parallel::tile_node instances.

Dependencies

For every task (tile_node ) there might be a number of input dependencies (i.e. tasks that should be executed before). This number is represented twice in the tile_node structure. They are dependencies and dependency_count. First one describes total number (i.e. input edges in the graph before execution) and does not change during execution, while second one describes current number of unresolved input dependencies. Once dependency_count drops to zero, task is ready for execution as all it's dependencies are satisfied

Dependents

Every task that directly depends on the current one is listed in the tile_node::dependees member. The engine decrees dependency_count of these tasks right after the completion of the current task.

Sync Vs. Async payload

Task's body in enclosed in either task_body or async_task_body members depending on task's nature (and async flag). While in both cases the engine calls passed in function there are two differences:

  • async one accept the callback (from the engine), which should be called when asynchronous activity started by this function completes
  • While for synchronous payloads engine treat the task completed after the function returns, for asynchronous ones task treated completed once the callback is called.

Execution and Priorities

Once all input dependencies for the task are satisfied (i.e. it's dependency_count drops to zero) it is treated as ready for execution and is put into the "ready" queue. The queue (tbb::concurrent_priority_queue) sorts all task according to their priorities.

Mapping to TBB

TBB uses tbb::task derivatives to describe a piece of work. They can be linked together into a tree describing tasks dependencies. Every TBB task has a refcount field describing unsatisfied dependencies. Once it drops to zero task is ready for execution (i.e. can be spawned). Dependency tasks are called child tasks (i.e. parent depends on it's childs). Every child task on creation increases it's parent's refcount. After child task is executed TBB task scheduler automatically decrease parent's refcount. Root of tasks tree is used only to call it's tbb::task::wait_for_all method to wait for the CPU work to complete. wait_for_all waits for root refcount value to become 1 as a sign that all work is done.

The executor engine creates a root task (derived from tbb::task) - a root of the task's tree. Once new tile_node item is pushed to the ready queue, new root's child task is created. Body of every child task does the following

  1. Pops ready for execution tile_node instance from the priority queue
  2. Execute it's body
  3. Decrease dependency_count for every dependent tile_node instance (via tile_node::dependees member)
  4. Per every tile_node with dependency_count dropped to zero, extra (tbb) child task is created.

Asynchronicity work and TBB active waiting

While waiting for the root task, TBB does not put to sleep calling thread. Instead it keep searching for spawned (or enqueued) TBB tasks to do, assuming it somehow related with the root task.

How ether, in case of async work (like GPU or non TBB CPU work) there may be no such. In this case calling thread constantly spinning, wasting CPU time. To overcome this more elaborated scheme then plain calling to tbb::task::wait_for_all is needed.

tile_node graph creation and re-usage

  1. dependency_count is reset to dependencies value for easier reuse of tile_node

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under OpenCV (BSD) License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=ARMv7,Custom
build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
build_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

@anton-potapov anton-potapov force-pushed the gapi_tbb_executor branch 2 times, most recently from a7220f4 to 2e9cb58 Compare May 6, 2020 15:05
@anton-potapov anton-potapov requested review from dmatveev and rgarnov May 13, 2020 23:15
@anton-potapov anton-potapov added this to the 4.x milestone May 13, 2020
@asmorkalov
Copy link
Copy Markdown
Contributor

@anton-potapov Friendly reminder. Please take a look on CI failures.

- opt in for now (default off)
- enable it by default to otest on CI
- fix warnings
- fixed some typos in comments
- try to fix warnings
- use GAPI/OCV debugging facilities
- wait on condition variable only when there active async tasks
- use current arena instead of explicit one
- fix compilation error on TBB 4.4. U1
- Added custom GAPI_EXPECT_THROW and machinery to workaround missing
support for exact exception propogation in custom build tbb binaries
#define ITT_AUTO_TRACE_GUARD_IMPL(LINE, h) ITT_AUTO_TRACE_GUARD_IMPL_(LINE, h)
#define ITT_AUTO_TRACE_GUARD(h) ITT_AUTO_TRACE_GUARD_IMPL(__LINE__ , h)
}} //gimpl::parallel
} //namespace cv
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens if ITT is not available? It is not clear from all these #ifdefs what these declarations unfold to. Can you please somehow restructure it?

}
o<<"]";
return o;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reminds me something...


# Executor
src/executor/gexecutor.cpp
src/executor/gapi_tbb_executor.cpp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if gapi_ needs to be here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe gtbbexecutor?

target_link_libraries(opencv_test_gapi PRIVATE ade)
endif()

if(USE_GAPI_TBB_EXECUTOR)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it shouldn't be CMake option. Let it compile always if there is TBB available.

To use or not to use it could be decided via our compile_args




#endif /* OPENCV_GAPI_GAPI_ITT_HPP */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'd recommend to drop gapi_ thing from here as well. Everything in modules/gapi is about G-API, but we don't have this prefix everywhere

//place in totally ordered queue of tasks to execute. Inverse to priority, i.e. lower index means higher priority
size_t total_order_index = 0;
std::function<void()> task_body;
std::function<void(std::function<void()> callback, size_t total_order_index)> async_task_body;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd give these types names like we do in the rest of our code.

run_op(op, m_res);
}

#endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of modifying the existing class, can we just put a new one on side? Then there's no changes, just additions to.

cv::GComputation comp(cv::GIn(in), cv::GOut(out));

EXPECT_THROW(comp.apply(in_mat, out_mat, cv::compile_args(pkg)), std::logic_error);
GAPI_EXPECT_THROW(comp.apply(in_mat, out_mat, cv::compile_args(pkg)), std::logic_error);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why?

EXPECT_THROW(expr, expected_exception); \
} else { \
EXPECT_ANY_THROW(expr); \
} \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be in precomp, I believe

// );

LOG_INFO(NULL, "Done. Executed " <<executed<<" tasks");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

200-LOC functions shouldn't exist.

@asmorkalov
Copy link
Copy Markdown
Contributor

@anton-potapov Friendly reminder. How does the PR correlates with #17851 ?

@dmatveev
Copy link
Copy Markdown
Contributor

@anton-potapov probably we should close this one?

@asmorkalov
Copy link
Copy Markdown
Contributor

@anton-potapov Friendly reminder.

1 similar comment
@asmorkalov
Copy link
Copy Markdown
Contributor

@anton-potapov Friendly reminder.

@anton-potapov
Copy link
Copy Markdown
Contributor Author

@anton-potapov Friendly reminder. How does the PR correlates with #17851 ?

This patch was split into parts to speed up integration. #17851 is the first part - the executor itself. The actual usage it the next part.

So this pull request can be closed.

@asmorkalov asmorkalov closed this Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants