Fix performance bug in MultiThreadedExecutor (hopefully)#1547
Conversation
1e061ef to
6b15da7
Compare
…g, this helped a lot with ros2#1452 Signed-off-by: Michael Tandy <git@mjt.me.uk>
6b15da7 to
0fb9834
Compare
|
From issue triage meeting: I put this on the agenda for the next Client Library Working Group meeting. |
|
We had to use 0.01 instead of 0 for this fix to work. @wjwwood Not sure what the underlying issue is, but the CPU performance boost is incredible. |
|
@LSchi-nexat Interesting, glad it's worked for you! For me, the sleep of zero worked OK. Can you tell me how many topics you're subscribed to, the message rates of the various topics, and whether you're doing any CPU-intensive processing of any of the messages? Using 0.01 (i.e. 10 milliseconds) of course means any messages arriving faster than 100 Hz might get dropped. So if you've got 1000 Hz topics, some of your CPU saving might be coming from just not not processing a bunch of messages. |
|
Discussion from Client Library Working Group. Based on researching during the meeting, we are fairly sure that @michaeltandy's analysis is correct. Since this PR has been opened, we have also landed some updates to the logic in the Executor based class here (#1469) which could potentially have impact on the behavior. It may be that this thread yield is still needed, but before we land it, we are going to check to see if 1469 has any impact. Actions:
|
|
I've tested, and @mjcarroll is right to say the current rolling branch has resolved the yield-the-same-task-repeatedly bug, in 9695271e providing a big boost in performance. However, there's still a further (modest) performance improvement available with the addition of the As of 10.0.2 Mostly fixed by 9695271e And slightly better again the the addition of the sleep(0) Still an improvement, but not the three-orders-of-magnitude I claimed in the pull request. |
fujitatomoya
left a comment
There was a problem hiding this comment.
@michaeltandy thanks a lot for sharing the information. i believe your analysis is correct, and this PR fixes that problematic situation in the MultiThreadedExecutor.
Signed-off-by: Michael Tandy <git@mjt.me.uk>
|
Pulls: #1547 |
Signed-off-by: Michael Tandy <git@mjt.me.uk> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>
) * Fix warnings from gcc. (#1501) Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Update type_support to use new abcs Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Cleanup old test cases to use new automatic inference Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Add content-filtered-topic interfaces (#1506) Signed-off-by: Barry Xu <Barry.Xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Added lock to protect futures for multithreaded executor (#1477) Signed-off-by: brennanmk <brennanmk2200@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * EventsExecutor: Handle async callbacks for services and subscriptions (#1478) Closes #1473 Signed-off-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * add spinning state for the Executor classes. (#1510) Signed-off-by: Tomoya.Fujita <tomoya.fujita825@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fixes Action.*_async futures never complete (#1308) Per rclpy:1123 If two seperate client server actions are running in seperate executors the future given to the ActionClient will never complete due to a race condition This fixes the calls to rcl handles potentially leading to deadlock scenarios by adding locks to there references Co-authored-by: Aditya Agarwal <aditya.kgp25@gmail.com> Co-authored-by: Jonathan Blixt <jmblixt3@gmail.com> Signed-off-by: Jonathan Blixt <jmblixt3@gmail.com> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * remove unused 'param_type' (#1524) 'param_type' is set but never used Signed-off-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Changelog Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.1 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove duplicate future handling from send_goal_async (#1532) A recent change intended to move this logic into a lock context, but actually ended up duplicating it instead. This fixes that by removing the duplicated logic outside of the lock. It also preserves the explicit typing annotation on the future. Signed-off-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * fix(test_events_executor): destroy all nodes before shutdown (#1538) Signed-off-by: yuanyuyuan <az6980522@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * add BaseImpl Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Add ImplT Support Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * fix changelong Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove accidental tuple (#1542) Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Allow action servers without execute callback (#1219) Signed-off-by: Tim Clephas <tim.clephas@nobleo.nl> * add : get clients, servers info (#1307) Signed-off-by: Minju, Lee <dlalswn531@naver.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.2 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * update tests Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * ParameterEventHandler support ContentFiltering (#1531) * ParameterEventHandler support ContentFiltering Signed-off-by: Barry Xu <barry.xu@sony.com> * Address review comments Signed-off-by: Barry Xu <barry.xu@sony.com> --------- Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fix issues with resuming async tasks awaiting a future (#1469) Signed-off-by: Błażej Sowa <bsowa123@gmail.com> Signed-off-by: Nadav Elkabets <elnadav12@gmail.com> Co-authored-by: Nadav Elkabets <32939935+nadavelkabets@users.noreply.github.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.3 Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Increase clock accuracy (#1564) Signed-off-by: Florian Vahl <git@flova.de> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use unconditional wait when possible. (#1563) Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove default from switch with enum, so that compiler warns. (#1566) Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fix parameter parsing for unspecified target nodes (#1552) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Improve the compatibility of processing YAML parameter files (#1548) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Improve wildcard parsing and optimize the logic for parsing YAML para… (#1571) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Expose action graph functions as Node class methods. (#1574) * Expose action graph functions as Node class methods. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> * address review comments to keep the warning consistent. Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> --------- Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> * Fix performance bug in MultiThreadedExecutor (hopefully) (#1547) Signed-off-by: Michael Tandy <git@mjt.me.uk> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Changelog Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.4 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * use Msg over BaseMessage Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use Srv over BaseService Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use Action over BaseAction Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * lint Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Update rclpy/rclpy/type_support.py Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> --------- Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> Signed-off-by: Barry Xu <Barry.Xu@sony.com> Signed-off-by: brennanmk <brennanmk2200@gmail.com> Signed-off-by: Brad Martin <bmartin@fatlxception.org> Signed-off-by: Tomoya.Fujita <tomoya.fujita825@gmail.com> Signed-off-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Signed-off-by: yuanyuyuan <az6980522@gmail.com> Signed-off-by: Tim Clephas <tim.clephas@nobleo.nl> Signed-off-by: Minju, Lee <dlalswn531@naver.com> Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Błażej Sowa <bsowa123@gmail.com> Signed-off-by: Nadav Elkabets <elnadav12@gmail.com> Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai> Signed-off-by: Florian Vahl <git@flova.de> Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Michael Tandy <git@mjt.me.uk> Co-authored-by: Chris Lalancette <clalancette@gmail.com> Co-authored-by: Barry Xu <barry.xu@sony.com> Co-authored-by: Brennan Miller-Klugman <55165406+brennanmk@users.noreply.github.com> Co-authored-by: Brad Martin <52003535+bmartin427@users.noreply.github.com> Co-authored-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Co-authored-by: Jonathan <jmblixt3@gmail.com> Co-authored-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Co-authored-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Co-authored-by: Yuyuan Yuan <az6980522@gmail.com> Co-authored-by: Tim Clephas <tim.clephas@nobleo.nl> Co-authored-by: Minju, Lee <70446214+leeminju531@users.noreply.github.com> Co-authored-by: Błażej Sowa <bsowa123@gmail.com> Co-authored-by: Nadav Elkabets <32939935+nadavelkabets@users.noreply.github.com> Co-authored-by: Michael Carroll <mjcarroll@intrinsic.ai> Co-authored-by: Florian Vahl <git@flova.de> Co-authored-by: Michael Tandy <git@mjt.me.uk> Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Description
Fixes #1452
This one-line change should improve the performance of MultiThreadedExecutor.
_wait_for_ready_callbacksyields, among other things, tasks that are in progress,if (not task.executing() and not task.done()SingleThreadedExecutorthis logic works fine - because yielded tasks are always completed before the next call to_wait_for_ready_callbacksMultiThreadedExecutoryielded tasks are merely submitted to a threadpool before the next call to_wait_for_ready_callbacks- there's no guarantee they'll have started executing. A task that's waiting in the threadpool queue can be returned and enqueued again._wait_for_ready_callbacksreturns quickly, rather than waiting.task.py's__call__method includes a check -if (not self._pending() or self._executing): return- the task doesn't actually run beyond that check repeatedly, despite being enqueued repeatedly.spin()thread holds the GIL, the thread pool executor's workers can't even begin to make progress.Running the Zachary's example from #1452 yields the following difference:
Before
22.503s/30.127s = 74.7% of a CPU core used, on average.
After
0.842s/30.127s = 2.8% of a CPU core used, on average.
Is this user-facing behavior change?
No change.
Did you use Generative AI?
No generative AI was used.
Additional Information
Diff if you want the logging that revealed this issue to me:
As a test I ran the basic publisher and subscriber from the documentation but with the publisher using
timer_period = 0.02and the subscriber using aMultiThreadedExecutor:Here's an excerpt of the logs, before the fix is applied: