Fix issue with MultiThreadedExecutor by adding lock to protect _futures#1477
Fix issue with MultiThreadedExecutor by adding lock to protect _futures#1477ahcorde merged 1 commit intoros2:rollingfrom
Conversation
fujitatomoya
left a comment
There was a problem hiding this comment.
the change looks good to me to exclude the racy condition.
IMO, the right thing to do here is to add the spinning status for the Executor to check if it is already spinning just like rclcpp::Executor. looks like Executor of rclpy does not have this status at all.
|
i would like to have 2nd review for this before starting CI. |
|
@brennanmk Is there a minimum reproducible example for this? |
|
Ah, had some trouble but just had to be more persistent. Looks good to me! |
|
Actually, since I removed the creation of a shallow copy in the loop I think it makes sense to also lock the self._futures append call on line 1005. Otherwise it is possible that one thread appends to self._futures while another is in the loop. I do not think this would cause any observable errors, but it is still probably best practice. Alternatively I could add the creation of a shallow copy back, but this seems unnecessary. It might make sense to do what @fujitatomoya discussed above and instead create a "spinning" variable like in rclcpp. Although this would not be quite as clean as in cpp because we would still need to add a lock to make the spinning variable atomic. It might make sense if we want to raise an exception (or warning) if _spin_once_impl is called multiple times. Happy to make this change if it would be beneficial. |
|
Pulls: #1477 |
|
@brennanmk thanks for the effort and explanation. for adding the spinning check, probably it would be better to have consensus to implement that. besides, this PR does make sense. i say this is good to go with 2nd approval from other maintainers. |
|
Hey there, What's needed to have this change merged? |
|
Hey @fujitatomoya |
|
@Mergifyio rebase |
Signed-off-by: brennanmk <brennanmk2200@gmail.com>
✅ Branch has been successfully rebased |
34e4063 to
699452f
Compare
|
@szobov waiting for the another approval from maintainers. |
|
@mjcarroll friendly ping. |
|
Pulls: #1477 |
|
Pulls: #1477 |
Signed-off-by: brennanmk <brennanmk2200@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>
Signed-off-by: brennanmk <brennanmk2200@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>
) * Fix warnings from gcc. (#1501) Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Update type_support to use new abcs Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Cleanup old test cases to use new automatic inference Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Add content-filtered-topic interfaces (#1506) Signed-off-by: Barry Xu <Barry.Xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Added lock to protect futures for multithreaded executor (#1477) Signed-off-by: brennanmk <brennanmk2200@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * EventsExecutor: Handle async callbacks for services and subscriptions (#1478) Closes #1473 Signed-off-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * add spinning state for the Executor classes. (#1510) Signed-off-by: Tomoya.Fujita <tomoya.fujita825@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fixes Action.*_async futures never complete (#1308) Per rclpy:1123 If two seperate client server actions are running in seperate executors the future given to the ActionClient will never complete due to a race condition This fixes the calls to rcl handles potentially leading to deadlock scenarios by adding locks to there references Co-authored-by: Aditya Agarwal <aditya.kgp25@gmail.com> Co-authored-by: Jonathan Blixt <jmblixt3@gmail.com> Signed-off-by: Jonathan Blixt <jmblixt3@gmail.com> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * remove unused 'param_type' (#1524) 'param_type' is set but never used Signed-off-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Changelog Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.1 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove duplicate future handling from send_goal_async (#1532) A recent change intended to move this logic into a lock context, but actually ended up duplicating it instead. This fixes that by removing the duplicated logic outside of the lock. It also preserves the explicit typing annotation on the future. Signed-off-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * fix(test_events_executor): destroy all nodes before shutdown (#1538) Signed-off-by: yuanyuyuan <az6980522@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * add BaseImpl Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Add ImplT Support Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * fix changelong Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove accidental tuple (#1542) Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Allow action servers without execute callback (#1219) Signed-off-by: Tim Clephas <tim.clephas@nobleo.nl> * add : get clients, servers info (#1307) Signed-off-by: Minju, Lee <dlalswn531@naver.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.2 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * update tests Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * ParameterEventHandler support ContentFiltering (#1531) * ParameterEventHandler support ContentFiltering Signed-off-by: Barry Xu <barry.xu@sony.com> * Address review comments Signed-off-by: Barry Xu <barry.xu@sony.com> --------- Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fix issues with resuming async tasks awaiting a future (#1469) Signed-off-by: Błażej Sowa <bsowa123@gmail.com> Signed-off-by: Nadav Elkabets <elnadav12@gmail.com> Co-authored-by: Nadav Elkabets <32939935+nadavelkabets@users.noreply.github.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.3 Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Increase clock accuracy (#1564) Signed-off-by: Florian Vahl <git@flova.de> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use unconditional wait when possible. (#1563) Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Remove default from switch with enum, so that compiler warns. (#1566) Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Fix parameter parsing for unspecified target nodes (#1552) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Improve the compatibility of processing YAML parameter files (#1548) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Improve wildcard parsing and optimize the logic for parsing YAML para… (#1571) Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Expose action graph functions as Node class methods. (#1574) * Expose action graph functions as Node class methods. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> * address review comments to keep the warning consistent. Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> --------- Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> * Fix performance bug in MultiThreadedExecutor (hopefully) (#1547) Signed-off-by: Michael Tandy <git@mjt.me.uk> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Changelog Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * 10.0.4 Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * use Msg over BaseMessage Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use Srv over BaseService Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Use Action over BaseAction Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * lint Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> * Update rclpy/rclpy/type_support.py Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com> Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> --------- Signed-off-by: Michael Carlstrom <rmc@carlstrom.com> Signed-off-by: Barry Xu <Barry.Xu@sony.com> Signed-off-by: brennanmk <brennanmk2200@gmail.com> Signed-off-by: Brad Martin <bmartin@fatlxception.org> Signed-off-by: Tomoya.Fujita <tomoya.fujita825@gmail.com> Signed-off-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Signed-off-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Signed-off-by: yuanyuyuan <az6980522@gmail.com> Signed-off-by: Tim Clephas <tim.clephas@nobleo.nl> Signed-off-by: Minju, Lee <dlalswn531@naver.com> Signed-off-by: Barry Xu <barry.xu@sony.com> Signed-off-by: Błażej Sowa <bsowa123@gmail.com> Signed-off-by: Nadav Elkabets <elnadav12@gmail.com> Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai> Signed-off-by: Florian Vahl <git@flova.de> Signed-off-by: Florian Vahl <florian.vahl@dlr.de> Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com> Signed-off-by: Michael Tandy <git@mjt.me.uk> Co-authored-by: Chris Lalancette <clalancette@gmail.com> Co-authored-by: Barry Xu <barry.xu@sony.com> Co-authored-by: Brennan Miller-Klugman <55165406+brennanmk@users.noreply.github.com> Co-authored-by: Brad Martin <52003535+bmartin427@users.noreply.github.com> Co-authored-by: Brad Martin <bmartin@fatlxception.org> Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com> Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com> Co-authored-by: Jonathan <jmblixt3@gmail.com> Co-authored-by: Christian Rauch <Christian.Rauch@unileoben.ac.at> Co-authored-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com> Co-authored-by: Yuyuan Yuan <az6980522@gmail.com> Co-authored-by: Tim Clephas <tim.clephas@nobleo.nl> Co-authored-by: Minju, Lee <70446214+leeminju531@users.noreply.github.com> Co-authored-by: Błażej Sowa <bsowa123@gmail.com> Co-authored-by: Nadav Elkabets <32939935+nadavelkabets@users.noreply.github.com> Co-authored-by: Michael Carroll <mjcarroll@intrinsic.ai> Co-authored-by: Florian Vahl <git@flova.de> Co-authored-by: Michael Tandy <git@mjt.me.uk> Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Description
The MultiThreadedExecutor contains a possible race condition. If _spin_once_impl is called more than once, it is possible that multiple threads may end up passing the condition on line 1008 and try to remove the same future on line 1009.
This solution proposes locking self._futures. This also has the benefit of removing the need to make a shallow copy of self._futures (#1129).
Replaces #1129
Fixes #1393
Is this user-facing behavior change?
No
Did you use Generative AI?
No
Additional Information
An alternative solution might involve placing a lock to prevent multiple threads from entering _spin_once_impl.