Skip to content

Fix issue with MultiThreadedExecutor by adding lock to protect _futures#1477

Merged
ahcorde merged 1 commit intoros2:rollingfrom
brennanmk:rolling
Oct 2, 2025
Merged

Fix issue with MultiThreadedExecutor by adding lock to protect _futures#1477
ahcorde merged 1 commit intoros2:rollingfrom
brennanmk:rolling

Conversation

@brennanmk
Copy link
Copy Markdown
Contributor

Description

The MultiThreadedExecutor contains a possible race condition. If _spin_once_impl is called more than once, it is possible that multiple threads may end up passing the condition on line 1008 and try to remove the same future on line 1009.

This solution proposes locking self._futures. This also has the benefit of removing the need to make a shallow copy of self._futures (#1129).

Replaces #1129
Fixes #1393

Is this user-facing behavior change?

No

Did you use Generative AI?

No

Additional Information

An alternative solution might involve placing a lock to prevent multiple threads from entering _spin_once_impl.

Copy link
Copy Markdown
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change looks good to me to exclude the racy condition.

IMO, the right thing to do here is to add the spinning status for the Executor to check if it is already spinning just like rclcpp::Executor. looks like Executor of rclpy does not have this status at all.

@fujitatomoya
Copy link
Copy Markdown
Collaborator

i would like to have 2nd review for this before starting CI.

@cst0
Copy link
Copy Markdown

cst0 commented Jun 25, 2025

@brennanmk Is there a minimum reproducible example for this?

@brennanmk
Copy link
Copy Markdown
Contributor Author

brennanmk commented Jun 26, 2025

@cst0 it is relatively straight forward to reproduce the issue. If you call spin_once on a MultiThreadedExecutor a bunch of times in rapid succession (in different threads) eventually it will show up.

#1393 shows what the issue looks like when it appears.

@cst0
Copy link
Copy Markdown

cst0 commented Jun 26, 2025

Ah, had some trouble but just had to be more persistent. Looks good to me!

@brennanmk
Copy link
Copy Markdown
Contributor Author

Actually, since I removed the creation of a shallow copy in the loop I think it makes sense to also lock the self._futures append call on line 1005. Otherwise it is possible that one thread appends to self._futures while another is in the loop. I do not think this would cause any observable errors, but it is still probably best practice. Alternatively I could add the creation of a shallow copy back, but this seems unnecessary.

It might make sense to do what @fujitatomoya discussed above and instead create a "spinning" variable like in rclcpp. Although this would not be quite as clean as in cpp because we would still need to add a lock to make the spinning variable atomic. It might make sense if we want to raise an exception (or warning) if _spin_once_impl is called multiple times. Happy to make this change if it would be beneficial.

@fujitatomoya
Copy link
Copy Markdown
Collaborator

Pulls: #1477
Gist: https://gist.githubusercontent.com/fujitatomoya/36b8e8c5c643de0b9572cd3eb716871a/raw/5452a7922c32aed9ca24965b37db679c3e299622/ros2.repos
BUILD args: --packages-above-and-dependencies rclpy
TEST args: --packages-above rclpy
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/16321

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Copy Markdown
Collaborator

@brennanmk thanks for the effort and explanation.

for adding the spinning check, probably it would be better to have consensus to implement that. besides, this PR does make sense. i say this is good to go with 2nd approval from other maintainers.

@szobov
Copy link
Copy Markdown

szobov commented Jul 16, 2025

Hey there,
Thanks a lot for the effort.

What's needed to have this change merged?
Please let me know if I can assist with the review.

@szobov
Copy link
Copy Markdown

szobov commented Sep 18, 2025

Hey @fujitatomoya
What's needed to make it merged?
We face this issue quite often in our system.
Let me know if I can help bring it into upstream in any way.

@fujitatomoya
Copy link
Copy Markdown
Collaborator

@Mergifyio rebase

Signed-off-by: brennanmk <brennanmk2200@gmail.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Sep 18, 2025

rebase

✅ Branch has been successfully rebased

@fujitatomoya
Copy link
Copy Markdown
Collaborator

@szobov waiting for the another approval from maintainers.

@fujitatomoya
Copy link
Copy Markdown
Collaborator

@mjcarroll friendly ping.

@fujitatomoya
Copy link
Copy Markdown
Collaborator

Pulls: #1477
Gist: https://gist.githubusercontent.com/fujitatomoya/41f61146dde3a7c0029ae055a946b18b/raw/5452a7922c32aed9ca24965b37db679c3e299622/ros2.repos
BUILD args: --packages-above-and-dependencies rclpy
TEST args: --packages-above rclpy
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/17022

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@ahcorde
Copy link
Copy Markdown
Contributor

ahcorde commented Oct 1, 2025

Pulls: #1477
Gist: https://gist.githubusercontent.com/ahcorde/2dd35e155d331a60943382b0048bebd2/raw/5452a7922c32aed9ca24965b37db679c3e299622/ros2.repos
BUILD args: --packages-above-and-dependencies rclpy
TEST args: --packages-above rclpy
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/17109

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@ahcorde ahcorde merged commit bcbd489 into ros2:rolling Oct 2, 2025
3 checks passed
InvincibleRMC pushed a commit to InvincibleRMC/rclpy that referenced this pull request Nov 20, 2025
Signed-off-by: brennanmk <brennanmk2200@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>
InvincibleRMC pushed a commit to InvincibleRMC/rclpy that referenced this pull request Jan 12, 2026
Signed-off-by: brennanmk <brennanmk2200@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>
InvincibleRMC added a commit that referenced this pull request Jan 21, 2026
)

* Fix warnings from gcc. (#1501)

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Update type_support to use new abcs

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Cleanup old test cases to use new automatic inference

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Add content-filtered-topic interfaces (#1506)

Signed-off-by: Barry Xu <Barry.Xu@sony.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Added lock to protect futures for multithreaded executor (#1477)

Signed-off-by: brennanmk <brennanmk2200@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* EventsExecutor: Handle async callbacks for services and subscriptions (#1478)

Closes #1473

Signed-off-by: Brad Martin <bmartin@fatlxception.org>
Co-authored-by: Brad Martin <bmartin@fatlxception.org>
Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* add spinning state for the Executor classes. (#1510)

Signed-off-by: Tomoya.Fujita <tomoya.fujita825@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Fixes Action.*_async futures never complete (#1308)

Per rclpy:1123 If two seperate client server actions are running in seperate executors the future given to the ActionClient will never complete due to a race condition
This fixes  the calls to rcl handles potentially leading to deadlock scenarios by adding locks to there references
Co-authored-by: Aditya Agarwal <aditya.kgp25@gmail.com>
Co-authored-by: Jonathan Blixt <jmblixt3@gmail.com>
Signed-off-by: Jonathan Blixt <jmblixt3@gmail.com>

Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* remove unused 'param_type' (#1524)

'param_type' is set but never used

Signed-off-by: Christian Rauch <Christian.Rauch@unileoben.ac.at>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Changelog

Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* 10.0.1

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Remove duplicate future handling from send_goal_async (#1532)

A recent change intended to move this logic into a lock context, but
actually ended up duplicating it instead. This fixes that by removing
the duplicated logic outside of the lock. It also preserves the explicit
typing annotation on the future.

Signed-off-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* fix(test_events_executor): destroy all nodes before shutdown (#1538)

Signed-off-by: yuanyuyuan <az6980522@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* add BaseImpl

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Add ImplT Support

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* fix changelong

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Remove accidental tuple (#1542)

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Allow action servers without execute callback (#1219)

Signed-off-by: Tim Clephas <tim.clephas@nobleo.nl>

* add : get clients, servers info (#1307)

Signed-off-by: Minju, Lee <dlalswn531@naver.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* 10.0.2

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* update tests

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* ParameterEventHandler support ContentFiltering (#1531)

* ParameterEventHandler support ContentFiltering

Signed-off-by: Barry Xu <barry.xu@sony.com>

* Address review comments

Signed-off-by: Barry Xu <barry.xu@sony.com>

---------

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Fix issues with resuming async tasks awaiting a future (#1469)

Signed-off-by: Błażej Sowa <bsowa123@gmail.com>
Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
Co-authored-by: Nadav Elkabets <32939935+nadavelkabets@users.noreply.github.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* 10.0.3

Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Increase clock accuracy (#1564)

Signed-off-by: Florian Vahl <git@flova.de>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Use unconditional wait when possible. (#1563)

Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution.

Signed-off-by: Florian Vahl <florian.vahl@dlr.de>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Remove default from switch with enum, so that compiler warns. (#1566)

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Fix parameter parsing for unspecified target nodes (#1552)

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Improve the compatibility of processing YAML parameter files (#1548)

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Improve wildcard parsing and optimize the logic for parsing YAML para… (#1571)

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Expose action graph functions as Node class methods. (#1574)

* Expose action graph functions as Node class methods.

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>

* address review comments to keep the warning consistent.

Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>

---------

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>

* Fix performance bug in MultiThreadedExecutor (hopefully) (#1547)

Signed-off-by: Michael Tandy <git@mjt.me.uk>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Changelog

Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* 10.0.4

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* use Msg over BaseMessage

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Use Srv over BaseService

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Use Action over BaseAction

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* lint

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

* Update rclpy/rclpy/type_support.py

Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>

---------

Signed-off-by: Michael Carlstrom <rmc@carlstrom.com>
Signed-off-by: Barry Xu <Barry.Xu@sony.com>
Signed-off-by: brennanmk <brennanmk2200@gmail.com>
Signed-off-by: Brad Martin <bmartin@fatlxception.org>
Signed-off-by: Tomoya.Fujita <tomoya.fujita825@gmail.com>
Signed-off-by: Christian Rauch <Christian.Rauch@unileoben.ac.at>
Signed-off-by: Alejandro Hernandez Cordero <ahcorde@gmail.com>
Signed-off-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com>
Signed-off-by: yuanyuyuan <az6980522@gmail.com>
Signed-off-by: Tim Clephas <tim.clephas@nobleo.nl>
Signed-off-by: Minju, Lee <dlalswn531@naver.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Błażej Sowa <bsowa123@gmail.com>
Signed-off-by: Nadav Elkabets <elnadav12@gmail.com>
Signed-off-by: Michael Carroll <mjcarroll@intrinsic.ai>
Signed-off-by: Florian Vahl <git@flova.de>
Signed-off-by: Florian Vahl <florian.vahl@dlr.de>
Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Tomoya.Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Michael Tandy <git@mjt.me.uk>
Co-authored-by: Chris Lalancette <clalancette@gmail.com>
Co-authored-by: Barry Xu <barry.xu@sony.com>
Co-authored-by: Brennan Miller-Klugman <55165406+brennanmk@users.noreply.github.com>
Co-authored-by: Brad Martin <52003535+bmartin427@users.noreply.github.com>
Co-authored-by: Brad Martin <bmartin@fatlxception.org>
Co-authored-by: Alejandro Hernandez Cordero <ahcorde@gmail.com>
Co-authored-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Co-authored-by: Jonathan <jmblixt3@gmail.com>
Co-authored-by: Christian Rauch <Christian.Rauch@unileoben.ac.at>
Co-authored-by: Nathan Wiebe Neufeldt <wn.nathan@gmail.com>
Co-authored-by: Yuyuan Yuan <az6980522@gmail.com>
Co-authored-by: Tim Clephas <tim.clephas@nobleo.nl>
Co-authored-by: Minju, Lee <70446214+leeminju531@users.noreply.github.com>
Co-authored-by: Błażej Sowa <bsowa123@gmail.com>
Co-authored-by: Nadav Elkabets <32939935+nadavelkabets@users.noreply.github.com>
Co-authored-by: Michael Carroll <mjcarroll@intrinsic.ai>
Co-authored-by: Florian Vahl <git@flova.de>
Co-authored-by: Michael Tandy <git@mjt.me.uk>
Co-authored-by: Christophe Bedard <bedard.christophe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in MultiThreadedExecutor

6 participants