Skip to content

Fix race condition in SynchronizedStringParameter#1039

Closed
corycrean wants to merge 7 commits intomoveit:mainfrom
corycrean:ssp_race_condition
Closed

Fix race condition in SynchronizedStringParameter#1039
corycrean wants to merge 7 commits intomoveit:mainfrom
corycrean:ssp_race_condition

Conversation

@corycrean
Copy link
Contributor

  • Fix race condition in SynchronizedStringParameter.
  • Remove redundant callbacks.

Description

This PR addresses the issue in moveit/moveit2_tutorials#262 where the robot model can't be loaded. Currently, the subscription created in SynchronizedStringParameter::waitForMessage is added to two WaitSet objects (one belonging to the executor, and one belonging to the SynchronizedStringParameter), and it's unpredictable which WaitSet will end up processing the messages received by that subscription. This PR fixes the problem by removing the WaitSet from SynchronizedStringParameter.

Checklist

  • Required by CI: Code is auto formatted using clang-format
  • Extend the tutorials / documentation reference
  • Document API changes relevant to the user in the MIGRATION.md notes
  • Create tests, which fail without this PR reference
  • Include a screenshot if changing a GUI
  • While waiting for someone to review your request, please help review another open pull request to support the maintainers


std::string content_;
boost::lockfree::spsc_queue<std::string> queue_;
std::mutex queue_mutex_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for a mutex? I thought lock-free mean no mutex needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the lock is that pushing into a spsc queue is not re-entrant safe. Because there are many threads that can be run by a ROS executor and there is the possibility of the callback being called on two threads at once you need to protect pushing into the spsc_queue.

rclcpp::Publisher<std_msgs::msg::String>::SharedPtr string_publisher_;

std::string content_;
boost::lockfree::spsc_queue<std::string> queue_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for a solution that doesn't require Boost since we're planning to get rid of it as a dependency

Copy link
Member

@tylerjw tylerjw Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no stl replacement for the spsc queue. The plan to remove boost is to remove the parts of boost that have been migrated into the stl. The spsc_queue is not one of those things.

auto start = node->now();

urdf_string_ =
urdf_ssp_.loadInitialValue(node, ros_name, std::bind(&RDFLoader::urdfUpdateCallback, this, std::placeholders::_1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing the context a little, but why did you remove the callbacks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked to me like the callback was responsible for:

  1. Saving the urdf_string_
  2. Calling loadFromStrings

but those things are already done in the constructor, so I thought the callbacks could be safely removed. I'm noticing now that the callbacks also call new_model_cb_() -- should this be added to the constructor as well?

Also, calling loadFromStrings from urdfUpdateCallback, before srdf_string_ was loaded, produced an error message https://github.com/ros-planning/moveit2/blob/2ee83c31e107ff71f4469fe0d048eb18606c8e93/moveit_ros/planning/rdf_loader/src/rdf_loader.cpp#L106 which was harmless but a little confusing, and removing the callbacks got rid of that error message as well.

@corycrean
Copy link
Contributor Author

It looks like my implementation here fails when node_ isn't spinning when the RDFLoader is created. (The original implementation worked fine in that case, but had a race condition when node_ was spinning.) I think there's a simpler way to handle both cases, which I implemented in #1050, so I'm closing this PR.

@corycrean corycrean closed this Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants