Skip to content

Fix potential resource deadlock of NodeSate while destorying timeSource#1865

Closed
Barry-Xu-2018 wants to merge 1 commit intoros2:masterfrom
Barry-Xu-2018:topic-fix-resource-deadlock
Closed

Fix potential resource deadlock of NodeSate while destorying timeSource#1865
Barry-Xu-2018 wants to merge 1 commit intoros2:masterfrom
Barry-Xu-2018:topic-fix-resource-deadlock

Conversation

@Barry-Xu-2018
Copy link
Copy Markdown
Collaborator

Address issue #1861

TimeSource::~TimeSource()
{
// Make sure clock subscriber thread is terminated at this time
node_state_->detachNode();
Copy link
Copy Markdown
Contributor

@ralph-lange ralph-lange Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
node_state_->detachNode();
node_state_.reset();

For consistency with TimeSource::detachNode(), I propose to rely on NodeState's dtor, which will call NodeState::detachNode() and will be called since there is only this one shared pointer to it, or? But even this line is redundant since node_state_ will be reset automatically in ~TimeSource().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing, using node_state_->detachNode() fixes the problem, while node_state_.reset() does not. I think it has to do with the lifetime of objects, specifically when passing the pointers into lambda captures.

That said, you are right, this shouldn't be necessary at all. In point of fact, looking at this code it is unnecessarily complicated. We really don't need to use shared and weak pointers for most of these things; if we manage the ClocksState as part of the NodeState, then we can just use normal object lifetimes. I made a set of changes that do this on this branch: https://github.com/clalancette/rclcpp/tree/clalancette/time-source-cleanups

It's a bigger change than this, but it cleans up the code pretty nicely and it solves the problem for me. @Barry-Xu-2018 can you take a look and let me know what you think?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralph-lange

Thanks for your comments.
I find a bug which is relevant to NodeState's dtor. (Refer to #1861 (comment)). At this case, after TimeSource is destroyed (Normally, internal NodeState should be destroyed too), NodeState is alive. So I add node_state_->detachNode() to make sure clock subscriber thread must be terminated at this time.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette

We really don't need to use shared and weak pointers for most of these things; if we manage the ClocksState as part of the NodeState, then we can just use normal object lifetimes. I made a set of changes that do this on this branch: https://github.com/clalancette/rclcpp/tree/clalancette/time-source-cleanups

Completely agree. I also think my solution isn't good solution.

It's a bigger change than this, but it cleans up the code pretty nicely and it solves the problem for me.
@Barry-Xu-2018 can you take a look and let me know what you think?

I quickly read your code.
I have one comments in detachNode().
https://github.com/clalancette/rclcpp/blob/a68d7519147f9ae38416d8a61dffc8e39ead0c1a/rclcpp/src/rclcpp/time_source.cpp#L282-L294
I think clocks_state_.disable_ros_time(); should be moved behind destroy_clock_sub();.
It is because we not sure whether clock_cb() is executed at disabling ros time.

  void clock_cb(std::shared_ptr<const rosgraph_msgs::msg::Clock> msg)
  {
    if (!clocks_state_.is_ros_time_active() && SET_TRUE == this->parameter_state_) {
      clocks_state_.enable_ros_time();
    }
    // Cache the last message in case a new clock is attached.
    clocks_state_.cache_last_msg(msg);
   ...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Barry-Xu-2018, @clalancette Thank you very much for the additional explanation.
@clalancette I've reviewed your bigger change and like it very much. The structure (object lifetimes, pointers) is much cleaner now. First, I had some doubts whether passing the this pointer of NodeState (instead of a shared_ptr) to the lambda expression for the subscription may cause new problems because of the risk that the clock_executor_ still accesses the the object while it is already being cleaned up. But of course this cannot happen if destroy_clock_sub() is called as first step of clean up. I propose to add a prominent note in detachNode or ~NodeState to ensure that this is not changed accidentally in the future.

@clalancette
Copy link
Copy Markdown
Contributor

Closing in favor of #1867.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants