Fix potential resource deadlock of NodeSate while destorying timeSource#1865
Fix potential resource deadlock of NodeSate while destorying timeSource#1865Barry-Xu-2018 wants to merge 1 commit intoros2:masterfrom
Conversation
Signed-off-by: Barry Xu <barry.xu@sony.com>
| TimeSource::~TimeSource() | ||
| { | ||
| // Make sure clock subscriber thread is terminated at this time | ||
| node_state_->detachNode(); |
There was a problem hiding this comment.
| node_state_->detachNode(); | |
| node_state_.reset(); |
For consistency with TimeSource::detachNode(), I propose to rely on NodeState's dtor, which will call NodeState::detachNode() and will be called since there is only this one shared pointer to it, or? But even this line is redundant since node_state_ will be reset automatically in ~TimeSource().
There was a problem hiding this comment.
In my testing, using node_state_->detachNode() fixes the problem, while node_state_.reset() does not. I think it has to do with the lifetime of objects, specifically when passing the pointers into lambda captures.
That said, you are right, this shouldn't be necessary at all. In point of fact, looking at this code it is unnecessarily complicated. We really don't need to use shared and weak pointers for most of these things; if we manage the ClocksState as part of the NodeState, then we can just use normal object lifetimes. I made a set of changes that do this on this branch: https://github.com/clalancette/rclcpp/tree/clalancette/time-source-cleanups
It's a bigger change than this, but it cleans up the code pretty nicely and it solves the problem for me. @Barry-Xu-2018 can you take a look and let me know what you think?
There was a problem hiding this comment.
Thanks for your comments.
I find a bug which is relevant to NodeState's dtor. (Refer to #1861 (comment)). At this case, after TimeSource is destroyed (Normally, internal NodeState should be destroyed too), NodeState is alive. So I add node_state_->detachNode() to make sure clock subscriber thread must be terminated at this time.
There was a problem hiding this comment.
We really don't need to use shared and weak pointers for most of these things; if we manage the ClocksState as part of the NodeState, then we can just use normal object lifetimes. I made a set of changes that do this on this branch: https://github.com/clalancette/rclcpp/tree/clalancette/time-source-cleanups
Completely agree. I also think my solution isn't good solution.
It's a bigger change than this, but it cleans up the code pretty nicely and it solves the problem for me.
@Barry-Xu-2018 can you take a look and let me know what you think?
I quickly read your code.
I have one comments in detachNode().
https://github.com/clalancette/rclcpp/blob/a68d7519147f9ae38416d8a61dffc8e39ead0c1a/rclcpp/src/rclcpp/time_source.cpp#L282-L294
I think clocks_state_.disable_ros_time(); should be moved behind destroy_clock_sub();.
It is because we not sure whether clock_cb() is executed at disabling ros time.
void clock_cb(std::shared_ptr<const rosgraph_msgs::msg::Clock> msg)
{
if (!clocks_state_.is_ros_time_active() && SET_TRUE == this->parameter_state_) {
clocks_state_.enable_ros_time();
}
// Cache the last message in case a new clock is attached.
clocks_state_.cache_last_msg(msg);
...There was a problem hiding this comment.
@Barry-Xu-2018, @clalancette Thank you very much for the additional explanation.
@clalancette I've reviewed your bigger change and like it very much. The structure (object lifetimes, pointers) is much cleaner now. First, I had some doubts whether passing the this pointer of NodeState (instead of a shared_ptr) to the lambda expression for the subscription may cause new problems because of the risk that the clock_executor_ still accesses the the object while it is already being cleaned up. But of course this cannot happen if destroy_clock_sub() is called as first step of clean up. I propose to add a prominent note in detachNode or ~NodeState to ensure that this is not changed accidentally in the future.
|
Closing in favor of #1867. |
Address issue #1861