-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Bug report
Required Info:
- Operating System:
- all
- ROS2 Version:
- all
- Version or commit hash:
- newest
- DDS implementation:
Steps to reproduce issue
Expected behavior
expected behavior what the degisner wish should be like
costmap_ros_->deactivate() works during planner/controller->on_deactivate(), as well as costmap_ros_->cleanup() works during planner/controller->on_cleanup()
.../nav2_planner/src/planner_server.cpp#L224 and L257, and similar code in nav2_controller
Actual behavior
As an individual LifecycleNode, costmap_ros_ may react to rcl_preshutdown randomly, and always be closed much earlier than planner/controller->deactivate()
it would lead to many unexpected crashes like bug mentioned in #3958 :
during shutdown_period, costmap_ros_ may react to rcl_preshutdown earlier, so that planner/controller may try to access the costmap_ros_ though it has been a nullptr. As a result it would lead to a crash during shutdown period.
such crashes may lead to bad situation that :
program shut down in an unplanned manner and result in residual processes.
Additional information
costmap is set as a Nav2::util LifecycleNode : ../nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp#L 73
all LifecycleNode would react to Ctrl+C ../nav2_util/src/lifecycle_node.cpp#L 90
all LifecycleNode would react to rcl_preshutdown randomly, and similar issue has been discussed before : ../ros2/rclcpp/issues/2096
this explanation should have been clear enough: ../nav2_planner/src/planner_server.cpp #L 214-L225
the bug happening in ISSUE: #3958 is caused by this.
because during shutdown_period, costmap_ros_ may react rcl_preshutdown earlier, so that planner/controller may try to access the costmap_ros_ though it has been a nullptr.
our provided solution
it seems that costmap is only used as a child-thread of planner/controller, ../nav2_planner/src/planner_server.cpp#L 89
so can we make it do nothing when reacting to Ctrl+C ?
doing so, its lifecycle would be completely bond to planner/controller's lifecycle, and also going to death in order.
//in file `costmap_2d_ros.hpp`
/**
* @brief override on_rcl_preshutdown() as empty
* [reason] costmap only could be created by its parents like planner/controller_server
* [reason] costmap may react to ctrl+C earlier than its parents to cause nullptr-accessed
* so the costmap's reaction must be later than its parent to avoid nullptr-accessed
*
* thus, it's a more perfect way to override on_rcl_preshutdown() as empty function
* and its deactivate must be joint in planner/controller_server->on_deactivate()
* and its cleanup must be joint in planner/controller_server->on_cleanup()
*
* !!! a mention !!!
* if any other place use this class (Costmap2DROS)
* it's necessary to let costmap->deactivate() joint in parent->deactivate()
* and let costmap->cleanup() joint in parent->cleanup()
*/
void on_rcl_preshutdown() override{
//do nothing
return ;
}because the costmp_ros_ is just a thread created by planner/controller,
whether program shutdown correctly or not, or even dead unexceptionably, the child-thread could be closed all the time since parent-LifecycleNode sub-process dead.
Thus there's no need to worry whether it would be still alive after whole program exit finally.
in addition
if our solution being adopted, many checks could be removed :
these double checks are also designed to face to bugs in situation that costmap_ros_ may be closed earlier than planner/controller. After our code modification, these checks are useless anymore.
what's more, our solution performs perfectly during our local tests, much more than our past solutions mentioned in #3958
Further Suggestion
We specially name nodes like costmap_ros_ as child-LifecycleNode, which is born from their parent-LifecycleNode like planner_server and controller_server.
our suggestion is:
Could we set up a mechanism specially for such situations like child-LifecycleNode, to make sure they're completely controlled by their parent?
these child-LifecycleNode should have some basic specialist like:
- they must be created by a parent-LifecycleNode
- they must be created as a thread but not a sub-process
such mechanism could be like:
child's->activateshould be joint into its their parent'son_configurechild->deactivate()must be joint into its parent'son_deactivatechild->cleanup()must be joint into its parent'son_cleanup- and so on
Perhaps such mechanism could be used for any program in ROS2 ?