Skip to content

fix bug mentioned in #3958 && a futhuer suggestion for child-LifecycleNode mechanism design #3971

@GoesM

Description

@GoesM

Bug report

Required Info:

  • Operating System:
    • all
  • ROS2 Version:
    • all
  • Version or commit hash:
    • newest
  • DDS implementation:

Steps to reproduce issue


Expected behavior

expected behavior what the degisner wish should be like

costmap_ros_->deactivate() works during planner/controller->on_deactivate(), as well as costmap_ros_->cleanup() works during planner/controller->on_cleanup()

.../nav2_planner/src/planner_server.cpp#L224 and L257, and similar code in nav2_controller

Actual behavior

As an individual LifecycleNode, costmap_ros_ may react to rcl_preshutdown randomly, and always be closed much earlier than planner/controller->deactivate()

it would lead to many unexpected crashes like bug mentioned in #3958 :

during shutdown_period, costmap_ros_ may react to rcl_preshutdown earlier, so that planner/controller may try to access the costmap_ros_ though it has been a nullptr. As a result it would lead to a crash during shutdown period.

such crashes may lead to bad situation that :

program shut down in an unplanned manner and result in residual processes.


Additional information

costmap is set as a Nav2::util LifecycleNode : ../nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp#L 73

all LifecycleNode would react to Ctrl+C ../nav2_util/src/lifecycle_node.cpp#L 90

all LifecycleNode would react to rcl_preshutdown randomly, and similar issue has been discussed before : ../ros2/rclcpp/issues/2096

this explanation should have been clear enough: ../nav2_planner/src/planner_server.cpp #L 214-L225

the bug happening in ISSUE: #3958 is caused by this.

because during shutdown_period, costmap_ros_ may react rcl_preshutdown earlier, so that planner/controller may try to access the costmap_ros_ though it has been a nullptr.


our provided solution

it seems that costmap is only used as a child-thread of planner/controller, ../nav2_planner/src/planner_server.cpp#L 89

so can we make it do nothing when reacting to Ctrl+C ?

doing so, its lifecycle would be completely bond to planner/controller's lifecycle, and also going to death in order.

//in file `costmap_2d_ros.hpp`
  /**
   * @brief override on_rcl_preshutdown() as empty
   * [reason] costmap only could be created by its parents like planner/controller_server
   * [reason] costmap may react to ctrl+C earlier than its parents to cause nullptr-accessed
   * so the costmap's reaction must be later than its parent to avoid nullptr-accessed
   * 
   * thus, it's a more perfect way to override on_rcl_preshutdown() as empty function
   * and its deactivate must be joint in planner/controller_server->on_deactivate()
   * and its cleanup must be joint in planner/controller_server->on_cleanup()
   * 
   * !!! a mention !!!
   * if any other place use this class (Costmap2DROS)
   * it's necessary to let costmap->deactivate() joint in parent->deactivate()
   * and let costmap->cleanup() joint in parent->cleanup() 
   */
  void on_rcl_preshutdown() override{
    //do nothing
    return ;
  }

because the costmp_ros_ is just a thread created by planner/controller,

whether program shutdown correctly or not, or even dead unexceptionably, the child-thread could be closed all the time since parent-LifecycleNode sub-process dead.

Thus there's no need to worry whether it would be still alive after whole program exit finally.

in addition

if our solution being adopted, many checks could be removed :

these double checks are also designed to face to bugs in situation that costmap_ros_ may be closed earlier than planner/controller. After our code modification, these checks are useless anymore.

what's more, our solution performs perfectly during our local tests, much more than our past solutions mentioned in #3958



Further Suggestion

We specially name nodes like costmap_ros_ as child-LifecycleNode, which is born from their parent-LifecycleNode like planner_server and controller_server.

our suggestion is:
Could we set up a mechanism specially for such situations like child-LifecycleNode, to make sure they're completely controlled by their parent?

these child-LifecycleNode should have some basic specialist like:

  • they must be created by a parent-LifecycleNode
  • they must be created as a thread but not a sub-process

such mechanism could be like:

  • child's->activate should be joint into its their parent's on_configure
  • child->deactivate() must be joint into its parent's on_deactivate
  • child->cleanup() must be joint into its parent's on_cleanup
  • and so on

Perhaps such mechanism could be used for any program in ROS2 ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions