Skip to content

Possible regression in rcl preshutdown callbacks - context invalid? #2547

@SteveMacenski

Description

@SteveMacenski

Bug report

Required Info:

  • Operating System:
    • 24.04
  • Installation type:
    • ROS 2 Jazzy, Rolling
  • Version or commit hash:
    • 28.2.0-1noble.20240430.174609 (from docker)
  • DDS implementation:
    • N/A
  • Client library (if applicable):
    • rclcpp

Description

I've noticed in the Nav2 transition in both unit tests and now at the system level that there appears to be a regression in the handling of the context and the pre-shutdown callbacks starting with the transition to 24.04 and seen in both Jazzy and current rolling. There has been some extended discussion wrt the unit tests in the comments:

But I'm also now seeing it on shutdown of Nav2 at the system level now that we have Gazebo Harmonic working with Nav2. We run our rcl preshutdown callbacks to transition the nodes that may not have been properly transitioned to shutdown before Control+C was hit. This is executing as expected with logs:

[component_container_isolated-8] [INFO] [1717089793.781486696] [controller_server]: Running Nav2 LifecycleNode rcl preshutdown (controller_server)
[component_container_isolated-8] [INFO] [1717089793.781516996] [controller_server]: Destroying bond (controller_server) to lifecycle manager.
[component_container_isolated-8] [INFO] [1717089793.781521867] [smoother_server]: Running Nav2 LifecycleNode rcl preshutdown (smoother_server)
[component_container_isolated-8] [INFO] [1717089793.781526590] [smoother_server]: Destroying bond (smoother_server) to lifecycle manager.
...

But then also contain traces from rcl which seem related to publishers failing to do their thing because the context is invalid at the stage of pre-shutdown. We see only publisher failures when we have the system properly cycled down, but if we have the Nav2 lifecycle nodes still active so services are called to transition them, then we see the service eq as well.

[component_container_isolated-8] [INFO] [1717089793.874960887] [controller_server]: Destroying
[component_container_isolated-8] [ERROR] [1717089793.875015316] [controller_server]: Unable to start transition 5 from current state shuttingdown: Could not publish transition: publisher's context is invalid, at ./src/rcl/publisher.c:423, at ./src/rcl_lifecycle.c:368
[component_container_isolated-8] [WARN] [1717089793.875026497] [rclcpp_lifecycle]: Shutdown error in destruction of LifecycleNode: final state(unconfigured)
...

I clipped the logs to show one server in particular, but this repeats for all instances.

This can be reproduced in Nav2 using the 24.04/Rolling + the simulator in ros-navigation/navigation2#3634. Drive around for a moment, stop, and hit control+C. We previously had a clean shutdown without crash. It can also be more minimally reproduced in the PR https://github.com/ros-navigation/navigation2/pull/4298/files TEST(WPTest, test_dynamic_parameters) test changes -- where we manually shut things down & reset the object internally to the test to avoid the need for the rcl pre-shutdown to do work, thereby bypassing it

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions