Skip to content

rclcpp_action goal handle timeout logic clearing long-running goals #3765

@AlexeyMerzlyakov

Description

@AlexeyMerzlyakov

This issue describes the flaky issues appeared with test_waypoint_follower system test

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04.3
  • ROS2 Version:
    • ROS2 Rolling (built from sources)
  • Version or commit hash:
  • DDS implementation:
    • Cyclone DDS

Steps to reproduce issue

  1. Enable visualization for Waypoint Follower system test by following patch:
diff --git a/nav2_system_tests/src/waypoint_follower/test_case_py.launch b/nav2_system_tests/src/waypoint_follower/test_case_py.launch
index adeb839c..c7194fe9 100755
--- a/nav2_system_tests/src/waypoint_follower/test_case_py.launch
+++ b/nav2_system_tests/src/waypoint_follower/test_case_py.launch
@@ -40,6 +40,8 @@ def generate_launch_description():
     bringup_dir = get_package_share_directory('nav2_bringup')
     params_file = os.path.join(bringup_dir, 'params/nav2_params.yaml')
 
+    rviz_config_file = os.path.join(bringup_dir, 'rviz', 'nav2_default_view.rviz')
+
     # Replace the `use_astar` setting on the params file
     configured_params = RewrittenYaml(
         source_file=params_file,
@@ -59,6 +61,13 @@ def generate_launch_description():
                  '--minimal_comms', world],
             output='screen'),
 
+        # Launch rviz
+        Node(
+            package='rviz2',
+            executable='rviz2',
+            arguments=['-d', rviz_config_file],
+            output='screen'),
+
         # TODO(orduno) Launch the robot state publisher instead
         #              using a local copy of TB3 urdf file
         Node(
  1. Run test_waypoint_follower manually, or using the attached below stress-test script:
    run_stress.sh.txt

Expected behavior

Test passes 30 times in a row.

Actual behavior

The following problems were found during the test:

Problem1: No action

When the robot is moving near the obstacle, it could stuck. Sometimes, this causes recovery behaviors that being triggered.
If this recovery was triggered (moreover, it was detected that only "Received request to clear entirely the local_costmap" event is enough for that), and robot successfully reached its goal, but the NavigateToPoseNavigator action returns UNKNOWN status instead of SUCCEEDED, which causes main WaypointFollower cycle to hang:

2023-08-15.15-33-02.no_action_failure_case_only.mp4
Problem2: Incorrect out-of-map status

Sometimes, when running set waypoint outside of map test scenario, missed_waypoint contains INVALID_PATH=103 status instead of expected ComputePathToPose.Goal().GOAL_OUTSIDE_MAP=204 which gives the following error messages:

[bt_navigator-12] [INFO] [1692161747.546793612] [bt_navigator]: Begin navigating from current location to (100.00, 100.00)
[planner_server-10] [WARN] [1692161747.548072894] [planner_server]: GridBased plugin failed to plan from (0.70, 0.19) to (100.00, 100.00): "Goal Coordinates of(100.000000, 100.000000) was outside bounds"
[planner_server-10] [WARN] [1692161747.548158462] [planner_server]: [compute_path_to_pose] [ActionServer] Aborting handle.
[bt_navigator-12] [ERROR] [1692161747.577081372] [bt_navigator]: Goal failed
[bt_navigator-12] [WARN] [1692161747.577271398] [bt_navigator]: [navigate_to_pose] [ActionServer] Aborting handle.
[tester_node-16] [INFO] [1692161747.949221718] [nav2_waypoint_tester]: Goal failed to process all waypoints, missed 1 wps.
[tester_node-16] Traceback (most recent call last):
[tester_node-16]   File "/home/amerzlyakov/nav2_ws/src/navigation2/nav2_system_tests/src/waypoint_follower/tester.py", line 271, in <module>
[tester_node-16]     main()
[tester_node-16]   File "/home/amerzlyakov/nav2_ws/src/navigation2/nav2_system_tests/src/waypoint_follower/tester.py", line 233, in main
[tester_node-16]     assert test.action_result.missed_waypoints[0].error_code == \
[tester_node-16] AssertionError
Problem3: Robot is getting lost after setting initial pose second time

In the stop on failure test with bogous waypoint test scenario there is test.setInitialPose(starting_pose) called second time while robot is not placed on its initial pose. This causes robot to lose its position. In most cases, robot still operating and reaching its goal; but sometimes since its position is getting lost, robot continues its uncontrolled movement between pillars causing testcase to fail by timeout:
Screenshot_2023-08-16_09-41-43_cr
Screenshot_2023-08-16_09-41-50_cr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions