Skip to content

rosbag2_transport crashes due to unsafe threading #329

@zmichaels11

Description

@zmichaels11

Description

Original issue: ros2/rmw_cyclonedds#118
It looks like the E2E tests for rosbag2_transport crash due to unsafe usage of std::async.

Expected Behavior

colcon test --packages-select rosbag2_transport should pass when using rmw_cyclonedds

Actual Behavior

*The test case "RecordIntegrationTestFixture.published_messages_from_multiple_topics_are_recorded" sometimes fails.

sometimes the output looks like this:

colcon test-result      
/opt/ros/master/build/rosbag2_transport/Testing/20200324-2148/Test.xml: 15 tests, 0 errors, 1 failure, 0 skipped
- test_record
  <<< failure message
    -- run_test.py: invoking following command in '/opt/ros/master/build/rosbag2_transport':
     - /opt/ros/master/build/rosbag2_transport/test_record --gtest_output=xml:/opt/ros/master/clion/test_results/rosbag2_transport/test_record.gtest.xml
    Running main() from gmock_main.cc
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from RecordIntegrationTestFixture
    [ RUN      ] RecordIntegrationTestFixture.published_messages_from_multiple_topics_are_recorded
    
    >>> [rcutils|error_handling.c:108] rcutils_set_error_state()
    This error state is being overwritten:
    
      'array_list is already initialized, at /opt/ros/master/src/ros2/rcutils/src/array_list.c:61, at /opt/ros/master/src/ros2/rcl/rcl/src/rcl/logging_rosout.c:188'
    
    with this new error message:
    
      'Failed to add publisher to map., at /opt/ros/master/src/ros2/rcl/rcl/src/rcl/logging_rosout.c:190'
    
    rcutils_reset_error() should be called after error handling to avoid this.
    <<<
    [INFO] [1585086552.363023089] [rosbag2_transport]: Listening for topics...
    unknown file: Failure
    C++ exception with description "failed to initialize rcl node: Failed to add publisher to map., at /opt/ros/master/src/ros2/rcl/rcl/src/rcl/logging_rosout.c:190" thrown in the test body.
  >>>

Summary: 15 tests, 0 errors, 1 failure, 0 skipped

Sometimes the output looks like this:

colcon test-result
/opt/ros/master/build/rosbag2_transport/Testing/20200324-2153/Test.xml: 15 tests, 0 errors, 1 failure, 0 skipped
- test_record
  <<< failure message
    -- run_test.py: invoking following command in '/opt/ros/master/build/rosbag2_transport':
     - /opt/ros/master/build/rosbag2_transport/test_record --gtest_output=xml:/opt/ros/master/clion/test_results/rosbag2_transport/test_record.gtest.xml
    Running main() from gmock_main.cc
    [==========] Running 1 test from 1 test case.
    [----------] Global test environment set-up.
    [----------] 1 test from RecordIntegrationTestFixture
    [ RUN      ] RecordIntegrationTestFixture.published_messages_from_multiple_topics_are_recorded
    -- run_test.py: return code -11
    -- run_test.py: generate result file '/opt/ros/master/clion/test_results/rosbag2_transport/test_record.gtest.xml' with failed test
    -- run_test.py: verify result file '/opt/ros/master/clion/test_results/rosbag2_transport/test_record.gtest.xml'
  >>>

Summary: 15 tests, 0 errors, 1 failure, 0 skipped

To Reproduce

  1. colcon test --packages-select rosbag2_transport --retest-until-fail 10 (the retest-until-fail is because this test can succeed, depending on how lucky you are)
  2. colcon test-result

System (please complete the following information)

  • OS: Ubuntu Focal
  • ROS 2 Distro: Foxy
  • Version: Master

Additional context

This can also be readily observed in CycloneDDS Nightly Linux CI: http://build.ros2.org/view/Fci/job/Fci__nightly-cyclonedds_ubuntu_focal_amd64/

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions