Skip to content

lock-order-inversion (potential deadlock) in rclcpp::Client::wait_for_service, detected by ThreadSanitizer #1121

@yuriy-universal-ivanov

Description

@yuriy-universal-ivanov

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04
  • Installation type:
    • Binaries, with apt-get
  • Version or commit hash:
    • eloquent, from deb http://packages.ros.org/ros2/ubuntu bionic main
  • DDS implementation:
    • Default (Fast-RTPS)
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Reproduced when using rclcpp::executors::MultiThreadedExecutor::spin() in one thread and a rclcpp::Client waiting for a service in our own worker thread.

Expected behavior

Mutexes are always locked in the same order in rclcpp to avoid deadlocks.

Actual behavior

Some mutexes are locked in different order:

  1. When rclcpp::graph_listener::GraphListener::run_loop() -> rclcpp::node_interfaces::NodeGraph::notify_graph_change()
  2. When rclcpp::node_interfaces::NodeGraph::get_graph_event() -> undetected by TSan call in rclcpp: <null> <null> (librclcpp.so+0x9fb1c)

Additional information

In the worker thread:

    template <typename ResponseMessageType>
    ResponseMessageType FutureResponse<ResponseMessageType>::Impl::request(std::shared_ptr<RclcppClient> rclcpp_client,
                                                                           std::vector<uint8_t> request_message)
    {
      constexpr std::chrono::milliseconds single_spin_limit{100};

      while (!stop && rclcpp::ok() && !rclcpp_client->wait_for_service(single_spin_limit))
      {
      }

where rclcpp_client is std::shared_ptr<rclcpp::Client<our service type>> and stop is an std::atomic<bool>.

ThreadSanitizer detects a lock order inversion, which is a real threat of a deadlock:

WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=9849)
  Cycle in lock order graph: M470198622835870888 (0x000000000000) => M471324660181503496 (0x000000000000) => M470198622835870888

  Mutex M471324660181503496 acquired here while holding mutex M470198622835870888 in thread T159:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    #1 <null> <null> (librclcpp.so+0x9fb1c)
    #2 our_project::FutureResponse<our_project::BinarySerializable<void> >::Impl::request(std::shared_ptr<rclcpp::Client<component_wrapper_ros2_interfaces::srv::BinaryBuffers> >, std::vector<unsigned char, std::allocator<unsigned char> >) /our_project/components/component-wrapper/include/./component-wrapper/future_response.h:124 (OurProjectComponentWrapper_test+0x5d82e)
    #3 our_project::FutureResponse<our_project::BinarySerializable<void> >::Impl::Impl(std::shared_ptr<rclcpp::Client<component_wrapper_ros2_interfaces::srv::BinaryBuffers> >, std::vector<unsigned char, std::allocator<unsigned char> >)::{lambda()#1}::operator()() const /our_project/components/component-wrapper/include/./component-wrapper/future_response.h:106 (OurProjectComponentWrapper_test+0x5b31c)
<...>

  Mutex M470198622835870888 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    #1 rclcpp::node_interfaces::NodeGraph::get_graph_event() <null> (librclcpp.so+0xc1d38)
    #2 our_project::FutureResponse<our_project::BinarySerializable<void> >::Impl::request(std::shared_ptr<rclcpp::Client<component_wrapper_ros2_interfaces::srv::BinaryBuffers> >, std::vector<unsigned char, std::allocator<unsigned char> >) /our_project/components/component-wrapper/include/./component-wrapper/future_response.h:124 (OurProjectComponentWrapper_test+0x5d82e)
<...>

  Mutex M470198622835870888 acquired here while holding mutex M471324660181503496 in thread T161:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    #1 rclcpp::node_interfaces::NodeGraph::notify_graph_change() <null> (librclcpp.so+0xc067a)

  Mutex M471324660181503496 previously acquired by the same thread here:
    #0 pthread_mutex_lock <null> (libtsan.so.0+0x3fadb)
    #1 rclcpp::graph_listener::GraphListener::run_loop() <null> (librclcpp.so+0x9ffa3)

future_response.h:124 contains !rclcpp_client->wait_for_service(single_spin_limit)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions