Skip to content

Multiple wakes caused by wait_mutex in MultiThreadedExecutor #1148

@peterpena

Description

@peterpena

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04
  • Installation type:
    • From source (Foxy)
  • Version or commit hash:
    • Master branch
  • DDS implementation:
    • FastRTPS

Steps to reproduce issue

#include <chrono>
#include <condition_variable>
#include <memory>
#include <mutex>
#include <string>
#include <thread>


#include "rclcpp/rclcpp.hpp"

using namespace std::chrono_literals;

class MultiTimer final : public rclcpp::Node
{
public:
  MultiTimer() : Node("multi_timer")
  {
    timer_ = this->create_wall_timer(
      2s, std::bind(&MultiTimer::timer_callback, this));
  }

private:
  void timer_callback()
  {
    RCLCPP_WARN(this->get_logger(), "Timer executed!");
  }

  rclcpp::TimerBase::SharedPtr timer_;
};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::executors::MultiThreadedExecutor executor;
  auto multi_timer = std::make_shared<MultiTimer>();
  executor.add_node(multi_timer);
  executor.spin();
  rclcpp::shutdown();
  return 0;
}

Expected behavior

When the guard condition is triggered, the thread that has the TimerBase::SharedPtr should be able to remove it from the scheduled_timers_ set; allowing the other thread to insert the TimerBase::SharedPtr into the scheduled_timers_ and execute it.

Actual behavior

The thread with the TimerBase is locked by the wait_mutex while the other thread wakes up from rcl_wait, finds a timer handle, but the scheduled_timer_ already has the timer base; so it loops again and goes to sleep repeating the process until the thread with the TimerBase::SharedPtr can get lock and remove it from scheduled_timers_. This issue is related to issue #877.

Additional information

[INFO] [1590794464.500770014] [rclcpp]: We got a timer !!!!!! --> Thread 1 finds a timer
[WARN] [1590794464.500999632] [multi_timer]: Timer executed! --> Thread 1 executes timer
[INFO] [1590794464.501258844] [rclcpp]: trigger guard condition!!!!!! --> Thread 1 triggers guard condition to wake up thread 2
[WARN] [1590794464.501313510] [rclcpp]: Waiting for lock to release **************************************************************** --> Thread 1 is waiting for Thread 2 to release lock
[INFO] [1590794466.663264141] []: Negative timeout .... --> Thread 2 wakes up
[INFO] [1590794466.663281221] []: Ready timer... --> Thread 2 finds a timer is ready
[INFO] [1590794466.663309650] [rclcpp]: wait_for_work completed --> Thread 2 leaves `wait_for_work`
[INFO] [1590794466.663343131] [rclcpp]: We got a timer !!!!!! --> `Thread 2 finds a timer executable in `get_next_executable`
[INFO] [1590794466.663352218] [rclcpp]: There is double timer!!!!!! --> Thread 2 tries to insert the `TimerBase::SharedPtr` in the `scheduled_timers_` but finds it is already there
[INFO] [1590794466.663387961] []: Negative timeout .... --> Thread 2 repeats the process and goes into `wait_for_work`
[INFO] [1590794466.663413380] []: Ready timer...
[INFO] [1590794466.663443670] [rclcpp]: wait_for_work completed
[INFO] [1590794466.663477824] [rclcpp]: We got a timer !!!!!!
[INFO] [1590794466.663487038] [rclcpp]: There is double timer!!!!!!
[INFO] [1590794466.663520572] []: Negative timeout ....
[INFO] [1590794466.663538625] []: Ready timer...
[INFO] [1590794466.663568124] [rclcpp]: wait_for_work completed
[INFO] [1590794466.663603595] [rclcpp]: We got a timer !!!!!!
[INFO] [1590794466.663612871] [rclcpp]: There is double timer!!!!!!
[INFO] [1590794466.663646848] []: Negative timeout ....
[INFO] [1590794466.663664237] []: Ready timer...
[INFO] [1590794466.663693468] [rclcpp]: wait_for_work completed
[INFO] [1590794466.663727852] [rclcpp]: We got a timer !!!!!!
[INFO] [1590794466.663737011] [rclcpp]: There is double timer!!!!!!
[INFO] [1590794466.663772649] []: Negative timeout ....
[INFO] [1590794466.663815189] [rclcpp]: wait_for_work completed
[INFO] [1590794466.663856006] [rclcpp]: We got a timer !!!!!!
[INFO] [1590794466.663865147] [rclcpp]: There is double timer!!!!!!
 ... This repeats many times ....
[WARN] [1590794466.663903965] [rclcpp]: timer lock release ****************************************************** --> Finally, thread 1 can remove the `TimerBase::SharedPtr` from `scheduled_timers_`
[INFO] [1590794466.663965196] []: Negative timeout ....
[INFO] [1590794466.663994174] []: Ready timer...
[INFO] [1590794466.664025785] [rclcpp]: wait_for_work completed
[INFO] [1590794466.664053201] [rclcpp]: We got a timer !!!!!!
[WARN] [1590794466.664075867] [multi_timer]: Timer executed! --> Thread 2 can now execute the timer

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions