Skip to content

Independent mutex to check and remove timer from scheduled timer#1168

Open
peterpena wants to merge 6 commits intorollingfrom
peterpena/timer-mutex
Open

Independent mutex to check and remove timer from scheduled timer#1168
peterpena wants to merge 6 commits intorollingfrom
peterpena/timer-mutex

Conversation

@peterpena
Copy link
Copy Markdown
Contributor

This PR tries to fix issue #1148. An independent mutex is added to protect the scheduled_timer_ set. This enables one thread to remove a timer from scheduled_timer_ while the other thread asynchronously tries to find the next executable. This removes the issue where one thread cannot remove the timer while the other thread keeps waking up and tries to get the timer but cannot execute it because the other thread is locked by the mutex. I ran tests to verify single and multi threaded executors can run the timer callbacks without any issues:

-- run_test.py: extra environment variables to append:
 - LD_LIBRARY_PATH+=/home/peter/ros2_foxy/build/rclcpp
-- run_test.py: invoking following command in '/home/peter/ros2_foxy/build/rclcpp':
 - /home/peter/ros2_foxy/build/rclcpp/test_wait_mutex --gtest_output=xml:/home/peter/ros2_foxy/build/rclcpp/test_results/rclcpp/test_wait_mutex.gtest.xml

Running main() from /home/peter/ros2_foxy/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from TestWaitMutex
[ RUN      ] TestWaitMutex.set_timer_wait_mutex_multi_threaded
[       OK ] TestWaitMutex.set_timer_wait_mutex_multi_threaded (12031 ms)
[ RUN      ] TestWaitMutex.set_timer_wait_mutex_single_threaded
[       OK ] TestWaitMutex.set_timer_wait_mutex_single_threaded (12024 ms)
[----------] 2 tests from TestWaitMutex (24055 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (24062 ms total)
[  PASSED  ] 2 tests.


-- run_test.py: return code 0
-- run_test.py: inject classname prefix into gtest result file '/home/peter/ros2_foxy/build/rclcpp/test_results/rclcpp/test_wait_mutex.gtest.xml'
-- run_test.py: verify result file '/home/peter/ros2_foxy/build/rclcpp/test_results/rclcpp/test_wait_mutex.gtest.xml'

@peterpena peterpena requested a review from wjwwood June 11, 2020 20:14
@peterpena peterpena force-pushed the peterpena/timer-mutex branch from 8743230 to 40fdb8e Compare June 11, 2020 20:15
Copy link
Copy Markdown
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some minor comments.


using namespace std::chrono_literals;

class TestWaitMutex : public ::testing::Test
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this test is to check wall_timer if the user callback is issued as expected times? if so, it would be better to change the file name and class name? like test_wall_timer_count.cpp or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to test_time_call_count.cpp

@fujitatomoya
Copy link
Copy Markdown
Collaborator

@peterpena

btw, i confirmed the following difference between before and after,

  • before
[INFO] [1591939878.346141939] [multi_timer]: Timer executed!
[DEBUG] [1591939878.346165295] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1591939878.346255275] [rcl]: Guard condition in wait set is ready
[DEBUG] [1591939878.346336561] [rcl]: Waiting with timeout: 1s + 997190395ns
[DEBUG] [1591939878.346362808] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.343696436] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.343978198] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.344031401] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.344079956] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.344231231] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.344254341] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.344274941] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.344339097] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.344353732] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.344371500] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.344482527] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.344898532] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.344935506] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.345049010] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.345086660] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.345131275] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.345275321] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1591939880.345309318] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591939880.345347118] [rcl]: Timer in wait set is ready
[DEBUG] [1591939880.345414981] [rcl]: Calling timer
[INFO] [1591939880.345447669] [multi_timer]: Timer executed!
  • after
[INFO] [1591941505.293213375] [multi_timer]: Timer executed!
[DEBUG] [1591941505.293298531] [rcl]: Waiting without timeout
[DEBUG] [1591941505.293502878] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1591941505.293542102] [rcl]: Guard condition in wait set is ready
[DEBUG] [1591941505.293691008] [rcl]: Waiting with timeout: 1s + 999020401ns
[DEBUG] [1591941505.293723050] [rcl]: Timeout calculated based on next scheduled timer: true
[DEBUG] [1591941507.292893558] [rcl]: Timer in wait set is ready
[DEBUG] [1591941507.293109779] [rcl]: Calling timer
[INFO] [1591941507.293163348] [multi_timer]: Timer executed!

Copy link
Copy Markdown
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but there seems to be a confliction.

@peterpena peterpena force-pushed the peterpena/timer-mutex branch 2 times, most recently from 61e37df to f0a59be Compare June 15, 2020 22:56
Copy link
Copy Markdown
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I'd like someone else to check.

@ivanpauno
Copy link
Copy Markdown
Member

This removes the issue where one thread cannot remove the timer while the other thread keeps waking up and tries to get the timer but cannot execute it because the other thread is locked by the mutex

This was only a performance problem, right?
At some point the first thread unlocked the mutex and the others can start executing.

@ros2 ros2 deleted a comment from fujitatomoya Jun 16, 2020
Copy link
Copy Markdown
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using two mutexes LGTM!

@ivanpauno
Copy link
Copy Markdown
Member

@ivanpauno ivanpauno deleted a comment from fujitatomoya 2 minutes ago

I was deleting a couple comments I did when reviewing (that were wrong), and by accident I deleted your comment @fujitatomoya 😁 . I don't know how to restore a comment, hope you remember what it was about.

@peterpena
Copy link
Copy Markdown
Contributor Author

This removes the issue where one thread cannot remove the timer while the other thread keeps waking up and tries to get the timer but cannot execute it because the other thread is locked by the mutex

This was only a performance problem, right?
At some point the first thread unlocked the mutex and the others can start executing.

Yes, that is correct. After some number of iterations, the first thread is able to remove the timer from the scheduled_timers_.

Copy link
Copy Markdown
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


executor.add_node(node);
executor.spin();
} No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing new line before EOF

@ivanpauno
Copy link
Copy Markdown
Member

Yes, that is correct. After some number of iterations, the first thread is able to remove the timer from the scheduled_timers_.

Thanks for clarifying!!

@peterpena peterpena force-pushed the peterpena/timer-mutex branch 2 times, most recently from f0a59be to fedf242 Compare June 19, 2020 22:20
@ivanpauno
Copy link
Copy Markdown
Member

ivanpauno commented Jun 29, 2020

@peterpena is this ready?

(edit) there seems to be a bunch of liner failures

@peterpena
Copy link
Copy Markdown
Contributor Author

@ivanpauno I still have to check why I am getting errors that were not there before.

@peterpena peterpena force-pushed the peterpena/timer-mutex branch 5 times, most recently from 0a40fcf to 4345721 Compare July 6, 2020 20:59
peterpena added 5 commits July 6, 2020 17:05
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
Signed-off-by: Pedro Pena <peter.a.pena@gmail.com>
@peterpena peterpena force-pushed the peterpena/timer-mutex branch from 4345721 to a4fe5a5 Compare July 6, 2020 21:06
@peterpena
Copy link
Copy Markdown
Contributor Author

Its good to go now. I believe @wjwwood wanted to take a look before I merge it though.

Copy link
Copy Markdown
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! sorry for the delay in reviewing.

@peterpena
Copy link
Copy Markdown
Contributor Author

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@peterpena
Copy link
Copy Markdown
Contributor Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@peterpena
Copy link
Copy Markdown
Contributor Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@peterpena
Copy link
Copy Markdown
Contributor Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Copy link
Copy Markdown
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_next_executable and scheduled_timers_ should be protected by the same mutex, unless there will be possibility that the same timer is executed multiple times.

}
continue;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. context switch here with already scheduled timer by Thread-1. (Thread-2)


if (any_exec.timer) {
std::lock_guard<std::mutex> wait_lock(wait_mutex_);
std::lock_guard<std::mutex> wait_lock(scheduled_timers_mutex_);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. execute timer and remove it from scheduled_timers_mutex_ (Thread-1)

}

if (any_exec.timer) {
std::lock_guard<std::mutex> wait_lock(scheduled_timers_mutex_);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the same timer will be scheduled again and executed. (Thread-2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants