Add a create_timer method to Node and LifecycleNode classes#1975
Conversation
|
Make sure to also sign-off your commits to satisfy the DCO check. |
b852aa2 to
a828be8
Compare
a828be8 to
cf4d644
Compare
jacobperron
left a comment
There was a problem hiding this comment.
Looks good to me. Please add a test exercising the new method, then I'll trigger CI.
Done. Please take a look at the unit tests, and particularly the Unit tests pass locally: 83: [==========] Running 16 tests from 1 test suite.
83: [----------] Global test environment set-up.
83: [----------] 16 tests from PerTimerType/TestTimer
83: [ RUN ] PerTimerType/TestTimer.test_simple_cancel/wall_timer
83: [ OK ] PerTimerType/TestTimer.test_simple_cancel/wall_timer (17 ms)
83: [ RUN ] PerTimerType/TestTimer.test_simple_cancel/generic_timer
83: [ OK ] PerTimerType/TestTimer.test_simple_cancel/generic_timer (5 ms)
83: [ RUN ] PerTimerType/TestTimer.test_is_canceled_reset/wall_timer
83: [ OK ] PerTimerType/TestTimer.test_is_canceled_reset/wall_timer (5 ms)
83: [ RUN ] PerTimerType/TestTimer.test_is_canceled_reset/generic_timer
83: [ OK ] PerTimerType/TestTimer.test_is_canceled_reset/generic_timer (5 ms)
83: [ RUN ] PerTimerType/TestTimer.test_run_cancel_executor/wall_timer
83: [ OK ] PerTimerType/TestTimer.test_run_cancel_executor/wall_timer (105 ms)
83: [ RUN ] PerTimerType/TestTimer.test_run_cancel_executor/generic_timer
83: [ OK ] PerTimerType/TestTimer.test_run_cancel_executor/generic_timer (105 ms)
83: [ RUN ] PerTimerType/TestTimer.test_run_cancel_timer/wall_timer
83: [ OK ] PerTimerType/TestTimer.test_run_cancel_timer/wall_timer (105 ms)
83: [ RUN ] PerTimerType/TestTimer.test_run_cancel_timer/generic_timer
83: [ OK ] PerTimerType/TestTimer.test_run_cancel_timer/generic_timer (105 ms)
83: [ RUN ] PerTimerType/TestTimer.test_bad_arguments/wall_timer
83: [ERROR] [1658525415.784172078] []: Failed to fini rcl clock.
83: [ OK ] PerTimerType/TestTimer.test_bad_arguments/wall_timer (5 ms)
83: [ RUN ] PerTimerType/TestTimer.test_bad_arguments/generic_timer
83: [ERROR] [1658525415.788818007] []: Failed to fini rcl clock.
83: [ OK ] PerTimerType/TestTimer.test_bad_arguments/generic_timer (5 ms)
83: [ RUN ] PerTimerType/TestTimer.callback_with_timer/wall_timer
83: [ OK ] PerTimerType/TestTimer.callback_with_timer/wall_timer (7 ms)
83: [ RUN ] PerTimerType/TestTimer.callback_with_timer/generic_timer
83: [ OK ] PerTimerType/TestTimer.callback_with_timer/generic_timer (6 ms)
83: [ RUN ] PerTimerType/TestTimer.callback_with_period_zero/wall_timer
83: [ OK ] PerTimerType/TestTimer.callback_with_period_zero/wall_timer (5 ms)
83: [ RUN ] PerTimerType/TestTimer.callback_with_period_zero/generic_timer
83: [ OK ] PerTimerType/TestTimer.callback_with_period_zero/generic_timer (4 ms)
83: [ RUN ] PerTimerType/TestTimer.test_failures_with_exceptions/wall_timer
83: [ERROR] [1658525415.815430083] [rclcpp]: Failed to clean up rcl timer handle: error not set
83: [ OK ] PerTimerType/TestTimer.test_failures_with_exceptions/wall_timer (4 ms)
83: [ RUN ] PerTimerType/TestTimer.test_failures_with_exceptions/generic_timer
83: [ERROR] [1658525415.820125821] [rclcpp]: Failed to clean up rcl timer handle: error not set
83: [ OK ] PerTimerType/TestTimer.test_failures_with_exceptions/generic_timer (5 ms)
83: [----------] 16 tests from PerTimerType/TestTimer (493 ms total)
83:
83: [----------] Global test environment tear-down
83: [==========] 16 tests from 1 test suite ran. (493 ms total)
83: [ PASSED ] 16 tests.
83: -- run_test.py: return code 0
83: -- run_test.py: inject classname prefix into gtest result file
The following tests passed:
test_timer
...
100% tests passed, 0 tests failed out of 1
Label Time Summary:
gtest = 0.68 sec*proc (1 test)
Total Test time (real) = 0.68 sec
Finished <<< rclcpp [0.94s]
Summary: 1 package finished [1.76s] |
Correct, if we are using a ROS clock (ie. potentially relying on the /clock topic) then we can't assume time will be be monotonically increasing. I don't think this violates any assumptions of the timer class, but I can double-check. |
fujitatomoya
left a comment
There was a problem hiding this comment.
lgtm with one question.
create_timer method to Node classcreate_timer method to Node and LifecycleNode classes
|
@ros-pull-request-builder retest this please |
There was a problem hiding this comment.
We should consider reverting this, even though the reversion should be a no-op. I get that this timer sets up the periodic publishing of /clock to drive peer nodes' perception of time (which we call "ROS time"). For this reason a wall timer makes more sense. However, my understanding is that since the parameter use_sim_time is not set on this node, create_timer should simply fall back to a wall timer. Thus, both create_timer and create_wall_timer are equivalent here.
There was a problem hiding this comment.
If I understand the tests correctly, we want this to be a wall timer so I would just revert it to be safe.
There was a problem hiding this comment.
yes, using create_wall_timer() here makes more sense I think
There was a problem hiding this comment.
Can you double check this is what you want!
We should consider reverting this, even though the reversion should be a no-op. I get that this timer sets up the periodic publishing of /clock to drive peer nodes' perception of time (which we call "ROS time"). For this reason a wall timer makes more sense. However, my understanding is that since the parameter use_sim_time is not set on this node, create_timer should simply fall back to a wall timer. Thus, both create_timer and create_wall_timer are equivalent here.
The above message I sent was wrong, as I incorrectly thought this change was to the SimClockPublisherNode code, which it is not. If you look at how this code is used in the unit test , we have a SimClockPublisherNode that publishes time on /clock and a ClockThreadTestingNode that consumes this time.
I'm fairly sure that we want a create_timer here to prove that a /clock driven sense of time has a callback within which you can make subsequent calls to this->now(). If we really wanted to test a wall timer, there would be no need to spin up a SimClockPublisherNode instance.
There was a problem hiding this comment.
Yes, that seems to be the case.
I'm not sure what was the test doing before as create_wall_timer() never used sim time.
There was a problem hiding this comment.
The comments also say that this test should be running with use_clock_thread=true, and that's not the case.
When I modify the test to do what's supposed to do, it's super flaky.
IMO, it's fair to comment out the test and create a new issue for it.
@jacobperron any opinion?
There was a problem hiding this comment.
I commented out the test as I think the problem is the test itself, see 0093121
There was a problem hiding this comment.
I thought the test was checking that the wall timer and simulation time updates were not interfering with each other. But, I'm not sure. I'm okay with disabling the flaky test. Maybe we can create a GitHub ticket to look into it (maybe it is just not a well-defined test and could be removed).
There was a problem hiding this comment.
Based on the comments, the test name, and the code; it seems that the test is supposed to be checking that the clock is updated when use_clock_thread=True even if you're blocked in a callback.
The part that's missing is that use_clock_thread was never actually enabled.
Enabling that doesn't solve the issue, as it seems something is not working well.
@anaelle-sw I think you added this feature, could you take a look at the issue?
|
@clalancette and @jacobperron -- I suspect that the nondeterminism in this PR has to do with discovery of the |
Update: the problem is completely fixed locally when I set |
ivanpauno
left a comment
There was a problem hiding this comment.
I haven't looked at the tests yet, but code looks good.
I only have a minor comment.
Could you link what the test that was causing issues is? |
The last five commits are all empty, and in theory they should all pass. However, passing happens on only some CI workers (look at the CI build history). Locally, things pass every time. I can occasionally replicate the issue with test |
Okay, that's strange. |
|
I'm taking a look to all the failures in https://build.ros2.org/job/Rpr__rclcpp__ubuntu_jammy_amd64/227/#showFailuresLink and https://build.ros2.org/job/Rpr__rclcpp__ubuntu_jammy_amd64/228/#showFailuresLink, and they don't seem to be related to your changes at all. It seems that for some reason they're happening in the Rpr job but not in the nightlies. About |
I initially tried |
329ec52 to
3de6312
Compare
|
@jacobperron and @ivanpauno - Please take a look at the last two commits and let me know if you are happy. I think I addressed all review comments, even though this one was quite tricky. I am of the opinion that this PR is now ready to merge. |
|
@ros-pull-request-builder retest this please |
1 similar comment
|
@ros-pull-request-builder retest this please |
|
Something I noticed while reading the tests, this line should say |
Fixed in c7acee7 |
Signed-off-by: Andrew Symington <andrew.c.symington@gmail.com>
… sequence Signed-off-by: Andrew Symington <andrew.c.symington@gmail.com>
Signed-off-by: Andrew Symington <andrew.c.symington@gmail.com>
Signed-off-by: Andrew Symington <andrew.c.symington@gmail.com>
c7acee7 to
43adf6e
Compare
ivanpauno
left a comment
There was a problem hiding this comment.
LGTM!
I will wait for @jacobperron approval as I made some minor changes.
Thanks @asymingt !!
…ble call sequence" This reverts commit 3de6312. Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
43adf6e to
1c16100
Compare
Signed-off-by: Ivan Santiago Paunovic <ivanpauno@ekumenlabs.com>
Thanks for getting this one over the line, Ivan 👍 |
jacobperron
left a comment
There was a problem hiding this comment.
LGTM, pending CI. One tiny nitpick.
There was a problem hiding this comment.
I thought the test was checking that the wall timer and simulation time updates were not interfering with each other. But, I'm not sure. I'm okay with disabling the flaky test. Maybe we can create a GitHub ticket to look into it (maybe it is just not a well-defined test and could be removed).
7e2e277 to
7e24c8d
Compare
|
I'm seeing a build failure in building tf2_ros that might be related to this change: https://ci.ros2.org/job/ci_linux-aarch64/11736/console#console-section-12 |
|
We probably should have run with test args |
|
Checking if it's this PR build: WIthout this PR repos file |
Problem
Right now the
NodeandLifecycleNodeclasses only provide a method for creating a wall timer, even though the underlying functions for creating a generic timer (which respects ROS time via/clock) are available. This is a problem because those that write node implementations -- and especially those that come from a ROS1 background -- default to using{Node, LifecycleNode}::create_wall_timernot understanding the implications this decision has on timer behavior in replay, and especially whenros2 bag play --rateis used. The existing method of creating the correct timer type is opaque: it requires knowledge of free-functions inrclcpp.Relates to issues:
Relates to pull requests:
Fix
This PR adds the piping to allow a timer that respects
/clockwhenuse_sim_time: Truenode parameter is set. While it is possible to do this manually withrclcpp::create_timerit's much less convenient. An analogous function already exists inrclpy.Progress
create_timermethod toNodeclass.chrono::durationcasts.create_timer,timer_sourceandtimer.TestTimeSource.check_sim_time_updated_in_callback_if_use_clock_thread.LifecycleNodeto also supportcreate_timer.create_timerto drop raw pointers and have common error checks.