Skip to content

Sudden NTP sync can break timer callback periodicity #1080

@hsgwa

Description

@hsgwa

Bug report

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • from source
  • Version or commit hash:
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Run demo node

$ ros2 run demo_nodes_cpp talker
[INFO] [1610531479.711126895] [talker]: Publishing: 'Hello World: 1'
[INFO] [1610531480.711165784] [talker]: Publishing: 'Hello World: 2'
[INFO] [1610531481.711246655] [talker]: Publishing: 'Hello World: 3'
...

Adjust system time manually in another terminal.

$ sudo timedatectl set-ntp no
$ sudo date -s @`date +%s | xargs -I@ echo @-5 | bc` # adjust 5s before.

Expected behavior

Most developers may assume timer callback executes periodically with best-effort.

[INFO] [1610531479.711126895] [talker]: Publishing: 'Hello World: 33' <- 1s later
[INFO] [1610531480.711165784] [talker]: Publishing: 'Hello World: 34' <- 1s later # timer adjust here
[INFO] [1610531481.711246655] [talker]: Publishing: 'Hello World: 35' <- 1s later
[INFO] [1610531481.711246655] [talker]: Publishing: 'Hello World: 36' <- 1s later
...

Actual behavior

Timer callback execution waits 5 seconds when the system timer is adjusted to backward.

[INFO] [1610531479.711126895] [talker]: Publishing: 'Hello World: 33' <- 1s later
[INFO] [1610531480.711165784] [talker]: Publishing: 'Hello World: 34' <- 1s later # timer adjust here
[INFO] [1610531481.711246655] [talker]: Publishing: 'Hello World: 35' <- 5s later
[INFO] [1610531481.711246655] [talker]: Publishing: 'Hello World: 36' <- 1s later
...

Additional information

In the rcl layer, the execution time of the timer callback is obtained by steady_time (clock_monotonic_raw), so it looks like timer callback is expected to be executed periodically. (rclcpp/timer.hpp)
However, condition_variable->wait_for eventually executes futex system call that uses the system clock and can be affected by the system's time adjustment.(rmw_fastrtps/rmw_wait.cpp)

related issue:
ros2/rcutils#43

For systems that use NTP, this issue will only occur in cases the hardware clock advances fastly.
If the hardware clock advances slowly, the time adjustment is forward direction and timer callback is executed periodically.

However, some systems can behave differently as expected with following conditions.

  1. Use of hardware whose hardware clock advances fast.
  2. NTP is inaccessible or a computer has been shut down for along time.
    (After startup, system clock reference hardware clock)
  3. An application was executed before NTP synchronization is completed or NTP synchronization is suddenly executed during the operation.

This is an extreme example, but if an automated car that has not been used for several days suddenly adjusts its system time during driving, this could cause sensors and steer uncontrolled for a few seconds.
Incidentally, in my environment, the Raspberry Pi's hardware clock was delayed at a rate of 2.8 seconds per day.

This issue may not be a problem for most applications.
Rather than a specific source code change to ros packages, I think it can be adequately solved with applications or systems implementation.

I considered posting this to tutorials, ros2 design, etc.
I didn't know the appropriate place to post it, so I posted this as an issue.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions