Skip to content

Delay systemd-timesyncd start after network is deemed online#2068

Merged
agners merged 1 commit intohome-assistant:devfrom
agners:delay-systemd-timesyncd-start
Aug 17, 2022
Merged

Delay systemd-timesyncd start after network is deemed online#2068
agners merged 1 commit intohome-assistant:devfrom
agners:delay-systemd-timesyncd-start

Conversation

@agners
Copy link
Member

@agners agners commented Aug 17, 2022

With commit 2d3119e ("Delay Supervisor start until time has been
sychronized (#1360)") systemd-time-wait-sync.service got enabled, which
waits until systemd-timesyncd synchronizes time with a NTP server.

By default systemd-timesyncd.service and systemd-time-wait-sync.service
are pulled in by sysinit.target. This starts the services before full
network connectivity is established. The first sychronization fails and
systemd-timesyncd only retries after a ratelimit mechanism times out.
This causes a dealy of 30s during startup. While systemd-timesyncd has
a mechanism to (re)try time synchronization when network becomes
online, it seems that those only work properly when systemd-networkd
is used, see also systemd/systemd#24298.

Simply reordering systemd-timesyncd.service after network-online.target
does not work as it causes circular dependencies (NetworkManager itself
depends ultimately on the sysinit.target).

With this change, the services are only pulled in by time-sync.target.
That allows to order the service after network-online.target. With that
the first synchronization succeeds.

This mechanism also works when a NTP server is provided through DHCP.
In that case, a the systemd-timesyncd service is started by the dispatch
script /usr/lib/NetworkManager/dispatcher.d/10-ntp before the systemd
even considers starting the service. Tests show that the default
fallback NTP is not contacted, only the DHCP provided service.

With commit 2d3119e ("Delay Supervisor start until time has been
sychronized (home-assistant#1360)") systemd-time-wait-sync.service got enabled, which
waits until systemd-timesyncd synchronizes time with a NTP server.

By default systemd-timesyncd.service and systemd-time-wait-sync.service
are pulled in by sysinit.target. This starts the services before full
network connectivity is established. The first sychronization fails and
systemd-timesyncd only retries after a ratelimit mechanism times out.
This causes a dealy of 30s during startup. While systemd-timesyncd has
a mechanism to (re)try time synchronization when network becomes
online, it seems that those only work properly when systemd-networkd
is used, see also systemd/systemd#24298.

Simply reordering systemd-timesyncd.service after network-online.target
does not work as it causes circular dependencies (NetworkManager itself
depends ultimately on the sysinit.target).

With this change, the services are only pulled in by time-sync.target.
That allows to order the service after network-online.target. With that
the first synchronization succeeds.

This mechanism also works when a NTP server is provided through DHCP.
In that case, a the systemd-timesyncd service is started by the dispatch
script /usr/lib/NetworkManager/dispatcher.d/10-ntp before the systemd
even considers starting the service. Tests show that the default
fallback NTP is not contacted, only the DHCP provided service.
@jens-maus
Copy link
Member

Sorry to ask here, but have you ever thought about using chrony rather than the systemd own timesync daemon which IMHO is more limited/error prone than using chrony which many öinux distributions tend to use as thw default NTP daemon/client?!?

@jens-maus
Copy link
Member

... in addition IMHO using chrony would not only introduce a real NTP daemon which should be used in a server kind environment (which we have in case of home assistant os) to ensure precise network timing, but also would allow to provide was config changes so that users could modify the default NTP servers used like this is AFAIK currently the case.

@agners
Copy link
Member Author

agners commented Aug 17, 2022

Well, systemd-timesyncd (which is a separate daemon which does the NTP thingy) also has a config file, and with the current OS version we also modify it using a NetworkManager script in case DHCP provides an NTP server. We could add a NTP option to the frontend, and edit that file through OS-Agent.

However, I am actually eying on this change, which adds D-Bus support to edit the NTP servers. That will be much easier to be used from Supervisor.

I haven't looked into chrony to be honest. In general, I like the systemd stuff quite well as it is usually written quite solid and integrates with each other... It's just when you use non-systemd components where some more integration work needs to be done (like in this case).

@agners agners merged commit 7a693be into home-assistant:dev Aug 17, 2022
@jens-maus
Copy link
Member

Well, all I can say is that we are using chronyd rather than the systemd timesyncd for good reasons in our server-based environments at work. Don't know all NTP internals of timesyncd really, but as chronyd is the defacto successor of ntpd providing a full fledged NTP daemon we experienced way better NTP performance and compatibility compared to timesyncd which IMHO is more suited for desktop environments and chronyd being more an NTP daemon for the needs of servers, i.e. 24/7 environments.

@agners
Copy link
Member Author

agners commented Aug 17, 2022

The question is if our use case is really closer to Server or Desktop. Yeah, we run 24h, but we run in people's homes.. It's more an embedded use-case. Also, we use D-Bus, which is typically found in Desktop environments to talk to system services.

For now, I don't intend to change, but I'll keep it in mind as an option in case we encounter problems with systemd-timesyncd.

@nagyrobi
Copy link

nagyrobi commented Sep 4, 2022

Home Assistant is closer to a server due to the fact that it relies on time precision much heavier than a desktop uses to. A lot of the automation stuff in HA relies on time and its precision, and this is important when the system runs continuously, without intreruption.

tinesyncd is not too precise. When ran periodically (or at boot) it adjusts system clock if there's a difference. Each time it does that small jumps in time occur. This can be, in certain edge cases a severe source of problems.

ntpd/chronyd in exchange runs constantly by smoothly stretching the time constantly so that no jumps in time ever happen. Services like that more and precision is higher.

Migration to ntpd/chronyd would be more than welcome and would be in fact a very proffessional addition to the Home Assistant server.

@jens-maus
Copy link
Member

Migration to ntpd/chronyd would be more than welcome and would be in fact a very proffessional addition to the Home Assistant server.

I can only fully agree on that observation and would also welcome introducing of chrony as the central NTP daemon being using within HomeAssistantOS.

@nagyrobi
Copy link

nagyrobi commented Sep 4, 2022

And, as a bonus, it could be used as the internal NTP server for the multitude of IoT devices in the local network, offloading the public NTP infrastructure.

(I do maintain a public NTP server)

@agners
Copy link
Member Author

agners commented Sep 5, 2022

tinesyncd is not too precise. When ran periodically (or at boot) it adjusts system clock if there's a difference. Each time it does that small jumps in time occur. This can be, in certain edge cases a severe source of problems.

Afaik, that is not entirely true: It does big jumps if the clock is way off, but not for smaller deltas (quote from the man page "slowly adjust it for smaller deltas"). It seems to track the RTC drift as well:

# timedatectl timesync-status
...
    Precision: 1us (-25)
Root distance: 11.176ms (max: 5s)
       Offset: -304us
        Delay: 18.318ms
       Jitter: 1.219ms
 Packet count: 89
    Frequency: -11.250ppm

From what I can tell, it uses the very same kernel API as chronyd uses (clock_adjtime). See also: https://github.com/systemd/systemd/blob/a7614fbe85da0e108f85443c06f28c07d065857b/src/timesync/timesyncd-manager.c#L250

According to its man page it seems to implement SNTP only, which seems to be inferior to full NTP. But I wonder, in terms of time syncing, what is really the advantage of using chronyd, and in which exact use-case would that matter?

The thing is, ~half of the installations are on Raspberry Pi, which has no RTC 🙈 for those users, time is lost on power off. We do a full sync at startup, before HA is started. That of course only works if there is Internet connectivity. But it's the best thing we can do. From that point on, systemd-timesyncd should adjust time only slowly.

And, as a bonus, it could be used as the internal NTP server for the multitude of IoT devices in the local network, offloading the public NTP infrastructure.

I don't think we should make HAOS a NTP server by default. That seems to ask for a lot of troubles, especially since people run it on Raspberry Pi's without RTC.

Sure, it might make sense for a more advanced setup with proper RTC, etc. But it should be an add-on in that case, so people can install and configure it according to their needs.

Furthermore, the systemd-timesycnd D-Bus API has been extended, and it is a very convenient way to communicate with the NTP service from Supervisor. We intend to make the NTP server use configurable through these new D-Bus APIs.

Afaict, chronyd has no D-Bus API. It seems to have a custom API via Unix domain socket /var/run/chrony/chronyd.sock, which we could access from Supervisor (it would need an OS change, but switching to chronyd would need that anyways).

All that said, I am not against chronyd. I just don't see the value in putting the effort into switching to it.

@agners agners deleted the delay-systemd-timesyncd-start branch September 5, 2022 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants