Skip to content

Conversation

@kmeaw
Copy link
Contributor

@kmeaw kmeaw commented Mar 30, 2025

When trying to calculate the next firing of 'hourly', we'd lose the tm_isdst value on the next iteration.

On most systems in Europe/Dublin it would cause a 100% cpu hang due to timers restarting.

This happens in Europe/Dublin because Ireland defines the Irish Standard Time as UTC+1, so winter time is encoded in tzdata as negative 1 hour of daylight saving.

Before this patch:

$ env TZ=Europe/Dublin systemd-analyze calendar --base-time='Sat 2025-03-29 22:00:00 UTC' --iterations=5 'hourly'
  Original form: hourly
Normalized form: *-*-* *:00:00
    Next elapse: Sat 2025-03-29 23:00:00 GMT
       (in UTC): Sat 2025-03-29 23:00:00 UTC
       From now: 13h ago
   Iteration #2: Sun 2025-03-30 00:00:00 GMT
       (in UTC): Sun 2025-03-30 00:00:00 UTC
       From now: 12h ago
   Iteration #3: Sun 2025-03-30 00:00:00 GMT  <-- note every next iteration having the same firing time
       (in UTC): Sun 2025-03-30 00:00:00 UTC
       From now: 12h ago
...

With this patch:

$ env TZ=Europe/Dublin systemd-analyze calendar --base-time='Sat 2025-03-29 22:00:00 UTC' --iterations=5 'hourly'
  Original form: hourly
Normalized form: *-*-* *:00:00
    Next elapse: Sat 2025-03-29 23:00:00 GMT
       (in UTC): Sat 2025-03-29 23:00:00 UTC
       From now: 13h ago
   Iteration #2: Sun 2025-03-30 00:00:00 GMT
       (in UTC): Sun 2025-03-30 00:00:00 UTC
       From now: 12h ago
   Iteration #3: Sun 2025-03-30 02:00:00 IST  <-- the expected 1 hour jump
       (in UTC): Sun 2025-03-30 01:00:00 UTC
       From now: 11h ago
...

Fixes #32039.

@github-actions github-actions bot added util-lib tests please-review PR is ready for (re-)review by a maintainer labels Mar 30, 2025
@bluca bluca added ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR and removed please-review PR is ready for (re-)review by a maintainer labels Mar 30, 2025
@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR labels Mar 31, 2025
@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels Mar 31, 2025
@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Mar 31, 2025
(void) mktime_or_timegm_usec(&c, spec->utc, /* ret= */ NULL);
c.tm_isdst = spec->dst;
if (dst != -2)
c.tm_isdst = dst;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we should unset the flag here again, no? i.e. invalidate_dst = false; here, to make sure we only revalidate the thing once after each check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't think I understand your idea - if we turn dst into a bool then we have two sources of tm_isdst - spec->dst and c.tm_isdst (which is set by mktime on each iteration before being overwritten). The invalidate_dst flag would determine which source would we use - first spec->dst is used but if we are stuck, we switch to mktime's tm_isdst.
Once the flag is flipped, we cannot update the spec's dst as the caller expects us to get a const pointer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure i follow? you only need the -2 thing locally, so you can replace that with a local variable no? that simply controls which source to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "make sure we only revalidate the thing once after each check" confused me.
I have replaced -2 with a local variable as you suggested - if true then keep the value of c.tm_isdst that was stored by mktime, otherwise overwrite (as the function did before) it with the value of spec->dst.

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels Mar 31, 2025
@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Mar 31, 2025

r = tm_compare(tm, &c);
if (r == 0)
r = CMP(*usec - 1, (usec_t) tm_usec);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can't this underflow? i.e. if *usec is zero?

I think it would make more sense to add 1 to tm_usec instead, since we know that that's not going to be UINT64_MAX since we treat that as USEC_INFINITY, which we don't allow.

(maybe add an assert() here checking for this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching that, adding 1 is sure a safer way. tm_usec isn't expected to go over 1e6, so (1e6)+1 won't overflow.
I don't feel comfortable doing an assert() in pid1. Do you think it would still be a good idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts to enforce against programming errors are common and used all over the place so it's ok

@poettering
Copy link
Member

looks good, just one more tweak

@poettering poettering added reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks and removed please-review PR is ready for (re-)review by a maintainer labels Apr 3, 2025
@github-actions github-actions bot added please-review PR is ready for (re-)review by a maintainer and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Apr 4, 2025
@bluca bluca removed the please-review PR is ready for (re-)review by a maintainer label Apr 9, 2025
When trying to calculate the next firing of 'hourly', we'd lose the
tm_isdst value on the next iteration.

On most systems in Europe/Dublin it would cause a 100% cpu hang due to
timers restarting.

This happens in Europe/Dublin because Ireland defines the Irish Standard Time
as UTC+1, so winter time is encoded in tzdata as negative 1 hour of daylight
saving.

Before this patch:
$ env TZ=IST-1GMT-0,M10.5.0/1,M3.5.0/1 systemd-analyze calendar --base-time='Sat 2025-03-29 22:00:00 UTC' --iterations=5 'hourly'
  Original form: hourly
Normalized form: *-*-* *:00:00
    Next elapse: Sat 2025-03-29 23:00:00 GMT
       (in UTC): Sat 2025-03-29 23:00:00 UTC
       From now: 13h ago
   Iteration #2: Sun 2025-03-30 00:00:00 GMT
       (in UTC): Sun 2025-03-30 00:00:00 UTC
       From now: 12h ago
   Iteration #3: Sun 2025-03-30 00:00:00 GMT  <-- note every next iteration having the same firing time
       (in UTC): Sun 2025-03-30 00:00:00 UTC
       From now: 12h ago
...

With this patch:
$ env TZ=IST-1GMT-0,M10.5.0/1,M3.5.0/1 systemd-analyze calendar --base-time='Sat 2025-03-29 22:00:00 UTC' --iterations=5 'hourly'
  Original form: hourly
Normalized form: *-*-* *:00:00
    Next elapse: Sat 2025-03-29 23:00:00 GMT
       (in UTC): Sat 2025-03-29 23:00:00 UTC
       From now: 13h ago
   Iteration #2: Sun 2025-03-30 00:00:00 GMT
       (in UTC): Sun 2025-03-30 00:00:00 UTC
       From now: 12h ago
   Iteration #3: Sun 2025-03-30 02:00:00 IST  <-- the expected 1 hour jump
       (in UTC): Sun 2025-03-30 01:00:00 UTC
       From now: 11h ago
...

This bug isn't reproduced on Debian and Ubuntu because they mitigate it by
using the rearguard version of tzdata. ArchLinux and NixOS don't, so it would
cause pid1 to spin during DST transition.

This is how the affected tzdata looks like:
$ zdump -V -c 2024,2025 Europe/Dublin
Europe/Dublin  Sun Mar 31 00:59:59 2024 UT = Sun Mar 31 00:59:59 2024 GMT isdst=1 gmtoff=0
Europe/Dublin  Sun Mar 31 01:00:00 2024 UT = Sun Mar 31 02:00:00 2024 IST isdst=0 gmtoff=3600
Europe/Dublin  Sun Oct 27 00:59:59 2024 UT = Sun Oct 27 01:59:59 2024 IST isdst=0 gmtoff=3600
Europe/Dublin  Sun Oct 27 01:00:00 2024 UT = Sun Oct 27 01:00:00 2024 GMT isdst=1 gmtoff=0

Compare it to Europe/London:
$ zdump -V -c 2024,2025 Europe/London
Europe/London  Sun Mar 31 00:59:59 2024 UT = Sun Mar 31 00:59:59 2024 GMT isdst=0 gmtoff=0
Europe/London  Sun Mar 31 01:00:00 2024 UT = Sun Mar 31 02:00:00 2024 BST isdst=1 gmtoff=3600
Europe/London  Sun Oct 27 00:59:59 2024 UT = Sun Oct 27 01:59:59 2024 BST isdst=1 gmtoff=3600
Europe/London  Sun Oct 27 01:00:00 2024 UT = Sun Oct 27 01:00:00 2024 GMT isdst=0 gmtoff=0

Fixes #32039.
@bluca bluca added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/with-minor-suggestions labels Apr 9, 2025
Copy link
Member

@yuwata yuwata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@yuwata yuwata merged commit e4bb033 into systemd:main Apr 9, 2025
37 of 45 checks passed
@github-actions github-actions bot removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Timers started looping service at 2024-03-31 00:00:00 UTC

4 participants