Skip to content

Fix Issue #604 in django-celery-beat: Celery Beat Crashing at the End of Daylight Savings #7901

Open
polarmt wants to merge 15 commits intocelery:mainfrom
polarmt:master
Open

Fix Issue #604 in django-celery-beat: Celery Beat Crashing at the End of Daylight Savings #7901
polarmt wants to merge 15 commits intocelery:mainfrom
polarmt:master

Conversation

@polarmt
Copy link

@polarmt polarmt commented Nov 11, 2022

Dependency

celery/django-celery-beat#605

Note: The two PRs are codependent and need to be merged together.

Description

These changes are necessary to fix celery/django-celery-beat#604 and is a dependency for celery/django-celery-beat#605. Both issues have more detailed information about the problem and how testing was performed for the changes.

@polarmt polarmt changed the title Remove logic to convert one timestamp to be the same as other Fix Issue #604 in django-celery-beat: Celery Beat Crashing at the End of Daylight Savings Nov 11, 2022
@auvipy
Copy link
Member

auvipy commented Nov 15, 2022

can you rebase on top top of master again please?

@polarmt
Copy link
Author

polarmt commented Nov 15, 2022

can you rebase on top top of master again please?

Done rebasing.

@auvipy
Copy link
Member

auvipy commented Nov 17, 2022

I am not sure why builds are failing, can you check please? are the tests passing locally?

@polarmt polarmt force-pushed the master branch 2 times, most recently from 36c6fc3 to 405ddfd Compare November 17, 2022 19:46
@polarmt
Copy link
Author

polarmt commented Nov 17, 2022

I am a little confused as to what the behavior should be for Celery beat. Currently, the only unit test that is failing is from https://github.com/celery/celery/blob/master/t/unit/app/test_schedules.py#L441. This unit test was added to deal with #1604.

This is where the confusion lies. During the end of DST, my understanding is that we do want the tasks to run twice. For example, a Celery task runs at 1:30 PDT and 1:30 PST. From our business perspective, this makes the most sense. However, the problem in #1604 seems to be that they do not want to the tasks to run twice.

What should be the expected behavior? Should we add a configuration parameter to control this? If so, what is the default parameter?

@auvipy auvipy requested a review from a team November 18, 2022 05:11
@auvipy
Copy link
Member

auvipy commented Nov 29, 2022

I will come back to this soon

@auvipy
Copy link
Member

auvipy commented Dec 14, 2022

can you please elaborate why a task should run twice?

@Nusnus Nusnus removed this from the 5.3 milestone Feb 19, 2023
@auvipy
Copy link
Member

auvipy commented Mar 2, 2023

I am a little confused as to what the behavior should be for Celery beat. Currently, the only unit test that is failing is from https://github.com/celery/celery/blob/master/t/unit/app/test_schedules.py#L441. This unit test was added to deal with #1604.

This is where the confusion lies. During the end of DST, my understanding is that we do want the tasks to run twice. For example, a Celery task runs at 1:30 PDT and 1:30 PST. From our business perspective, this makes the most sense. However, the problem in #1604 seems to be that they do not want to the tasks to run twice.

What should be the expected behavior? Should we add a configuration parameter to control this? If so, what is the default parameter?

can you please revisit this? I want to know more of your thoughts

@polarmt
Copy link
Author

polarmt commented Jun 30, 2023

For instance, let us assume that we have a task that we want to run every hour. An example of such important task is sending a timely email every hour at the 30 minute mark. Then, we would want this task to run both at 1:30 AM PDT AND 1:30 AM PST.

In our business, we send such timely emails for many of our features hourly at different minute marks and having one hour of downtime is detrimental.

@auvipy
Copy link
Member

auvipy commented Jul 2, 2023

For instance, let us assume that we have a task that we want to run every hour. An example of such important task is sending a timely email every hour at the 30 minute mark. Then, we would want this task to run both at 1:30 AM PDT AND 1:30 AM PST.

In our business, we send such timely emails for many of our features hourly at different minute marks and having one hour of downtime is detrimental.

I think running a task multiple times in different timezone should be allowed.

@polarmt
Copy link
Author

polarmt commented Jul 3, 2023

Thanks, @auvipy ! In that case, keep us posted on the progress of the review.

@auvipy
Copy link
Member

auvipy commented Jul 6, 2023

can't review it as the builds are failing and there are merge conflicts. we need those two resolved before moving further.

@polarmt
Copy link
Author

polarmt commented Jul 6, 2023

Can you run the builds again?

@auvipy
Copy link
Member

auvipy commented Jul 8, 2023

re started and builds are failing

@auvipy
Copy link
Member

auvipy commented Nov 6, 2023

can you restart this?

@polarmt
Copy link
Author

polarmt commented Nov 7, 2023

Yes. Yesterday, we discovered a potential bug with this change that was exposed by the latest DST transition. I will summarize it and fix it when I have time.

@polarmt
Copy link
Author

polarmt commented Nov 7, 2023

Root Cause of Bug

In the existing logic of the Celery source code, it will convert the start timezone to the current timezone (PDT to PST) and then compute the difference between start and now. If we compare 1:15 PM PST and 1:15 PM PDT, the difference should be one hour apart. However, with the unnecessary conversion, the remaining function would return 0 minutes. To fix this issue, we skipped the conversion of start from PDT to PST before computing the expected end time.

This works for periodic tasks that run every X minutes, but the logic seems to be incorrect if the beat task uses a crontab. time.remaining takes in three main parameters: now, ends_in, and start. The ends_in represents when we expect the task to run next. This value can either be a timedelta object or an ffwd object.

An ffwd object does not represent the time difference between two times. Instead, "adding" an ffwd object will replace the current datetime object with the times specified in the ffwd object. For instance, if we add ffwd(hour=7, minute=0, weekday=6) to 1:15 AM on Saturday, the resulting time would become 7:00 AM on the following Sunday.

If the Saturday in the example was prior to the transition and the Sunday was afterwards, then we expect the start (1:15 AM) to be in PDT and end_date (7:00 AM) to be PST. Because we skip the conversion of start in the new logic, the end_date will be incorrectly in PDT even though we have transitioned to PST. For ffwd, the changes will not work.

Symptoms of Bug

The first time that the Celery beat task will run will be an hour earlier than the scheduled time. Then, the Celery beat task will run again at the expected time. This causes the Celery beat task to run an additional time unnecessarily.

Proposed Fix

We can check if ends_in is an ffwd object or not before deciding whether to convert ends_in or not:

   if str(start.tzinfo) == str(now.tzinfo) and now.utcoffset() != start.utcoffset():
        if now.utcoffset() > start.utcoffset() or isinstance(ends_in, ffwd):
            # DST started
            start = start.replace(tzinfo=now.tzinfo)

@polarmt
Copy link
Author

polarmt commented Nov 8, 2023

I think I fixed the issue. We can see whether the CI passes.

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following test fails as per CI report. we need to fix it to make everything green again

=================================== FAILURES ===================================
________________________________ test_remaining ________________________________

  def test_remaining():
      # Relative
      remaining(datetime.utcnow(), timedelta(hours=1), relative=True)
  
      """
      The upcoming cases check whether the next run is calculated correctly
      """
      eastern_tz = ZoneInfo("US/Eastern")
      tokyo_tz = ZoneInfo("Asia/Tokyo")
      eastern_tz_pytz = pytz.timezone("US/Eastern")
      tokyo_tz_pytz = pytz.timezone("Asia/Tokyo")
  
      # Case 1: `start` in UTC and `now` in other timezone
      start = datetime.now(ZoneInfo("UTC"))
      now = datetime.now(eastern_tz)
      delta = timedelta(hours=1)
      assert str(start.tzinfo) == str(ZoneInfo("UTC"))
      assert str(now.tzinfo) == str(eastern_tz)
      rem_secs = remaining(start, delta, now).total_seconds()
      # assert remaining time is approximately equal to delta
      assert rem_secs == pytest.approx(delta.total_seconds(), abs=1)
  
      # Case 2: `start` and `now` in different timezones (other than UTC)
      start = datetime.now(eastern_tz)
      now = datetime.now(tokyo_tz)
      delta = timedelta(hours=1)
      assert str(start.tzinfo) == str(eastern_tz)
      assert str(now.tzinfo) == str(tokyo_tz)
      rem_secs = remaining(start, delta, now).total_seconds()
      assert rem_secs == pytest.approx(delta.total_seconds(), abs=1)
  
      """
      Case 3: DST check
      Suppose start (which is last_run_time) is in EST while next_run is in EDT,
      then check whether the `next_run` is actually the time specified in the
      start (i.e. there is not an hour diff due to DST).
      In 2019, DST starts on March 10
      """
      start = datetime(day=9, month=3, year=2019, hour=10, minute=0, tzinfo=eastern_tz)         # EST
      now = datetime(day=11, month=3, year=2019, hour=1, minute=0, tzinfo=eastern_tz)           # EDT
      delta = ffwd(hour=10, year=2019, microsecond=0, minute=0, second=0, day=11, weeks=0, month=3)
      # `next_actual_time` is the next time to run (derived from delta)
      next_actual_time = datetime(
          day=11, month=3, year=2019, hour=10, minute=0, tzinfo=eastern_tz)  # EDT
      assert start.tzname() == "EST"
      assert now.tzname() == "EDT"
      assert next_actual_time.tzname() == "EDT"
      rem_time = remaining(start, delta, now)
      next_run = now + rem_time
      assert next_run == next_actual_time
  
      """
      Case 4: DST check (ZoneInfo, timedelta)
      Suppose start (which is last_run_time) is in PDT while next_run is in PST,
      Check whether there is an hour added to the time between now and the `next_run`
      In 2022, DST ends on Nov 6
      """
      start = datetime(day=6, month=11, year=2022, hour=1, minute=15, tzinfo=eastern_tz, fold=0)
      now = datetime(day=6, month=11, year=2022, hour=1, minute=34, tzinfo=eastern_tz, fold=1)
      ends_in = timedelta(minutes=80)
      next_actual_time = datetime(day=6, month=11, year=2022, hour=1, minute=35, tzinfo=eastern_tz, fold=1)
      assert start.tzname() == "EDT"
      assert now.tzname() == "EST"
      assert next_actual_time.tzname() == "EST"
      rem_time = remaining(start, ends_in, now)
      print(start + ends_in - now)
      next_run = now + rem_time
  assert next_run == next_actual_time

E AssertionError: assert datetime.datetime(2022, 11, 6, 2, 35, tzinfo=ZoneInfo(key='US/Eastern')) == datetime.datetime(2022, 11, 6, 1, 35, tzinfo=ZoneInfo(key='US/Eastern'), fold=1)

t/unit/utils/test_time.py:193: AssertionError

@codecov
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.31%. Comparing base (4e2face) to head (1cfe14d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7901   +/-   ##
=======================================
  Coverage   78.31%   78.31%           
=======================================
  Files         153      153           
  Lines       19116    19120    +4     
  Branches     2531     2533    +2     
=======================================
+ Hits        14970    14974    +4     
  Misses       3859     3859           
  Partials      287      287           
Flag Coverage Δ
unittests 78.29% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@auvipy
Copy link
Member

auvipy commented Nov 9, 2023

It seems unit tests are now passing. I restarted the failing pypy build as it seems to be a network issue

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to check/fix the lint issues as well. thanks for all your efforts here!! I will come back to this next morning to check if it is merge able with 5.3.5 release

@auvipy
Copy link
Member

auvipy commented Nov 13, 2023

@Nusnus can we consider this for a patch release?

@Nusnus
Copy link
Member

Nusnus commented Nov 21, 2023

@auvipy

@Nusnus can we consider this for a patch release?

I haven't fully reviewed the entire PR, but from the original description:

celery/django-celery-beat#605

Note: The two PRs are codependent and need to be merged together.

It looks like the other PR isn't ready, besides this PR having failures in CI.

That being said, if this work can be baked enough, I support merging on, but we'll have it for v5.4 if it will be ready by then.

@auvipy
Copy link
Member

auvipy commented Nov 21, 2023

Yeah it's better off for 5.4

@Nusnus Nusnus modified the milestones: 5.4, 5.5 Jun 25, 2024
@auvipy auvipy removed the request for review from a team February 3, 2025 05:24
@auvipy auvipy modified the milestones: 5.5.0, 5.6.0 Feb 3, 2025
@auvipy auvipy self-requested a review February 3, 2025 06:01
"""
eastern_tz = ZoneInfo("US/Eastern")
tokyo_tz = ZoneInfo("Asia/Tokyo")
eastern_tz_pytz = pytz.timezone("US/Eastern")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of pytz, shouldn't we use ZoneInfo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@auvipy auvipy modified the milestones: 5.6.0, 5.7.0 May 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Celery Beat Crashing at the End of Daylight Savings

4 participants