Skip to content

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented Apr 12, 2021

The code used to decide on the next time to wake on a timer with
microsecond accuracy, but when deciding to go to sleep it used
milliseconds accuracy (with truncation), this means that it would wake
up too early, see that there's no timer to process, and go to sleep
again for 0ms again and again until the right microsecond arrived.

i.e. a timer for 100ms, would sleep for 99ms, but then do a busy loop
through the kernel in the last millisecond, triggering many calls to
beforeSleep.

The fix is to change all the logic in ae.c to work with microseconds,
which is good since most of the ae backends support micro (or even nano)
seconds. however the epoll backend, doesn't support micro, so to avoid
this problem it needs to round upwards, rather than truncate.

Issue created by the monotonic timer PR #7644 (redis 6.2)
Before that, all the timers in ae.c were in milliseconds (using
mstime), so when it requested the backend to sleep till the next timer
event, it would have worked ok.

The code used to decide on the next time to wake on a timer with
microsecond accuracy, but when deciding to go to sleep it used
milliseconds accuracy (with truncation), this means that it would wake
up too early, see that there's no timer to process, and go to sleep
again for 0ms again and again until the right microsecond arrived.

i.e. a timer for 100ms, would sleep for 99ms, but then do a busy loop
though the kernel in the last millisecond.

The fix is to change all the logic in ae.c to work with microseconds,
which is good since most of the ae backends support micro (or even nano)
seconds. however the epoll backend, doesn't support micro, so to avoid
this problem it needs to round upwards, rather than truncate.

Issue created by the monotonic timer PR #7644 (redis 6.2)
Before that, all the timers in ae.c were in milliseconds (using
mstime), so when it requested the backend to sleep till the next timer
event, it would have worked ok.
@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Apr 12, 2021
@oranagra oranagra requested review from JimB123 and yossigo April 12, 2021 19:59
@oranagra oranagra changed the title Fix busy stime loop in ae.c Fix busy loop in ae.c when timer event is about to fire Apr 12, 2021
Copy link
Contributor

@JimB123 JimB123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@oranagra oranagra merged commit 175a9e3 into redis:unstable Apr 13, 2021
@oranagra oranagra mentioned this pull request Apr 19, 2021
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
The code used to decide on the next time to wake on a timer with
microsecond accuracy, but when deciding to go to sleep it used
milliseconds accuracy (with truncation), this means that it would wake
up too early, see that there's no timer to process, and go to sleep
again for 0ms again and again until the right microsecond arrived.

i.e. a timer for 100ms, would sleep for 99ms, but then do a busy loop
through the kernel in the last millisecond, triggering many calls to
beforeSleep.

The fix is to change all the logic in ae.c to work with microseconds,
which is good since most of the ae backends support micro (or even nano)
seconds. however the epoll backend, doesn't support micro, so to avoid
this problem it needs to round upwards, rather than truncate.

Issue created by the monotonic timer PR redis#7644 (redis 6.2)
Before that, all the timers in ae.c were in milliseconds (using
mstime), so when it requested the backend to sleep till the next timer
event, it would have worked ok.
panjf2000 added a commit to panjf2000/redis that referenced this pull request Jan 29, 2024
`epoll_pwait2()` was introduced in Linux kernel 5.11, it extends
`epoll_wait()`/`epoll_pwait()` by allowing the `timeout` to be specified with a
higher resolution, as a `timespec` type, providing nanosecond precision.

By using `epoll_pwait2()`, we can resolve the problem in redis#8764 faultlessly,
once and for all. Furthermore, it would also offer the possibility for us to
handle timers in a higher resolution in the future: from microseconds to
nanoseconds (if necessary).
@oranagra oranagra deleted the ae_timer_busyloop branch February 7, 2024 14:23
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 25, 2025
The code used to decide on the next time to wake on a timer with
microsecond accuracy, but when deciding to go to sleep it used
milliseconds accuracy (with truncation), this means that it would wake
up too early, see that there's no timer to process, and go to sleep
again for 0ms again and again until the right microsecond arrived.

i.e. a timer for 100ms, would sleep for 99ms, but then do a busy loop
through the kernel in the last millisecond, triggering many calls to
beforeSleep.

The fix is to change all the logic in ae.c to work with microseconds,
which is good since most of the ae backends support micro (or even nano)
seconds. however the epoll backend, doesn't support micro, so to avoid
this problem it needs to round upwards, rather than truncate.

Issue created by the monotonic timer PR redis#7644 (redis 6.2)
Before that, all the timers in ae.c were in milliseconds (using
mstime), so when it requested the backend to sleep till the next timer
event, it would have worked ok.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants