Skip to content

Conversation

@JimB123
Copy link
Contributor

@JimB123 JimB123 commented Aug 10, 2020

Background

Redis needs an internal API for using a monotonic clock. Currently, most of Redis uses gettimeofday. This is a problem because time of day can be altered due to system time changes. The system time can be altered manually or by NTP. There is actually code, included in ae.c, which checks to see if the clock has moved backwards. However, most of Redis uses gettimeofday for timing, etc. - simply assuming that it is a monotonic clock.

This update provides the following

  1. An interface for retrieving a monotonic clock. getMonotonicUs returns a uint64_t (aka monotime) with the number of micro-seconds from an arbitrary point. No more messing with tv_sec/tv_usec. Simple routines are provided for measuring elapsed milli-seconds or elapsed micro-seconds (the most common use case for a monotonic timer). No worries about time moving backwards.
  2. High-speed assembler implementation for x86 and ARM. The standard method for retrieving the monotonic clock is POSIX.1b (1993): clock_gettime(CLOCK_MONOTONIC, timespec*). However, most modern processors provide a constant speed instruction clock which can be retrieved in a fraction of the time that it takes to call clock_gettime. For x86, this is provided by the RDTSC instruction. For ARM, this is provided by the CNTVCT_EL0 instruction. As a compile-time option, these high-speed timers can be chosen. (Default is POSIX clock_gettime.)
  3. Refactor of event loop timers. The timer processing in ae.c has been refactored to use the new monotonic clock interface. This results in simpler/cleaner logic and improved performance.

Performance implications

  1. When using the processor's instruction clock, timer processing in the event loop is dramatically improved. In a test program, overall timer event processing has been measured to run at nearly 6x performance. (Test details below.)
  2. When using the POSIX clock_gettime, timer processing is improved about 18% due to streamlined logic in ae.c.

Test details

Simple test program creating an event loop and 3 timers - each firing with 0ms delay. Linux host, with processor: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz. The events each incremented a counter. A beforeSleep proc checked elapsed time (using the monotonic clock) until 5 seconds had passed. Compared average count of timers fired.

Note: on my test machine, clock_gettime is measured to run about 5% slower than gettimeofday. However it is expected that this varies on system-to-system and that they are nominally similar in terms of performance.

Future updates

Introduction of this API provides the opportunity to begin replacing ustime and mstime calls throughout the code. With the huge performance difference provided by using the processor clock, this will provide benefits to any feature which relies on time-based code.

For those unfamiliar with the x86 TSC

The x86 maintains a counter which is based on a constant clock rate at the nominal clocking of the processor. Even though individual cores are running at various and changing clock rates, they all share the same constant-rate clock. On multi-CPU systems, all CPUs are sync'd to a common clock source.

Though you can still find articles about older CPUs that did not support a constant-rate TSC, all of the reasonably modern x86 and AMD processors support constant-rate TSC. On Linux, you can check for the constant_tsc flag in /proc/cpuinfo.

Note also, though the TSC runs at a constant rate at the nominal speed of the processor, the processor itself doesn't actually know it's speed. The OS will report the speed in /proc/cpuinfo (Linux). On Windows, the QueryPerformanceFrequency function may be used. The ticks-per-second needs to be used to convert the TSC to common units of time (milli-seconds/micro-seconds).

The TSC is not guaranteed to be wall-clock precise - it may suffer from clock-drift and is not corrected by NTP. It's fine for timing short intervals, but (for example) system uptime should be measured with a different mechanism (like gettimeofday).

@itamarhaber
Copy link
Member

@JimB123 wow - looks extremely promising.

@guybe7 following up on #6861, WDUT?

@guybe7
Copy link
Collaborator

guybe7 commented Aug 11, 2020

looks good indeed.
@JimB123 please CMIIW - usage of getMonotonicUs helps us only in case we need to measure "deltas" from some point. it does not return a number from which we can deduce the wall-clock time, right? if so, it's not really help us with #6861 where we need the actual timestamp (it can be retrieved by XINFO)

TBH we we really need to reduce calls to mstime is some server struct uint64 that represents the "timestamp for the current command" - can be calculated on the first time a command calls to mstime and stored in the server struct. we zero it when a command is done.
even if a a command calls mstime multiple times gettimeofday is actually called just once.

@JimB123
Copy link
Contributor Author

JimB123 commented Aug 11, 2020

@guybe7 You are correct. The monotonic clock should never be used in an attempt to "deduce the wall-clock time". It is only useful for determining elapsed time (aka deltas) - which is a common need.

Re: reduction of calls to mstime... this might be worthwhile if many use cases can be satisfied by (for example) the time at the start of command processing. Without study, it's unclear to me if the monotonic time should be cached OR wall-clock time OR both. In existing code, mstime is used generically as both a wall clock time and also for the purpose of computing elapsed time.

@guybe7
Copy link
Collaborator

guybe7 commented Aug 11, 2020

@JimB123 thanks
@oranagra @yossigo please have a look as well

@madolson
Copy link
Contributor

What a great idea Jim! Disclosure, this is an AWS contribution, so we run a similar variant to this internally already and have for awhile. Although I think this is specifically beneficial here, I also agree that is probably useful to replace some of the other usages of mstime/gettimeofday within redis when precise time isn't necessary and we can get a little performance boost. I am good with the current implementation as is, but this is a change worthy of consensus.

The specific concerns I had that would be great to validate by others:

  1. General support of clock_gettime() since gettimeofday is being removed. clock_gettime was poorly supported by MacOS, which I think is why salvatore never used it earlier. I think it's now been widely available for 4 years now on MacOS, but someone who might be running a very old version of MacOS might find this unable to build. We might end up on other systems that don't support it as well, but I'm inclined to say we can patch that when it comes.
  2. The ifdef causing indentation (I know, whitespace). The IDE I use, VSCode, can be configured to hide ifdef'd out code, so to me it looks like all this code is incorrectly indented. Generally #directives don't interfere with indentation anywhere else in Redis, and I agree with that, but I also see there is a reason for readability if you are just viewing this in a plain text editor.
  3. Monotonic and ae as standalone libraries. I know some people use redis as a source of clean code, I thought it would be nice to not require an extra dependencies on monotonic if someone didn't want to use it. It might also be nice as a header only library.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JimB123 thanks a lot for this contribution.
I agree with everyone that it's an important step forward, and we can merge it as is and later worry about using it in more places.

I also agree with Guy and others that in some (many) places we should just cache the mstime/ustime of the current command rather than sample it repeatedly, but that's certainly another topic.

Regarding styling:
I personally think ifdefs should not induce code indentation. Instead i like the #else and #endif to have a comment indicating to which #if they belong.
I usually prefer statements that change execution flow (goto, return, continue, break) to be on a line of their own (not with the if), except in a repeated error handling code which would add too many lines and hide the actual logic code.
But these are not a "one size fits all" (specifically since this is a new file), it's up to you to decide if you wanna change it.

I saw the PR description says the default is to use clock_gettime, but correct me if i'm wrong, you mean that it's the "fallback", right? i.e. by default it will use the faster HW specific methods and only fallback to POSIX if they're missing at compile time.

@JimB123
Copy link
Contributor Author

JimB123 commented Aug 12, 2020

I saw the PR description says the default is to use clock_gettime, but correct me if i'm wrong, you mean that it's the "fallback", right? i.e. by default it will use the faster HW specific methods and only fallback to POSIX if they're missing at compile time.

@oranagra Originally, I coded this to use clock_gettime as a fallback. However, after talking with Maddy, we thought that the processor clock would be better as an "opt-in" functionality. In order to use it, you need to explicitly define "USE_PROCESSOR_CLOCK" (which is not defined). So, as published, clock_gettime is a default, not a fallback.

@oranagra
Copy link
Member

@oranagra Originally, I coded this to use clock_gettime as a fallback. However, after talking with Maddy, we thought that the processor clock would be better as an "opt-in" functionality. In order to use it, you need to explicitly define "USE_PROCESSOR_CLOCK" (which is not defined). So, as published, clock_gettime is a default, not a fallback.

I fail to see what are the risks of enabling it by default (@madolson can you elaborate). many will not be aware of this flag and will just remain with less efficient redis.
I guess there are two (quite slim) risks:

  1. Either the assembly / intrinsic code will fail to compile, in which case we'll learn about it and add some additional ifdef.
  2. Or the parsing of cpuinfo will fail at runtime, in which case we need to add a fallback mechanism that will revert to use clock_gettime at runtime.

@yossigo what do you think?

The second bullet brings me back to the assertions. again assert is more suitable to check a condition that should logically never happen. in this case it checks external conditions (parsing a file), on failure it should print a message to stderr and call exit(1), (not abort() and certainly not segfault)..
But if we now consider the runtime fallback (let's wait for a decision) then these should obviously not even be exit, we can refactor the code so that this function returns NULL, but it is wrapped inside another function that calls the clock_gettime init, and sets up some flag so that it's used in runtime.

@yossigo
Copy link
Collaborator

yossigo commented Aug 13, 2020

@JimB123 This is a great contribution, thanks!
@oranagra I don't know enough to judge the risks of using it by default. But I do agree, if USE_PROCESSOR_CLOCK is defined but fails at runtime we can just fallback and emit a warning to the log. This is a common practice with regard in other cases, like kernel configurations that is not optimal and may have a negative impact on Redis.

@oranagra
Copy link
Member

@yossigo so just to be sure, you support making it enabled in a default build, and have it simply print an error and exit if if there is a problem (i.e. no fallback at runtime is needed).
right?

@yossigo
Copy link
Collaborator

yossigo commented Aug 13, 2020

@oranagra no.

I don't have a strong opinion about being built by default, I know it can have undesired side effects in some cases (i.e. depending on CPU / firmware / hypervisor in use, in case a CPU is hotswapped, etc.).

But if we do compile support for it, I don't see a reason to exit if we fail to initialize. We can just produce a warning and fall back to clock_gettime().

@oranagra
Copy link
Member

ohh, missed one word in your previous post.
So you're not worried that enabling it on default build would fail to build or fail to execute (since we have a runtime fallback), you're worried that it'll run and be buggy.

Personally i don't really like an important optimization to be opt-in. i think there's a high chance that many will not enjoy it.
Maybe we can:

  • Gather some info on the subject somehow?
  • Add some runtime sanity check at startup with a warning and a fallback?
  • Or conclude that even when dysfunctional, the implication are not very severe (no crashes or data loss), and users will be able to disable it and report back to us?

@JimB123
Copy link
Contributor Author

JimB123 commented Aug 13, 2020

The second bullet brings me back to the assertions. again assert is more suitable to check a condition that should logically never happen. in this case it checks external conditions (parsing a file),

@oranagra I submit that this is acceptable use of assert. These conditions should never happen unless there is a fundamental error in the code or compilation options. I agree that this is an "external condition" in that it's checking the expected functionality of the operating system (at startup). But, this is not "parsing a file" as there is no "file". Linux "files" under /proc/ are core functionality of the OS and while they are represented as part of the filesystem, they are not files.

on failure it should print a message to stderr and call exit(1), (not abort() and certainly not segfault).

I don't see much difference between assert which (according to documentation) "prints an error message to standard error and terminates the program by calling abort(3)" vs. explicitly printing an error message to standard error and terminating the program by calling exit(3).

While I think use of assert in this situation is appropriate, simple, readable, & maintainable, I can create a local function that exits with a message to stderr if that's what's needed to pass review.

@JimB123
Copy link
Contributor Author

JimB123 commented Aug 13, 2020

But if we do compile support for it, I don't see a reason to exit if we fail to initialize. We can just produce a warning and fall back to clock_gettime().

Note that supporting run-time fallback would probably offset the value of providing inline implementations of the functions. With run-time fallback, I think I would choose to provide only a function pointer in the .h and initialize it in the .c file.

@madolson
Copy link
Contributor

@oranagra My main concern was to make sure Redis kept building out of the box or weird clock behavior. There seems to be a lot of weird compatibility issues around what timers were supported on different OS's. It looks like it's gotten much more consistent in the last couple of years. If someone is running an old enough x86 processor, it might not actually work correctly as the TSC between cores aren't synced. So I thought letting people slowly testing it out with an opt in flag for awhile might work and then at some point we build it by default.

Looking back to the internal review, I was also concerned it might not initialize correctly. So the alternative approach is to make sure at startup everything works as expected otherwise fallback to clock_gettime(). I was also okay with that approach, but it's slightly more complex.

@JimB123
Copy link
Contributor Author

JimB123 commented Aug 13, 2020

Thank you all for the positive reception of this feature. It's nice to see that there are no issues/concerns with the functional interface or the changes to ae.

@madolson, as primary reviewer for this, please reach concensus (or make an executive decision) regarding the following open questions:

  1. Should the CPU instruction clock be enabled by default?
  • YES: the clock source will be chosen automatically based on existing system #defines
  • NO: the builder will need to explicitly opt-in by defining USE_PROCESSOR_CLOCK at build time
  • (I lean towards YES, but am comfortable with NO)
  1. If the selected clock can not be initialized properly should a fallback be performed or should the system abort initialization? NOTE: a failure to initialize indicates a coding error (probably a coding error related to the use of the #defines on a given system).
  • ABORT: Redis generates error message and terminates at startup
  • FALL-BACK: Redis prints (to stderr) an error message, and then chooses clock_gettime. Note that this adds 2 complications.
    • It's possible (though unlikely) that the POXIX.1b(1993) function clock_gettime might not exist or that the specific clock (CLOCK_MONOTONIC) might not exist. If we fall back to clock_gettime(CLOCK_MONOTONIC) and that fails to initialize, I strongly suggest that aborting is better than continuing to fall back to a non-monotonic source (like gettimeofday).
    • By adding fallback, it makes implementation of getMonotonicUs more complex and I would recommend NOT inlining it in this case. Instead, a function-pointer approach would be cleaner. This would slow down assember implementations of getMonotonicUs as the function itself is just a few assembler instructions. However it would still be orders of magnitude faster than calling clock_gettime.
  • (I prefer ABORT, but would be comfortable with FALL-BACK as long as we abort if clock_gettime(CLOCK_MONOTONIC) fails.)
  1. In cases where the code aborts during initialization, is assert acceptable?
  • YES: Code is simple and clean. A message is printed to stderr and abort(3) is called.
  • NO: A printf would be performed to stderr and exit(3) would be called. Note that this functionality is below the level of Redis logging so serverLog, serverAssert, etc. should not be considered.
  • (I prefer YES. I would be OK with NO, as long as it's just printf/exit and nothing fancier.)

@madolson
Copy link
Contributor

Hello, I guess I'll post the update:

  1. No, it should not be enabled default. The current code is fine.
  2. Yes, it should fallback if it's not able to initialize when possible. If initialization of clock_gettime fails, exiting is still fine.
  3. I don't think there was really a preference here and that wasn't really a sticking point in our conversation.

@oranagra
Copy link
Member

oranagra commented Aug 18, 2020

we also need to document it somewhere (maybe in the README).
specifying how to enable it and what are the risk (possibly linking to some external info)
maybe this is a good link: http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/

@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state-doc-needed state:major-decision Requires core team consensus state:needs-code-doc the PR requires documentation inside the redis repository and removed state-doc-needed labels Aug 18, 2020
oranagra added a commit that referenced this pull request Apr 13, 2021
The code used to decide on the next time to wake on a timer with
microsecond accuracy, but when deciding to go to sleep it used
milliseconds accuracy (with truncation), this means that it would wake
up too early, see that there's no timer to process, and go to sleep
again for 0ms again and again until the right microsecond arrived.

i.e. a timer for 100ms, would sleep for 99ms, but then do a busy loop
through the kernel in the last millisecond, triggering many calls to
beforeSleep.

The fix is to change all the logic in ae.c to work with microseconds,
which is good since most of the ae backends support micro (or even nano)
seconds. however the epoll backend, doesn't support micro, so to avoid
this problem it needs to round upwards, rather than truncate.

Issue created by the monotonic timer PR #7644 (redis 6.2)
Before that, all the timers in ae.c were in milliseconds (using
mstime), so when it requested the backend to sleep till the next timer
event, it would have worked ok.
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
The code used to decide on the next time to wake on a timer with
microsecond accuracy, but when deciding to go to sleep it used
milliseconds accuracy (with truncation), this means that it would wake
up too early, see that there's no timer to process, and go to sleep
again for 0ms again and again until the right microsecond arrived.

i.e. a timer for 100ms, would sleep for 99ms, but then do a busy loop
through the kernel in the last millisecond, triggering many calls to
beforeSleep.

The fix is to change all the logic in ae.c to work with microseconds,
which is good since most of the ae backends support micro (or even nano)
seconds. however the epoll backend, doesn't support micro, so to avoid
this problem it needs to round upwards, rather than truncate.

Issue created by the monotonic timer PR redis#7644 (redis 6.2)
Before that, all the timers in ae.c were in milliseconds (using
mstime), so when it requested the backend to sleep till the next timer
event, it would have worked ok.
@jhmilan
Copy link

jhmilan commented Aug 5, 2022

Sorry for asking a maybe stupid question... Does it mean that as of Redis 6.2 Redis is using monotonic clock? Or is it an config option... Depends on the platform?

Thanks in advance

@JimB123
Copy link
Contributor Author

JimB123 commented Aug 5, 2022

@jhmilan there is an internal API which provides a monotonic clock for some (not all) Redis functions. Eventually most of Redis should use that API. Currently most of Redis uses mstime which provides wall-clock time.

Behind the API is a clock implementation which is determined at compile time. There are options to directly use the X86 TSC, the ARM equivalent, or a POSIX implementation.

Note that, other than slight performance variations across platforms, this has no impact on the Redis client experience.

@jhmilan
Copy link

jhmilan commented Aug 5, 2022

@jhmilan there is an internal API which provides a monotonic clock for some (not all) Redis functions. Eventually most of Redis should use that API. Currently most of Redis uses mstime which provides wall-clock time.

Behind the API is a clock implementation which is determined at compile time. There are options to directly use the X86 TSC, the ARM equivalent, or a POSIX implementation.

Note that, other than slight performance variations across platforms, this has no impact on the Redis client experience.

Thanks for your quick reply @JimB123
My main concern is about the fact that TTL could be invalidated due to date shifts (NTP ...) and how it relates to the known issue around Redlock. I would like to understand if, as of some given Redis version, this issue is solved and, thus, that Redlock pitfall is gone.

@oranagra
Copy link
Member

oranagra commented Aug 7, 2022

@jhmilan the expiration mechanism always uses wall-clock (not monotonic), and i'm not sure it should change.
can you provide a reference to the problem you're referring to?

@jhmilan
Copy link

jhmilan commented Aug 7, 2022

@oranagra sure. This a very interesting read where Martin Kleppman talks about this topic: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html

Note that Redis uses gettimeofday, not a monotonic clock, to determine the expiry of keys. The man page for gettimeofday explicitly says that the time it returns is subject to discontinuous jumps in system time – that is, it might suddenly jump forwards by a few minutes, or even jump back in time (e.g. if the clock is stepped by NTP because it differs from a NTP server by too much, or if the clock is manually adjusted by an administrator). Thus, if the system clock is doing weird things, it could easily happen that the expiry of a key in Redis is much faster or much slower than expected.

From this I understand that it is possible that a given key expires before its expected TTL. If that is the case, then, as explained in the article, Redlock could violate the safety property for the lock adquisition

@oranagra
Copy link
Member

oranagra commented Aug 7, 2022

that's all true (except for the clock jumping backwards, since a proper configuration should use slewing), but the question is if / how this breaks redlock (which i don't know much about).

@jhmilan
Copy link

jhmilan commented Aug 7, 2022

Say you have 5 Redis nodes for your Redlock implementation. Then 2 of them are down. You adquire a lock for the resource "foo" for 30 seconds (it is basically setting the key "foo" in a majority of nodes for 30 secs TTL).
The operation would success since you managed to set the key in at least 3/5 nodes .

Now imagine that the 2 nodes down come back to life again and suppose there is a clock shift in node B which makes the TTL to expire. Basically it would mean that 2/5 nodes (a minority) would hold the lock but someone trying to acquire "foo" would success since 3/5 nodes would set that key.

In this situation 2 clients could have acquired the same lock simultaneously. The chances for this to happen are quite low I think (especially for short lived ttls) but still, I becomes Redlock unsafe and not eligible for some scenarios were consistency is crucial.

By using a monotonic clock for the expiration mechanism that issue would be gone. I guess that this is not important when using Redis as a cache layer, there you would be just missing some key that you can easily refresh.

@jhmilan
Copy link

jhmilan commented Aug 19, 2022

Small thought here... Regardless Redlock use case, in my opinion, TTL mechanism should be reliable and the fact it uses the wall-clock makes it weak. I never thought about it very thorough, but it scopes Redis to just be a simpler cache layer and no more than this.

It is a bit disappointing

@madolson
Copy link
Contributor

Small thought here... Regardless Redlock use case, in my opinion, TTL mechanism should be reliable and the fact it uses the wall-clock makes it weak.

Wall-clocks have been getting more reliable and consistent. Within AWS, you can configure NTP to use AWS time sync which provides a very accurate global clock which correctly slews with changes. I don't think the answer to more precise time measurement is to necessarily use the hardware clocks/monotonic APIs, especially for distributed systems.

@jhmilan
Copy link

jhmilan commented Aug 19, 2022

@madolson "more reliable" is not enough in my opinion. Your argument is that "you can set up the system in a way that makes this problem less likely" and, ok, but this is not sufficient I would say.

It is not about being against NTP and wall clocks in general, it is just about this specific use case in Redis TTL. I think it should be (100%) reliable. A TTL should never ever expire before its expiration time. It never ever should be affected by something external.

@oranagra
Copy link
Member

oranagra commented Aug 21, 2022

@jhmilan This is not as simple as it looks, you are looking at it from the perspective of Redlock, but redis has other concerns as well.
First, Redis has both EXPIRE and EXPIREAT commands (the later taking a Unix time rather than relative time, but both getting encoded in the same format internally).
In addition, to that, we have replicas and AOF / RDB files, we need to consider the consistency of the dataset that's replicated to other machines, see #8474.

@jhmilan
Copy link

jhmilan commented Aug 21, 2022

@jhmilan This is not as simple as it looks, you are looking at it from the perspective of Redlock, but redis has other concerns as well.
First, Redis has both EXPIRE and EXPIREAT commands (the later taking a Anix time rather than relative time, but both getting encoded in the same format internally).
In addition, to that, we have replicas and AOF / RDB files, we need to consider the consistency of the dataset that's replicated to other machines, see #8474.

Thanks for your response. And please, don't take me wrong.

  1. I'm not saying this is simple. I'm pretty sure it has a lots of corners and implementation details that I don't know.
  2. I stumbled upon this topic from Redlock but now I'm looking at the problem more generically.
  3. I have no clue about AOF/RDB files (just a Redis user)

My only suggestion here is adding some disclaimer in the documentation clearly stating the limitation and risks around TTLs, clocks, etc (or maybe they are there and I miss them).

As a user of Redis I assumed that TTLs are consistent. It is not something that you think of upfront. So discovering that wall-clock issues can affect Redis TTLs scares you a bit as a user. It would be nice if you know exactly what are the potential problems, how you can prevent them, what are the implications, what you should/should not do, etc...

And particularly, Redlock docs must warn about it very clearly.

I love Redis and I think it is a great solution. My problem has been that I have implemented something that Redis has brought up publicly (Redlock) and after implementing it, I discovered that something in the guts of Redis makes this (official) solution weak.

  • Do you think this TTL/clock issue might be fixed consistently in the future?

Any light you can bring in that sense would be very welcome.

Thank you very much in advance

@oranagra
Copy link
Member

Hi,
We don't currently have any plans to change or fix this, but maybe if someone suggests a solution we can consider it, or if enough problems are found and we realize we have to find some solution.

Regarding Redlock, it could be that something has changed since it was designed. well, many things have changed, but i can't think of something that broke it.
I agree it should be documented. i invite you to make a PR to https://github.com/redis/redis-doc/blob/master/docs/reference/patterns/distributed-locks.md

Regarding other documentation, about TTL, i'm not certain what's already documented and where it could be added.
@itamarhaber maybe you have some idea?

@jhmilan
Copy link

jhmilan commented Aug 23, 2022

Hi,
We don't currently have any plans to change or fix this, but maybe if someone suggests a solution we can consider it, or if enough problems are found and we realize we have to find some solution.

Regarding Redlock, it could be that something has changed since it was designed. well, many things have changed, but i can't think of something that broke it.
I agree it should be documented. i invite you to make a PR to https://github.com/redis/redis-doc/blob/master/docs/reference/patterns/distributed-locks.md

I will, for sure.

Thanks for clarifying this to me.

enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 25, 2025
Update adds a general source for retrieving a monotonic time.
In addition, AE has been updated to utilize the new monotonic
clock for timer processing.

This performance improvement is **not** enabled in a default build due to various H/W compatibility
concerns, see README.md for details. It does however change the default use of gettimeofday with
clock_gettime and somewhat improves performance.

This update provides the following
1. An interface for retrieving a monotonic clock. getMonotonicUs returns a uint64_t (aka monotime)
   with the number of micro-seconds from an arbitrary point. No more messing with tv_sec/tv_usec.
   Simple routines are provided for measuring elapsed milli-seconds or elapsed micro-seconds (the
   most common use case for a monotonic timer). No worries about time moving backwards.
2. High-speed assembler implementation for x86 and ARM. The standard method for retrieving the
   monotonic clock is POSIX.1b (1993): clock_gettime(CLOCK_MONOTONIC, timespec*). However, most
   modern processors provide a constant speed instruction clock which can be retrieved in a fraction
   of the time that it takes to call clock_gettime. For x86, this is provided by the RDTSC
   instruction. For ARM, this is provided by the CNTVCT_EL0 instruction. As a compile-time option,
   these high-speed timers can be chosen. (Default is POSIX clock_gettime.)
3. Refactor of event loop timers. The timer processing in ae.c has been refactored to use the new
   monotonic clock interface. This results in simpler/cleaner logic and improved performance.
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 25, 2025
The code used to decide on the next time to wake on a timer with
microsecond accuracy, but when deciding to go to sleep it used
milliseconds accuracy (with truncation), this means that it would wake
up too early, see that there's no timer to process, and go to sleep
again for 0ms again and again until the right microsecond arrived.

i.e. a timer for 100ms, would sleep for 99ms, but then do a busy loop
through the kernel in the last millisecond, triggering many calls to
beforeSleep.

The fix is to change all the logic in ae.c to work with microseconds,
which is good since most of the ae backends support micro (or even nano)
seconds. however the epoll backend, doesn't support micro, so to avoid
this problem it needs to round upwards, rather than truncate.

Issue created by the monotonic timer PR redis#7644 (redis 6.2)
Before that, all the timers in ae.c were in milliseconds (using
mstime), so when it requested the backend to sleep till the next timer
event, it would have worked ok.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants