-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Incremental eviction processing #7653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Rebase to latest OSS
|
@JimB123 nice PR, eviction-related latency is a painful issue indeed. |
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JimB123 Thanks for another important improvement.
This is indeed a painful matter, but i'm a bit worried about two things:
- As Guy mentioned, there might be cases where the client traffic manages to overcome the eviction efforts and the memory grows and grows despite it. I think we may need to add some mechanism that gradually increases the timeout and makes it more aggressive when it detects that despite the background eviction, the memory keeps growing.
- Till now the eviction mechanics where simple, and i suppose that it was argued that the implications are predictable and the limitations are acceptable. (compromise between simple and limited design, and one that's attempting to achieve more by being more complicated). This change makes redis less predictable, and with the addition of a mechanism like i suggested above we add even more complexity.
Still, my feeling is that it would be worth it, it's just that i have a concern.
p.s. i suppose that sooner or later we'll find places where we do want to execute the eviction loop to run to completion, or be more aggressive. maybe it would be a good idea to pass the time limit as an argument, in some contexts we'll want it to be infinite, in others just make it run a bit longer (rather than run a loop that calls performEvictions).
src/evict.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand why you remove the loop (latency), but in that case as long as there's something in the lazy free list, we should return with PENDING, not FAIL.
and if we do that, maybe there's no need for the usleep that's still here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My logic:
- If we've exited the main loop above with EVICT_FAIL, the main loop has exhausted all evictable items. Though there are some evictable items still being processed by lazy free, we have no idea how many or if there are enough to overcome the current memory deficiency. The safe thing to do in this situation is to return EVICT_FAIL. (Even if we did return EVICT_PASS, it's almost certain that one of the next commands will fail anyway. So it becomes a situation of trading one failure for another.)
- I had considered removing the sleep entirely, but decided that it was better to slightly slow the main thread - giving some time to the background thread to perform evictions. This will perform better than the existing code and is a bit more conservative than removing the sleep altogether. ALSO - if the sleep is removed, there's little point in calling
getMemoryStateand therefore no reason to check forbioPendingJobs(so the entireifclause can be removed).
|
@oranagra @guybe7 Thanks for reviewing and commenting!
I don't think the extra complexity is necessary. Regardless of the time check, the loop is only checking for time after every 16 evictions. This means that for every command, (at least) 16 keys are being evicted.
Given this, the risk of client traffic overcoming eviction efforts is extremely small.
It should be considered that most often, this will make Redis more predictable. With today's mechanism, latency can vary wildly, for even the simplest of commands. A simple GET command which usually takes <1ms might suddenly take several seconds to return. Any operation which is performed as a single, large, spike operation will negatively impact predictability. By smoothing evictions over time, predictability is improved.
I can't really think of a case for this. Most of Redis should just want to allow the eviction process time to run. What that eviction process actually does should be up to the eviction process itself. I think it's more likely that we'd want to make the eviction process smarter - to potentially run longer under those unlikely scenarios above - but that logic should be within the eviction process, not coming from outside. Though I don't think it should ever be needed, the existing (run to completion) mechanism can easily be coded as: |
|
Just to add my thoughts:
I think these have to basically be constructed situations. Before every command you have to be inserting enough traffic then can be evicted in 500us in steady state. I think this a very unlikely usecase since it would likely be overwhelming redis. You can also construct other scenarios where we blow past maxmemory since there are read commands that affect memory consumption. We can also guard against it with a configuration if a user really wants to do that, which is simpler. |
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already reviewed this so just adding this for accounting.
|
We have a similar feature, but simpler:
Our users never meet eviction latency spike after this feature implemented. |
|
BTW, if just return error after 500us, users may confused, for example if they set |
|
@soloestoy Thanks for reviewing and your comments. If I understand your comment correctly, you're saying that up until 10% over the limit, However I suggest that the submitted solution is better for several reasons:
Re:
I think you mis-understood this. An error is returned ONLY if there are no more evictable keys. After 500us of eviction, the client commands will execute as if memory was OK. Meanwhile, Redis will continue to evict for each command processed and in a timerProc. |
|
I agree with everything Jim wrote above. if we do add some mode of aggressiveness to protect against abuse, it should not be a boolean thing changing behaviour above a certain threshold, but rather a gradual thing. Another point I wanna raise before I forget is that like the test that this PR there may be other (tests, client apps, or modules) that looks at the memory overuse to decide something. |
|
@JimB123 Thank you for this contribution, very interesting indeed. Your analysis of how this change doesn't increase the risk of eviction not catching is very convincing indeed. With this kind of changes, I'm often concerned about pathological cases that turn out to be more common than expected in production. But if I understand correctly, this (or similar) mechanism has some production mileage already - is that correct? Another concern is the system-level impact of taking longer to align with Intuitively, I may be missing something like having
Re-reading the discussion above, I believe this is what @oranagra refers to in the last comment as well but please correct me if I'm wrong. |
|
@yossigo Thanks for reviewing
An interesting consideration. However if we consider a system where memory is split between multiple Redis instances, each (smaller) instance is using a smaller portion of total system memory. Memory overages are expected to be somewhat proportional with memory allocation. (The smaller the total memory, the smaller possible hash-table resize.) So if all of the instances on a host experienced simultaneous overages, that would likely be similar to one large instance experiencing a worst-cast overage. In most situations, multiple Redis instances on a host would tend to limit/buffer the impact of transient overages.
I agree that there's opportunity to add the ability for dynamic adjustment of the eviction period. However, I have no evidence to believe that this extra complexity is necessary. If, at a later date, we decide to add the capability for dynamic tuning, I agree with @oranagra that such a mechanism should not be dependent on the amount of the overage, but rather ensuring that memory is progressing towards a goal. Alternatively, adding a config parameter for the eviction period is a simple option, but in discussion with @madolson, we felt it unlikely that this would add value. Extra knobs and options make configuration more complex, and adding something of dubious value should be avoided. But this is certainly an option if there is enough concern around a fixed value. Setting the eviction period to something like 15 minutes would make the process similar to what it is today. It's also an option to choose a higher fixed value.
I agree that breaking eviction into smaller chunks will make the process slightly less efficient overall. This is a fair trade-off for the reduction in latency spikes which would occur otherwise.
The time to recovery memory is less predictable, but the impacts on client command latency are more predictable. I would suggest that predictability for the client is more important. The system is over the limit already. There are many ways to drive Redis over the memory limit - possibly into swap. What's important is to get the system back within it's configured memory usage as quickly as possible, without impacting Redis availability, and with minimal impact on client latencies. As an aside... If there is a lot of concern about recovering memory quickly, we'd probably get more mileage out of improving the eviction scheme. The current method samples a number of values to find the best value to evict. Sample/Evict/Repeat. This is slow - especially on non-clustermode hosts having multiple DBs. If the system is significantly over memory, evicting 2 values each loop would practically double the eviction performance. |
|
@JimB123 thanks for your reply, I know it's the rehash problem, the new hashtable needs a lot of memory if redis contains too many keys. And I like the background eviction, it is very similar with our implement. 10% is not a fixed number, it's configurable, mostly 10% can balance maxmemory and eviction. Whatever, rehash is a problem, but I think it's not the root cause, the root cause is how we distinguish overhead memory and dataset memory. We can see overhead detail via function Back to this PR, I reread the code, you are right, it doesn't break And I also concern what @yossigo said, need more feedback. |
|
The core group didn't reach a consensus here, so I'll post the update on the conversation. There are two major cases:
The mitigation that was outlined was to introduce some type of config like eviction-latency-reduction, which is an integer from 0-100. With 0 being the no attempt to reduce latency and attempt to evict keys ASAP (I.E., the current case) and 100 being some very gradual release of memory with minimal impact to latency (I.E., we only evict one key at a time each event loop) I'm not the biggest fan of this approach, since the memory usage is still capped to roughly the same amount. @JimB123, if you think the config is a good idea, then I would commit to that (or if you have some other idea). Otherwise I guess this is still open. |
|
FYI, stumbled upon a comment from Salvatore where he mentioned some design to resolve a similar problem (and try to avoid the problem of not evicting enough): #4583 (comment) |
|
@madolson The suggested approach seems non-intuitive/backwards to me. I understand that you want to make the config generic and not tied to any specific timing values. But in this situation, I would expect 0 to be the default/normal case (which would be the minimum time spent evicting - as submitted) and 100 would indicate block-until-done (the current/bad strategy). So, instead of calling this "eviction-latency-reduction" (or similar), I would suggest "eviction-tenacity" where 0 (the default) indicates minimum latency/effort, and 100 indicates block-until-done. I guess I don't feel terribly strongly about this. If adding a config value will break the log-jam, It's worth it. I don't expect that any users will actually need to configure this, but it will be there if needed. Unless I get other comments, I'll proceed with "eviction-tenacity". @oranagra I read over the referenced commit. Clearly others are having this problem as well. The problem with Salvatore's suggestion to keep a record of the last call is that there is lots of normal jitter in Redis memory measurements. I wouldn't ever want to base a decision on a single pair of data points. Unless there are objections, I'll add the config parameter as mentioned above. Though I don't believe it will ever be needed/justified, there is always the option to add a more complex mechanism of auto-tuning later. Heck, we could add a complete machine learning model. :)
I absolutely agree with this. We should have a measurement for dataset memory which is independent of overhead memory. We are mixing all types of transient and non-transient memory usage, lumping it together, and using eviction as the single method to address memory concerns. Maybe that's the right thing to do, maybe not (I think not). Regardless, that is out of scope for this PR.
Yes, it is correct that we can construct a scenario which will cause memory to grow. But I don't think that such a scenario is a valid real-world scenario. Today, I can blow through memory with a single SUNIONSTORE command. I can lock up the CPU with a single KEYS command. And yes, with the this update, a stream of pipelined MSETs might possibly outstrip eviction. But the important thing is that for 99.99% of systems, this is an important improvement to performance and stability. For that user who is running pipelined MSETs, it really shouldn't be a surprise when that exceeds memory (just like a big SUNIONSTORE would do). As mentioned above, though I don't really think it's necessary, I can/will add a config parameter to allow adjustment of eviction tenacity. With such a parameter, eviction could be tuned to mimic the degenerate case that exists in code today. |
|
@JimB123 I imagine some logarithmic formula that would make a value of 100 block until done, something like 50 will give the default of 16 keys and 500ms, and 0 will evict just a single key and leave (maybe without even bothering to check the time).. regarding the name, i see all eviction related tunables are named |
|
@JimB123 now that the monolithic timer PR is merged, maybe it's time to push this forward. note that the change you made to the tests would no longer be needed once #7726 is merged (not using DEBUG POPULATE anymore). But now i realize that maybe we wanna add some tests to cover this new mechanism. |
|
@oranagra, I will rebase and address comments next week. Currently addressing conflicting priorities. I haven't abandoned this. |
rebase from redis
|
@JimB123 i'm struggling to review what you added. |
Apologies @Oran. I wasn't expecting you to review. I'm still addressing the previous comments. This was just to rebase, pulling in the monotonic changes (which had been duplicated here in the first posting). |
|
Rebased again. |
|
i assume rebase was trivial.. reapply some typo fixes on your version? |
Yes, fairly trivial. The one change was in a comment that I had removed/replaced.
Yes, just one. Interestingly, this auto-merged. When I had done a global search for For completeness, I'll fix it. The comment is in the wrong place and doesn't make sense anyway. |
Rather than blindly evicting until maxmemory limit is achieved, this update adds a time limit to eviction. While over the maxmemory limit, eviction will process before each command AND as a timeProc when no commands are running.
|
Thanks Jim. |
itamarhaber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Rather than blindly evicting until maxmemory limit is achieved, this update adds a time limit to eviction. While over the maxmemory limit, eviction will process before each command AND as a timeProc when no commands are running. This will reduce the latency impact on many cases, especially pathological cases like massive used memory increase during dict rehashing. There is a risk that some other edge cases (like massive pipelined use of MGET) could cause Redis memory usage to keep growing despite the eviction attempts, so a new maxmemory-eviction-tenacity config is introduced to let users mitigate that.
This bug is introduced in #7653. (Redis 6.2.0) When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is `ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep forever if some items are being freed in the lazyfree thread.
This bug is introduced in #7653. (Redis 6.2.0) When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is `ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep forever if some items are being freed in the lazyfree thread. (cherry picked from commit 464aa04)
This bug is introduced in #7653. (Redis 6.2.0) When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is `ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep forever if some items are being freed in the lazyfree thread. (cherry picked from commit 464aa04)
…#11237) This bug is introduced in redis#7653. (Redis 6.2.0) When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is `ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep forever if some items are being freed in the lazyfree thread.
…#11237) This bug is introduced in redis#7653. (Redis 6.2.0) When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is `ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep forever if some items are being freed in the lazyfree thread.
Overview
When Redis breaches the
maxmemorythreshold,freeMemoryIfNeededAndSafeis used to perform evictions until the memory situation is corrected. This is a synchronous process, and can cause large periods of unresponsiveness as well as wild variation in command latencies.Extreme use case
Consider a large Redis instance with 268MM keys. This requires a hash table that's 2GB in size. Add another 1MM keys and the hash table will require expansion. Redis will allocate a new 4GB block of memory for the hash table. (Some time later, after rehashing is complete, the old 2GB table is released.)
If the instance was near it's
maxmemorysetting before the rehash, we could suddenly find ourselves 4GB over the limit. At this point, Redis will stop everything while running the eviction algorithm until 4GB of memory has been freed. Unresponsiveness in this case might reasonably last several minutes.Solution
Understand that when eviction is happening, Redis has already breached
maxmemory. Redis does NOT prevent us from breachingmaxmemory. At this point, the intent is to recover - however we don't need to recover instantly.This update limits eviction processing to 500us.
Given that eviction processing is run before every command (read/write/other) Redis will quickly return to its nominal memory usage. Also, this update creates a timer event which will continue to evict in-between client requests until normal memory usage is restored.
Given a large eviction event like mentioned above, the effects will be:
For more common, smaller, eviction events:
Refactoring
Included in this update is a slight refactoring. The original function
freeMemoryIfNeededis wrapped byfreeMemoryIfNeededAndSafe(awkward at best). Although these routines are identified to "free memory", the only technique employed is to evict. So this is a very general purpose name applied to a very specific function.This update replaces the above 2 routines with
processEvictionswhich is more in accordance with what the function actually does. "IfNeeded" and "AndSafe" are not needed as it should be obvious thatprocessEvictionswon't/shouldn't evict if unnecessary or unsafe.The new
processEvictionsfunction has 3 return codes -EVICT_OK,EVICT_RUNNING, &EVICT_FAIL(nothing left to evict). TheprocessCommandfunction will only reject (write) commands whenEVICT_FAILis returned.References updated.
Dependencies
This update is dependent on #7644. The monotonic clock is used for timing the eviction loop. Once that is approved and merged, This update will be rebased.
Caveat
Eviction processing can still take a long time in the case that large values (hash/set/etc) are present and
server.lazyfree_lazy_evictionis false. However it's generally unlikely that large hashes would be evicted. In the case that a large hash was evicted, performance would be similar to existing performance.