defrag: eliminate persistent kvstore pointer and edge case fixes#1430
Merged
hpatro merged 1 commit intovalkey-io:unstablefrom Dec 12, 2024
Merged
defrag: eliminate persistent kvstore pointer and edge case fixes#1430hpatro merged 1 commit intovalkey-io:unstablefrom
hpatro merged 1 commit intovalkey-io:unstablefrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1430 +/- ##
============================================
+ Coverage 70.76% 70.90% +0.14%
============================================
Files 119 119
Lines 64618 64642 +24
============================================
+ Hits 45728 45836 +108
+ Misses 18890 18806 -84
|
JimB123
commented
Dec 12, 2024
ce3b933 to
ed3d8e9
Compare
Signed-off-by: Jim Brunner <brunnerj@amazon.com>
ed3d8e9 to
005deba
Compare
hpatro
reviewed
Dec 12, 2024
Contributor
hpatro
left a comment
There was a problem hiding this comment.
The change looks good to me. But if I understand correctly there isn't enough test coverage to discover such issues. Should we add more test with defrag to avoid things like this in future?
Comment on lines
+1232
to
+1234
| int dutyCycleUs = computeDefragCycleUs(); | ||
| monotime endtime = starttime + dutyCycleUs; | ||
| bool haveMoreWork = true; |
Contributor
There was a problem hiding this comment.
I think we were avoiding camel cased variables in methods. But the clang format check seems to be not complaining.
Contributor
hpatro
approved these changes
Dec 12, 2024
vudiep411
pushed a commit
to Autxmaton/valkey
that referenced
this pull request
Dec 15, 2024
…key-io#1430) This update addresses several issues in defrag: 1. In the defrag redesign (valkey-io#1242), a bug was introduced where `server.cronloops` was no longer being incremented in the `whileBlockedCron()`. This resulted in some memory statistics not being updated while blocked. 2. In the test case for AOF loading, we were seeing errors due to defrag latencies. However, running the math, the latencies are justified given the extremely high CPU target of the testcase. Adjusted the expected latency check to allow longer latencies for this case where defrag is undergoing starvation while AOF loading is in progress. 3. A "stage" is passed a "target". For the main dictionary and expires, we were passing in a `kvstore*`. However, on flushall or swapdb, the pointer may change. It's safer and more stable to use an index for the DB (a DBID). Then if the pointer changes, we can detect the change, and simply abort the stage. (If there's still fragmentation to deal with, we'll pick it up again on the next cycle.) 4. We always start a new stage on a new defrag cycle. This gives the new stage time to run, and prevents latency issues for certain stages which don't operate incrementally. However, often several stages will require almost no work, and this will leave a chunk of our CPU allotment unused. This is mainly an issue in starvation situations (like AOF loading or LUA script) - where defrag is running infrequently, with a large duty-cycle. This change allows a new stage to be initiated if we still have a standard duty-cycle remaining. (This can happen during starvation situations where the planned duty cycle is larger than the standard cycle. Most likely this isn't a concern for real scenarios, but it was observed in testing.) 5. Minor comment correction in `server.h` Signed-off-by: Jim Brunner <brunnerj@amazon.com>
kronwerk
pushed a commit
to kronwerk/valkey
that referenced
this pull request
Jan 27, 2025
…key-io#1430) This update addresses several issues in defrag: 1. In the defrag redesign (valkey-io#1242), a bug was introduced where `server.cronloops` was no longer being incremented in the `whileBlockedCron()`. This resulted in some memory statistics not being updated while blocked. 2. In the test case for AOF loading, we were seeing errors due to defrag latencies. However, running the math, the latencies are justified given the extremely high CPU target of the testcase. Adjusted the expected latency check to allow longer latencies for this case where defrag is undergoing starvation while AOF loading is in progress. 3. A "stage" is passed a "target". For the main dictionary and expires, we were passing in a `kvstore*`. However, on flushall or swapdb, the pointer may change. It's safer and more stable to use an index for the DB (a DBID). Then if the pointer changes, we can detect the change, and simply abort the stage. (If there's still fragmentation to deal with, we'll pick it up again on the next cycle.) 4. We always start a new stage on a new defrag cycle. This gives the new stage time to run, and prevents latency issues for certain stages which don't operate incrementally. However, often several stages will require almost no work, and this will leave a chunk of our CPU allotment unused. This is mainly an issue in starvation situations (like AOF loading or LUA script) - where defrag is running infrequently, with a large duty-cycle. This change allows a new stage to be initiated if we still have a standard duty-cycle remaining. (This can happen during starvation situations where the planned duty cycle is larger than the standard cycle. Most likely this isn't a concern for real scenarios, but it was observed in testing.) 5. Minor comment correction in `server.h` Signed-off-by: Jim Brunner <brunnerj@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This update addresses several issues in defrag:
server.cronloopswas no longer being incremented in thewhileBlockedCron(). This resulted in some memory statistics not being updated while blocked.kvstore*. However, on flushall or swapdb, the pointer may change. It's safer and more stable to use an index for the DB (a DBID). Then if the pointer changes, we can detect the change, and simply abort the stage. (If there's still fragmentation to deal with, we'll pick it up again on the next cycle.)server.h