Skip to content

Conversation

@hpatro
Copy link
Contributor

@hpatro hpatro commented Oct 18, 2023

Temporarily disabling few of the defrag tests in cluster mode to make the daily run stable:

  1. Active defrag eval scripts
  2. Active defrag big keys
  3. Active defrag big list
  4. Active defrag edge case

Few scenarios observed and I'm investigating:

  1. defrag not started - Either the defrag scan got completed real quick or never started.
  2. Actual fragmentation is lower than the expected fragmentation - Might require us to lower the fragmentation expectation or have separate threshold for cluster mode.
  3. Max latency exceeds from the current set threshold.
  4. defrag didn't stop - Sometimes allocator_frag_ratio reaches 1.06 and the test fails. Expectation is 1.05.

Failure run: https://github.com/redis/redis/actions/runs/6527277037/job/17721912648

@oranagra
Copy link
Member

Trueth be told, when you added the loop to run all the Defrag tests twice, I thought it was excessive, but I think the one about big keys, or big list is actually needed (we had some concerns about the DefragLater list in cluster mode)

@hpatro
Copy link
Contributor Author

hpatro commented Oct 19, 2023

Trueth be told, when you added the loop to run all the Defrag tests twice, I thought it was excessive, but I think the one about big keys, or big list is actually needed (we had some concerns about the DefragLater list in cluster mode)

Wasn't expecting this amount of flakiness at each validation level. Seems the tests were particularly crafted for standalone setup. I'm trying to fix them here #12674

@oranagra
Copy link
Member

I don't understand the changes of the other PR, i guess it's still WIP.
since it takes time and the diff here is trivial to revert, i'll merge it just to silence the errors for now.

@oranagra oranagra merged commit becd50d into redis:unstable Oct 19, 2023
@hpatro
Copy link
Contributor Author

hpatro commented Oct 19, 2023

I don't understand the changes of the other PR, i guess it's still WIP. since it takes time and the diff here is trivial to revert, i'll merge it just to silence the errors for now.

@oranagra I've explained the issue with the current logic for defrag not started failure in this comment #12674 (comment). PTAL.

oranagra pushed a commit that referenced this pull request Oct 27, 2023
Fixing issues described in #12672, started after #11695
Related to #12674

Fixes the `defrag didn't stop' issue.

In some cases of how the keys were stored in memory
defrag_later_item_in_progress was not getting reset once we finish
defragging the later items and we move to the next slot. This stopped
the scan to happen in the later slots and did not get 
oranagra added a commit that referenced this pull request Nov 2, 2023
Reverts the skipping defrag tests in cluster mode (done in #12672.
instead it skips only some defrag tests that are relevant for cluster modes.
The test now run well after investigating and making the changes in #12674 and #12694.

Co-authored-by: Oran Agra <oran@redislabs.com>
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 25, 2025
Fixing issues described in redis#12672, started after redis#11695
Related to redis#12674

Fixes the `defrag didn't stop' issue.

In some cases of how the keys were stored in memory
defrag_later_item_in_progress was not getting reset once we finish
defragging the later items and we move to the next slot. This stopped
the scan to happen in the later slots and did not get
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants