Skip to content

Introduce DEBUG_DEFRAG compilation option to allow run test with activedefrag when allocator is not jemalloc#14326

Merged
sundb merged 34 commits intoredis:unstablefrom
sundb:force-defrag
Sep 10, 2025
Merged

Introduce DEBUG_DEFRAG compilation option to allow run test with activedefrag when allocator is not jemalloc#14326
sundb merged 34 commits intoredis:unstablefrom
sundb:force-defrag

Conversation

@sundb
Copy link
Copy Markdown
Collaborator

@sundb sundb commented Sep 2, 2025

This PR is based on valkey-io/valkey#1303

This PR introduces a DEBUG_DEFRAG compilation option that enables activedefrag functionality even when the allocator is not jemalloc, and always forces defragmentation regardless of the amount or ratio of fragmentation.

Using

make SANITIZER=address DEBUG_DEFRAG=<force|fully>
./runtest --debug-defrag
  • DEBUG_DEFRAG=force

    • Ignore the threshold for defragmentation to ensure that defragmentation is always triggered.
    • Always reallocate pointers to probe for correctness issues in pointer reallocation.
  • DEBUG_DEFRAG=fully

    • Includes everything in the option force.
    • Additionally performs a full defrag on every defrag cycle, which is significantly slower but more accurate.

Co-authored-by: Ran Shidlansik ranshid@amazon.com

@snyk-io
Copy link
Copy Markdown

snyk-io bot commented Sep 2, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@sundb sundb added release-notes indication that this issue needs to be mentioned in the release notes and removed release-notes indication that this issue needs to be mentioned in the release notes labels Sep 4, 2025
@sundb sundb added this to Redis 8.4 Sep 4, 2025
@github-project-automation github-project-automation bot moved this to Todo in Redis 8.4 Sep 4, 2025
sundb and others added 2 commits September 4, 2025 14:08
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sundb sundb marked this pull request as ready for review September 4, 2025 08:22
@sundb sundb requested a review from Copilot September 4, 2025 08:23

This comment was marked as outdated.

@sundb
Copy link
Copy Markdown
Collaborator Author

sundb commented Sep 4, 2025

defrag CI(failed) with commit 0257441 (#14326):
https://github.com/sundb/redis/actions/runs/17460851901/job/49584975468

defrag CI(all tests passed except a known issue) with commit c8fa4bb (#14326)

@sundb sundb requested a review from oranagra September 4, 2025 13:02
Copy link
Copy Markdown
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this causes the server to always detect fragmentation, and when it reaches a pointer, it always relocates it, but it still runs only on server cron, so there's a good chance many tests will reach certain states, and then clear these states before the defragger had a chance to mess them up.
maybe add another mode in which we run full defrag cycle in each beforeSleep (then we don't even need to fake high fragmentation metric)

@sundb
Copy link
Copy Markdown
Collaborator Author

sundb commented Sep 6, 2025

maybe add another mode in which we run full defrag cycle in each beforeSleep (then we don't even need to fake high fragmentation metric)

@oranagra in the commit, I changed the defrag duty cycle to 30 seconds to avoid failures caused by a large number of slow tests. The effect is roughly the same as manually forcing a full cycle in beforeSleep(). Through this change, I also discovered another issue related to defragmentation.
The main reason is that if we want to add a full fully defrag in beforeSleep, it will bring quite a lot of code changes. I choose a relatively simple way.

@oranagra
Copy link
Copy Markdown
Member

oranagra commented Sep 6, 2025

Ok. I missed that. That's indeed probably sufficient to scan the whole thing in one cycle.
But I don't currently remember if we added some defragmentation in beforeSleep or is it still only in server serverCron (im AFK).
If it's only in cron, it'll be ineffective (maybe states will get undone before it sees them).
P.s. I suppose we can skip all tests that use large amount of data, and also we can reduce the interval to weekly.

@sundb
Copy link
Copy Markdown
Collaborator Author

sundb commented Sep 7, 2025

Ok. I missed that. That's indeed probably sufficient to scan the whole thing in one cycle. But I don't currently remember if we added some defragmentation in beforeSleep or is it still only in server serverCron (im AFK). If it's only in cron, it'll be ineffective (maybe states will get undone before it sees them).

We are no longer doing defragmentation in cron now, but in ae timer. So after the commit: 2432cff, As long as the fragment is complete and the cycle does not exceed 30 seconds (if one completed cycle ends, we will immediately start the next one), each ae timer will be triggered.

P.s. I suppose we can skip all tests that use large amount of data, and also we can reduce the interval to weekly.

I tried to comment out the tests that are consuming too much time, but there might be quite a few tests that need to be commented out. Below is the test duration consumption when I used fully defrag. The total test time was one and a half hours.

166 seconds - integration/replication-rdbchannel
  370 seconds - unit/scan
  334 seconds - unit/type/zset
  396 seconds - unit/maxmemory
  419 seconds - unit/other
  386 seconds - unit/type/string
  246 seconds - integration/replication-psync
  461 seconds - unit/lazyfree
  352 seconds - integration/replication-2
  508 seconds - integration/replication
  1359 seconds - unit/sort
  2030 seconds - unit/scripting

@oranagra
Copy link
Copy Markdown
Member

oranagra commented Sep 7, 2025

ok. so essentially it'll run on each event loop.

regarding duration, that's not too bad..
a better approach could be to check duration of each individual test, then the ones taking very long will stand out (not the units that just happen to have a lot of tests)

@oranagra
Copy link
Copy Markdown
Member

oranagra commented Sep 7, 2025

from my perspective, you can merge it.

@sundb sundb changed the title Introduce FORCE_DEFRAG compilation option to allow activedefrag run when allocator is not jemalloc Introduce DEBUG_DEFRAG compilation option to allow activedefrag run when allocator is not jemalloc Sep 8, 2025
@sundb sundb changed the title Introduce DEBUG_DEFRAG compilation option to allow activedefrag run when allocator is not jemalloc Introduce DEBUG_DEFRAG compilation option to allow run test with activedefrag when allocator is not jemalloc Sep 8, 2025
@sundb
Copy link
Copy Markdown
Collaborator Author

sundb commented Sep 8, 2025

@sundb
Copy link
Copy Markdown
Collaborator Author

sundb commented Sep 9, 2025

Intentional failures(commit c8876fc) can be detected by this PR.
failed CI: https://github.com/sundb/redis/actions/runs/17576123234/job/49930009235

@sundb sundb requested a review from Copilot September 9, 2025 11:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a DEBUG_DEFRAG compilation option that enables active defragmentation functionality for testing purposes even when jemalloc is not available. The changes allow tests to run with defragmentation enabled using alternative allocators like AddressSanitizer.

  • Adds DEBUG_DEFRAG compilation flags (force/fully modes) to enable defrag testing without jemalloc
  • Implements alternative defrag functions for non-jemalloc allocators that always perform defragmentation
  • Updates test framework to support --debug-defrag flag and skip incompatible tests

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/Makefile Adds DEBUG_DEFRAG compilation flags for force and fully modes
src/zmalloc.h Extends HAVE_DEFRAG macro to include DEBUG_DEFRAG_FORCE
src/zmalloc.c Conditionally compiles jemalloc-specific functions
src/defrag.c Implements alternative defrag functions and forces high fragmentation reporting
tests/test_helper.tcl Adds --debug-defrag command line option
tests/support/server.tcl Adds debug_defrag:skip tag handling and defrag config
.github/workflows/daily.yml Adds CI job for testing with DEBUG_DEFRAG
tests/unit/*.tcl Adds debug_defrag:skip tags to tests incompatible with debug defrag mode
tests/integration/*.tcl Adds debug_defrag:skip tags to replication tests
tests/unit/moduleapi/*.tcl Adds debug_defrag:skip tags to module tests

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sundb
Copy link
Copy Markdown
Collaborator Author

sundb commented Sep 9, 2025

Run the CI ten times and all passed, i think we can merge it now.

@sundb sundb merged commit 60adba4 into redis:unstable Sep 10, 2025
18 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Redis 8.4 Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants