Skip to content

Separate jobs for large memory tests with sanitizers#3161

Merged
zuiderkwast merged 6 commits into
valkey-io:unstablefrom
sarthakaggarwal97:skip-tests-asan
Feb 10, 2026
Merged

Separate jobs for large memory tests with sanitizers#3161
zuiderkwast merged 6 commits into
valkey-io:unstablefrom
sarthakaggarwal97:skip-tests-asan

Conversation

@sarthakaggarwal97

@sarthakaggarwal97 sarthakaggarwal97 commented Feb 4, 2026

Copy link
Copy Markdown
Contributor

We have been seeing github actions runners being OOM when large memory tests are run with ASan. The operation eventually is being canceled during the test.

This change moves the large-memory tests with ASan and UBSan to separate jobs, so we get a dedicated runner with its own timeout. We can tweak the number of simultaneous test clients for these tests without affecting the other test jobs.

Some recent failure examples:

  1. https://github.com/valkey-io/valkey/actions/runs/21573028934/job/62155215331#step:10:425
  2. https://github.com/valkey-io/valkey/actions/runs/21553319484/job/62105370976#step:10:425

@sarthakaggarwal97

Copy link
Copy Markdown
Contributor Author

@madolson I skipped the tests right now for scheduled runs. Let me know if you think we should skip the tests altogether. Thanks!

@codecov

codecov Bot commented Feb 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.94%. Comparing base (87caeb7) to head (ae55f09).
⚠️ Report is 4 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3161      +/-   ##
============================================
+ Coverage     74.90%   74.94%   +0.03%     
============================================
  Files           129      129              
  Lines         71327    71329       +2     
============================================
+ Hits          53429    53457      +28     
+ Misses        17898    17872      -26     

see 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons to keep running large memory with asan is to keep new bugs from being introduced, here #1748 large string bug was fixed because we did not have those runs before. Also if someone will see failure in extra tests they will not have reference to wether this change introduced new bug or it was already present.

@sarthakaggarwal97

Copy link
Copy Markdown
Contributor Author

@dvkashapov that's a good example but over there it affected a specific platform. But currently ASan tests with daily are only run on single machine / platform. So we are anyways not covering all the platforms.
I am more worried if these large memory tests might hide important failures since the operation just gets canceled due to machines going OOM.

Alternate is that I saw couple of specific tests which were affecting these runs, and maybe we can change the params (like with lesser memory) of those based on if the run is in ASan mode.

What do you think?

@dvkashapov

Copy link
Copy Markdown
Member

Alternate is that I saw couple of specific tests which were affecting these runs, and maybe we can change the params (like with lesser memory) of those based on if the run is in ASan mode.

Yes, this makes sense, what tests did affect the most?

@zuiderkwast

Copy link
Copy Markdown
Contributor

What if we skip the large-memory tests and then add a new separate job that only runs the large-memory tests?

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

@sarthakaggarwal97

sarthakaggarwal97 commented Feb 5, 2026

Copy link
Copy Markdown
Contributor Author

What if we skip the large-memory tests and then add a new separate job that only runs the large-memory tests?

Like as a part of daily itself? In another workflow? Can try it out.

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

This can slow down the execution time significantly. It takes about an hour right now.

Yes, this makes sense, what tests did affect the most?

test_quicklistCompressAndDecompressQuicklistListpackNode

@sarthakaggarwal97

Copy link
Copy Markdown
Contributor Author

Also, multiple sanitizer runs failed again yesterday: https://github.com/valkey-io/valkey/actions/runs/21693609685/job/62558879812#step:10:423

@zuiderkwast

Copy link
Copy Markdown
Contributor

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

This can slow down the execution time significantly. It takes about an hour right now.

Yes, but if this new job (a separate job in Daily) only runs the large-memory tests and nothing else, then it shouldn't take too long I guess.

@sarthakaggarwal97

Copy link
Copy Markdown
Contributor Author

@zuiderkwast I can try doing this and maybe run the tests and observe.

@sarthakaggarwal97

Copy link
Copy Markdown
Contributor Author

@zuiderkwast I have separated them into different jobs. Let me give it a few tries to run daily in my forked repo and see if it works.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
@sarthakaggarwal97

Copy link
Copy Markdown
Contributor Author

@zuiderkwast the asan tests passed in my local forked repo after separating the jobs: https://github.com/sarthakaggarwal97/valkey/actions/runs/21768844363

@sarthakaggarwal97 sarthakaggarwal97 changed the title Skip large memory tests in ASan Separate Jobs for large memory tests in ASan Feb 9, 2026

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@zuiderkwast

Copy link
Copy Markdown
Contributor

It looks like unit test was running but regular tests and module API tests were skipped in your repo:

image

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
@sarthakaggarwal97

Copy link
Copy Markdown
Contributor Author

@zuiderkwast thanks for the call out. I think as I am running these large memory tests we are seeing multiple errors. (#3184, #3174). Good to get this green before release.

@sarthakaggarwal97

Copy link
Copy Markdown
Contributor Author

@zuiderkwast zuiderkwast changed the title Separate Jobs for large memory tests in ASan Separate jobs for large memory tests with sanitizers Feb 10, 2026
@zuiderkwast zuiderkwast merged commit 42e6a0b into valkey-io:unstable Feb 10, 2026
24 checks passed
harrylin98 pushed a commit to harrylin98/valkey_forked that referenced this pull request Feb 19, 2026
We have been seeing github actions runners being OOM when large memory
tests are run with ASan. The operation eventually is being canceled
during the test.

This change moves the large-memory tests with ASan and UBSan to separate
jobs, so we get a dedicated runner with its own timeout. We can tweak
the number of simultaneous test clients for these tests without
affecting the other test jobs.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
We have been seeing github actions runners being OOM when large memory
tests are run with ASan. The operation eventually is being canceled
during the test.

This change moves the large-memory tests with ASan and UBSan to separate
jobs, so we get a dedicated runner with its own timeout. We can tweak
the number of simultaneous test clients for these tests without
affecting the other test jobs.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
enjoy-binbin pushed a commit that referenced this pull request Mar 12, 2026
Carries on from where #3161 left off. The test-sanitizer-address-large-memory
jobs were being OOM-killed on GitHub-hosted runners (15.6GB RAM) due to
ASAN's 2-3x memory overhead.

Changes:
- Skip 4GB quicklist compression test under ASAN (requires ~16-24GB with
dual buffers + ASAN overhead)
- Reduce integration test sizes from 5GB to 4.1GB (preserves >4GB 32-bit
boundary coverage)
- Reduce XADD iterations from 10 to 3
- Add memory monitoring to track minimum free memory during CI runs

Signed-off-by: Rain Valentine <rsg000@gmail.com>
JimB123 pushed a commit that referenced this pull request Mar 19, 2026
Carries on from where #3161 left off. The test-sanitizer-address-large-memory
jobs were being OOM-killed on GitHub-hosted runners (15.6GB RAM) due to
ASAN's 2-3x memory overhead.

Changes:
- Skip 4GB quicklist compression test under ASAN (requires ~16-24GB with
dual buffers + ASAN overhead)
- Reduce integration test sizes from 5GB to 4.1GB (preserves >4GB 32-bit
boundary coverage)
- Reduce XADD iterations from 10 to 3
- Add memory monitoring to track minimum free memory during CI runs

Signed-off-by: Rain Valentine <rsg000@gmail.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 27, 2026
…o#3263)

Carries on from where valkey-io#3161 left off. The test-sanitizer-address-large-memory
jobs were being OOM-killed on GitHub-hosted runners (15.6GB RAM) due to
ASAN's 2-3x memory overhead.

Changes:
- Skip 4GB quicklist compression test under ASAN (requires ~16-24GB with
dual buffers + ASAN overhead)
- Reduce integration test sizes from 5GB to 4.1GB (preserves >4GB 32-bit
boundary coverage)
- Reduce XADD iterations from 10 to 3
- Add memory monitoring to track minimum free memory during CI runs

Signed-off-by: Rain Valentine <rsg000@gmail.com>
(cherry picked from commit c9ce3e0)
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 28, 2026
…o#3263)

Carries on from where valkey-io#3161 left off. The test-sanitizer-address-large-memory
jobs were being OOM-killed on GitHub-hosted runners (15.6GB RAM) due to
ASAN's 2-3x memory overhead.

Changes:
- Skip 4GB quicklist compression test under ASAN (requires ~16-24GB with
dual buffers + ASAN overhead)
- Reduce integration test sizes from 5GB to 4.1GB (preserves >4GB 32-bit
boundary coverage)
- Reduce XADD iterations from 10 to 3
- Add memory monitoring to track minimum free memory during CI runs

Signed-off-by: Rain Valentine <rsg000@gmail.com>
(cherry picked from commit c9ce3e0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants