Separate jobs for large memory tests with sanitizers by sarthakaggarwal97 · Pull Request #3161 · valkey-io/valkey

sarthakaggarwal97 · 2026-02-04T00:09:43Z

We have been seeing github actions runners being OOM when large memory tests are run with ASan. The operation eventually is being canceled during the test.

This change moves the large-memory tests with ASan and UBSan to separate jobs, so we get a dedicated runner with its own timeout. We can tweak the number of simultaneous test clients for these tests without affecting the other test jobs.

Some recent failure examples:

sarthakaggarwal97 · 2026-02-04T00:13:32Z

@madolson I skipped the tests right now for scheduled runs. Let me know if you think we should skip the tests altogether. Thanks!

codecov · 2026-02-04T00:29:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.94%. Comparing base (87caeb7) to head (ae55f09).
⚠️ Report is 4 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3161      +/-   ##
============================================
+ Coverage     74.90%   74.94%   +0.03%     
============================================
  Files           129      129              
  Lines         71327    71329       +2     
============================================
+ Hits          53429    53457      +28     
+ Misses        17898    17872      -26

see 22 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dvkashapov

One of the reasons to keep running large memory with asan is to keep new bugs from being introduced, here #1748 large string bug was fixed because we did not have those runs before. Also if someone will see failure in extra tests they will not have reference to wether this change introduced new bug or it was already present.

sarthakaggarwal97 · 2026-02-04T20:29:59Z

@dvkashapov that's a good example but over there it affected a specific platform. But currently ASan tests with daily are only run on single machine / platform. So we are anyways not covering all the platforms.
I am more worried if these large memory tests might hide important failures since the operation just gets canceled due to machines going OOM.

Alternate is that I saw couple of specific tests which were affecting these runs, and maybe we can change the params (like with lesser memory) of those based on if the run is in ASan mode.

What do you think?

dvkashapov · 2026-02-05T05:45:44Z

Alternate is that I saw couple of specific tests which were affecting these runs, and maybe we can change the params (like with lesser memory) of those based on if the run is in ASan mode.

Yes, this makes sense, what tests did affect the most?

zuiderkwast · 2026-02-05T09:37:06Z

What if we skip the large-memory tests and then add a new separate job that only runs the large-memory tests?

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

sarthakaggarwal97 · 2026-02-05T17:00:49Z

What if we skip the large-memory tests and then add a new separate job that only runs the large-memory tests?

Like as a part of daily itself? In another workflow? Can try it out.

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

This can slow down the execution time significantly. It takes about an hour right now.

Yes, this makes sense, what tests did affect the most?

test_quicklistCompressAndDecompressQuicklistListpackNode

sarthakaggarwal97 · 2026-02-05T17:02:42Z

Also, multiple sanitizer runs failed again yesterday: https://github.com/valkey-io/valkey/actions/runs/21693609685/job/62558879812#step:10:423

zuiderkwast · 2026-02-05T22:25:35Z

To use less memory, we can run with --clients 1 (or another small number) instead of the default 10, to avoid parallelizing the tests.

This can slow down the execution time significantly. It takes about an hour right now.

Yes, but if this new job (a separate job in Daily) only runs the large-memory tests and nothing else, then it shouldn't take too long I guess.

sarthakaggarwal97 · 2026-02-05T23:41:50Z

@zuiderkwast I can try doing this and maybe run the tests and observe.

sarthakaggarwal97 · 2026-02-06T22:34:21Z

@zuiderkwast I have separated them into different jobs. Let me give it a few tries to run daily in my forked repo and see if it works.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

sarthakaggarwal97 · 2026-02-09T19:16:18Z

@zuiderkwast the asan tests passed in my local forked repo after separating the jobs: https://github.com/sarthakaggarwal97/valkey/actions/runs/21768844363

dvkashapov

Awesome, thank you!

zuiderkwast

Looks good.

zuiderkwast · 2026-02-10T11:53:17Z

It looks like unit test was running but regular tests and module API tests were skipped in your repo:

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

sarthakaggarwal97 · 2026-02-10T17:34:12Z

@zuiderkwast thanks for the call out. I think as I am running these large memory tests we are seeing multiple errors. (#3184, #3174). Good to get this green before release.

sarthakaggarwal97 · 2026-02-10T17:38:35Z

Latest Run: https://github.com/sarthakaggarwal97/valkey/actions/runs/21871589440/job/63130770511

We have been seeing github actions runners being OOM when large memory tests are run with ASan. The operation eventually is being canceled during the test. This change moves the large-memory tests with ASan and UBSan to separate jobs, so we get a dedicated runner with its own timeout. We can tweak the number of simultaneous test clients for these tests without affecting the other test jobs. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

We have been seeing github actions runners being OOM when large memory tests are run with ASan. The operation eventually is being canceled during the test. This change moves the large-memory tests with ASan and UBSan to separate jobs, so we get a dedicated runner with its own timeout. We can tweak the number of simultaneous test clients for these tests without affecting the other test jobs. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Carries on from where #3161 left off. The test-sanitizer-address-large-memory jobs were being OOM-killed on GitHub-hosted runners (15.6GB RAM) due to ASAN's 2-3x memory overhead. Changes: - Skip 4GB quicklist compression test under ASAN (requires ~16-24GB with dual buffers + ASAN overhead) - Reduce integration test sizes from 5GB to 4.1GB (preserves >4GB 32-bit boundary coverage) - Reduce XADD iterations from 10 to 3 - Add memory monitoring to track minimum free memory during CI runs Signed-off-by: Rain Valentine <rsg000@gmail.com>

…o#3263) Carries on from where valkey-io#3161 left off. The test-sanitizer-address-large-memory jobs were being OOM-killed on GitHub-hosted runners (15.6GB RAM) due to ASAN's 2-3x memory overhead. Changes: - Skip 4GB quicklist compression test under ASAN (requires ~16-24GB with dual buffers + ASAN overhead) - Reduce integration test sizes from 5GB to 4.1GB (preserves >4GB 32-bit boundary coverage) - Reduce XADD iterations from 10 to 3 - Add memory monitoring to track minimum free memory during CI runs Signed-off-by: Rain Valentine <rsg000@gmail.com> (cherry picked from commit c9ce3e0)

sarthakaggarwal97 requested a review from madolson February 4, 2026 00:09

github-actions Bot assigned sarthakaggarwal97 Feb 4, 2026

dvkashapov reviewed Feb 4, 2026

View reviewed changes

sarthakaggarwal97 force-pushed the skip-tests-asan branch from d0ff007 to e0c1b4d Compare February 6, 2026 22:09

sarthakaggarwal97 added 5 commits February 9, 2026 10:31

skip large memory tests in asan

df0fb42

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

some fix

6406d5f

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

separate for ASan undefined

9e0ef1e

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

fix

af6804c

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

run on schedule

7630f2e

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

sarthakaggarwal97 force-pushed the skip-tests-asan branch from 641e70f to 7630f2e Compare February 9, 2026 18:32

sarthakaggarwal97 requested review from roshkhatri and zuiderkwast February 9, 2026 22:50

sarthakaggarwal97 changed the title ~~Skip large memory tests in ASan~~ Separate Jobs for large memory tests in ASan Feb 9, 2026

sarthakaggarwal97 requested a review from Nikhil-Manglore February 9, 2026 23:09

dvkashapov approved these changes Feb 10, 2026

View reviewed changes

zuiderkwast approved these changes Feb 10, 2026

View reviewed changes

reducing clients

ae55f09

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

zuiderkwast mentioned this pull request Feb 10, 2026

Fix PFADD corrupted sparse HLL handling by restoring hllSparseSet return type #3184

Merged

zuiderkwast approved these changes Feb 10, 2026

View reviewed changes

zuiderkwast changed the title ~~Separate Jobs for large memory tests in ASan~~ Separate jobs for large memory tests with sanitizers Feb 10, 2026

zuiderkwast merged commit 42e6a0b into valkey-io:unstable Feb 10, 2026
24 checks passed

rainsupreme mentioned this pull request Feb 27, 2026

Fix OOM aborts in large-memory ASAN tests on GitHub runners #3263

Merged

Uh oh!

Conversation

sarthakaggarwal97 commented Feb 4, 2026 • edited by zuiderkwast Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarthakaggarwal97 commented Feb 4, 2026

Uh oh!

codecov Bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dvkashapov left a comment

Choose a reason for hiding this comment

Uh oh!

sarthakaggarwal97 commented Feb 4, 2026

Uh oh!

dvkashapov commented Feb 5, 2026

Uh oh!

zuiderkwast commented Feb 5, 2026

Uh oh!

sarthakaggarwal97 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarthakaggarwal97 commented Feb 5, 2026

Uh oh!

zuiderkwast commented Feb 5, 2026

Uh oh!

sarthakaggarwal97 commented Feb 5, 2026

Uh oh!

sarthakaggarwal97 commented Feb 6, 2026

Uh oh!

sarthakaggarwal97 commented Feb 9, 2026

Uh oh!

dvkashapov left a comment

Choose a reason for hiding this comment

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

zuiderkwast commented Feb 10, 2026

Uh oh!

sarthakaggarwal97 commented Feb 10, 2026

Uh oh!

sarthakaggarwal97 commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sarthakaggarwal97 commented Feb 4, 2026 •

edited by zuiderkwast

Loading

codecov Bot commented Feb 4, 2026 •

edited

Loading

sarthakaggarwal97 commented Feb 5, 2026 •

edited

Loading