Additional assertions for listpack by muelstefamzn · Pull Request #2027 · valkey-io/valkey

muelstefamzn · 2025-04-29T20:46:01Z

Adds a few asserts to increase the chance of detecting corrupted listpack data.

It replaces a questionable code path that purposefully returns invalid data with a panic() and adds assert() statements to verify that LP_EOF is only detected at the end of the listpack.

Listpack data is expected to be correctly encoded. These asserts should only trigger if the listpack data was corrupted in some way, in which case it is better to assert than to return invalid data to the client.

codecov · 2025-04-29T21:01:52Z

Codecov Report

Attention: Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 71.08%. Comparing base (70f2057) to head (eca224f).
Report is 27 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/listpack.c	93.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2027      +/-   ##
============================================
+ Coverage     70.97%   71.08%   +0.10%     
============================================
  Files           123      122       -1     
  Lines         66135    66175      +40     
============================================
+ Hits          46941    47039      +98     
+ Misses        19194    19136      -58

Files with missing lines	Coverage Δ
src/listpack.c	`91.60% <93.33%> (+1.81%)`	⬆️

... and 29 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

xbasel · 2025-04-30T08:30:36Z

Few questions:

Is this addressing an issue that has occurred in practice (e.g. corruption caught in real-world usage)?
The code adds assertions in a hot path. Do we have any benchmarks or profiling to understand the performance penalty of adding these assertions?
How has this change been tested?

muelstefamzn · 2025-05-01T17:53:16Z

Thank you for reviewing it!

I see a few reasons why listpack data could be invalid, all have low probability:

Memory corruption due to a hardware defect.
Snapshot corruption during storage/transfer that is not caught by the checksum.
A bug that creates invalid data. I'm not aware of such a bug in Valkey.

If such a situation occurs, I prefer to panic/assert over returning invalid data or undefined behaviour. Of course, this change does not guarantee that it will be caught, just makes it more likely.

I've done a simple valkey-benchmark test:

Pre-filled the node with 100k hash keys with four fields each, containing integers.
Ran valkey-benchmark -r 100000 -n 1000000 HINCRBY __rand_int__ field 1.
This exercises lpFirst(), lpFind(), lpNext(), lpGetWithSize(), and others.
Repeated the test 5x with and without the change.

I did not observe a significant change in performance.

Without the change: 83157.334 RPS (standard deviation +-381.4)
With the change: 83417.814 RPS (standard deviation +-332.5)

This is hardly a comprehensive performance test, of course. At least for this setup, the difference
is less than the standard deviation (+-0.4%). By chance, averaged a little higher with the change.

Other than running the existing tests, I've manually spot checked some of the asserts by attaching
a debugger and modifying the listpack data in-situ to contain the LP_EOF byte at invalid locations.
The asserts trigger as expected, creating a crash report.

Adds a few asserts to increase the chance of detecting corrupted listpack data. Signed-off-by: Stefan Mueller <muelstef@amazon.com>

madolson

LGTM

madolson · 2025-05-13T20:25:01Z

-        uval = 12345678900000000ULL + p[0];
-        negstart = UINT64_MAX;
-        negmax = 0;
+        panic("Invalid listpack encoding. Byte %02hhx is not a valid encoding.", p[0]);


So I suppose we can't do this. While validating listpack entries, we need to fetch some of the data out to validate the integrity. However, if the listpack itself is itself malformed, we might try to extract the data out. The alternative would be to check every single point in streamValidateListpackIntegrity to verify we still have more data to read.

There is existing code that depends on the previous behaviour. See valkey-io#2027. Signed-off-by: Stefan Mueller <muelstef@amazon.com>

Removing the `panic()` statement introduced previously. There is existing code that depends on the previous behaviour. See #2027. Signed-off-by: Stefan Mueller <muelstef@amazon.com>

zuiderkwast · 2025-05-14T07:19:55Z

The fact that we allow loading corrupted listpacks from RDB is the source of this problem and complexity. We have already decided that we want to validate the dump before we load it, so we will not need to allow correpted listpacks to exist in the database anymore. The decision is here: #399 (comment) but the PR was never finished.

Removing the `panic()` statement introduced previously. There is existing code that depends on the previous behaviour. See valkey-io#2027. Signed-off-by: Stefan Mueller <muelstef@amazon.com> Signed-off-by: shanwan1 <shanwan1@intel.com>

Remove lpAssertValidEntry calls from lpNext, lpPrev, lpFirst, lpFind, and lpDeleteRangeWithEntry to eliminate redundant per-operation listpack validation. Safety is maintained by making deep_integrity_validation unconditional in rdbLoadObject, ensuring all data structures (listpack, intset, stream) are fully validated on every load path (RDB and RESTORE). This effectively makes sanitize-dump-payload a no-op. Benchmark results (Graviton3, 200 clients, P=32, 8 threads): 256-field hash HGET: +4.7% throughput (907K -> 950K rps) 256-field hash HSET: +1.2% throughput (688K -> 696K rps) 100-field hash HSET: +2.4% throughput (688K -> 704K rps) p99 latency (256-field HSET): -16% (11.15ms -> 9.38ms) RDB load (1M keys): +1% overhead (acceptable) Lightweight EOF position assertions from PR valkey-io#2027 remain intact. Takes over: valkey-io#399 Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>

muelstefamzn force-pushed the validation-1 branch from 835ef54 to 3c97025 Compare April 29, 2025 21:35

xbasel reviewed Apr 30, 2025

View reviewed changes

Comment thread src/listpack.c Outdated

Comment thread src/listpack.c Outdated

madolson reviewed May 8, 2025

View reviewed changes

Comment thread src/listpack.c Outdated

Comment thread src/listpack.c Outdated

muelstefamzn force-pushed the validation-1 branch from 3c97025 to df35628 Compare May 8, 2025 23:04

ranshid reviewed May 9, 2025

View reviewed changes

Comment thread src/listpack.c Outdated

Comment thread src/listpack.c

Additional assertions for listpack

eca224f

Adds a few asserts to increase the chance of detecting corrupted listpack data. Signed-off-by: Stefan Mueller <muelstef@amazon.com>

muelstefamzn force-pushed the validation-1 branch from df35628 to eca224f Compare May 12, 2025 16:38

madolson approved these changes May 12, 2025

View reviewed changes

madolson merged commit 9f7aa04 into valkey-io:unstable May 12, 2025

madolson reviewed May 13, 2025

View reviewed changes

muelstefamzn added a commit to muelstefamzn/valkey that referenced this pull request May 13, 2025

Reverting the panic() introduced in valkey-io#2027

b8ed9ca

There is existing code that depends on the previous behaviour. See valkey-io#2027. Signed-off-by: Stefan Mueller <muelstef@amazon.com>

muelstefamzn mentioned this pull request May 13, 2025

Reverting the panic() introduced in #2027 #2081

Merged

jjuleslasarte mentioned this pull request May 14, 2026

Force listpack validation on load to prevent deferred assertions #3721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Additional assertions for listpack#2027

Additional assertions for listpack#2027
madolson merged 1 commit into
valkey-io:unstablefrom
muelstefamzn:validation-1

muelstefamzn commented Apr 29, 2025

Uh oh!

codecov Bot commented Apr 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

xbasel commented Apr 30, 2025

Uh oh!

muelstefamzn commented May 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson left a comment

Uh oh!

madolson May 13, 2025

Uh oh!

zuiderkwast commented May 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

muelstefamzn commented Apr 29, 2025

Uh oh!

codecov Bot commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

xbasel commented Apr 30, 2025

Uh oh!

muelstefamzn commented May 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

madolson May 13, 2025

Choose a reason for hiding this comment

Uh oh!

zuiderkwast commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov Bot commented Apr 29, 2025 •

edited

Loading

zuiderkwast commented May 14, 2025 •

edited

Loading