Additional assertions for listpack#2027
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2027 +/- ##
============================================
+ Coverage 70.97% 71.08% +0.10%
============================================
Files 123 122 -1
Lines 66135 66175 +40
============================================
+ Hits 46941 47039 +98
+ Misses 19194 19136 -58
🚀 New features to boost your workflow:
|
835ef54 to
3c97025
Compare
|
Few questions:
|
|
Thank you for reviewing it! I see a few reasons why listpack data could be invalid, all have low probability:
If such a situation occurs, I prefer to panic/assert over returning invalid data or undefined behaviour. Of course, this change does not guarantee that it will be caught, just makes it more likely. I've done a simple valkey-benchmark test:
I did not observe a significant change in performance.
This is hardly a comprehensive performance test, of course. At least for this setup, the difference Other than running the existing tests, I've manually spot checked some of the asserts by attaching |
Adds a few asserts to increase the chance of detecting corrupted listpack data. Signed-off-by: Stefan Mueller <muelstef@amazon.com>
df35628 to
eca224f
Compare
| uval = 12345678900000000ULL + p[0]; | ||
| negstart = UINT64_MAX; | ||
| negmax = 0; | ||
| panic("Invalid listpack encoding. Byte %02hhx is not a valid encoding.", p[0]); |
There was a problem hiding this comment.
So I suppose we can't do this. While validating listpack entries, we need to fetch some of the data out to validate the integrity. However, if the listpack itself is itself malformed, we might try to extract the data out. The alternative would be to check every single point in streamValidateListpackIntegrity to verify we still have more data to read.
There is existing code that depends on the previous behaviour. See valkey-io#2027. Signed-off-by: Stefan Mueller <muelstef@amazon.com>
|
The fact that we allow loading corrupted listpacks from RDB is the source of this problem and complexity. We have already decided that we want to validate the dump before we load it, so we will not need to allow correpted listpacks to exist in the database anymore. The decision is here: #399 (comment) but the PR was never finished. |
Removing the `panic()` statement introduced previously. There is existing code that depends on the previous behaviour. See valkey-io#2027. Signed-off-by: Stefan Mueller <muelstef@amazon.com> Signed-off-by: shanwan1 <shanwan1@intel.com>
Remove lpAssertValidEntry calls from lpNext, lpPrev, lpFirst, lpFind, and lpDeleteRangeWithEntry to eliminate redundant per-operation listpack validation. Safety is maintained by making deep_integrity_validation unconditional in rdbLoadObject, ensuring all data structures (listpack, intset, stream) are fully validated on every load path (RDB and RESTORE). This effectively makes sanitize-dump-payload a no-op. Benchmark results (Graviton3, 200 clients, P=32, 8 threads): 256-field hash HGET: +4.7% throughput (907K -> 950K rps) 256-field hash HSET: +1.2% throughput (688K -> 696K rps) 100-field hash HSET: +2.4% throughput (688K -> 704K rps) p99 latency (256-field HSET): -16% (11.15ms -> 9.38ms) RDB load (1M keys): +1% overhead (acceptable) Lightweight EOF position assertions from PR valkey-io#2027 remain intact. Takes over: valkey-io#399 Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Remove lpAssertValidEntry calls from lpNext, lpPrev, lpFirst, lpFind, and lpDeleteRangeWithEntry to eliminate redundant per-operation listpack validation. Safety is maintained by making deep_integrity_validation unconditional in rdbLoadObject, ensuring all data structures (listpack, intset, stream) are fully validated on every load path (RDB and RESTORE). This effectively makes sanitize-dump-payload a no-op. Benchmark results (Graviton3, 200 clients, P=32, 8 threads): 256-field hash HGET: +4.7% throughput (907K -> 950K rps) 256-field hash HSET: +1.2% throughput (688K -> 696K rps) 100-field hash HSET: +2.4% throughput (688K -> 704K rps) p99 latency (256-field HSET): -16% (11.15ms -> 9.38ms) RDB load (1M keys): +1% overhead (acceptable) Lightweight EOF position assertions from PR valkey-io#2027 remain intact. Takes over: valkey-io#399 Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Remove lpAssertValidEntry calls from lpNext, lpPrev, lpFirst, lpFind, and lpDeleteRangeWithEntry to eliminate redundant per-operation listpack validation. Safety is maintained by making deep_integrity_validation unconditional in rdbLoadObject, ensuring all data structures (listpack, intset, stream) are fully validated on every load path (RDB and RESTORE). This effectively makes sanitize-dump-payload a no-op. Benchmark results (Graviton3, 200 clients, P=32, 8 threads): 256-field hash HGET: +4.7% throughput (907K -> 950K rps) 256-field hash HSET: +1.2% throughput (688K -> 696K rps) 100-field hash HSET: +2.4% throughput (688K -> 704K rps) p99 latency (256-field HSET): -16% (11.15ms -> 9.38ms) RDB load (1M keys): +1% overhead (acceptable) Lightweight EOF position assertions from PR valkey-io#2027 remain intact. Takes over: valkey-io#399 Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Adds a few asserts to increase the chance of detecting corrupted listpack data.
It replaces a questionable code path that purposefully returns invalid data with a
panic()and addsassert()statements to verify thatLP_EOFis only detected at the end of the listpack.Listpack data is expected to be correctly encoded. These asserts should only trigger if the listpack data was corrupted in some way, in which case it is better to assert than to return invalid data to the client.