Force listpack validation on load to prevent deferred assertions#3721
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemove per-entry listpack assertions; make listpack integrity validation unconditional and deep at RDB/RESTORE/load time; remove sanitize-dump-payload config/ACL flags (deprecated no-ops); and update tests and fuzzers to expect RESTORE rejection for corrupt payloads and verify server liveness. ChangesValidation Pipeline Refactor
Sequence Diagram (RDB load validation): sequenceDiagram
participant rdbLoad as rdbLoadObject()
participant LPdup as lpValidateIntegrityAndDups()
participant Loader as Quicklist/Set/Zset/Hash/Stream loaders
rdbLoad->>LPdup: always invoke deep validation (pairs)
Loader->>LPdup: call with updated signature
rdbLoad->>Loader: proceed with load after validation
Estimated code review effort: Suggested reviewers:
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
zuiderkwast
left a comment
There was a problem hiding this comment.
Nice!
The main point for this change IMO is security.
Scenario:
- A user has access to RESTORE for legitimate reasons, to migrate data from some other node using DUMP/RESTORE.
- The client is compromized and the attacker sends a RESTORE with a corrupted listpack. It passes.
- Later, on access, the server runs lpAssertValidEntry and hits the assert.
- Result: segfault. Server is down. Denial of service.
Performance is a secondary thing. Thanks for the benchmarks!
We should also note that full sync and RESTORE get slower for users who had validation turned off before.
Please also:
- Update config.c to move the config to the list of deprecated configs (so that we still accept it but it's a no-op).
- Update valkey.conf the section about this config, possibly remove the section entirely or perhaps just keep one line saying that it's deprecated in Valkey 10 and has no effect.
- Remove that ACL flag related to this, mentioned in the chat.
There was a problem hiding this comment.
As @zuiderkwast mentioned: its also important to deprecate config, mention that in valkey.conf and delete ACL user flag. Other than that looks good! Also we would need to update documentation to mention those changes
We should also note that full sync and RESTORE get slower for users who had validation turned off before.
I think we can skip this because of benchmark results: RDB load (1M keys): +1% overhead
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/corrupt-dump-fuzzer.tcl`:
- Around line 86-90: The FreeBSD branch still allows the fuzzer to run once by
setting min_cycles and min_duration to 1; change the FreeBSD handling (the
conditional that checks [exec uname] == "FreeBSD") so it skips execution
entirely instead of reducing to one run—e.g., exit or set a skip flag before the
fuzzer loop—or add an early return/abort when running on FreeBSD so the fuzzer
loop that uses min_cycles/min_duration is never invoked.
- Around line 183-191: When valgrind or ASan issues are detected by
find_valgrind_errors or sanitizer_errors_from_file the test only logs and
restarts; change the handling so the fuzzer run fails immediately for those
payloads by invoking the test failure path (e.g., call fail with a descriptive
message) in addition to setting report_and_restart and print_commands. Update
the block that checks valgrind_errors and asan_errors so that when either is
non-empty it calls fail (including printable_dump and the error details) to
ensure the test run is marked as failed rather than silently continuing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c84d6888-0ca4-47b0-960d-4f1024481c7f
📒 Files selected for processing (11)
src/acl.csrc/config.csrc/fuzzer_command_generator.csrc/server.hsrc/valkey-check-rdb.ctests/integration/corrupt-dump-fuzzer.tcltests/integration/corrupt-dump.tcltests/integration/rdb.tcltests/unit/acl.tcltests/unit/moduleapi/usercall.tclvalkey.conf
💤 Files with no reviewable changes (4)
- src/valkey-check-rdb.c
- src/fuzzer_command_generator.c
- tests/integration/rdb.tcl
- src/acl.c
✅ Files skipped from review due to trivial changes (1)
- tests/unit/moduleapi/usercall.tcl
|
@dvkashapov @zuiderkwast once this is approved, should i raise the corresponding docs PR? |
|
The PR title is not very understandable for users, e.g. in release notes. Can you change it to describe behaviour instead of internal code? I mean that we sanitize on load, not on access and traversal. I don't like to use the "semanic commit" title style. We don't use it normally so the few PRs that do this stick out. Also can you mention anything about the security aspect in the title or description? That forcing sanitization at load time prevents invalid data from being loaded into memory and later hitting an assert. |
ACK, will do. Yeah, the semantic commit style is my bad, i had an LLM do it for me and they are really prone to it. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/rdb.c (1)
1900-1905:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the helper comment to match the new API.
Lines 1900-1904 still describe a removed
deepparameter, so this now documents behavior the function no longer supports.As per coding guidelines, "Document why code exists, not just what it does; document all functions in C code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rdb.c` around lines 1900 - 1905, Update the function comment for lpValidateIntegrityAndDups to reflect the current API (remove any mention of the removed `deep` parameter) and explain why the validator exists; describe what the function validates now (header integrity and full listpack scanning behavior), the meaning of the `pairs` parameter (0 => unique elements as set, 1 => odd elements unique as key/value map), the expected inputs (`unsigned char *lp`, `size_t size`) and what the return value indicates, and add a short note about when callers should use this check (purpose/context of the validation).
🧹 Nitpick comments (1)
src/rdb.c (1)
1934-1938: Request@core-teamreview for this validation-policy change.This moves corruption handling behavior in
rdbLoadObject(), which is exactly the kind ofrdb.cchange that should get explicit architectural sign-off before merge.As per coding guidelines, "Request
@core-teamarchitectural review for changes to cluster*.c, replication.c, rdb.c, or aof.c".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rdb.c` around lines 1934 - 1938, You changed corruption-handling behavior by forcing deep_integrity_validation=1 in rdb.c (affects rdbLoadObject()); this requires an explicit architectural review—update the PR and code with an explicit request for `@core-team` review: add a short comment next to the deep_integrity_validation assignment noting the behavioral change and the need for core-team sign-off, and update the PR description to request `@core-team` review and summarize the rationale and potential impact on replication/load semantics so reviewers can evaluate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/rdb.c`:
- Around line 1900-1905: Update the function comment for
lpValidateIntegrityAndDups to reflect the current API (remove any mention of the
removed `deep` parameter) and explain why the validator exists; describe what
the function validates now (header integrity and full listpack scanning
behavior), the meaning of the `pairs` parameter (0 => unique elements as set, 1
=> odd elements unique as key/value map), the expected inputs (`unsigned char
*lp`, `size_t size`) and what the return value indicates, and add a short note
about when callers should use this check (purpose/context of the validation).
---
Nitpick comments:
In `@src/rdb.c`:
- Around line 1934-1938: You changed corruption-handling behavior by forcing
deep_integrity_validation=1 in rdb.c (affects rdbLoadObject()); this requires an
explicit architectural review—update the PR and code with an explicit request
for `@core-team` review: add a short comment next to the deep_integrity_validation
assignment noting the behavioral change and the need for core-team sign-off, and
update the PR description to request `@core-team` review and summarize the
rationale and potential impact on replication/load semantics so reviewers can
evaluate the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 269010ad-3664-4aaa-a7b3-396868bacf25
📒 Files selected for processing (9)
src/acl.csrc/listpack.csrc/listpack.hsrc/rdb.csrc/stream.hsrc/t_stream.csrc/unit/test_listpack.cpptests/integration/corrupt-dump-fuzzer.tcltests/unit/acl.tcl
🚧 Files skipped from review as they are similar to previous changes (2)
- src/acl.c
- tests/integration/corrupt-dump-fuzzer.tcl
Since validation is now always unconditional, the deep_integrity_validation variable (hardcoded to 1) and all its conditionals are dead code. Remove them and make the guarded operations unconditional. Addresses review feedback from zuiderkwast on PR valkey-io#3721. Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
7345a31 to
ce2f9ff
Compare
Since validation is now always unconditional, the deep_integrity_validation variable (hardcoded to 1) and all its conditionals are dead code. Remove them and make the guarded operations unconditional. Addresses review feedback from zuiderkwast on PR valkey-io#3721. Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
ce2f9ff to
08a953f
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #3721 +/- ##
============================================
+ Coverage 76.68% 76.71% +0.02%
============================================
Files 162 162
Lines 80706 80679 -27
============================================
- Hits 61892 61889 -3
+ Misses 18814 18790 -24
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/unit/test_listpack.cpp (1)
943-950: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd a corrupt-payload regression for the new always-deep validator.
This still exercises only the happy path, so it won't catch a regression where
lpValidateIntegrity()accepts a listpack with a valid header but a broken entry/backlen. A small negative test here would lock in the new contract.As per coding guidelines
src/unit/**/*.{c,cpp}:Data-structure and low-level logic changes should be covered by unit tests.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/unit/test_listpack.cpp` around lines 943 - 950, Add a negative regression case to TEST_F(ListpackTest, listpackLpValidateIntegrity) that mutates the created listpack (created via createList()) to simulate a broken entry or corrupt backlen (for example flip a byte in the payload or make the backlen inconsistent) and then call lpValidateIntegrity(lp, lpBytes(lp), lpValidation, &count) expecting a failure (return 0) before freeing with lpFree(lp); reference the existing test and functions lpValidateIntegrity, lpBytes, lpValidation, createList, and lpFree so the test asserts the validator rejects a listpack with a valid header but corrupted entry/backlen.src/config.c (1)
454-462:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep
CONFIG SET sanitize-dump-payloadas a deprecated no-op.This table is only used while parsing startup config. With
sanitize-dump-payloadno longer registered instatic_configs, runtimeCONFIG SET sanitize-dump-payload ...now falls throughlookupConfig()and returns "Unknown option", which is a compatibility break for existing automation. Please preserve a hidden/runtime no-op path as well.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/config.c` around lines 454 - 462, The startup parsing table deprecated_configs no longer preserves a runtime no-op for "sanitize-dump-payload", causing lookupConfig()/CONFIG SET to return "Unknown option"; restore the entry so CONFIG SET sanitize-dump-payload remains accepted as a deprecated no-op at runtime. Edit the deprecated_configs array to include the "sanitize-dump-payload" entry (matching the format used for other deprecated options), ensure static_configs can omit it but lookupConfig()/configSet path still recognizes it as deprecated and performs no-op behavior, and keep any related parsing/validation logic in config.c (referencing deprecated_configs, lookupConfig, static_configs, and the CONFIG SET handling) consistent with other deprecated entries.
🧹 Nitpick comments (2)
src/rdb.c (2)
1900-1913: ⚡ Quick winUpdate the
lpValidateIntegrityAndDups()contract comment.The doc block still describes the removed
deepargument, so it now documents behavior this function no longer supports.As per coding guidelines
src/**/*.{c,h}:Use comments for non-obvious behavior and rationale, not for restating codeandDocument why code exists, not just what it does; document all functions in C code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rdb.c` around lines 1900 - 1913, Update the function docblock for lpValidateIntegrityAndDups to remove the stale mention of the removed `deep` parameter and instead accurately document the current contract: that it validates the listpack header and entries via lpValidateIntegrity with _lpEntryValidation, that `pairs` controls duplicate-checking semantics (0 = all elements must be unique, 1 = odd elements must be unique as map keys), and that it uses an internal hashtable to track seen fields and returns validation status; include brief rationale for why duplicate checking is separated from basic integrity validation. Reference lpValidateIntegrityAndDups, the `pairs` parameter, lpValidateIntegrity, and _lpEntryValidation when editing the comment.
1905-1913: Please request@core-teamreview for thesesrc/rdb.cload-path changes.This PR changes RDB/RESTORE validation behavior in a file the repository explicitly flags for architectural review.
As per coding guidelines
src/{cluster*.c,replication.c,rdb.c,aof.c}:Request@core-teamarchitectural review for changes to cluster*.c, replication.c, rdb.c, or aof.c.Also applies to: 2321-2323, 2453-2558, 2616-2617, 2858-2870
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/rdb.c` around lines 1905 - 1913, This change touches core RDB load-path validation in src/rdb.c (notably the lpValidateIntegrityAndDups function and surrounding lpValidateIntegrity/_lpEntryValidation usage) and per project policy requires an explicit `@core-team` architectural review; update the PR by adding a clear request for `@core-team` review in the PR description and/or add a code comment at the top of the modified region referencing the policy and tagging `@core-team` so reviewers see this change to rdb.c, and ensure the other mentioned ranges (around lines for lpValidateIntegrityAndDups and the 2321-2323, 2453-2558, 2616-2617, 2858-2870 edits) are similarly annotated/requested for core-team review.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/config.c`:
- Around line 454-462: The startup parsing table deprecated_configs no longer
preserves a runtime no-op for "sanitize-dump-payload", causing
lookupConfig()/CONFIG SET to return "Unknown option"; restore the entry so
CONFIG SET sanitize-dump-payload remains accepted as a deprecated no-op at
runtime. Edit the deprecated_configs array to include the
"sanitize-dump-payload" entry (matching the format used for other deprecated
options), ensure static_configs can omit it but lookupConfig()/configSet path
still recognizes it as deprecated and performs no-op behavior, and keep any
related parsing/validation logic in config.c (referencing deprecated_configs,
lookupConfig, static_configs, and the CONFIG SET handling) consistent with other
deprecated entries.
In `@src/unit/test_listpack.cpp`:
- Around line 943-950: Add a negative regression case to TEST_F(ListpackTest,
listpackLpValidateIntegrity) that mutates the created listpack (created via
createList()) to simulate a broken entry or corrupt backlen (for example flip a
byte in the payload or make the backlen inconsistent) and then call
lpValidateIntegrity(lp, lpBytes(lp), lpValidation, &count) expecting a failure
(return 0) before freeing with lpFree(lp); reference the existing test and
functions lpValidateIntegrity, lpBytes, lpValidation, createList, and lpFree so
the test asserts the validator rejects a listpack with a valid header but
corrupted entry/backlen.
---
Nitpick comments:
In `@src/rdb.c`:
- Around line 1900-1913: Update the function docblock for
lpValidateIntegrityAndDups to remove the stale mention of the removed `deep`
parameter and instead accurately document the current contract: that it
validates the listpack header and entries via lpValidateIntegrity with
_lpEntryValidation, that `pairs` controls duplicate-checking semantics (0 = all
elements must be unique, 1 = odd elements must be unique as map keys), and that
it uses an internal hashtable to track seen fields and returns validation
status; include brief rationale for why duplicate checking is separated from
basic integrity validation. Reference lpValidateIntegrityAndDups, the `pairs`
parameter, lpValidateIntegrity, and _lpEntryValidation when editing the comment.
- Around line 1905-1913: This change touches core RDB load-path validation in
src/rdb.c (notably the lpValidateIntegrityAndDups function and surrounding
lpValidateIntegrity/_lpEntryValidation usage) and per project policy requires an
explicit `@core-team` architectural review; update the PR by adding a clear
request for `@core-team` review in the PR description and/or add a code comment at
the top of the modified region referencing the policy and tagging `@core-team` so
reviewers see this change to rdb.c, and ensure the other mentioned ranges
(around lines for lpValidateIntegrityAndDups and the 2321-2323, 2453-2558,
2616-2617, 2858-2870 edits) are similarly annotated/requested for core-team
review.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 712e1328-c295-49ba-bacc-d1c6cc48ab05
📒 Files selected for processing (18)
src/acl.csrc/config.csrc/fuzzer_command_generator.csrc/listpack.csrc/listpack.hsrc/rdb.csrc/server.hsrc/stream.hsrc/t_stream.csrc/unit/test_listpack.cppsrc/valkey-check-rdb.ctests/integration/corrupt-dump-fuzzer.tcltests/integration/corrupt-dump.tcltests/integration/rdb.tcltests/unit/acl.tcltests/unit/cluster/resharding.tcltests/unit/moduleapi/usercall.tclvalkey.conf
💤 Files with no reviewable changes (3)
- tests/unit/cluster/resharding.tcl
- tests/integration/rdb.tcl
- src/valkey-check-rdb.c
✅ Files skipped from review due to trivial changes (2)
- tests/unit/moduleapi/usercall.tcl
- valkey.conf
zuiderkwast
left a comment
There was a problem hiding this comment.
Looks good, thank you! Only a few non-blocking nits.
81ce181 to
d7879ae
Compare
d7879ae to
9fdb4e5
Compare
Remove lpAssertValidEntry calls from lpNext, lpPrev, lpFirst, lpFind, and lpDeleteRangeWithEntry to eliminate redundant per-operation listpack validation. Safety is maintained by making deep_integrity_validation unconditional in rdbLoadObject, ensuring all data structures (listpack, intset, stream) are fully validated on every load path (RDB and RESTORE). This effectively makes sanitize-dump-payload a no-op. Benchmark results (Graviton3, 200 clients, P=32, 8 threads): 256-field hash HGET: +4.7% throughput (907K -> 950K rps) 256-field hash HSET: +1.2% throughput (688K -> 696K rps) 100-field hash HSET: +2.4% throughput (688K -> 704K rps) p99 latency (256-field HSET): -16% (11.15ms -> 9.38ms) RDB load (1M keys): +1% overhead (acceptable) Lightweight EOF position assertions from PR valkey-io#2027 remain intact. Takes over: valkey-io#399 Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Since deep_integrity_validation is now unconditional, the sanitize-dump-payload config and skip-sanitize-payload ACL flag have no effect. This commit: - Moves sanitize-dump-payload to deprecated_configs[] in config.c - Removes SANITIZE_DUMP_* macros and server.sanitize_dump_payload field - Removes skip-sanitize-payload/sanitize-payload ACL flags entirely - Updates valkey.conf with deprecation notice - Deletes duplicate unsanitized test variants that are now redundant - Cleans up corrupt-dump-fuzzer.tcl and rdb.tcl test references Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
…cated ACL flags - Remove the 'deep' parameter from lpValidateIntegrity() since validation is now always full (deep). Update all callers in rdb.c, t_stream.c, and unit tests. - Remove 'deep' parameter from lpValidateIntegrityAndDups() and streamValidateListpackIntegrity() for the same reason. - Accept 'skip-sanitize-payload' and 'sanitize-payload' as no-op ACL flags for backwards compatibility with Valkey 9 ACL files. These flags are parsed but not emitted, so they get silently dropped on next ACL save. Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Address PR review feedback on the corrupt-dump-fuzzer integration test: - Skip the test entirely on FreeBSD instead of running a single short cycle. The previous workaround masked any signal the fuzzer might produce on that platform; honestly skipping it is a clearer signal than 'ran but found nothing'. - Track sanitizer (valgrind/asan) findings in a new stat_sanitizer_findings counter and assert it is zero at the end of the run. Previously, sanitizer hits were only logged and the server was restarted, so a real bug surfaced via ASan or Valgrind would not fail the test. Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Since validation is now always unconditional, the deep_integrity_validation variable (hardcoded to 1) and all its conditionals are dead code. Remove them and make the guarded operations unconditional. Addresses review feedback from zuiderkwast on PR valkey-io#3721. Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Remove the unused outer 'size_t bytes' declarations in lpNext() and lpFirst() that became dead code after lpAssertValidEntry was removed. Fixes build failure with -Werror (unused-variable warning) in CI. Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
9fdb4e5 to
f42aa10
Compare
Yes, but I don't know which docs are affected. ACL docs? |
|
Yes, we need to update ACL docs to mention that from next release those user flags will be no-op and same in config file |
…key-io#3721) Until now, deep santitization of listpack on load (RDB load, RESTORE) was configurable. That meant that we had to do validation on access and traversal of the listpack. This change: - Force sanitization of listpacks on load and import. - Deprecates the config `sanitize-dump-payload` and makes it a no-op. - Deprecates the ACL flags `skip-sanitize-payload` and `sanitize-payload`, ignoring them when loading old ACL files. - Removes validation asserts on access. Benefits: - Prevents deferred assertions, which would happen if unsanitized listpacks were loaded and later accessed, leading to server shutdown. - Improves read performance. Replaces: valkey-io#399 Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
As per valkey-io/valkey#3721 and valkey-io/valkey#3964, omit `alldbs` rules and drop mentions of `sanitize-payload` and `skip-sanitize-payload` user flags. Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Until now, deep santitization of listpack on load (RDB load, RESTORE) was configurable. That meant that we had to do validation on access and traversal of the listpack. This change:
sanitize-dump-payloadand makes it a no-op.skip-sanitize-payloadandsanitize-payload, ignoring them when loading old ACL files.Benefits:
Replaces: #399
Internal changes
Remove lpAssertValidEntry calls from lpNext, lpPrev, lpFirst, lpFind, and lpDeleteRangeWithEntry to eliminate redundant per-operation listpack validation. Safety is maintained by making deep_integrity_validation unconditional in rdbLoadObject, ensuring all data structures (listpack, intset, stream) are fully validated on every load path (RDB and RESTORE). This effectively makes sanitize-dump-payload a no-op.
Lightweight EOF position assertions from PR #2027 remain intact.
Benchmarks
Benchmark results (Graviton3, 200 clients, P=32, 8 threads):
Benchmark details
hash-max-listpack-entriesset to 512 to force listpack encoding for large hashesResults
256-Field Hash — HSET (updating existing field in 10K pre-populated hashes)
Delta: +1.2% (AFTER faster)
256-Field Hash — HGET (reading field from 10K pre-populated hashes)
Delta: +4.7% (AFTER faster)
100-Field Hash — HSET
Delta: +2.4% (AFTER faster)
100-Field Hash — HGET
Delta: ~0% (no significant difference)
Small Hash — HSET (default 3-field hashes, baseline)
Delta: +1.1% (AFTER faster)
RDB Load (1M keys: 500K hashes, 300K sets, 200K strings)
Delta: +1.0% slower (AFTER slightly slower due to integrity validation during load)
Note: The AFTER binary enables
deep_integrity_validationduring RDB load, which performs the equivalent oflpAssertValidEntrychecks only at load time.flamegraphs
Perf Recording (10M HSET operations on 256-field hashes)
The AFTER flamegraph shows the elimination of
lpAssertValidEntryfrom the hot path, which was previously called on everylpNext/lpPrev/lpSeekoperation during hash field lookups and updates.Notes
lpAssertValidEntrywas called at each step.deep_integrity_validationflag adds ~1% overhead during load, which should be acceptable since it only runs once at startup and provides the same safety guarantees thatlpAssertValidEntrypreviously provided on every operation.