Skip to content

Force listpack validation on load to prevent deferred assertions#3721

Merged
zuiderkwast merged 6 commits into
valkey-io:unstablefrom
jjuleslasarte:remove-lpAssertValidEntry
May 27, 2026
Merged

Force listpack validation on load to prevent deferred assertions#3721
zuiderkwast merged 6 commits into
valkey-io:unstablefrom
jjuleslasarte:remove-lpAssertValidEntry

Conversation

@jjuleslasarte

@jjuleslasarte jjuleslasarte commented May 14, 2026

Copy link
Copy Markdown
Contributor

Until now, deep santitization of listpack on load (RDB load, RESTORE) was configurable. That meant that we had to do validation on access and traversal of the listpack. This change:

  • Force sanitization of listpacks on load and import.
  • Deprecates the config sanitize-dump-payload and makes it a no-op.
  • Deprecates the ACL flags skip-sanitize-payload and sanitize-payload, ignoring them when loading old ACL files.
  • Removes validation asserts on access.

Benefits:

  • Prevents deferred assertions, which would happen if unsanitized listpacks were loaded and later accessed, leading to server shutdown.
  • Improves read performance.

Replaces: #399

Internal changes

Remove lpAssertValidEntry calls from lpNext, lpPrev, lpFirst, lpFind, and lpDeleteRangeWithEntry to eliminate redundant per-operation listpack validation. Safety is maintained by making deep_integrity_validation unconditional in rdbLoadObject, ensuring all data structures (listpack, intset, stream) are fully validated on every load path (RDB and RESTORE). This effectively makes sanitize-dump-payload a no-op.

Lightweight EOF position assertions from PR #2027 remain intact.

Benchmarks

Benchmark results (Graviton3, 200 clients, P=32, 8 threads):

  256-field hash HGET: +4.7% throughput (907K -> 950K rps)
  256-field hash HSET: +1.2% throughput (688K -> 696K rps)
  100-field hash HSET: +2.4% throughput (688K -> 704K rps)
  p99 latency (256-field HSET): -16% (11.15ms -> 9.38ms)
  RDB load (1M keys): +1% overhead 

Benchmark details

  • CPU: ARM Graviton3
  • Cores: 64
  • Memory: 247 GB
  • Each benchmark: 5M requests, 200 clients, pipeline 32, 8 threads, 3 runs per configuration
  • hash-max-listpack-entries set to 512 to force listpack encoding for large hashes
  • IO threads: 1 (default, single-threaded mode)

Results

256-Field Hash — HSET (updating existing field in 10K pre-populated hashes)

Server Run 1 (rps) Run 2 (rps) Run 3 (rps) Average (rps)
BEFORE (with lpAssertValidEntry) 688,137 687,947 687,663 687,916
AFTER (lpAssertValidEntry removed) 712,555 687,758 687,663 695,992

Delta: +1.2% (AFTER faster)

256-Field Hash — HGET (reading field from 10K pre-populated hashes)

Server Run 1 (rps) Run 2 (rps) Run 3 (rps) Average (rps)
BEFORE (with lpAssertValidEntry) 907,276 906,947 906,783 907,002
AFTER (lpAssertValidEntry removed) 949,848 950,390 950,029 950,089

Delta: +4.7% (AFTER faster)

100-Field Hash — HSET

Server Run 1 (rps) Run 2 (rps) Run 3 (rps) Average (rps)
BEFORE (with lpAssertValidEntry) 687,663 688,042 688,137 687,947
AFTER (lpAssertValidEntry removed) 687,474 712,454 712,860 704,263

Delta: +2.4% (AFTER faster)

100-Field Hash — HGET

Server Run 1 (rps) Run 2 (rps) Run 3 (rps) Average (rps)
BEFORE (with lpAssertValidEntry) 950,570 950,209 950,209 950,329
AFTER (lpAssertValidEntry removed) 950,390 950,209 949,668 950,089

Delta: ~0% (no significant difference)

Small Hash — HSET (default 3-field hashes, baseline)

Server Run 1 (rps) Run 2 (rps) Run 3 (rps) Average (rps)
BEFORE (with lpAssertValidEntry) 643,583 643,666 664,894 650,714
AFTER (lpAssertValidEntry removed) 643,583 664,982 665,336 657,967

Delta: +1.1% (AFTER faster)

RDB Load (1M keys: 500K hashes, 300K sets, 200K strings)

Server Run 1 (sec) Run 2 (sec) Run 3 (sec) Average (sec)
BEFORE (no deep validation) 0.483 0.486 0.492 0.487
AFTER (with deep_integrity_validation) 0.516 0.476 0.484 0.492

Delta: +1.0% slower (AFTER slightly slower due to integrity validation during load)

Note: The AFTER binary enables deep_integrity_validation during RDB load, which performs the equivalent of lpAssertValidEntry checks only at load time.

flamegraphs

Perf Recording (10M HSET operations on 256-field hashes)

Metric BEFORE AFTER
Throughput during recording 687,711 rps 712,606 rps
p50 latency 9.167 ms 8.879 ms
p99 latency 11.151 ms 9.375 ms
BEFORE AFTER
flamegraph-before flamegraph-after

The AFTER flamegraph shows the elimination of lpAssertValidEntry from the hot path, which was previously called on every lpNext/lpPrev/lpSeek operation during hash field lookups and updates.

Notes

  1. Read-heavy workloads (HGET) on large hashes see the biggest improvement: +4.7% on 256-field hashes. This is expected since HGET traverses the listpack to find the field, and lpAssertValidEntry was called at each step.
  2. Write workloads (HSET) show modest improvement: +1-2.4% across configurations. The write path has more overhead beyond traversal (memory allocation, reallocation), so the relative impact of removing the assertion is smaller.
  3. Small hashes show minimal difference: ~1% improvement. With only 3 fields, the listpack traversal is trivial and the assertion cost is amortized across other work
  4. RDB load is essentially unchanged: The deep_integrity_validation flag adds ~1% overhead during load, which should be acceptable since it only runs once at startup and provides the same safety guarantees that lpAssertValidEntry previously provided on every operation.
  5. Latency improvements are notable: The p99 latency dropped from 11.15ms to 9.38ms during the flamegraph recording (256-field HSET), a 16% reduction in tail latency.
  6. The change is a net positive: Runtime performance improves across all hash sizes, with the largest gains on the workloads that matter most (large listpack-encoded hashes). Data integrity is preserved through validation at RDB load time.

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Remove per-entry listpack assertions; make listpack integrity validation unconditional and deep at RDB/RESTORE/load time; remove sanitize-dump-payload config/ACL flags (deprecated no-ops); and update tests and fuzzers to expect RESTORE rejection for corrupt payloads and verify server liveness.

Changes

Validation Pipeline Refactor

Layer / File(s) Summary
Listpack traversal & public API changes
src/listpack.c, src/listpack.h, src/unit/test_listpack.cpp
Removed internal lpAssertValidEntry() and per-entry assertions from traversal/mutation paths; lpFind advances using lpSkip() and uses bounds assert(); lpValidateIntegrity API now always scans entries; unit tests and header updated.
Unconditional deep validation during RDB load
src/rdb.c, src/t_stream.c, src/stream.h, src/valkey-check-rdb.c
rdbLoadObject() and lpValidateIntegrityAndDups() drop the deep parameter and run deep listpack integrity validation unconditionally; quicklist/set/zset/hash/stream/intset/PEL checks and sanitization counters are now unconditional and call sites updated to new signatures.
Remove sanitize-dump-payload config and ACL flags
src/server.h, src/config.c, src/acl.c, src/fuzzer_command_generator.c, valkey.conf
Removed SANITIZE_DUMP_* macros and server.sanitize_dump_payload field; dropped sanitize-payload/skip-sanitize-payload from emitted ACL flags and defaults while accepting them as deprecated no-ops; moved config to deprecated list and updated enum handling and docs.
Consolidate fuzzers and update corrupt-dump tests
tests/integration/*, tests/integration/corrupt-dump-fuzzer.tcl, tests/unit/*, tests/integration/corrupt-dump.tcl
Consolidated fuzzer test run (removed per-run sanitize param), removed per-test sanitize-dump-payload toggles across many corrupt-dump cases, standardized RESTORE failure + liveness checks and integrity-log assertions, and added a dedicated empty-keys corrupted-RDB test; updated unit/module ACL expectations.

Sequence Diagram (RDB load validation):

sequenceDiagram
  participant rdbLoad as rdbLoadObject()
  participant LPdup as lpValidateIntegrityAndDups()
  participant Loader as Quicklist/Set/Zset/Hash/Stream loaders
  rdbLoad->>LPdup: always invoke deep validation (pairs)
  Loader->>LPdup: call with updated signature
  rdbLoad->>Loader: proceed with load after validation
Loading

Estimated code review effort:
🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers:

  • zuiderkwast
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: making listpack validation unconditional on load to eliminate runtime assertions.
Description check ✅ Passed The description thoroughly explains the change, its rationale, internal mechanics, performance benefits, benchmarks, and security implications related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

The main point for this change IMO is security.

Scenario:

  1. A user has access to RESTORE for legitimate reasons, to migrate data from some other node using DUMP/RESTORE.
  2. The client is compromized and the attacker sends a RESTORE with a corrupted listpack. It passes.
  3. Later, on access, the server runs lpAssertValidEntry and hits the assert.
  4. Result: segfault. Server is down. Denial of service.

Performance is a secondary thing. Thanks for the benchmarks!

We should also note that full sync and RESTORE get slower for users who had validation turned off before.

Please also:

  • Update config.c to move the config to the list of deprecated configs (so that we still accept it but it's a no-op).
  • Update valkey.conf the section about this config, possibly remove the section entirely or perhaps just keep one line saying that it's deprecated in Valkey 10 and has no effect.
  • Remove that ACL flag related to this, mentioned in the chat.

Comment thread tests/integration/corrupt-dump.tcl

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @zuiderkwast mentioned: its also important to deprecate config, mention that in valkey.conf and delete ACL user flag. Other than that looks good! Also we would need to update documentation to mention those changes

We should also note that full sync and RESTORE get slower for users who had validation turned off before.

I think we can skip this because of benchmark results: RDB load (1M keys): +1% overhead

Comment thread src/rdb.c Outdated
@dvkashapov dvkashapov added the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label May 15, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/corrupt-dump-fuzzer.tcl`:
- Around line 86-90: The FreeBSD branch still allows the fuzzer to run once by
setting min_cycles and min_duration to 1; change the FreeBSD handling (the
conditional that checks [exec uname] == "FreeBSD") so it skips execution
entirely instead of reducing to one run—e.g., exit or set a skip flag before the
fuzzer loop—or add an early return/abort when running on FreeBSD so the fuzzer
loop that uses min_cycles/min_duration is never invoked.
- Around line 183-191: When valgrind or ASan issues are detected by
find_valgrind_errors or sanitizer_errors_from_file the test only logs and
restarts; change the handling so the fuzzer run fails immediately for those
payloads by invoking the test failure path (e.g., call fail with a descriptive
message) in addition to setting report_and_restart and print_commands. Update
the block that checks valgrind_errors and asan_errors so that when either is
non-empty it calls fail (including printable_dump and the error details) to
ensure the test run is marked as failed rather than silently continuing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c84d6888-0ca4-47b0-960d-4f1024481c7f

📥 Commits

Reviewing files that changed from the base of the PR and between 5faaabb and 2e1cb98.

📒 Files selected for processing (11)
  • src/acl.c
  • src/config.c
  • src/fuzzer_command_generator.c
  • src/server.h
  • src/valkey-check-rdb.c
  • tests/integration/corrupt-dump-fuzzer.tcl
  • tests/integration/corrupt-dump.tcl
  • tests/integration/rdb.tcl
  • tests/unit/acl.tcl
  • tests/unit/moduleapi/usercall.tcl
  • valkey.conf
💤 Files with no reviewable changes (4)
  • src/valkey-check-rdb.c
  • src/fuzzer_command_generator.c
  • tests/integration/rdb.tcl
  • src/acl.c
✅ Files skipped from review due to trivial changes (1)
  • tests/unit/moduleapi/usercall.tcl

Comment thread tests/integration/corrupt-dump-fuzzer.tcl Outdated
Comment thread tests/integration/corrupt-dump-fuzzer.tcl
@jjuleslasarte

Copy link
Copy Markdown
Contributor Author

@dvkashapov @zuiderkwast once this is approved, should i raise the corresponding docs PR?

Comment thread src/acl.c
@zuiderkwast

Copy link
Copy Markdown
Contributor

The PR title is not very understandable for users, e.g. in release notes. Can you change it to describe behaviour instead of internal code? I mean that we sanitize on load, not on access and traversal.

I don't like to use the "semanic commit" title style. We don't use it normally so the few PRs that do this stick out.

Also can you mention anything about the security aspect in the title or description? That forcing sanitization at load time prevents invalid data from being loaded into memory and later hitting an assert.

@jjuleslasarte

Copy link
Copy Markdown
Contributor Author

The PR title is not very understandable for users, e.g. in release notes. Can you change it to describe behaviour instead of internal code? I mean that we sanitize on load, not on access and traversal.

I don't like to use the "semanic commit" title style. We don't use it normally so the few PRs that do this stick out.

Also can you mention anything about the security aspect in the title or description? That forcing sanitization at load time prevents invalid data from being loaded into memory and later hitting an assert.

ACK, will do. Yeah, the semantic commit style is my bad, i had an LLM do it for me and they are really prone to it.

@madolson madolson moved this to Todo in Valkey 10 May 18, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/rdb.c (1)

1900-1905: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the helper comment to match the new API.

Lines 1900-1904 still describe a removed deep parameter, so this now documents behavior the function no longer supports.

As per coding guidelines, "Document why code exists, not just what it does; document all functions in C code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rdb.c` around lines 1900 - 1905, Update the function comment for
lpValidateIntegrityAndDups to reflect the current API (remove any mention of the
removed `deep` parameter) and explain why the validator exists; describe what
the function validates now (header integrity and full listpack scanning
behavior), the meaning of the `pairs` parameter (0 => unique elements as set, 1
=> odd elements unique as key/value map), the expected inputs (`unsigned char
*lp`, `size_t size`) and what the return value indicates, and add a short note
about when callers should use this check (purpose/context of the validation).
🧹 Nitpick comments (1)
src/rdb.c (1)

1934-1938: Request @core-team review for this validation-policy change.

This moves corruption handling behavior in rdbLoadObject(), which is exactly the kind of rdb.c change that should get explicit architectural sign-off before merge.

As per coding guidelines, "Request @core-team architectural review for changes to cluster*.c, replication.c, rdb.c, or aof.c".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rdb.c` around lines 1934 - 1938, You changed corruption-handling behavior
by forcing deep_integrity_validation=1 in rdb.c (affects rdbLoadObject()); this
requires an explicit architectural review—update the PR and code with an
explicit request for `@core-team` review: add a short comment next to the
deep_integrity_validation assignment noting the behavioral change and the need
for core-team sign-off, and update the PR description to request `@core-team`
review and summarize the rationale and potential impact on replication/load
semantics so reviewers can evaluate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/rdb.c`:
- Around line 1900-1905: Update the function comment for
lpValidateIntegrityAndDups to reflect the current API (remove any mention of the
removed `deep` parameter) and explain why the validator exists; describe what
the function validates now (header integrity and full listpack scanning
behavior), the meaning of the `pairs` parameter (0 => unique elements as set, 1
=> odd elements unique as key/value map), the expected inputs (`unsigned char
*lp`, `size_t size`) and what the return value indicates, and add a short note
about when callers should use this check (purpose/context of the validation).

---

Nitpick comments:
In `@src/rdb.c`:
- Around line 1934-1938: You changed corruption-handling behavior by forcing
deep_integrity_validation=1 in rdb.c (affects rdbLoadObject()); this requires an
explicit architectural review—update the PR and code with an explicit request
for `@core-team` review: add a short comment next to the deep_integrity_validation
assignment noting the behavioral change and the need for core-team sign-off, and
update the PR description to request `@core-team` review and summarize the
rationale and potential impact on replication/load semantics so reviewers can
evaluate the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 269010ad-3664-4aaa-a7b3-396868bacf25

📥 Commits

Reviewing files that changed from the base of the PR and between 2e1cb98 and 7345a31.

📒 Files selected for processing (9)
  • src/acl.c
  • src/listpack.c
  • src/listpack.h
  • src/rdb.c
  • src/stream.h
  • src/t_stream.c
  • src/unit/test_listpack.cpp
  • tests/integration/corrupt-dump-fuzzer.tcl
  • tests/unit/acl.tcl
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/acl.c
  • tests/integration/corrupt-dump-fuzzer.tcl

Comment thread src/rdb.c Outdated
@zuiderkwast zuiderkwast changed the title perf(listpack): Remove lpAssertValidEntry from traversal functions Validate listpacks on load instead of on access to prevent deferred assertions May 22, 2026
@zuiderkwast zuiderkwast changed the title Validate listpacks on load instead of on access to prevent deferred assertions Force listpack validation on load to prevent deferred assertions May 22, 2026
jjuleslasarte added a commit to jjuleslasarte/valkey that referenced this pull request May 26, 2026
Since validation is now always unconditional, the deep_integrity_validation
variable (hardcoded to 1) and all its conditionals are dead code. Remove
them and make the guarded operations unconditional.

Addresses review feedback from zuiderkwast on PR valkey-io#3721.

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
@jjuleslasarte jjuleslasarte force-pushed the remove-lpAssertValidEntry branch from 7345a31 to ce2f9ff Compare May 26, 2026 17:27
jjuleslasarte added a commit to jjuleslasarte/valkey that referenced this pull request May 26, 2026
Since validation is now always unconditional, the deep_integrity_validation
variable (hardcoded to 1) and all its conditionals are dead code. Remove
them and make the guarded operations unconditional.

Addresses review feedback from zuiderkwast on PR valkey-io#3721.

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
@jjuleslasarte jjuleslasarte force-pushed the remove-lpAssertValidEntry branch from ce2f9ff to 08a953f Compare May 26, 2026 18:04
@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.71%. Comparing base (93e84bd) to head (9fdb4e5).
⚠️ Report is 2 commits behind head on unstable.

Files with missing lines Patch % Lines
src/unit/test_listpack.cpp 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3721      +/-   ##
============================================
+ Coverage     76.68%   76.71%   +0.02%     
============================================
  Files           162      162              
  Lines         80706    80679      -27     
============================================
- Hits          61892    61889       -3     
+ Misses        18814    18790      -24     
Files with missing lines Coverage Δ
src/acl.c 92.67% <100.00%> (+0.08%) ⬆️
src/config.c 78.46% <ø> (+0.36%) ⬆️
src/fuzzer_command_generator.c 77.15% <ø> (+0.23%) ⬆️
src/listpack.c 92.86% <ø> (-0.27%) ⬇️
src/rdb.c 76.94% <100.00%> (-0.22%) ⬇️
src/server.h 100.00% <ø> (ø)
src/t_stream.c 94.45% <100.00%> (-0.01%) ⬇️
src/valkey-check-rdb.c 70.82% <ø> (-0.06%) ⬇️
src/unit/test_listpack.cpp 79.43% <66.66%> (ø)

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jjuleslasarte jjuleslasarte requested a review from zuiderkwast May 26, 2026 19:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/unit/test_listpack.cpp (1)

943-950: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a corrupt-payload regression for the new always-deep validator.

This still exercises only the happy path, so it won't catch a regression where lpValidateIntegrity() accepts a listpack with a valid header but a broken entry/backlen. A small negative test here would lock in the new contract.

As per coding guidelines src/unit/**/*.{c,cpp}: Data-structure and low-level logic changes should be covered by unit tests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/unit/test_listpack.cpp` around lines 943 - 950, Add a negative regression
case to TEST_F(ListpackTest, listpackLpValidateIntegrity) that mutates the
created listpack (created via createList()) to simulate a broken entry or
corrupt backlen (for example flip a byte in the payload or make the backlen
inconsistent) and then call lpValidateIntegrity(lp, lpBytes(lp), lpValidation,
&count) expecting a failure (return 0) before freeing with lpFree(lp); reference
the existing test and functions lpValidateIntegrity, lpBytes, lpValidation,
createList, and lpFree so the test asserts the validator rejects a listpack with
a valid header but corrupted entry/backlen.
src/config.c (1)

454-462: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep CONFIG SET sanitize-dump-payload as a deprecated no-op.

This table is only used while parsing startup config. With sanitize-dump-payload no longer registered in static_configs, runtime CONFIG SET sanitize-dump-payload ... now falls through lookupConfig() and returns "Unknown option", which is a compatibility break for existing automation. Please preserve a hidden/runtime no-op path as well.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/config.c` around lines 454 - 462, The startup parsing table
deprecated_configs no longer preserves a runtime no-op for
"sanitize-dump-payload", causing lookupConfig()/CONFIG SET to return "Unknown
option"; restore the entry so CONFIG SET sanitize-dump-payload remains accepted
as a deprecated no-op at runtime. Edit the deprecated_configs array to include
the "sanitize-dump-payload" entry (matching the format used for other deprecated
options), ensure static_configs can omit it but lookupConfig()/configSet path
still recognizes it as deprecated and performs no-op behavior, and keep any
related parsing/validation logic in config.c (referencing deprecated_configs,
lookupConfig, static_configs, and the CONFIG SET handling) consistent with other
deprecated entries.
🧹 Nitpick comments (2)
src/rdb.c (2)

1900-1913: ⚡ Quick win

Update the lpValidateIntegrityAndDups() contract comment.

The doc block still describes the removed deep argument, so it now documents behavior this function no longer supports.

As per coding guidelines src/**/*.{c,h}: Use comments for non-obvious behavior and rationale, not for restating code and Document why code exists, not just what it does; document all functions in C code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rdb.c` around lines 1900 - 1913, Update the function docblock for
lpValidateIntegrityAndDups to remove the stale mention of the removed `deep`
parameter and instead accurately document the current contract: that it
validates the listpack header and entries via lpValidateIntegrity with
_lpEntryValidation, that `pairs` controls duplicate-checking semantics (0 = all
elements must be unique, 1 = odd elements must be unique as map keys), and that
it uses an internal hashtable to track seen fields and returns validation
status; include brief rationale for why duplicate checking is separated from
basic integrity validation. Reference lpValidateIntegrityAndDups, the `pairs`
parameter, lpValidateIntegrity, and _lpEntryValidation when editing the comment.

1905-1913: Please request @core-team review for these src/rdb.c load-path changes.

This PR changes RDB/RESTORE validation behavior in a file the repository explicitly flags for architectural review.

As per coding guidelines src/{cluster*.c,replication.c,rdb.c,aof.c}: Request @core-team architectural review for changes to cluster*.c, replication.c, rdb.c, or aof.c.

Also applies to: 2321-2323, 2453-2558, 2616-2617, 2858-2870

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rdb.c` around lines 1905 - 1913, This change touches core RDB load-path
validation in src/rdb.c (notably the lpValidateIntegrityAndDups function and
surrounding lpValidateIntegrity/_lpEntryValidation usage) and per project policy
requires an explicit `@core-team` architectural review; update the PR by adding a
clear request for `@core-team` review in the PR description and/or add a code
comment at the top of the modified region referencing the policy and tagging
`@core-team` so reviewers see this change to rdb.c, and ensure the other mentioned
ranges (around lines for lpValidateIntegrityAndDups and the 2321-2323,
2453-2558, 2616-2617, 2858-2870 edits) are similarly annotated/requested for
core-team review.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/config.c`:
- Around line 454-462: The startup parsing table deprecated_configs no longer
preserves a runtime no-op for "sanitize-dump-payload", causing
lookupConfig()/CONFIG SET to return "Unknown option"; restore the entry so
CONFIG SET sanitize-dump-payload remains accepted as a deprecated no-op at
runtime. Edit the deprecated_configs array to include the
"sanitize-dump-payload" entry (matching the format used for other deprecated
options), ensure static_configs can omit it but lookupConfig()/configSet path
still recognizes it as deprecated and performs no-op behavior, and keep any
related parsing/validation logic in config.c (referencing deprecated_configs,
lookupConfig, static_configs, and the CONFIG SET handling) consistent with other
deprecated entries.

In `@src/unit/test_listpack.cpp`:
- Around line 943-950: Add a negative regression case to TEST_F(ListpackTest,
listpackLpValidateIntegrity) that mutates the created listpack (created via
createList()) to simulate a broken entry or corrupt backlen (for example flip a
byte in the payload or make the backlen inconsistent) and then call
lpValidateIntegrity(lp, lpBytes(lp), lpValidation, &count) expecting a failure
(return 0) before freeing with lpFree(lp); reference the existing test and
functions lpValidateIntegrity, lpBytes, lpValidation, createList, and lpFree so
the test asserts the validator rejects a listpack with a valid header but
corrupted entry/backlen.

---

Nitpick comments:
In `@src/rdb.c`:
- Around line 1900-1913: Update the function docblock for
lpValidateIntegrityAndDups to remove the stale mention of the removed `deep`
parameter and instead accurately document the current contract: that it
validates the listpack header and entries via lpValidateIntegrity with
_lpEntryValidation, that `pairs` controls duplicate-checking semantics (0 = all
elements must be unique, 1 = odd elements must be unique as map keys), and that
it uses an internal hashtable to track seen fields and returns validation
status; include brief rationale for why duplicate checking is separated from
basic integrity validation. Reference lpValidateIntegrityAndDups, the `pairs`
parameter, lpValidateIntegrity, and _lpEntryValidation when editing the comment.
- Around line 1905-1913: This change touches core RDB load-path validation in
src/rdb.c (notably the lpValidateIntegrityAndDups function and surrounding
lpValidateIntegrity/_lpEntryValidation usage) and per project policy requires an
explicit `@core-team` architectural review; update the PR by adding a clear
request for `@core-team` review in the PR description and/or add a code comment at
the top of the modified region referencing the policy and tagging `@core-team` so
reviewers see this change to rdb.c, and ensure the other mentioned ranges
(around lines for lpValidateIntegrityAndDups and the 2321-2323, 2453-2558,
2616-2617, 2858-2870 edits) are similarly annotated/requested for core-team
review.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 712e1328-c295-49ba-bacc-d1c6cc48ab05

📥 Commits

Reviewing files that changed from the base of the PR and between 7345a31 and 2d342fe.

📒 Files selected for processing (18)
  • src/acl.c
  • src/config.c
  • src/fuzzer_command_generator.c
  • src/listpack.c
  • src/listpack.h
  • src/rdb.c
  • src/server.h
  • src/stream.h
  • src/t_stream.c
  • src/unit/test_listpack.cpp
  • src/valkey-check-rdb.c
  • tests/integration/corrupt-dump-fuzzer.tcl
  • tests/integration/corrupt-dump.tcl
  • tests/integration/rdb.tcl
  • tests/unit/acl.tcl
  • tests/unit/cluster/resharding.tcl
  • tests/unit/moduleapi/usercall.tcl
  • valkey.conf
💤 Files with no reviewable changes (3)
  • tests/unit/cluster/resharding.tcl
  • tests/integration/rdb.tcl
  • src/valkey-check-rdb.c
✅ Files skipped from review due to trivial changes (2)
  • tests/unit/moduleapi/usercall.tcl
  • valkey.conf

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you! Only a few non-blocking nits.

Comment thread src/listpack.c Outdated
Comment thread src/listpack.c Outdated
Comment thread src/listpack.c Outdated
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label May 27, 2026
@zuiderkwast zuiderkwast moved this from Todo to Needs Review in Valkey 10 May 27, 2026
@jjuleslasarte jjuleslasarte force-pushed the remove-lpAssertValidEntry branch from 81ce181 to d7879ae Compare May 27, 2026 15:45
@jjuleslasarte jjuleslasarte force-pushed the remove-lpAssertValidEntry branch from d7879ae to 9fdb4e5 Compare May 27, 2026 15:46
Remove lpAssertValidEntry calls from lpNext, lpPrev, lpFirst, lpFind,
and lpDeleteRangeWithEntry to eliminate redundant per-operation listpack
validation. Safety is maintained by making deep_integrity_validation
unconditional in rdbLoadObject, ensuring all data structures (listpack,
intset, stream) are fully validated on every load path (RDB and RESTORE).
This effectively makes sanitize-dump-payload a no-op.

Benchmark results (Graviton3, 200 clients, P=32, 8 threads):

  256-field hash HGET: +4.7% throughput (907K -> 950K rps)
  256-field hash HSET: +1.2% throughput (688K -> 696K rps)
  100-field hash HSET: +2.4% throughput (688K -> 704K rps)
  p99 latency (256-field HSET): -16% (11.15ms -> 9.38ms)
  RDB load (1M keys): +1% overhead (acceptable)

Lightweight EOF position assertions from PR valkey-io#2027 remain intact.

Takes over: valkey-io#399

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Since deep_integrity_validation is now unconditional, the
sanitize-dump-payload config and skip-sanitize-payload ACL flag
have no effect. This commit:

- Moves sanitize-dump-payload to deprecated_configs[] in config.c
- Removes SANITIZE_DUMP_* macros and server.sanitize_dump_payload field
- Removes skip-sanitize-payload/sanitize-payload ACL flags entirely
- Updates valkey.conf with deprecation notice
- Deletes duplicate unsanitized test variants that are now redundant
- Cleans up corrupt-dump-fuzzer.tcl and rdb.tcl test references

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
…cated ACL flags

- Remove the 'deep' parameter from lpValidateIntegrity() since
  validation is now always full (deep). Update all callers in rdb.c,
  t_stream.c, and unit tests.
- Remove 'deep' parameter from lpValidateIntegrityAndDups() and
  streamValidateListpackIntegrity() for the same reason.
- Accept 'skip-sanitize-payload' and 'sanitize-payload' as no-op
  ACL flags for backwards compatibility with Valkey 9 ACL files.
  These flags are parsed but not emitted, so they get silently
  dropped on next ACL save.

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Address PR review feedback on the corrupt-dump-fuzzer integration test:

- Skip the test entirely on FreeBSD instead of running a single
  short cycle. The previous workaround masked any signal the
  fuzzer might produce on that platform; honestly skipping it is
  a clearer signal than 'ran but found nothing'.

- Track sanitizer (valgrind/asan) findings in a new
  stat_sanitizer_findings counter and assert it is zero at the
  end of the run. Previously, sanitizer hits were only logged and
  the server was restarted, so a real bug surfaced via ASan or
  Valgrind would not fail the test.

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Since validation is now always unconditional, the deep_integrity_validation
variable (hardcoded to 1) and all its conditionals are dead code. Remove
them and make the guarded operations unconditional.

Addresses review feedback from zuiderkwast on PR valkey-io#3721.

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Remove the unused outer 'size_t bytes' declarations in lpNext() and
lpFirst() that became dead code after lpAssertValidEntry was removed.
Fixes build failure with -Werror (unused-variable warning) in CI.

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
@jjuleslasarte jjuleslasarte force-pushed the remove-lpAssertValidEntry branch from 9fdb4e5 to f42aa10 Compare May 27, 2026 16:17
@zuiderkwast zuiderkwast merged commit e4fdae4 into valkey-io:unstable May 27, 2026
60 of 61 checks passed
@zuiderkwast

Copy link
Copy Markdown
Contributor

@dvkashapov @zuiderkwast once this is approved, should i raise the corresponding docs PR?

Yes, but I don't know which docs are affected. ACL docs?

@dvkashapov

Copy link
Copy Markdown
Member

Yes, we need to update ACL docs to mention that from next release those user flags will be no-op and same in config file

2030XiaoGe added a commit to 2030XiaoGe/valkey that referenced this pull request May 28, 2026
…key-io#3721)

Until now, deep santitization of listpack on load (RDB load, RESTORE)
was configurable. That meant that we had to do validation on access and
traversal of the listpack. This change:

- Force sanitization of listpacks on load and import.
- Deprecates the config `sanitize-dump-payload` and makes it a no-op.
- Deprecates the ACL flags `skip-sanitize-payload` and
`sanitize-payload`, ignoring them when loading old ACL files.
- Removes validation asserts on access.

Benefits:

- Prevents deferred assertions, which would happen if unsanitized
  listpacks were loaded and later accessed, leading to server
  shutdown.
- Improves read performance.

Replaces: valkey-io#399

Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
@coderabbitai coderabbitai Bot mentioned this pull request Jun 16, 2026
zuiderkwast pushed a commit to valkey-io/valkey-doc that referenced this pull request Jun 18, 2026
As per valkey-io/valkey#3721 and
valkey-io/valkey#3964, omit `alldbs` rules and
drop mentions of `sanitize-payload` and `skip-sanitize-payload` user
flags.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. release-notes This issue should get a line item in the release notes

Projects

Status: Needs Review

Development

Successfully merging this pull request may close these issues.

4 participants