hashtable: scan no-duplicate guarantee and add HasPassedKey API by rainsupreme · Pull Request #3803 · valkey-io/valkey

rainsupreme · 2026-05-21T19:37:51Z

Description:

When shrinking is blocked for the duration of a scan (via hashtablePauseAutoShrink), the hashtable scan guarantees no entry is emitted more than once. Expansion and rehashing may occur freely — only a shrink is problematic because it introduces a smaller table whose bucket boundaries might not align with the existing cursor position. This was already true but undocumented.

New API: hashtableScanHasPassedKey(ht, key, cursor)

Given a scan cursor, determines whether a key's hashtable position has already been visited. This enables callers to definitively answer: "would this key have been returned already?" — useful for background iteration that needs to track which keys have been seen without maintaining a separate visited-set. The key does not need to exist in the table (position is computed from its hash). The result is exact when shrinking is blocked; best-effort otherwise.

Motivated by #2878 (forkless save, to be used by bgiterator).

Tests: 10 unit tests including fuzz tests with random seeds, a test proving expansion + full rehash completion mid-scan produces no duplicates, and a test verifying shrink is correctly deferred during scan callbacks.

Document the no-duplicate guarantee for scan when rehashing is not active: entries are never emitted more than once, entries present throughout the scan are guaranteed to be returned, and entries created/destroyed during scan are never returned twice. Add hashtableScanHasPassedKey(ht, key, cursor) which determines whether a key's bucket has already been visited by the scan cursor. Uses reverse-bit-increment ordering to compare bucket positions. The key does not need to exist in the table. When rehashing is active, the result is best-effort. Add comprehensive unit tests: - Full scan no-duplicates on static table - No-duplicates with deletions mid-scan - No-duplicates with insertions mid-scan - Fuzz test with random seed, 50 iterations, random ops - HasPassedKey correctness verification against actual emission - HasPassedKey with cursor=0 (scan complete) - HasPassedKey for non-existent keys - HasPassedKey fuzz with random seed - HasPassedKey graceful behavior during active rehash Signed-off-by: Rain Valentine <rainval@amazon.com>

Remove redundant '(or not active)' parenthetical -- if rehashing isn't active there's nothing to pause, so 'when rehashing is paused' covers both cases from the caller's perspective. Signed-off-by: Rain Valentine <rainval@amazon.com>

Consistent with all other hashtable predicate functions. Signed-off-by: Rain Valentine <rainval@amazon.com>

The 'doesn't crash' check adds nothing beyond what the fuzz tests already cover. Signed-off-by: Rain Valentine <rainval@amazon.com>

Signed-off-by: Rain Valentine <rainval@amazon.com>

Avoid exposing internal bucket concept in the public API documentation. Signed-off-by: Rain Valentine <rainval@amazon.com>

coderabbitai · 2026-05-21T19:37:59Z

📝 Walkthrough

Walkthrough

Adds hashtableScanHasPassedKey(), tightens hashtableScan guarantees for scans where shrinking is blocked, implements the helper, and adds unit tests validating scan uniqueness, cursor semantics, mutation interactions, and fuzz scenarios.

Changes

Hashtable scan cursor API

Layer / File(s)	Summary
Scan API declaration and scan guarantees `src/hashtable.h`, `src/hashtable.c`	Public declaration of `hashtableScanHasPassedKey()` and extended `hashtableScan()` documentation covering paused-rehash scans (at-most-once emission; stable-entry coverage guarantees).
hashtableScanHasPassedKey implementation `src/hashtable.c`	Implements `hashtableScanHasPassedKey()` where `cursor==0` is "not started"; computes key bucket with the main table mask and compares using reverse-bit ordering; documents exact vs best-effort depending on shrinking.
Test headers and scan accumulator `src/unit/test_hashtable.cpp`	Adds `<ctime>` and `<set>`. Introduces `ScanContractData` and `scanContractFn` using `std::set` to record scanned values and detect duplicates.
Scan no-duplicate contract tests `src/unit/test_hashtable.cpp`	Tests validating no-duplicate emissions: static scan, mid-scan deletions, mid-scan insertions, and randomized mutation fuzz; all run with auto-shrink paused.
hashtableScanHasPassedKey correctness and fuzz tests `src/unit/test_hashtable.cpp`	Per-cursor correctness tests, cursor==0 semantics, deleted-key checks using captured cursor, and fuzz cross-checks against observed emitted keys while scanning under mutations.
Callback shrink/expansion interaction tests `src/unit/test_hashtable.cpp`	Verifies auto-shrink does not start during a paused scan callback and asserts no duplicates when expansion/rehash starts and completes during an in-progress scan.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers:

ranshid
yzc-yzc
zuiderkwast

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately captures the two main changes: adding a no-duplicate scan guarantee and the new HasPassedKey API.
Description check	✅ Passed	The PR description clearly explains the purpose: documenting a no-duplicate scan guarantee and introducing the new hashtableScanHasPassedKey API with motivation and use cases.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-21T20:05:24Z

Codecov Report

❌ Patch coverage is 99.13793% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.74%. Comparing base (0321a69) to head (d91db1e).
⚠️ Report is 14 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/unit/test_hashtable.cpp	99.11%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3803      +/-   ##
============================================
+ Coverage     76.62%   76.74%   +0.12%     
============================================
  Files           162      162              
  Lines         80674    80939     +265     
============================================
+ Hits          61813    62115     +302     
+ Misses        18861    18824      -37

Files with missing lines	Coverage Δ
src/hashtable.c	`97.89% <100.00%> (+0.01%)`	⬆️
src/unit/test_hashtable.cpp	`96.28% <99.11%> (+0.82%)`	⬆️

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Rain Valentine <rainval@amazon.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/hashtable.c (1)

2048-2048: ⚡ Quick win

Use a boolean literal in the bool return path

Line 2048 returns 1 from a bool API. Prefer true for consistency with the C boolean convention in this repo.

Proposed patch

 bool hashtableScanHasPassedKey(hashtable *ht, const void *key, size_t cursor) {
-    if (cursor == 0) return 1;
+    if (cursor == 0) return true;
     size_t mask = expToMask(ht->bucket_exp[0]);

As per coding guidelines, "Use the boolean type for true/false values in C code".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/hashtable.c` at line 2048, The function currently returns the integer
literal 1 for a bool API; replace that integer literal with the boolean literal
true so the return path uses the bool type consistently (change the "return 1;"
after the cursor check to "return true;"), referencing the same cursor variable
and return statement in this function.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/unit/test_hashtable.cpp`:
- Around line 1418-1420: The test setup loops call hashtableAdd(ht, (void *)j)
but ignore its return value, so failed inserts can silently break test
preconditions; update each setup loop (where hashtableAdd is used with ht and
(void*)j) to check the return and fail the test on error (e.g., replace the bare
call with an assertion that hashtableAdd returned success such as
ASSERT_TRUE/EXPECT_TRUE or equivalent test-framework failure) so any insertion
failure aborts the test and makes the precondition explicit.
- Around line 1413-1434: Wrap each test's global state changes and heap
allocations in RAII-style guards so cleanup always runs even if ASSERT_* aborts:
create a small scoped guard that calls
hashtableSetResizePolicy(HASHTABLE_RESIZE_FORBID) in its constructor and
restores hashtableSetResizePolicy(HASHTABLE_RESIZE_ALLOW) in its destructor, and
use another scoped guard that takes the hashtable* (ht) and calls
hashtableRelease(ht) in its destructor; replace the manual tail reset/release in
the test blocks (which call hashtableSetResizePolicy and hashtableRelease) with
these scope guards around the code in functions that include
hashtableSetResizePolicy, hashtableCreate, and hashtableRelease (e.g., the test
around hashtableSetResizePolicy/HASHTABLE_RESIZE_FORBID, hashtableCreate,
hashtableRelease/hashtableScan), and apply the same pattern to the other test
ranges noted (1438-1464, 1468-1497, 1505-1548, 1555-1593, 1614-1635, 1643-1687).

---

Nitpick comments:
In `@src/hashtable.c`:
- Line 2048: The function currently returns the integer literal 1 for a bool
API; replace that integer literal with the boolean literal true so the return
path uses the bool type consistently (change the "return 1;" after the cursor
check to "return true;"), referencing the same cursor variable and return
statement in this function.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 35e763d5-9662-410c-9320-89f0a94d1fc5

📥 Commits

Reviewing files that changed from the base of the PR and between d778b37 and d19a1f9.

📒 Files selected for processing (3)

src/hashtable.c
src/hashtable.h
src/unit/test_hashtable.cpp

Signed-off-by: Rain Valentine <rainval@amazon.com>

nmvk · 2026-05-21T23:06:56Z

+ * - An entry that is created, destroyed, or re-created during the scan may or
+ *   may not be returned, but will never be returned more than once.


Is it possible that key that is deleted and other key taking its spot can return more than once upon re-creation https://github.com/valkey-io/valkey/blob/unstable/src/hashtable.c#L1080-L1082 as we find lowest unused position?

No. 🙂 As long as rehashing was paused before the scan started and remains paused until the scan finishes, this cannot occur.

When rehashing incrementally the hashtable typically doubles or halves in size so 2 indices in one table correspond to 1 index in the other table, or vice versa. Scan will use the index increment of the table with larger buckets, so it would scan 2 indices of the larger table and 1 index of the smaller table in one scan batch, then stop. Removing an item and adding a new one with the same key can only move it between that group of 3 indices between the two tables, so there is no situation where a key can jump over the cursor position. (This is because our hashtable resolves collisions with chaining, not open indexing.)

The reason we have to pause rehashing is that when we're not rehashing it goes one table index at a time. If rehashing begins between scan calls then the cursor might be left between the two indices of the larger table - i.e. in the middle of one of these groups of three, in which case when we continue scanning we can't be strictly sure which in that group have been covered and which haven't. (We don't actually have to pause rehashing - the key thing to avoid is to not start a rehash! But "pause rehashing" is a much simpler contract to work with.)

Ah, somehow missed you were talking about filling in holes in the bucket chain. That isn't an issue because scan goes index-by-index, i.e. one bucket chain at a time. The code you're looking at does mean that an item could "move" to a different position within a bucket chain, but it will still be at the same table index. (or in the same group of 3 if a rehash is in progress, as I mentioned before.)

Thanks @rainsupreme explanation makes sense. I was thinking about movement within the same bucket chain from the generic hashtable API perspective, but I agree that this is not how the current code paths use the scan callback, and your explanation covers the table index/cursor guarantee being documented here.

Cursor 0 is ambiguous (start or end of scan), but the caller always knows which state they're in. Returning false ('not started') is the safer default -- the caller handles 'scan complete' themselves. Signed-off-by: Rain Valentine <rainval@amazon.com>

- Extract populateSequential helper (DRY) - Remove redundant hashtableExpand calls (no-op with FORBID) - Remove redundant ASSERT_FALSE(hashtableIsRehashing) assertions - Simplify deletion pattern to random per-step - Fix fuzz delete to target full range including new inserts - Remove manual resize policy restore (SetUp handles it) Signed-off-by: Rain Valentine <rainval@amazon.com>

The no-duplicate guarantee requires that no shrink is initiated during the scan. This is already ensured by hashtablePauseRehashing (called internally by scan) which also pauses auto-shrink. Expansion and rehashing progress are safe: the scan cursor step size is based on the smallest table and remains valid when a larger table is introduced. Only a shrink is problematic because it introduces a smaller table whose bucket boundaries don't align with the existing cursor. Add a test verifying that shrink is deferred during a scan callback even when the fill factor drops below the threshold. Signed-off-by: Rain Valentine <rainval@amazon.com>

Tests now match the documented contract precisely: block shrinking (not all resizing) for the duration of the scan. Expansion and rehashing are allowed to occur naturally during the tests. Signed-off-by: Rain Valentine <rainval@amazon.com>

Set a good example for anyone using these tests as reference for the pause/resume pattern. Signed-off-by: Rain Valentine <rainval@amazon.com>

- Add completeRehashing helper; ensure table is static before scan starts - Move PauseAutoShrink/ResumeAutoShrink to tightly scope the scan loop - Replace scan_has_passed_key_nonexistent with scan_has_passed_key_deleted: scan partway, pick one emitted and one non-emitted key, delete both, verify HasPassedKey still reflects their original positions - Fix expand test: single initial scan step (no dead cursor==0 branch), use larger table (5000) to avoid premature scan completion, drive rehash to completion during scan and assert it finished Signed-off-by: Rain Valentine <rainval@amazon.com>

Simplify the doc to one clear statement: exact when shrinking is blocked, best-effort if a shrink has been initiated since the cursor was obtained. Removes the confusing 'rehashing active' vs 'shrinking blocked' apparent contradiction. Also use EXPECT_TRUE in populateSequential helper (correct gtest pattern for non-test functions). Signed-off-by: Rain Valentine <rainval@amazon.com>

Assert that rehashing finishes while the scan loop is still in progress, proving the full expand lifecycle (start, progress, complete) is safe mid-scan. Signed-off-by: Rain Valentine <rainval@amazon.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/unit/test_hashtable.cpp`:
- Around line 1419-1421: The while loops that wait for rehash (using
hashtableIsRehashing and hashtableRehashMicroseconds) are unbounded and must be
guarded: introduce a fail-fast counter/timeout (e.g. max_iters or remaining_us)
initialized before the loop, decrement it on each iteration or subtract the
waited microseconds, and assert/fail with a clear message if the counter reaches
zero so the test fails fast and shows the rehash did not complete; apply this
change to the loop using hashtableIsRehashing/hashtableRehashMicroseconds at
both locations referenced (around lines ~1419 and ~1738) so the test cannot hang
indefinitely.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f3113ab0-a5f6-4859-a1da-ad0bbfc85cd5

📥 Commits

Reviewing files that changed from the base of the PR and between 476ebe0 and d91db1e.

📒 Files selected for processing (2)

src/hashtable.c
src/unit/test_hashtable.cpp

coderabbitai · 2026-05-23T00:00:33Z

+    while (hashtableIsRehashing(ht)) {
+        hashtableRehashMicroseconds(ht, 1000);
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add fail-fast bounds to rehash wait loops.

Both loops are unbounded and can hang the test process indefinitely if rehash never progresses/starts. Add a guard counter with assertions so failures are diagnosable instead of timing out CI.

Proposed fix

static void completeRehashing(hashtable *ht) { + size_t guard = 0; while (hashtableIsRehashing(ht)) { hashtableRehashMicroseconds(ht, 1000); + ASSERT_LT(++guard, 100000u) << "Rehash did not complete"; } } ... /* Trigger expansion by adding entries until rehashing starts. */ long next = count + 1; + size_t start_rehash_guard = 0; while (!hashtableIsRehashing(ht)) { - hashtableAdd(ht, (void *)next++); + ASSERT_TRUE(hashtableAdd(ht, (void *)next++)); + ASSERT_LT(++start_rehash_guard, 100000u) + << "Expansion rehash did not start"; }

Also applies to: 1738-1740

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/unit/test_hashtable.cpp` around lines 1419 - 1421, The while loops that wait for rehash (using hashtableIsRehashing and hashtableRehashMicroseconds) are unbounded and must be guarded: introduce a fail-fast counter/timeout (e.g. max_iters or remaining_us) initialized before the loop, decrement it on each iteration or subtract the waited microseconds, and assert/fail with a clear message if the counter reaches zero so the test fails fast and shows the rehash did not complete; apply this change to the loop using hashtableIsRehashing/hashtableRehashMicroseconds at both locations referenced (around lines ~1419 and ~1738) so the test cannot hang indefinitely.

rainsupreme added 6 commits May 21, 2026 18:43

hashtable: simplify scan contract wording

1cbc41a

Remove redundant '(or not active)' parenthetical -- if rehashing isn't active there's nothing to pause, so 'when rehashing is paused' covers both cases from the caller's perspective. Signed-off-by: Rain Valentine <rainval@amazon.com>

hashtable: use bool return type for HasPassedKey

6e73a99

Consistent with all other hashtable predicate functions. Signed-off-by: Rain Valentine <rainval@amazon.com>

test: remove trivial HasPassedKey during-rehash test

7afc167

The 'doesn't crash' check adds nothing beyond what the fuzz tests already cover. Signed-off-by: Rain Valentine <rainval@amazon.com>

test: match existing comment style in test_hashtable.cpp

73f1d45

Signed-off-by: Rain Valentine <rainval@amazon.com>

hashtable: use 'position' instead of 'bucket' in HasPassedKey API doc

6005bb0

Avoid exposing internal bucket concept in the public API documentation. Signed-off-by: Rain Valentine <rainval@amazon.com>

github-actions Bot assigned rainsupreme May 21, 2026

rainsupreme requested a review from JimB123 May 21, 2026 19:38

test: remove trailing blank line (clang-format)

d19a1f9

Signed-off-by: Rain Valentine <rainval@amazon.com>

rainsupreme marked this pull request as ready for review May 21, 2026 20:52

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/unit/test_hashtable.cpp Outdated

Comment thread src/unit/test_hashtable.cpp Outdated

test: assert hashtableAdd return values in scan contract tests

fba20dd

Signed-off-by: Rain Valentine <rainval@amazon.com>

nmvk reviewed May 21, 2026

View reviewed changes

JimB123 requested changes May 22, 2026

View reviewed changes

rainsupreme added 8 commits May 22, 2026 20:54

test: resume auto-shrink after scan in all tests

a61da37

Set a good example for anyone using these tests as reference for the pause/resume pattern. Signed-off-by: Rain Valentine <rainval@amazon.com>

test: verify rehash completes during scan, not after

d91db1e

Assert that rehashing finishes while the scan loop is still in progress, proving the full expand lifecycle (start, progress, complete) is safe mid-scan. Signed-off-by: Rain Valentine <rainval@amazon.com>

rainsupreme force-pushed the hashtable-scan-contract branch from 476ebe0 to d91db1e Compare May 22, 2026 23:56

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

rainsupreme changed the title ~~hashtable: scan no-duplicate guarantee when rehashing paused and add HasPassedKey API~~ hashtable: scan no-duplicate guarantee and add HasPassedKey API May 23, 2026

JimB123 approved these changes Jun 17, 2026

View reviewed changes

JimB123 merged commit 5ae43a4 into valkey-io:unstable Jun 17, 2026
63 checks passed

rainsupreme deleted the hashtable-scan-contract branch June 17, 2026 20:49

coderabbitai Bot mentioned this pull request Jun 19, 2026

Randomize hashtable scan start to avoid cursor=0 restart bias #4007

Open

		* - An entry that is created, destroyed, or re-created during the scan may or
		* may not be returned, but will never be returned more than once.

Uh oh!

Conversation

rainsupreme commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nmvk May 21, 2026

Choose a reason for hiding this comment

Uh oh!

rainsupreme May 22, 2026

Choose a reason for hiding this comment

Uh oh!

rainsupreme May 22, 2026

Choose a reason for hiding this comment

Uh oh!

nmvk May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rainsupreme commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

codecov Bot commented May 21, 2026 •

edited

Loading