Skip to content

Fix crash when active field-expiry leaves a single-entry HT vset bucket#3950

Merged
ranshid merged 1 commit into
valkey-io:unstablefrom
ranshid:repro-hfe-bug
Jun 10, 2026
Merged

Fix crash when active field-expiry leaves a single-entry HT vset bucket#3950
ranshid merged 1 commit into
valkey-io:unstablefrom
ranshid:repro-hfe-bug

Conversation

@ranshid

@ranshid ranshid commented Jun 9, 2026

Copy link
Copy Markdown
Member

When a hash has many fields whose expirations fall in the same volatile-set time-bucket (>127 entries), that bucket is encoded as a hashtable (HT). The active field-expiry path drains expired entries via vsetBucketRemoveExpired_HASHTABLE(), but it is bounded by the per-key expire quota (max_count, from dbReclaimExpiredFields). When the quota stops the drain one entry short, the function only collapsed the bucket to NONE at size 0 and left it as an HT bucket holding a single entry.

That violates the invariant enforced by removeFromBucket_HASHTABLE(), which asserts hashtableSize(ht) > 0 after a delete and downgrades an HT bucket to SINGLE at size 1. A later normal removal of that lone field -- HDEL, or a cross-bucket HSETEX update routed through removeEntryFromRaxBucket() -- deletes the sole entry, drops the size 1 -> 0, and trips the assertion.

Fix: Apply the same downgrade in the expiry path: when draining leaves exactly one entry, downgrade the HT bucket to SINGLE so the invariant holds. Behavior is unchanged when the bucket drains fully (-> NONE) or retains two or more entries (-> HT, as the normal removal path already tolerates).

Add a regression test that fills one time-bucket past the HT threshold, expires all but one entry, then removes the survivor through the normal path; it crashed before this fix and passes after.

When a hash has many fields whose expirations fall in the same volatile-set
time-bucket (>127 entries), that bucket is encoded as a hashtable (HT). The
active field-expiry path drains expired entries via
vsetBucketRemoveExpired_HASHTABLE(), but it is bounded by the per-key expire
quota (max_count, from dbReclaimExpiredFields). When the quota stops the drain
one entry short, the function only collapsed the bucket to NONE at size 0 and
left it as an HT bucket holding a single entry.

That violates the invariant enforced by removeFromBucket_HASHTABLE(), which
asserts hashtableSize(ht) > 0 after a delete and downgrades an HT bucket to
SINGLE at size 1. A later normal removal of that lone field -- HDEL, or a
cross-bucket HSETEX update routed through removeEntryFromRaxBucket() -- deletes
the sole entry, drops the size 1 -> 0, and trips the assertion.

Fix: Apply the same downgrade in the expiry path: when draining leaves exactly one
entry, downgrade the HT bucket to SINGLE so the invariant holds. Behavior is
unchanged when the bucket drains fully (-> NONE) or retains two or more entries
(-> HT, as the normal removal path already tolerates).

Add a regression test that fills one time-bucket past the HT threshold, expires
all but one entry, then removes the survivor through the normal path; it crashed
before this fix and passes after.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7d0ef7b6-8e1b-4c2a-9b4e-89b80a7edf77

📥 Commits

Reviewing files that changed from the base of the PR and between d0ffbab and 9b5df50.

📒 Files selected for processing (2)
  • src/unit/test_vset.cpp
  • src/vset.c

📝 Walkthrough

Walkthrough

This PR fixes an invariant violation in vset bucket transitions by collapsing or downgrading hashtable-encoded buckets after expired entry removal. The expired-entry drain path now checks the remaining hashtable size and either releases the hashtable entirely (setting bucket to NONE) or downgrades it to SINGLE type (extracting the last entry). A regression test validates the specific scenario where expiration leaves one entry, then removes it via the normal path.

Changes

Hashtable bucket collapse on expired entry removal

Layer / File(s) Summary
Hashtable bucket collapse and downgrade after expiration
src/vset.c, src/unit/test_vset.cpp
After vsetRemoveExpired deletes expired entries from a VSET_BUCKET_HT, the function checks remaining hashtable size and transitions the bucket: size 0 releases the hashtable and sets bucket to NONE; size 1 releases the hashtable and downgrades to SINGLE by extracting the remaining entry. A new regression test (TestVsetRemoveExpiredLeavesHtBucketSizeOne) creates entries past the HT threshold with identical expiry, removes expired entries with a quota leaving exactly one HT entry, then removes that survivor via vsetRemoveEntry to verify invariants hold and the vset becomes empty.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: addressing a crash that occurs when active field-expiry leaves a single-entry HT vset bucket, which is the core issue resolved by this changeset.
Description check ✅ Passed The description is directly related to the changeset, explaining the root cause, the fix applied, and the regression test added to verify the solution.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)
src/unit/test_vset.cpp

src/unit/test_vset.cpp:7:10: fatal error: 'generated_wrappers.hpp' file not found
7 | #include "generated_wrappers.hpp"
| ^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/9b5df5027f708542ede2f5d44f3eff00724201d1-a88611519f43b40e/tmp/clang_command_.tmp.2f139a.txt
++Contents of '/tmp/coderabbit-infer/9b5df5027f708542ede2f5d44f3eff00724201d1-a88611519f43b40e/tmp/clang_command_.tmp.2f139a.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-triple"
"x86_64-unknown-linux-gnu" "-emit-obj" "-mre

... [truncated 1085 characters] ...

lang/install/lib/clang/18/include"
"-internal-isystem" "/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/a88611519f43b40e/file.o" "-x" "c++"
"src/unit/test_vset.cpp" "-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.57%. Comparing base (69004ca) to head (9b5df50).
⚠️ Report is 1 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3950      +/-   ##
============================================
- Coverage     76.66%   76.57%   -0.09%     
============================================
  Files           162      162              
  Lines         80733    80757      +24     
============================================
- Hits          61895    61842      -53     
- Misses        18838    18915      +77     
Files with missing lines Coverage Δ
src/unit/test_vset.cpp 99.32% <100.00%> (+0.04%) ⬆️
src/vset.c 88.08% <100.00%> (-0.25%) ⬇️

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@enjoy-binbin enjoy-binbin moved this to To be backported in Valkey 9.0 Jun 10, 2026
@enjoy-binbin enjoy-binbin moved this to To be backported in Valkey 9.1 Jun 10, 2026
@ranshid

ranshid commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Tnank you @enjoy-binbin !
we will probably have to backport WITHOUT the unitests, but we need to

@ranshid ranshid merged commit e1bc58c into valkey-io:unstable Jun 10, 2026
64 checks passed
ranshid added a commit that referenced this pull request Jun 10, 2026
#3957)

backport of (#3950)

When a hash has many fields whose expirations fall in the same
volatile-set time-bucket (>127 entries), that bucket is encoded as a
hashtable (HT). The active field-expiry path drains expired entries via
vsetBucketRemoveExpired_HASHTABLE(), but it is bounded by the per-key
expire quota (max_count, from dbReclaimExpiredFields). When the quota
stops the drain one entry short, the function only collapsed the bucket
to NONE at size 0 and left it as an HT bucket holding a single entry.

That violates the invariant enforced by removeFromBucket_HASHTABLE(),
which asserts hashtableSize(ht) > 0 after a delete and downgrades an HT
bucket to SINGLE at size 1. A later normal removal of that lone field --
HDEL, or a cross-bucket HSETEX update routed through
removeEntryFromRaxBucket() -- deletes the sole entry, drops the size 1
-> 0, and trips the assertion.

Fix: Apply the same downgrade in the expiry path: when draining leaves
exactly one entry, downgrade the HT bucket to SINGLE so the invariant
holds. Behavior is unchanged when the bucket drains fully (-> NONE) or
retains two or more entries (-> HT, as the normal removal path already
tolerates).

Add a regression test that fills one time-bucket past the HT threshold,
expires all but one entry, then removes the survivor through the normal
path; it crashed before this fix and passes after.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
ranshid added a commit that referenced this pull request Jun 10, 2026
#3958)

backport of (#3950)

When a hash has many fields whose expirations fall in the same
volatile-set time-bucket (>127 entries), that bucket is encoded as a
hashtable (HT). The active field-expiry path drains expired entries via
vsetBucketRemoveExpired_HASHTABLE(), but it is bounded by the per-key
expire quota (max_count, from dbReclaimExpiredFields). When the quota
stops the drain one entry short, the function only collapsed the bucket
to NONE at size 0 and left it as an HT bucket holding a single entry.

That violates the invariant enforced by removeFromBucket_HASHTABLE(),
which asserts hashtableSize(ht) > 0 after a delete and downgrades an HT
bucket to SINGLE at size 1. A later normal removal of that lone field --
HDEL, or a cross-bucket HSETEX update routed through
removeEntryFromRaxBucket() -- deletes the sole entry, drops the size 1
-> 0, and trips the assertion.

Fix: Apply the same downgrade in the expiry path: when draining leaves
exactly one entry, downgrade the HT bucket to SINGLE so the invariant
holds. Behavior is unchanged when the bucket drains fully (-> NONE) or
retains two or more entries (-> HT, as the normal removal path already
tolerates).

Add a regression test that fills one time-bucket past the HT threshold,
expires all but one entry, then removes the survivor through the normal
path; it crashed before this fix and passes after.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 23, 2026
…et (#3950)

When a hash has many fields whose expirations fall in the same
volatile-set time-bucket (>127 entries), that bucket is encoded as a
hashtable (HT). The active field-expiry path drains expired entries via
vsetBucketRemoveExpired_HASHTABLE(), but it is bounded by the per-key
expire quota (max_count, from dbReclaimExpiredFields). When the quota
stops the drain one entry short, the function only collapsed the bucket
to NONE at size 0 and left it as an HT bucket holding a single entry.

That violates the invariant enforced by removeFromBucket_HASHTABLE(),
which asserts hashtableSize(ht) > 0 after a delete and downgrades an HT
bucket to SINGLE at size 1. A later normal removal of that lone field --
HDEL, or a cross-bucket HSETEX update routed through
removeEntryFromRaxBucket() -- deletes the sole entry, drops the size 1
-> 0, and trips the assertion.

Fix: Apply the same downgrade in the expiry path: when draining leaves
exactly one entry, downgrade the HT bucket to SINGLE so the invariant
holds. Behavior is unchanged when the bucket drains fully (-> NONE) or
retains two or more entries (-> HT, as the normal removal path already
tolerates).

Add a regression test that fills one time-bucket past the HT threshold,
expires all but one entry, then removes the survivor through the normal
path; it crashed before this fix and passes after.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To be backported
Status: To be backported

Development

Successfully merging this pull request may close these issues.

2 participants