Sharded pubsub command execution within multi/exec#13
Merged
Conversation
madolson
reviewed
Mar 22, 2024
zuiderkwast
reviewed
Mar 25, 2024
zuiderkwast
approved these changes
Mar 26, 2024
madolson
approved these changes
Mar 27, 2024
PatrickJS
pushed a commit
to PatrickJS/placeholderkv
that referenced
this pull request
Apr 24, 2024
Allow SPUBLISH command within multi/exec on replica.
zuiderkwast
pushed a commit
that referenced
this pull request
Jun 25, 2025
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.
**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:
```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
#0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
#1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
#2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)
0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
#0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
#2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
#3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
#4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
#5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
previously allocated by thread T0 here:
#0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
#2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
#3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
#4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==46774==ABORTING
```
**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.
**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:
```
start_server {tags {"expire hashtable defrag"}} {
test {hashtable scan defrag on expiry} {
r config set hz 100
set num_keys 20
for {set i 0} {$i < $num_keys} {incr i} {
r set "key_$i" "value_$i"
}
for {set j 0} {$j < 50} {incr j} {
set expire_keys 100
for {set i 0} {$i < $expire_keys} {incr i} {
# Short expiry time to ensure they expire quickly
r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
}
# Verify keys are set
set initial_size [r dbsize]
assert_equal $initial_size [expr $num_keys + $expire_keys]
after 150
for {set i 0} {$i < 10} {incr i} {
r get "expire_key_${i}_${j}"
after 10
}
}
set remaining_keys [r dbsize]
assert_equal $remaining_keys $num_keys
# Verify server is still responsive
assert_equal [r ping] {PONG}
} {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.
Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
ranshid
pushed a commit
to ranshid/valkey
that referenced
this pull request
Sep 30, 2025
…y-io#2257) **Current state** During `hashtableScanDefrag`, rehashing is paused to prevent entries from moving, but the scan callback can still delete entries which triggers `hashtableShrinkIfNeeded`. For example, the `expireScanCallback` can delete expired entries. **Issue** This can cause the table to be resized and the old memory to be freed while the scan is still accessing it, resulting in the following memory access violation: ``` [err]: Sanitizer error: ================================================================= ==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0 READ of size 1 at 0x611000003100 thread T0 #0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422 #1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768 #2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 #3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 #4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 #5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 #6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 #7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 #8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) valkey-io#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39) 0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200) freed by thread T0 here: #0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5) #1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400 #2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415 #3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456 #4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656 #5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272 #6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448 #7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459 #8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847 valkey-io#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490 valkey-io#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831 valkey-io#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844 valkey-io#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70 valkey-io#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139 valkey-io#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770 valkey-io#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 valkey-io#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 valkey-io#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 valkey-io#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 valkey-io#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 valkey-io#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 valkey-io#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) previously allocated by thread T0 here: #0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753) #1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214 #2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257 #3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645 #4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272 #5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448 #6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459 #7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847 #8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490 valkey-io#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831 valkey-io#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844 valkey-io#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70 valkey-io#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139 valkey-io#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770 valkey-io#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 valkey-io#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 valkey-io#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 valkey-io#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 valkey-io#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 valkey-io#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 valkey-io#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled Shadow bytes around the buggy address: 0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa =>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==46774==ABORTING ``` **Solution** Suggested solution is to also pause auto shrinking during `hashtableScanDefrag`. I noticed that there was already a `hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that. **Testing** I created a simple tcl test that (most of the times) triggers this error, but it's a little clunky so I didn't add it as part of the PR: ``` start_server {tags {"expire hashtable defrag"}} { test {hashtable scan defrag on expiry} { r config set hz 100 set num_keys 20 for {set i 0} {$i < $num_keys} {incr i} { r set "key_$i" "value_$i" } for {set j 0} {$j < 50} {incr j} { set expire_keys 100 for {set i 0} {$i < $expire_keys} {incr i} { # Short expiry time to ensure they expire quickly r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}" } # Verify keys are set set initial_size [r dbsize] assert_equal $initial_size [expr $num_keys + $expire_keys] after 150 for {set i 0} {$i < 10} {incr i} { r get "expire_key_${i}_${j}" after 10 } } set remaining_keys [r dbsize] assert_equal $remaining_keys $num_keys # Verify server is still responsive assert_equal [r ping] {PONG} } {} } ``` Compiling with ASAN using `make noopt SANITIZER=address valkey-server` and running the test causes error above. Applying the fix resolves the issue. Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Oct 1, 2025
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.
**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:
```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
#0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
#1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
#2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)
0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
#0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
#2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
#3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
#4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
#5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
previously allocated by thread T0 here:
#0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
#2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
#3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
#4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==46774==ABORTING
```
**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.
**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:
```
start_server {tags {"expire hashtable defrag"}} {
test {hashtable scan defrag on expiry} {
r config set hz 100
set num_keys 20
for {set i 0} {$i < $num_keys} {incr i} {
r set "key_$i" "value_$i"
}
for {set j 0} {$j < 50} {incr j} {
set expire_keys 100
for {set i 0} {$i < $expire_keys} {incr i} {
# Short expiry time to ensure they expire quickly
r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
}
# Verify keys are set
set initial_size [r dbsize]
assert_equal $initial_size [expr $num_keys + $expire_keys]
after 150
for {set i 0} {$i < 10} {incr i} {
r get "expire_key_${i}_${j}"
after 10
}
}
set remaining_keys [r dbsize]
assert_equal $remaining_keys $num_keys
# Verify server is still responsive
assert_equal [r ping] {PONG}
} {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.
Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
hpatro
added a commit
that referenced
this pull request
Oct 8, 2025
With #1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
roshkhatri
referenced
this pull request
in roshkhatri/valkey
Oct 14, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 16, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb)
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 17, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 19, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 21, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
madolson
pushed a commit
that referenced
this pull request
Oct 21, 2025
With #1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
enjoy-binbin
pushed a commit
that referenced
this pull request
Feb 8, 2026
) I was working on ASAN large memory tests when I countered this issue. The issue was that the hardcoded `999` key could land in an early bucket. Then shrink rehash could finish early, and later inserts could trigger a new expansion rehash, resetting rehash_idx low. The test now picks the survivor key dynamically as the key mapped to the highest bucket index. ``` [test_hashtable.c] Memory leak detected of 336 bytes ================================================================= ==3901==ERROR: LeakSanitizer: detected memory leaks Direct leak of 80 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x563bfdf4c47d in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:156 #2 0x563bfdf4c47d in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:185 #3 0x563bfdd42eaf in hashtableCreate /home/runner/work/valkey/valkey/src/hashtable.c:1217 #4 0x563bfdaa1cbf in test_empty_buckets_rehashing unit/test_hashtable.c:232 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 #6 0x563bfda86b20 in main unit/test_main.c:108 #7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1446 #5 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1433 #6 0x563bfdd45eb1 in insert /home/runner/work/valkey/valkey/src/hashtable.c:1041 #7 0x563bfdd45eb1 in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 #8 0x563bfdd45eb1 in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 #9 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 #10 0x563bfdae772b in runTestSuite unit/test_main.c:36 #11 0x563bfda86b20 in main unit/test_main.c:108 #12 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #13 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #14 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd3f553 in bucketConvertToChained /home/runner/work/valkey/valkey/src/hashtable.c:908 #4 0x563bfdd3f553 in findBucketForInsert /home/runner/work/valkey/valkey/src/hashtable.c:1021 #5 0x563bfdd45d9e in insert /home/runner/work/valkey/valkey/src/hashtable.c:1045 #6 0x563bfdd45d9e in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 #7 0x563bfdd45d9e in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 #8 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 #9 0x563bfdae772b in runTestSuite unit/test_main.c:36 #10 0x563bfda86b20 in main unit/test_main.c:108 #11 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #12 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #13 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdaa1df8 in test_empty_buckets_rehashing unit/test_hashtable.c:248 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 #6 0x563bfda86b20 in main unit/test_main.c:108 #7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) SUMMARY: AddressSanitizer: 336 byte(s) leaked in 4 allocation(s). ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
harrylin98
pushed a commit
to harrylin98/valkey_forked
that referenced
this pull request
Feb 19, 2026
…lkey-io#3174) I was working on ASAN large memory tests when I countered this issue. The issue was that the hardcoded `999` key could land in an early bucket. Then shrink rehash could finish early, and later inserts could trigger a new expansion rehash, resetting rehash_idx low. The test now picks the survivor key dynamically as the key mapped to the highest bucket index. ``` [test_hashtable.c] Memory leak detected of 336 bytes ================================================================= ==3901==ERROR: LeakSanitizer: detected memory leaks Direct leak of 80 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x563bfdf4c47d in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:156 #2 0x563bfdf4c47d in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:185 #3 0x563bfdd42eaf in hashtableCreate /home/runner/work/valkey/valkey/src/hashtable.c:1217 valkey-io#4 0x563bfdaa1cbf in test_empty_buckets_rehashing unit/test_hashtable.c:232 valkey-io#5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 valkey-io#4 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1446 valkey-io#5 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1433 valkey-io#6 0x563bfdd45eb1 in insert /home/runner/work/valkey/valkey/src/hashtable.c:1041 valkey-io#7 0x563bfdd45eb1 in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#8 0x563bfdd45eb1 in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#9 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#10 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#11 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#12 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#14 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd3f553 in bucketConvertToChained /home/runner/work/valkey/valkey/src/hashtable.c:908 valkey-io#4 0x563bfdd3f553 in findBucketForInsert /home/runner/work/valkey/valkey/src/hashtable.c:1021 valkey-io#5 0x563bfdd45d9e in insert /home/runner/work/valkey/valkey/src/hashtable.c:1045 valkey-io#6 0x563bfdd45d9e in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#7 0x563bfdd45d9e in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#8 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#9 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#10 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#11 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#12 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 valkey-io#4 0x563bfdaa1df8 in test_empty_buckets_rehashing unit/test_hashtable.c:248 valkey-io#5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) SUMMARY: AddressSanitizer: 336 byte(s) leaked in 4 allocation(s). ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Mar 5, 2026
…lkey-io#3174) I was working on ASAN large memory tests when I countered this issue. The issue was that the hardcoded `999` key could land in an early bucket. Then shrink rehash could finish early, and later inserts could trigger a new expansion rehash, resetting rehash_idx low. The test now picks the survivor key dynamically as the key mapped to the highest bucket index. ``` [test_hashtable.c] Memory leak detected of 336 bytes ================================================================= ==3901==ERROR: LeakSanitizer: detected memory leaks Direct leak of 80 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x563bfdf4c47d in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:156 #2 0x563bfdf4c47d in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:185 #3 0x563bfdd42eaf in hashtableCreate /home/runner/work/valkey/valkey/src/hashtable.c:1217 #4 0x563bfdaa1cbf in test_empty_buckets_rehashing unit/test_hashtable.c:232 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1446 #5 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1433 valkey-io#6 0x563bfdd45eb1 in insert /home/runner/work/valkey/valkey/src/hashtable.c:1041 valkey-io#7 0x563bfdd45eb1 in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#8 0x563bfdd45eb1 in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#9 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#10 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#11 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#12 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#14 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd3f553 in bucketConvertToChained /home/runner/work/valkey/valkey/src/hashtable.c:908 #4 0x563bfdd3f553 in findBucketForInsert /home/runner/work/valkey/valkey/src/hashtable.c:1021 #5 0x563bfdd45d9e in insert /home/runner/work/valkey/valkey/src/hashtable.c:1045 valkey-io#6 0x563bfdd45d9e in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#7 0x563bfdd45d9e in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#8 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#9 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#10 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#11 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#12 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdaa1df8 in test_empty_buckets_rehashing unit/test_hashtable.c:248 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) SUMMARY: AddressSanitizer: 336 byte(s) leaked in 4 allocation(s). ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
GilboaAWS
pushed a commit
to GilboaAWS/valkey
that referenced
this pull request
Jun 24, 2026
* feat(S2.4): compression worker pool
Replaces the Phase 0 stub at src/compression_workers.{c,h} with a real
pool: pthread-managed workers, mutexQueue inbox + mpsc outbox, QSBR
worker contract, runtime resize via Stop+Start, race-free graceful
shutdown via sentinels, grace barriers via wake-all from
compressionRegistryRetire, resize-aware canFree bound. The per-job
body is a deliberate placeholder pass-through — workers accept jobs
and post empty results to the outbox without compressing — so
end-to-end plumbing (enqueue, dequeue, drain, shutdown ordering) is
exercised before ZSTD_compress_usingCDict integration lands in S2.5.
Architecture
============
Inbox: mutexQueue (src/mutexqueue.h). Cond-var blocking gives us
worker parking when idle. Right fit for a feature whose default state
is enabled-but-quiet, where busy-spin would burn a core for no work.
Outbox: mpscQueue from src/queues.h. Lock-free; the main thread polls
non-blocking from compressionAfterSleep. Workers retry on full rather
than drop completed work.
QSBR plumbing concerns surfaced during review and fixed in this commit
=====================================================================
QSBR grace barriers (mutexQueueWakeAll + new mutexQueuePopWakable)
------------------------------------------------------------------
The QSBR design (§4.4) requires that idle workers periodically advance
their per-thread quiescent_gen counter even when no jobs are flowing,
so the registry's GC pass can reclaim retired dicts. Without this, a
parked worker's gen stays frozen at the snapshot taken by
compressionRegistryRetire, blocking reclamation indefinitely.
mutexqueue.h is extended with **two new APIs** that preserve the
existing pop contract for current callers (bio):
- mutexQueueWakeAll(q): pthread_cond_broadcast under the mutex; wakes
every consumer parked in cond_wait without enqueuing anything.
- mutexQueuePopWakable(q, blocking=true): a wake-aware pop variant
that returns NULL once on a spurious wake or wake-all, instead of
re-parking. The compression worker uses this; bio continues to use
the unchanged mutexQueuePop.
The original mutexQueuePop / mutexQueuePopAll APIs keep their original
"never returns NULL when blocking" semantics — they re-park on
spurious wakes via the `while` loop. This addresses reviewer feedback
that adding mutexQueueWakeAll should not change behaviour for existing
callers.
compressionRegistryRetire calls compressionWorkersWakeAll() (a thin
wrapper) right after taking the gen snapshot. The compression worker
distinguishes wake reasons in its loop:
- mutexQueuePopWakable returns NULL → grace-barrier wake. Worker
advances its gen via compressionWorkerReportQuiescent and
continues.
- Returns &kShutdownSentinel → see below.
- Returns compressionJob* → real work.
Race-free shutdown via sentinels (kShutdownSentinel)
-----------------------------------------------------
Wake-all alone for shutdown has a small race: a worker that has read
shutdown_requested == 0 at top-of-loop and is about to enter
mutexQueuePopWakable can be preempted; if Stop's broadcast fires in
that window, the broadcast hits no waiters and is lost; the worker
subsequently parks and deadlocks pthread_join.
Sentinels avoid this because they put DATA in the queue. Once Stop
has pushed N sentinels, the next mutexQueuePopWakable sees length>0
under the mutex and pops one without entering cond_wait. The sentinel
is a unique address (static int) distinguishable by pointer equality;
zero memory cost.
Stop pushes N sentinels (matching pool.n_threads). The Start-rollback
path on pthread_create failure does the same.
Wake-all is reserved for grace barriers because missed barrier wakes
are benign — the next event catches the worker. Shutdown is
correctness-critical and cannot tolerate the race.
Resize-aware canFree (retire_n_workers)
=======================================
After a resize-up post-retire, the registry's snapshot may have been
taken with N workers but canFree iterates N+M. Slots beyond N are
zcalloc-default (retire_worker_gen[i]=0) and the new worker starts
at gen[i]=0 — strict-inequality canFree (`gen > retire_worker_gen`)
fails on these slots, blocking reclamation until normal traffic
advances the gen.
compressionDictPair gains an int retire_n_workers field. startRetirement
captures server.compression_threads alongside the gen snapshot. canFree
iterates `min(retire_n_workers, server.compression_threads)`:
- resize-up after retire: don't check new slots. Safe — the new
worker spawned AFTER retire and cannot have observed this dict
(compressionRegistryActive() returns the current active, never
a retiring one).
- resize-down after retire: don't check dead slots. Safe — the
dead worker is joined and cannot hold a pointer.
This eliminates the only theoretical liveness issue with QSBR + resize.
No gen-bumping at Start needed — keeps the QSBR invariant that gens
advance only via the worker reaching quiescence.
Other plumbing
==============
- compressionInit (compression.c): registry init → workers Start.
- compressionShutdown (new): Stop then Release, in that order, before
any registry teardown. Called from finishShutdown in server.c after
moduleUnloadAllModules.
- compressionAfterSleep drains up to 256 results per main-loop tick.
Comment explains the 256 budget rationale (~2.5 ms install time, fits
inside event-loop budget; surfaceable via outbox-backpressure
counter if undersized).
- CONFIG SET compression-threads applies via applyCompressionThreads
→ compressionWorkersResize. Resize is graceful (Stop + Start).
- compressionWorkersGetThreadCount(): test/introspection accessor for
the live pool size — used by unit tests to verify Resize actually
creates/destroys threads.
Worker contract refinements (review feedback)
=============================================
- compressionWorkersEnqueue dropped the `dict_id` parameter. Per the
design (§4.6), the worker loads the active dict at compress time;
the dict_id field on the job struct is filled by the worker after
compression so the resulting dict_id can be carried into the
compressed-frame header.
- Header documents the ownership contract for `key`/`src` sds
pointers: they are borrowed by the pool for the job's lifetime.
In production (S2.5+) the caller's incrRefCount on the value robj
pins the sds until the drain runs decrRefCount. In the unit-test
fixture the test owns and frees the sds directly.
- compression_registry.c uses COMPRESSION_WORKERS_MAX for sizing
worker_quiescent_gen[] instead of the bare constant 16.
Tests
=====
src/unit/test_compression_workers.cpp — extended:
- Lifecycle: Start{Zero,One,Four,Max}Threads + StopIsIdempotent +
ThreadCountIsZeroBeforeStart sanity check; every transition asserts
compressionWorkersGetThreadCount().
- Enqueue/drain edge cases: EnqueueRejectedBeforeStart,
DrainBeforeStartReturnsZero.
- End-to-end: SingleJobRoundTrip, BurstOf256JobsOneWorker,
BurstOf1024JobsFourWorkers — each asserts post-condition that the
outbox is empty after the expected count surfaced.
- Resize: ResizeStartsPoolWhenNotInitialized, ResizeIsNoOpWhenSameCount,
ResizeUpAndDown (each transition checks thread count),
ResizeToZeroDisablesPool, ResizeRejectsOutOfRange,
ResizeAcrossEnqueuedJobs (drop contract made explicit in comment +
asserted via EXPECT_LE/GE).
- Out-of-range Start is documented as assertion-based (caller-bug);
out-of-range public surface is exercised via Resize.
Design doc
==========
§4.4 and §4.6 in detailed-design.md reflect the wake-all/sentinel
split and the retire_n_workers field.
Verified locally
================
- make -j builds clean.
- ./runtest --single unit/type/compression: 10/10 passes.
- Manual smoke test: server boots with compression-threads=2,
CONFIG SET compression-threads {0, 1, 2, 4, 8} all work at runtime,
shutdown joins all workers cleanly via the sentinel path.
Not verified locally (CI validates):
- gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).
- clang-format-18 (not installed locally).
* docs(plan): record S2.4 delivery; add S2.11 bounded-inbox follow-up
Reflects what actually landed in PR valkey-io#13 vs. the original one-liner plan
entry for S2.4:
- mutexqueue dual API (Pop/PopAll keep contract; new PopWakable for
wake-aware consumers)
- sentinel-based race-free shutdown
- retire_n_workers field on compressionDictPair for resize-aware
canFree
- compressionWorkersGetThreadCount accessor
S1.1 marked as merged (PR valkey-io#12) and noted that PR valkey-io#13 extended its
data structures (retire_n_workers + COMPRESSION_WORKERS_MAX usage).
Adds S2.11 — bounded inbox + per-caller back-pressure counters — as
an explicit prerequisite before S2.7 (write-path hook) ships. Was
mentioned in PR valkey-io#13 description and SESSION_CHECKPOINT but not
tracked in the plan. Squash this into the S2.4 commit at merge time.
GilboaAWS
pushed a commit
to GilboaAWS/valkey
that referenced
this pull request
Jun 24, 2026
…key-io#15 (valkey-io#16) Reconcile the planning docs with the implementation-phase changes that landed after the original design walkthrough. - idea-honing.md Q9: add a "Superseded during implementation" annotation pointing to the new training-knob model from PR valkey-io#14 (compression-dict-min-training-keys / -max-training-keys / compression-training-buffer-size). The original walkthrough text is preserved below the annotation for historical record. - detailed-design.md §7.1 transparency-mode example: replace --compression-dict-first-training-keys-count (removed in PR valkey-io#14) with --compression-dict-min-training-keys. - detailed-design.md §2.12 heading: "Advanced knobs (12)" → "(11)". Off-by-one count introduced when PR valkey-io#14 reshuffled the table. - summary.md: add a new "What changed during implementation" section between the walkthrough section and the plan summary, capturing the three substantive post-walkthrough refinements (PR valkey-io#14 training rewrite, PR valkey-io#15 R2.5.6 read-hot gap, PR valkey-io#13 QSBR plumbing). Doc-only; no code changes.
GilboaAWS
pushed a commit
to GilboaAWS/valkey
that referenced
this pull request
Jun 24, 2026
* [S2.7] Compression write-path hook
Wires compressionEnqueueCandidate into dbAddInternal and dbSetValue,
and replaces the TODO(S2.7) placeholder in the drain handler with a
real install path. With this change, writes to eligible STRING values
get queued for background compression and the result is installed back
into the kvstore as an OBJ_ENCODING_COMPRESSED robj.
The decoder (S2.6) is shipped but not yet wired into read paths (S2.8),
so as long as compression-enabled stays no (default), behavior is
unchanged. Once an operator turns the switch on, written values get
compressed, but reads return the compressed bytes until S2.8 lands.
Existing transparency tests verify no regression in the default-off
configuration.
Producer side (compression.c, db.c)
Two seams in db.c — end of dbAddInternal and end of dbSetValue —
call compressionEnqueueCandidate(key, value, db->id). The candidate
function applies four guards:
1. Master switch (compression_enabled, via compressionIsEligible).
2. R2.2 eligibility (type/encoding/size/hot-key — also via predicate).
3. R2.1.5 active-dict check — saves an allocator round-trip when
compression-enabled=yes but training hasn't completed.
4. incrRefCount(value) — pins the bytes for the worker AND
reserves the robj address for the drain handler's pointer-
equality stale check (ABA-safe per R2.4.4 + the lifetime
discussion in PR valkey-io#18).
If the worker pool refuses (not started; future S2.11 inbox full),
the pin is released immediately. RDB-load enqueue is deliberately
skipped — TODO(S2.10): the sweep tick will rediscover RDB-loaded
values without hammering the inbox during load.
API change: compressionWorkersEnqueue
Old: compressionWorkersEnqueue(sds key, int dbid, uint64_t version, sds src)
New: compressionWorkersEnqueue(robj *value, int dbid)
The new form requires a pinned robj; the worker reads
objectGetVal(value) once at enqueue (captured into job->src) and
never touches the robj afterwards (R2.11.4 intact). The drain
handler uses job->value for the kvstore lookup and the pointer-
equality stale check.
The version field is gone — pointer equality, made ABA-safe by the
pin, is sufficient. R2.4.4 explains why: holding incrRefCount(value)
prevents the allocator from reusing the address while the job is
in flight.
Drain install (compression_workers.c)
New compressionInstall() helper:
1. void **slot = kvstoreHashtableFindRef(db->keys, didx, key_sds);
2. If slot == NULL OR *slot != job->value: stale (overwrite, expire,
or COW). Discard.
3. Else: createCompressedObject(OBJ_STRING, job->dst, job->dst_len);
dbReplaceValue installs.
4. compressionRegistryIncRef(job->dict_id) on success.
dbReplaceValue routes through dbSetValue(..., overwrite=0, ...),
which does NOT call signalModifiedKey, moduleNotifyKeyUnlink, or
signalDeletedKeyAsReady. Background compression is a storage-only
change per R2.9.2 — no WATCH dirty_cas, no client-side-caching
invalidations, no keyspace notifications.
Pin released on every drain completion path (success, stale-discard,
net-savings reject, ZSTD error, no-active-dict). Test-mode jobs
(job->value == NULL) skip both install and decRef.
Test migration
The 15 existing test-fixture call sites passed raw sds + dummy
version. Migrated to a new testOnlyCompressionWorkersEnqueueRaw(src,
dbid) that sets job->value = NULL. Tests extract jobs via
testOnlyCompressionWorkersDrainOutbox before the production drain
runs, so production-only paths (install, decRef) are never reached
by the value=NULL sentinel.
No new gtest cases for the install path itself — that requires a
fully-initialized server.db / kvstore that the unit-test environment
doesn't construct. End-to-end coverage will come from the Tcl
transparency harness once S2.8 wires the read path.
TODO(S4.1) markers added at:
- compressionInstall: compression_compressions_per_sec, EMA fold,
compression_compressed_objects.
- compressionEnqueueCandidate: compression_candidates_dropped_total
when S2.11 lands (today the pool-not-started rejection is a
config state, not back-pressure).
Verified locally:
- make -j2 -C src → clean (BUILD_ZSTD=yes default).
- make -j2 -C src BUILD_ZSTD=no → clean.
- ./runtest --single unit/type/compression → 10/10 pass.
gtest unit tests not runnable locally; CI validates.
Diff stat:
.../implementation/plan.md | 4 +-
src/compression.c | 35 +++-
src/compression.h | 27 ++-
src/compression_workers.c | 185 +++++++++++++++------
src/compression_workers.h | 56 +++----
src/db.c | 14 ++
src/unit/test_compression_workers.cpp | 31 ++--
7 files changed, 244 insertions(+), 108 deletions(-)
* [S2.7] PR valkey-io#19 review: assert + design-doc alignment
Two reviewer threads addressed:
Thread #1 (T-3369017721) — production code carrying test concerns
The drain handler had a `if (job->value == NULL)` branch that only
existed to handle test-only jobs from
testOnlyCompressionWorkersEnqueueRaw. Reviewer correctly pointed out
that production code shouldn't carry test-only branches.
Fix: replaced with serverAssert(job->value != NULL) at the top of
the per-job loop. Production drain assumes every job has a real
pinned robj; tests must extract their value=NULL jobs via
testOnlyCompressionWorkersDrainOutbox before this drain runs.
Side effect: removed the conditional `if (job->value != NULL)`
guards around decrRefCount and the install branch — the top-of-loop
assert means every code path can assume value is non-NULL.
Thread valkey-io#2 (T-3356207626) — design doc out of sync with implementation
Design §4.6 still described the original version-counter approach
for staleness detection (`uint64_t version` field on compressionJob,
"if version counter moved, discard"). The implementation has used
pointer equality + the incrRefCount-pin since S2.4 PR valkey-io#13.
Fix: updated §4.6 to:
- compressionJob struct: drop `version`, drop `robj *key`, add
`robj *value` (pinned via incrRefCount), and `sds src` and
`int dbid` separately, matching the actual struct.
- Concurrency notes: replaced the "version counter moved" bullet
with the pointer-equality + ABA-safety reasoning, naming the
incrRefCount-reserves-the-address invariant as the protection
mechanism (same property explained in PR valkey-io#18 review).
Verified locally:
- make -j2 -C src → clean
- ./runtest --single unit/type/compression → 10/10 pass
* [S2.7] Fix CI: remove erroneous & on server.db indexing
build-32bit (and the 30+ downstream cells, all CI cells use -Werror):
compression_workers.c:531:20: error: initialization of 'serverDb *'
from incompatible pointer type 'serverDb **'
[-Werror=incompatible-pointer-types]
`server.db` is `serverDb **` (array of pointers, one per DB). So
`server.db[i]` is already `serverDb *` — the address-of operator was
redundant and produced `serverDb **`.
Fix: drop the `&`. Matches the pattern used everywhere else in the
codebase (db.c, server.c, etc.).
Local make didn't catch this — the default SERVER_CFLAGS doesn't
include -Werror. CI does. Built locally with `make SERVER_CFLAGS=-Werror`
to confirm clean.
* [S2.7] Fix CI: tests must use testOnly drain for value=NULL jobs
5 gtest cases failed on build-32bit (and would on every test cell)
with the new production-drain serverAssert(job->value != NULL):
ASSERTION FAILED: compression_workers.c:591 'job->value != NULL'
in: SingleJobRoundTrip, BurstOf256JobsOneWorker,
BurstOf1024JobsFourWorkers, ResizeAcrossEnqueuedJobs,
NetSavingsGuardRejectsIncompressible
Root cause: the previous commit's reviewer-driven hardening (PR valkey-io#19
review thread #1) made the production drain assert that every job
has a non-NULL pinned robj. The premise was "tests use the testOnly
drain to extract jobs before the production drain runs". That premise
was wrong — many tests ALSO call compressionWorkersDrainOutbox
directly to consume-and-dispose test-mode jobs (the drainUntil helper
is the most-used path).
Fix: add testOnlyCompressionWorkersDrainAndDispose(budget) — pulls
jobs via the existing testOnlyCompressionWorkersDrainOutbox, frees
them via testOnlyCompressionWorkersFreeJob, returns count. Migrate
the test fixture's drainUntil helper and all 8 direct
compressionWorkersDrainOutbox call sites in the test file to the
new helper.
Production drain stays clean — no test concerns. Reviewer thread #1
intent preserved.
Verified locally:
- make -j2 -C src SERVER_CFLAGS=-Werror → clean
- ./runtest --single unit/type/compression → 10/10 pass
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Allow SPUBLISH command within multi/exec on replica
Behavior on unstable:
With this change: