Adds discord link for discussions.#10
Merged
Merged
Conversation
madolson
approved these changes
Mar 22, 2024
naglera
pushed a commit
to naglera/placeholderkv
that referenced
this pull request
Apr 8, 2024
…is missed cases to redis-server. (#12322)
Observed that the sanitizer reported memory leak as clean up is not done
before the process termination in negative/following cases:
**- when we passed '--invalid' as option to redis-server.**
```
-vm:~/mem-leak-issue/redis$ ./src/redis-server --invalid
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'invalid'
Bad directive or wrong number of arguments
=================================================================
==865778==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x7f0985f65867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
valkey-io#1 0x558ec86686ec in ztrymalloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:117
valkey-io#2 0x558ec86686ec in ztrymalloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:135
valkey-io#3 0x558ec86686ec in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:276
valkey-io#4 0x558ec86686ec in zrealloc /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:327
valkey-io#5 0x558ec865dd7e in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1172
valkey-io#6 0x558ec87a1be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
valkey-io#7 0x558ec87a13b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
valkey-io#8 0x558ec85e6f15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
valkey-io#9 0x7f09856e5d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).
```
**- when we pass '--port' as option and missed to add port number to redis-server.**
```
vm:~/mem-leak-issue/redis$ ./src/redis-server --port
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'port'
wrong number of arguments
=================================================================
==865846==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x7fdcdbb1f867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
valkey-io#1 0x557e8b04f6ec in ztrymalloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:117
valkey-io#2 0x557e8b04f6ec in ztrymalloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:135
valkey-io#3 0x557e8b04f6ec in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:276
valkey-io#4 0x557e8b04f6ec in zrealloc /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:327
valkey-io#5 0x557e8b044d7e in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1172
valkey-io#6 0x557e8b188be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
valkey-io#7 0x557e8b1883b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
valkey-io#8 0x557e8afcdf15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
valkey-io#9 0x7fdcdb29fd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Indirect leak of 10 byte(s) in 1 object(s) allocated from:
#0 0x7fdcdbb1fc18 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
valkey-io#1 0x557e8b04f9aa in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:287
valkey-io#2 0x557e8b04f9aa in ztryrealloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:317
valkey-io#3 0x557e8b04f9aa in zrealloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:342
valkey-io#4 0x557e8b033f90 in _sdsMakeRoomFor /home/ubuntu/mem-leak-issue/redis/src/sds.c:271
valkey-io#5 0x557e8b033f90 in sdsMakeRoomFor /home/ubuntu/mem-leak-issue/redis/src/sds.c:295
valkey-io#6 0x557e8b033f90 in sdscatlen /home/ubuntu/mem-leak-issue/redis/src/sds.c:486
valkey-io#7 0x557e8b044e1f in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1165
valkey-io#8 0x557e8b188be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
valkey-io#9 0x557e8b1883b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
valkey-io#10 0x557e8afcdf15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
valkey-io#11 0x7fdcdb29fd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: 18 byte(s) leaked in 2 allocation(s).
```
As part analysis found that the sdsfreesplitres is not called when this condition checks are being hit.
Output after the fix:
```
vm:~/mem-leak-issue/redis$ ./src/redis-server --invalid
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'invalid'
Bad directive or wrong number of arguments
vm:~/mem-leak-issue/redis$
===========================================
vm:~/mem-leak-issue/redis$ ./src/redis-server --jdhg
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'jdhg'
Bad directive or wrong number of arguments
---------------------------------------------------------------------------
vm:~/mem-leak-issue/redis$ ./src/redis-server --port
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'port'
wrong number of arguments
```
Co-authored-by: Oran Agra <oran@redislabs.com>
skolosov-snap
pushed a commit
to skolosov-snap/valkey
that referenced
this pull request
Feb 5, 2025
* Implement cluster replicate no one * Fix arg arity * Fix arity of cluster replicate * Fix crash
skolosov-snap
pushed a commit
to skolosov-snap/valkey
that referenced
this pull request
Feb 5, 2025
* Implement cluster replicate no one * Fix arg arity * Fix arity of cluster replicate * Fix crash Signed-off-by: Sergey Kolosov <skolosov@snapchat.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Jun 25, 2025
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.
**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:
```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
#0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
#1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
#2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)
0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
#0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
#2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
#3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
#4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
#5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
previously allocated by thread T0 here:
#0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
#2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
#3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
#4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==46774==ABORTING
```
**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.
**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:
```
start_server {tags {"expire hashtable defrag"}} {
test {hashtable scan defrag on expiry} {
r config set hz 100
set num_keys 20
for {set i 0} {$i < $num_keys} {incr i} {
r set "key_$i" "value_$i"
}
for {set j 0} {$j < 50} {incr j} {
set expire_keys 100
for {set i 0} {$i < $expire_keys} {incr i} {
# Short expiry time to ensure they expire quickly
r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
}
# Verify keys are set
set initial_size [r dbsize]
assert_equal $initial_size [expr $num_keys + $expire_keys]
after 150
for {set i 0} {$i < 10} {incr i} {
r get "expire_key_${i}_${j}"
after 10
}
}
set remaining_keys [r dbsize]
assert_equal $remaining_keys $num_keys
# Verify server is still responsive
assert_equal [r ping] {PONG}
} {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.
Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
enjoy-binbin
added a commit
that referenced
this pull request
Sep 13, 2025
## Summary - extend replication wait time in `slave-selection` test ``` *** [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 ``` ## Testing - `./runtest --single unit/cluster/slave-selection` - `./runtest --single unit/cluster/slave-selection --valgrind` Signed-off-by: Vitali Arbuzov <Vitali.Arbuzov@proton.me> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
rjd15372
pushed a commit
to rjd15372/valkey
that referenced
this pull request
Sep 19, 2025
## Summary - extend replication wait time in `slave-selection` test ``` *** [err]: Node valkey-io#10 should eventually replicate node valkey-io#5 in tests/unit/cluster/slave-selection.tcl valkey-io#10 didn't became slave of valkey-io#5 ``` ## Testing - `./runtest --single unit/cluster/slave-selection` - `./runtest --single unit/cluster/slave-selection --valgrind` Signed-off-by: Vitali Arbuzov <Vitali.Arbuzov@proton.me> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
rjd15372
pushed a commit
to rjd15372/valkey
that referenced
this pull request
Sep 19, 2025
…alkey-io#2612) With valkey-io#2604 merged, the `Node valkey-io#10 should eventually replicate node valkey-io#5` started passing successfully with valgrind, but I guess we are seeing a new daily failure from a `New Master down consecutively` test that runs shortly after. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
rjd15372
pushed a commit
that referenced
this pull request
Sep 23, 2025
## Summary - extend replication wait time in `slave-selection` test ``` *** [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 ``` ## Testing - `./runtest --single unit/cluster/slave-selection` - `./runtest --single unit/cluster/slave-selection --valgrind` Signed-off-by: Vitali Arbuzov <Vitali.Arbuzov@proton.me> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com>
ranshid
pushed a commit
to ranshid/valkey
that referenced
this pull request
Sep 30, 2025
…y-io#2257) **Current state** During `hashtableScanDefrag`, rehashing is paused to prevent entries from moving, but the scan callback can still delete entries which triggers `hashtableShrinkIfNeeded`. For example, the `expireScanCallback` can delete expired entries. **Issue** This can cause the table to be resized and the old memory to be freed while the scan is still accessing it, resulting in the following memory access violation: ``` [err]: Sanitizer error: ================================================================= ==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0 READ of size 1 at 0x611000003100 thread T0 #0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422 #1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768 #2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 #3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 #4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 #5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 #6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 #7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 #8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) valkey-io#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39) 0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200) freed by thread T0 here: #0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5) #1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400 #2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415 #3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456 #4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656 #5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272 #6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448 #7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459 #8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847 valkey-io#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490 valkey-io#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831 valkey-io#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844 valkey-io#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70 valkey-io#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139 valkey-io#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770 valkey-io#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 valkey-io#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 valkey-io#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 valkey-io#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 valkey-io#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 valkey-io#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 valkey-io#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) previously allocated by thread T0 here: #0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753) #1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214 #2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257 #3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645 #4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272 #5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448 #6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459 #7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847 #8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490 valkey-io#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831 valkey-io#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844 valkey-io#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70 valkey-io#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139 valkey-io#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770 valkey-io#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 valkey-io#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 valkey-io#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 valkey-io#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 valkey-io#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 valkey-io#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 valkey-io#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled Shadow bytes around the buggy address: 0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa =>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==46774==ABORTING ``` **Solution** Suggested solution is to also pause auto shrinking during `hashtableScanDefrag`. I noticed that there was already a `hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that. **Testing** I created a simple tcl test that (most of the times) triggers this error, but it's a little clunky so I didn't add it as part of the PR: ``` start_server {tags {"expire hashtable defrag"}} { test {hashtable scan defrag on expiry} { r config set hz 100 set num_keys 20 for {set i 0} {$i < $num_keys} {incr i} { r set "key_$i" "value_$i" } for {set j 0} {$j < 50} {incr j} { set expire_keys 100 for {set i 0} {$i < $expire_keys} {incr i} { # Short expiry time to ensure they expire quickly r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}" } # Verify keys are set set initial_size [r dbsize] assert_equal $initial_size [expr $num_keys + $expire_keys] after 150 for {set i 0} {$i < 10} {incr i} { r get "expire_key_${i}_${j}" after 10 } } set remaining_keys [r dbsize] assert_equal $remaining_keys $num_keys # Verify server is still responsive assert_equal [r ping] {PONG} } {} } ``` Compiling with ASAN using `make noopt SANITIZER=address valkey-server` and running the test causes error above. Applying the fix resolves the issue. Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Oct 1, 2025
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.
**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:
```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
#0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
#1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
#2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)
0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
#0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
#2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
#3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
#4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
#5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
previously allocated by thread T0 here:
#0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
#2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
#3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
#4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==46774==ABORTING
```
**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.
**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:
```
start_server {tags {"expire hashtable defrag"}} {
test {hashtable scan defrag on expiry} {
r config set hz 100
set num_keys 20
for {set i 0} {$i < $num_keys} {incr i} {
r set "key_$i" "value_$i"
}
for {set j 0} {$j < 50} {incr j} {
set expire_keys 100
for {set i 0} {$i < $expire_keys} {incr i} {
# Short expiry time to ensure they expire quickly
r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
}
# Verify keys are set
set initial_size [r dbsize]
assert_equal $initial_size [expr $num_keys + $expire_keys]
after 150
for {set i 0} {$i < 10} {incr i} {
r get "expire_key_${i}_${j}"
after 10
}
}
set remaining_keys [r dbsize]
assert_equal $remaining_keys $num_keys
# Verify server is still responsive
assert_equal [r ping] {PONG}
} {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.
Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
hpatro
added a commit
to hpatro/valkey
that referenced
this pull request
Oct 3, 2025
## Summary - extend replication wait time in `slave-selection` test ``` *** [err]: Node valkey-io#10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl valkey-io#10 didn't became slave of #5 ``` ## Testing - `./runtest --single unit/cluster/slave-selection` - `./runtest --single unit/cluster/slave-selection --valgrind` Signed-off-by: Vitali Arbuzov <Vitali.Arbuzov@proton.me> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Harkrishn Patro <bunty.hari@gmail.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Oct 3, 2025
…alkey-io#2612) With valkey-io#2604 merged, the `Node valkey-io#10 should eventually replicate node #5` started passing successfully with valgrind, but I guess we are seeing a new daily failure from a `New Master down consecutively` test that runs shortly after. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro
added a commit
that referenced
this pull request
Oct 8, 2025
With #1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Oct 9, 2025
) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Oct 13, 2025
) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 Backported to the 9.0 branch in #2731. Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
roshkhatri
referenced
this pull request
in roshkhatri/valkey
Oct 14, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
roshkhatri
referenced
this pull request
in roshkhatri/valkey
Oct 14, 2025
…lkey-io#2672) We have relaxed the `cluster-ping-interval` and `cluster-node-timeout` so that cluster has enough time to stabilize and propagate changes. Fixes this test occasional failure when running with valgrind: [err]: Node #10 should eventually replicate node #5 in tests/unit/cluster/slave-selection.tcl #10 didn't became slave of #5 Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 16, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb)
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 17, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 19, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 21, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
madolson
pushed a commit
that referenced
this pull request
Oct 21, 2025
With #1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
nitaicaro
pushed a commit
to nitaicaro/valkey
that referenced
this pull request
Jan 21, 2026
forkless save
enjoy-binbin
pushed a commit
that referenced
this pull request
Feb 8, 2026
) I was working on ASAN large memory tests when I countered this issue. The issue was that the hardcoded `999` key could land in an early bucket. Then shrink rehash could finish early, and later inserts could trigger a new expansion rehash, resetting rehash_idx low. The test now picks the survivor key dynamically as the key mapped to the highest bucket index. ``` [test_hashtable.c] Memory leak detected of 336 bytes ================================================================= ==3901==ERROR: LeakSanitizer: detected memory leaks Direct leak of 80 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x563bfdf4c47d in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:156 #2 0x563bfdf4c47d in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:185 #3 0x563bfdd42eaf in hashtableCreate /home/runner/work/valkey/valkey/src/hashtable.c:1217 #4 0x563bfdaa1cbf in test_empty_buckets_rehashing unit/test_hashtable.c:232 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 #6 0x563bfda86b20 in main unit/test_main.c:108 #7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1446 #5 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1433 #6 0x563bfdd45eb1 in insert /home/runner/work/valkey/valkey/src/hashtable.c:1041 #7 0x563bfdd45eb1 in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 #8 0x563bfdd45eb1 in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 #9 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 #10 0x563bfdae772b in runTestSuite unit/test_main.c:36 #11 0x563bfda86b20 in main unit/test_main.c:108 #12 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #13 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #14 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd3f553 in bucketConvertToChained /home/runner/work/valkey/valkey/src/hashtable.c:908 #4 0x563bfdd3f553 in findBucketForInsert /home/runner/work/valkey/valkey/src/hashtable.c:1021 #5 0x563bfdd45d9e in insert /home/runner/work/valkey/valkey/src/hashtable.c:1045 #6 0x563bfdd45d9e in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 #7 0x563bfdd45d9e in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 #8 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 #9 0x563bfdae772b in runTestSuite unit/test_main.c:36 #10 0x563bfda86b20 in main unit/test_main.c:108 #11 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #12 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #13 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdaa1df8 in test_empty_buckets_rehashing unit/test_hashtable.c:248 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 #6 0x563bfda86b20 in main unit/test_main.c:108 #7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) SUMMARY: AddressSanitizer: 336 byte(s) leaked in 4 allocation(s). ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
harrylin98
pushed a commit
to harrylin98/valkey_forked
that referenced
this pull request
Feb 19, 2026
…lkey-io#3174) I was working on ASAN large memory tests when I countered this issue. The issue was that the hardcoded `999` key could land in an early bucket. Then shrink rehash could finish early, and later inserts could trigger a new expansion rehash, resetting rehash_idx low. The test now picks the survivor key dynamically as the key mapped to the highest bucket index. ``` [test_hashtable.c] Memory leak detected of 336 bytes ================================================================= ==3901==ERROR: LeakSanitizer: detected memory leaks Direct leak of 80 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x563bfdf4c47d in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:156 #2 0x563bfdf4c47d in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:185 #3 0x563bfdd42eaf in hashtableCreate /home/runner/work/valkey/valkey/src/hashtable.c:1217 valkey-io#4 0x563bfdaa1cbf in test_empty_buckets_rehashing unit/test_hashtable.c:232 valkey-io#5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 valkey-io#4 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1446 valkey-io#5 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1433 valkey-io#6 0x563bfdd45eb1 in insert /home/runner/work/valkey/valkey/src/hashtable.c:1041 valkey-io#7 0x563bfdd45eb1 in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#8 0x563bfdd45eb1 in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#9 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#10 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#11 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#12 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#14 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd3f553 in bucketConvertToChained /home/runner/work/valkey/valkey/src/hashtable.c:908 valkey-io#4 0x563bfdd3f553 in findBucketForInsert /home/runner/work/valkey/valkey/src/hashtable.c:1021 valkey-io#5 0x563bfdd45d9e in insert /home/runner/work/valkey/valkey/src/hashtable.c:1045 valkey-io#6 0x563bfdd45d9e in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#7 0x563bfdd45d9e in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#8 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#9 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#10 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#11 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#12 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 valkey-io#4 0x563bfdaa1df8 in test_empty_buckets_rehashing unit/test_hashtable.c:248 valkey-io#5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) SUMMARY: AddressSanitizer: 336 byte(s) leaked in 4 allocation(s). ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
hpatro
pushed a commit
to hpatro/valkey
that referenced
this pull request
Mar 5, 2026
…lkey-io#3174) I was working on ASAN large memory tests when I countered this issue. The issue was that the hardcoded `999` key could land in an early bucket. Then shrink rehash could finish early, and later inserts could trigger a new expansion rehash, resetting rehash_idx low. The test now picks the survivor key dynamically as the key mapped to the highest bucket index. ``` [test_hashtable.c] Memory leak detected of 336 bytes ================================================================= ==3901==ERROR: LeakSanitizer: detected memory leaks Direct leak of 80 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x563bfdf4c47d in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:156 #2 0x563bfdf4c47d in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:185 #3 0x563bfdd42eaf in hashtableCreate /home/runner/work/valkey/valkey/src/hashtable.c:1217 #4 0x563bfdaa1cbf in test_empty_buckets_rehashing unit/test_hashtable.c:232 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1446 #5 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1433 valkey-io#6 0x563bfdd45eb1 in insert /home/runner/work/valkey/valkey/src/hashtable.c:1041 valkey-io#7 0x563bfdd45eb1 in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#8 0x563bfdd45eb1 in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#9 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#10 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#11 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#12 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#14 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd3f553 in bucketConvertToChained /home/runner/work/valkey/valkey/src/hashtable.c:908 #4 0x563bfdd3f553 in findBucketForInsert /home/runner/work/valkey/valkey/src/hashtable.c:1021 #5 0x563bfdd45d9e in insert /home/runner/work/valkey/valkey/src/hashtable.c:1045 valkey-io#6 0x563bfdd45d9e in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#7 0x563bfdd45d9e in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#8 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#9 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#10 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#11 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#12 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdaa1df8 in test_empty_buckets_rehashing unit/test_hashtable.c:248 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) SUMMARY: AddressSanitizer: 336 byte(s) leaked in 4 allocation(s). ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
ikolomi
referenced
this pull request
in ikolomi/valkey
May 28, 2026
Drops the `incompressibleKeys{}` side hashtable design from S2.3 and
the `compression-retry-interval` config knob; refines the existing
drift signal in R2.3.5 to fold rejected attempts into
`compression_live_ratio_10m`. No new config knob.
Why this matters
----------------
Under a fixed dict and a fixed workload, the probability that any
given write at key K produces compressible bytes is constant — time-
based throttling between retries doesn't shift this probability, only
spaces attempts. CPU bounding is already provided at the sweep level
(`compression-sweep-max-cpu-pct`); per-key throttling would be
redundant and would add correctness burden (every mutating code path
— `dbOverwrite`, APPEND, SETRANGE, BITOP, BITFIELD, module DMA writes
— would need a `Clear` integration point to avoid suppressing
compression of legitimately-changed values).
The right trigger for "the rejection might now compress" is **dict
change**, which is handled at the system level. Updated definition:
`compression_live_ratio_10m` is now the rolling ratio over **all
eligible-bytes activity** in the last 10 min — each successful
compression contributes its actual ratio, each rejection contributes
ratio = 1.0 (we end up storing the value uncompressed). Both forms
of degradation feed the metric uniformly:
- workload-content drift among compressible values, and
- workload composition drift (more values failing the net-savings
guard).
A single drift trigger threshold catches both scenarios. Operators
reason about ONE metric; the design has ONE config knob
(`compression-dict-drift-ratio`) for the threshold; one signal feeds
retraining.
This is a deliberate simplification over an earlier design (PR #10)
which proposed a per-key `incompressibleKeys` side hashtable with
`compression-retry-interval` time fallback. That approach added a
module + a config knob + integration points in every mutating code
path, but the underlying assumption (waiting between retries
improves the per-attempt hit rate) is incorrect for fixed-distribution
workloads under a fixed dict. The simpler model — retry on every
sweep tick, let the drift signal handle systemic dict-fit issues — is
correct and operationally cleaner.
Drift comparison direction fixed
--------------------------------
R2.3.5's existing trigger had the comparison backwards. The ratio
convention is `compressed/uncompressed` (lower = more savings, per
R2.10.1). The previous formulation —
fires when live_ratio < drift_ratio × post_training_ratio
— fires when live is BETTER (lower) than 70% of post-training, i.e.,
when compression IMPROVES over post-training. That's not drift.
Corrected to:
fires when live_ratio > post_training_ratio / drift_ratio
With drift_ratio = 0.7 (70%): fires when live exceeds post-training
by a factor of 1/0.7 ≈ 1.43 — i.e., compression efficiency has
degraded by about 43%. Default value preserved; only the comparison
direction changed.
Code changes
------------
- src/compression.c (`compressionIsEligible`):
- Drop branch 6 (the S2.2-stub "always retry-eligible" comment +
the S2.3-anticipated lookup that never landed).
- Drop the `key` parameter from the signature — no longer needed
now that no branch consumes it. Trivial to re-add when a future
branch (e.g., S2.7 write-path hook) wants the key for logging.
- Update doc comment to reflect the simpler predicate.
- src/compression.h: matching signature change.
- src/server.h: drop `compression_retry_interval` field.
- src/config.c: drop `compression-retry-interval` registration.
Advanced knobs count goes 10 → 9.
- src/unit/test_compression_eligibility.cpp: drop `key` parameter
from all test calls (sed s/(o, NULL)/(o)/g).
- tests/unit/type/compression.tcl: drop the
`compression-retry-interval` config-default assertion; update
comment from "Advanced (10)" to "Advanced (9)".
Doc changes
-----------
- detailed-design.md §2.2:
- Predicate: drop `retry_eligible(obj)` line.
- Rationale paragraph rewritten to use the probability argument:
time-throttling doesn't change per-attempt success probability,
only spaces attempts; sweep pacing already bounds CPU. Per-key
tracking is functionless and adds integration burden.
- detailed-design.md §2.3 R2.3.5 drift triggers:
- Single trigger formulation, comparison direction fixed
(< → >). Definition of `compression_live_ratio_10m` made
explicit: includes rejected attempts as ratio = 1.0. Two forms
of drift (content-change among successes; composition shift to
rejections) feed uniformly.
- detailed-design.md §2.12 advanced config table:
- `compression-retry-interval` removed (10 → 9).
- `compression-dict-drift-ratio` description updated to reflect
corrected formula.
- detailed-design.md §6.6 Net-savings guard failure: rewritten with
the probability-argument rationale.
- detailed-design.md §7.1 transparency-mode harness config: drop
`--compression-retry-interval 0` line.
- idea-honing.md Q6 baseline filter "Skip post-compression" bullet:
matching rewrite (probability argument).
- idea-honing.md Q6 consolidated predicate: drop `retry_eligible(obj)`
line and the `where retry_eligible(obj) is...` block.
- idea-honing.md Q6 config table: drop `compression-retry-interval`
row.
- idea-honing.md Q6b answer: matching rewrite with reference to the
S2.3 implementation review (PR #10 design discussion) so the trace
is preserved.
- idea-honing.md §7.1 harness: drop the same flag.
- summary.md: update eligibility row + walkthrough-highlights bullet
to note the post-walkthrough refinement (drop the
`compression-dict-rejection-drift-rate` mention from earlier draft;
no new knob, just refined existing live_ratio definition).
- plan.md S2 row description: drop "incompressible-keys hashtable"
from S2's scope summary.
- plan.md S2.2: drop the trailing "incompressible-keys hashtable
check" claim.
- plan.md S2.3: marked "(REMOVED)" with explanation; struck-through.
- plan.md S1.4: extended description to include the rolling-ratio
semantics extension (rejection contributes ratio = 1.0) — the
drift-detection work absorbed from removed S2.3.
Audit-trail files (DESIGN_TODO.md, pr-feedback.json) intentionally
unchanged — Thread #20's resolution stands as written; only the
implementation interpretation in the live design docs is refined.
Branch ikolomi/s2-incompressible (the abandoned S2.3 implementation)
should be closed/deleted on GitHub.
Verified locally
----------------
- `make -j` builds clean.
- `./runtest --single unit/type/compression` 10/10 passes.
Not verified locally (CI will validate):
- gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).
ikolomi
referenced
this pull request
in ikolomi/valkey
May 28, 2026
Drops the `incompressibleKeys{}` side hashtable design from S2.3 and
the `compression-retry-interval` config knob; refines the existing
drift signal in R2.3.5 to fold rejected attempts into
`compression_live_ratio_10m`. No new config knob.
Why this matters
----------------
Under a fixed dict and a fixed workload, the probability that any
given write at key K produces compressible bytes is constant — time-
based throttling between retries doesn't shift this probability, only
spaces attempts. CPU bounding is already provided at the sweep level
(`compression-sweep-max-cpu-pct`); per-key throttling would be
redundant and would add correctness burden (every mutating code path
— `dbOverwrite`, APPEND, SETRANGE, BITOP, BITFIELD, module DMA writes
— would need a `Clear` integration point to avoid suppressing
compression of legitimately-changed values).
The right trigger for "the rejection might now compress" is **dict
change**, which is handled at the system level. Updated definition:
`compression_live_ratio_10m` is now the rolling ratio over **all
eligible-bytes activity** in the last 10 min — each successful
compression contributes its actual ratio, each rejection contributes
ratio = 1.0 (we end up storing the value uncompressed). Both forms
of degradation feed the metric uniformly:
- workload-content drift among compressible values, and
- workload composition drift (more values failing the net-savings
guard).
A single drift trigger threshold catches both scenarios. Operators
reason about ONE metric; the design has ONE config knob
(`compression-dict-drift-ratio`) for the threshold; one signal feeds
retraining.
This is a deliberate simplification over an earlier design (PR #10)
which proposed a per-key `incompressibleKeys` side hashtable with
`compression-retry-interval` time fallback. That approach added a
module + a config knob + integration points in every mutating code
path, but the underlying assumption (waiting between retries
improves the per-attempt hit rate) is incorrect for fixed-distribution
workloads under a fixed dict. The simpler model — retry on every
sweep tick, let the drift signal handle systemic dict-fit issues — is
correct and operationally cleaner.
Drift comparison direction fixed
--------------------------------
R2.3.5's existing trigger had the comparison backwards. The ratio
convention is `compressed/uncompressed` (lower = more savings, per
R2.10.1). The previous formulation —
fires when live_ratio < drift_ratio × post_training_ratio
— fires when live is BETTER (lower) than 70% of post-training, i.e.,
when compression IMPROVES over post-training. That's not drift.
Corrected to:
fires when live_ratio > post_training_ratio / drift_ratio
With drift_ratio = 0.7 (70%): fires when live exceeds post-training
by a factor of 1/0.7 ≈ 1.43 — i.e., compression efficiency has
degraded by about 43%. Default value preserved; only the comparison
direction changed.
Code changes
------------
- src/compression.c (`compressionIsEligible`):
- Drop branch 6 (the S2.2-stub "always retry-eligible" comment +
the S2.3-anticipated lookup that never landed).
- Drop the `key` parameter from the signature — no longer needed
now that no branch consumes it. Trivial to re-add when a future
branch (e.g., S2.7 write-path hook) wants the key for logging.
- Update doc comment to reflect the simpler predicate.
- src/compression.h: matching signature change.
- src/server.h: drop `compression_retry_interval` field.
- src/config.c: drop `compression-retry-interval` registration.
Advanced knobs count goes 10 → 9.
- src/unit/test_compression_eligibility.cpp: drop `key` parameter
from all test calls (sed s/(o, NULL)/(o)/g).
- tests/unit/type/compression.tcl: drop the
`compression-retry-interval` config-default assertion; update
comment from "Advanced (10)" to "Advanced (9)".
Doc changes
-----------
- detailed-design.md §2.2:
- Predicate: drop `retry_eligible(obj)` line.
- Rationale paragraph rewritten to use the probability argument:
time-throttling doesn't change per-attempt success probability,
only spaces attempts; sweep pacing already bounds CPU. Per-key
tracking is functionless and adds integration burden.
- detailed-design.md §2.3 R2.3.5 drift triggers:
- Single trigger formulation, comparison direction fixed
(< → >). Definition of `compression_live_ratio_10m` made
explicit: includes rejected attempts as ratio = 1.0. Two forms
of drift (content-change among successes; composition shift to
rejections) feed uniformly.
- detailed-design.md §2.12 advanced config table:
- `compression-retry-interval` removed (10 → 9).
- `compression-dict-drift-ratio` description updated to reflect
corrected formula.
- detailed-design.md §6.6 Net-savings guard failure: rewritten with
the probability-argument rationale.
- detailed-design.md §7.1 transparency-mode harness config: drop
`--compression-retry-interval 0` line.
- idea-honing.md Q6 baseline filter "Skip post-compression" bullet:
matching rewrite (probability argument).
- idea-honing.md Q6 consolidated predicate: drop `retry_eligible(obj)`
line and the `where retry_eligible(obj) is...` block.
- idea-honing.md Q6 config table: drop `compression-retry-interval`
row.
- idea-honing.md Q6b answer: matching rewrite with reference to the
S2.3 implementation review (PR #10 design discussion) so the trace
is preserved.
- idea-honing.md §7.1 harness: drop the same flag.
- summary.md: update eligibility row + walkthrough-highlights bullet
to note the post-walkthrough refinement (drop the
`compression-dict-rejection-drift-rate` mention from earlier draft;
no new knob, just refined existing live_ratio definition).
- plan.md S2 row description: drop "incompressible-keys hashtable"
from S2's scope summary.
- plan.md S2.2: drop the trailing "incompressible-keys hashtable
check" claim.
- plan.md S2.3: marked "(REMOVED)" with explanation; struck-through.
- plan.md S1.4: extended description to include the rolling-ratio
semantics extension (rejection contributes ratio = 1.0) — the
drift-detection work absorbed from removed S2.3.
Audit-trail files (DESIGN_TODO.md, pr-feedback.json) intentionally
unchanged — Thread #20's resolution stands as written; only the
implementation interpretation in the live design docs is refined.
Branch ikolomi/s2-incompressible (the abandoned S2.3 implementation)
should be closed/deleted on GitHub.
Verified locally
----------------
- `make -j` builds clean.
- `./runtest --single unit/type/compression` 10/10 passes.
Not verified locally (CI will validate):
- gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).
ikolomi
referenced
this pull request
in ikolomi/valkey
May 28, 2026
Drops the `incompressibleKeys{}` side hashtable design from S2.3 and
the `compression-retry-interval` config knob; refines the existing
drift signal in R2.3.5 to include rejected attempts in the
`compression_live_ratio_10m` rolling average. No new config knob.
Why this matters
----------------
Under a fixed dict and a fixed workload, the probability that any
given write at key K produces compressible bytes is constant — time-
based throttling between retries doesn't shift this probability, only
spaces attempts. CPU bounding is already provided at the sweep level
(`compression-sweep-max-cpu-pct`); per-key throttling would be
redundant and would add correctness burden (every mutating code path
— `dbOverwrite`, APPEND, SETRANGE, BITOP, BITFIELD, module DMA writes
— would need a `Clear` integration point to avoid suppressing
compression of legitimately-changed values).
The right trigger for "the rejection might now compress" is **dict
change**, which is handled at the system level. Updated definition:
`compression_live_ratio_10m` is now the rolling ratio over compression
*attempts* (successful + rejected) in the 10 min window, weighted by
uncompressed bytes. Each successful compression contributes its actual
`compressed/uncompressed` ratio. Each rejection contributes its actual
measured ratio (typically in `[0.9, 1.05]` since the net-savings guard
rejected it for being too close to 1.0). Worker errors are excluded —
they're tracked separately via `compression_errors_total`.
Both forms of degradation feed the metric uniformly:
- workload-content drift among compressible values, and
- workload composition drift (more values compressing poorly enough
to fail the net-savings guard).
A single drift trigger threshold catches both scenarios. Operators
reason about ONE metric; the design has ONE config knob
(`compression-dict-drift-ratio`) for the threshold.
This is a deliberate simplification over an earlier design (PR #10)
which proposed a per-key `incompressibleKeys` side hashtable with
`compression-retry-interval` time fallback. That approach added a
module + a config knob + integration points in every mutating code
path, but the underlying assumption (waiting between retries
improves the per-attempt hit rate) is incorrect for fixed-distribution
workloads under a fixed dict. The simpler model — retry on every
sweep tick, let the drift signal handle systemic dict-fit issues — is
correct and operationally cleaner.
Drift comparison direction fixed
--------------------------------
R2.3.5's existing trigger had the comparison backwards. The ratio
convention is `compressed/uncompressed` (lower = more savings, per
R2.10.1). The previous formulation —
fires when live_ratio < drift_ratio × post_training_ratio
— fires when live is BETTER (lower) than 70% of post-training, i.e.,
when compression IMPROVES over post-training. That's not drift.
Corrected to:
fires when live_ratio > post_training_ratio / drift_ratio
With drift_ratio = 0.7 (70%): fires when live exceeds post-training
by a factor of 1/0.7 ≈ 1.43 — i.e., compression efficiency has
degraded by about 43%. Default value preserved; only the comparison
direction changed.
Why "real measured ratio" not "1.0" for rejections
--------------------------------------------------
An earlier draft of this commit substituted `ratio = 1.0` for every
rejected attempt, on the theory that we store such values uncompressed
so the effective storage savings is zero. That was reviewed and
revised: the metric's purpose is **drift detection** (workload-
compressibility under this dict), not effective-storage tracking. A
rejected value at 0.91 (just above the 90% net-savings threshold) and
a truly incompressible value at 1.05 carry different information about
dict-workload fit. Substituting 1.0 for both throws away that
distinction; using the actual measured ratio preserves it. The data
is right there — the worker reports the ratio when it returns, we
just record what was measured.
Code changes
------------
- src/compression.c (`compressionIsEligible`):
- Drop branch 6 (the S2.2-stub "always retry-eligible" comment +
the S2.3-anticipated lookup that never landed).
- Drop the `key` parameter from the signature — no longer needed
now that no branch consumes it. Trivial to re-add when a future
branch (e.g., S2.7 write-path hook) wants the key for logging.
- Update doc comment to reflect the simpler predicate.
- src/compression.h: matching signature change.
- src/server.h: drop `compression_retry_interval` field.
- src/config.c: drop `compression-retry-interval` registration.
Advanced knobs count goes 10 → 9.
- src/unit/test_compression_eligibility.cpp: drop `key` parameter
from all test calls.
- tests/unit/type/compression.tcl: drop the
`compression-retry-interval` config-default assertion; update
comment from "Advanced (10)" to "Advanced (9)".
Doc changes
-----------
- detailed-design.md §2.2:
- Predicate: drop `retry_eligible(obj)` line.
- Rationale paragraph rewritten to use the probability argument:
time-throttling doesn't change per-attempt success probability,
only spaces attempts; sweep pacing already bounds CPU. Per-key
tracking is functionless and adds integration burden.
- detailed-design.md §2.3 R2.3.5 drift triggers:
- Single trigger formulation, comparison direction fixed
(< → >). Definition of `compression_live_ratio_10m` made
explicit: rolling ratio over attempts (successful + rejected)
using actual measured ratios; worker errors excluded.
- detailed-design.md §2.12 advanced config table:
- `compression-retry-interval` removed (10 → 9).
- `compression-dict-drift-ratio` description updated to reflect
corrected formula.
- detailed-design.md §6.6 Net-savings guard failure: rewritten with
the probability-argument rationale and real-ratio framing.
- detailed-design.md §7.1 transparency-mode harness config: drop
`--compression-retry-interval 0` line.
- idea-honing.md Q6 baseline filter "Skip post-compression" bullet:
matching rewrite (probability argument, real-ratio framing).
- idea-honing.md Q6 consolidated predicate: drop `retry_eligible(obj)`
line and the `where retry_eligible(obj) is...` block.
- idea-honing.md Q6 config table: drop `compression-retry-interval`
row.
- idea-honing.md Q6b answer: matching rewrite with reference to the
S2.3 implementation review (PR #10 design discussion) so the trace
is preserved.
- idea-honing.md §7.1 harness: drop the same flag.
- summary.md: update eligibility row + walkthrough-highlights bullet
to note the post-walkthrough refinement.
- plan.md S2 row description: drop "incompressible-keys hashtable"
from S2's scope summary.
- plan.md S2.2: drop the trailing "incompressible-keys hashtable
check" claim.
- plan.md S2.3: marked "(REMOVED)" with explanation; struck-through.
- plan.md S1.4: extended description to include the rolling-ratio
semantics extension (rejections contribute their actual measured
ratio) — the drift-detection work absorbed from removed S2.3.
Audit-trail files (DESIGN_TODO.md, pr-feedback.json) intentionally
unchanged — Thread #20's resolution stands as written; only the
implementation interpretation in the live design docs is refined.
Branch ikolomi/s2-incompressible (the abandoned S2.3 implementation)
should be closed/deleted on GitHub.
Verified locally
----------------
- `make -j` builds clean.
- `./runtest --single unit/type/compression` 10/10 passes.
Not verified locally (CI will validate):
- gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).
GilboaAWS
pushed a commit
to GilboaAWS/valkey
that referenced
this pull request
Jun 24, 2026
…alkey-io#10) Drops the `incompressibleKeys{}` side hashtable design from S2.3 and the `compression-retry-interval` config knob; refines the existing drift signal in R2.3.5 to include rejected attempts in the `compression_live_ratio_10m` rolling average. No new config knob. Why this matters ---------------- Under a fixed dict and a fixed workload, the probability that any given write at key K produces compressible bytes is constant — time- based throttling between retries doesn't shift this probability, only spaces attempts. CPU bounding is already provided at the sweep level (`compression-sweep-max-cpu-pct`); per-key throttling would be redundant and would add correctness burden (every mutating code path — `dbOverwrite`, APPEND, SETRANGE, BITOP, BITFIELD, module DMA writes — would need a `Clear` integration point to avoid suppressing compression of legitimately-changed values). The right trigger for "the rejection might now compress" is **dict change**, which is handled at the system level. Updated definition: `compression_live_ratio_10m` is now the rolling ratio over compression *attempts* (successful + rejected) in the 10 min window, weighted by uncompressed bytes. Each successful compression contributes its actual `compressed/uncompressed` ratio. Each rejection contributes its actual measured ratio (typically in `[0.9, 1.05]` since the net-savings guard rejected it for being too close to 1.0). Worker errors are excluded — they're tracked separately via `compression_errors_total`. Both forms of degradation feed the metric uniformly: - workload-content drift among compressible values, and - workload composition drift (more values compressing poorly enough to fail the net-savings guard). A single drift trigger threshold catches both scenarios. Operators reason about ONE metric; the design has ONE config knob (`compression-dict-drift-ratio`) for the threshold. This is a deliberate simplification over an earlier design (PR valkey-io#10) which proposed a per-key `incompressibleKeys` side hashtable with `compression-retry-interval` time fallback. That approach added a module + a config knob + integration points in every mutating code path, but the underlying assumption (waiting between retries improves the per-attempt hit rate) is incorrect for fixed-distribution workloads under a fixed dict. The simpler model — retry on every sweep tick, let the drift signal handle systemic dict-fit issues — is correct and operationally cleaner. Drift comparison direction fixed -------------------------------- R2.3.5's existing trigger had the comparison backwards. The ratio convention is `compressed/uncompressed` (lower = more savings, per R2.10.1). The previous formulation — fires when live_ratio < drift_ratio × post_training_ratio — fires when live is BETTER (lower) than 70% of post-training, i.e., when compression IMPROVES over post-training. That's not drift. Corrected to: fires when live_ratio > post_training_ratio / drift_ratio With drift_ratio = 0.7 (70%): fires when live exceeds post-training by a factor of 1/0.7 ≈ 1.43 — i.e., compression efficiency has degraded by about 43%. Default value preserved; only the comparison direction changed. Why "real measured ratio" not "1.0" for rejections -------------------------------------------------- An earlier draft of this commit substituted `ratio = 1.0` for every rejected attempt, on the theory that we store such values uncompressed so the effective storage savings is zero. That was reviewed and revised: the metric's purpose is **drift detection** (workload- compressibility under this dict), not effective-storage tracking. A rejected value at 0.91 (just above the 90% net-savings threshold) and a truly incompressible value at 1.05 carry different information about dict-workload fit. Substituting 1.0 for both throws away that distinction; using the actual measured ratio preserves it. The data is right there — the worker reports the ratio when it returns, we just record what was measured. Code changes ------------ - src/compression.c (`compressionIsEligible`): - Drop branch 6 (the S2.2-stub "always retry-eligible" comment + the S2.3-anticipated lookup that never landed). - Drop the `key` parameter from the signature — no longer needed now that no branch consumes it. Trivial to re-add when a future branch (e.g., S2.7 write-path hook) wants the key for logging. - Update doc comment to reflect the simpler predicate. - src/compression.h: matching signature change. - src/server.h: drop `compression_retry_interval` field. - src/config.c: drop `compression-retry-interval` registration. Advanced knobs count goes 10 → 9. - src/unit/test_compression_eligibility.cpp: drop `key` parameter from all test calls. - tests/unit/type/compression.tcl: drop the `compression-retry-interval` config-default assertion; update comment from "Advanced (10)" to "Advanced (9)". Doc changes ----------- - detailed-design.md §2.2: - Predicate: drop `retry_eligible(obj)` line. - Rationale paragraph rewritten to use the probability argument: time-throttling doesn't change per-attempt success probability, only spaces attempts; sweep pacing already bounds CPU. Per-key tracking is functionless and adds integration burden. - detailed-design.md §2.3 R2.3.5 drift triggers: - Single trigger formulation, comparison direction fixed (< → >). Definition of `compression_live_ratio_10m` made explicit: rolling ratio over attempts (successful + rejected) using actual measured ratios; worker errors excluded. - detailed-design.md §2.12 advanced config table: - `compression-retry-interval` removed (10 → 9). - `compression-dict-drift-ratio` description updated to reflect corrected formula. - detailed-design.md §6.6 Net-savings guard failure: rewritten with the probability-argument rationale and real-ratio framing. - detailed-design.md §7.1 transparency-mode harness config: drop `--compression-retry-interval 0` line. - idea-honing.md Q6 baseline filter "Skip post-compression" bullet: matching rewrite (probability argument, real-ratio framing). - idea-honing.md Q6 consolidated predicate: drop `retry_eligible(obj)` line and the `where retry_eligible(obj) is...` block. - idea-honing.md Q6 config table: drop `compression-retry-interval` row. - idea-honing.md Q6b answer: matching rewrite with reference to the S2.3 implementation review (PR valkey-io#10 design discussion) so the trace is preserved. - idea-honing.md §7.1 harness: drop the same flag. - summary.md: update eligibility row + walkthrough-highlights bullet to note the post-walkthrough refinement. - plan.md S2 row description: drop "incompressible-keys hashtable" from S2's scope summary. - plan.md S2.2: drop the trailing "incompressible-keys hashtable check" claim. - plan.md S2.3: marked "(REMOVED)" with explanation; struck-through. - plan.md S1.4: extended description to include the rolling-ratio semantics extension (rejections contribute their actual measured ratio) — the drift-detection work absorbed from removed S2.3. Audit-trail files (DESIGN_TODO.md, pr-feedback.json) intentionally unchanged — Thread valkey-io#20's resolution stands as written; only the implementation interpretation in the live design docs is refined. Branch ikolomi/s2-incompressible (the abandoned S2.3 implementation) should be closed/deleted on GitHub. Verified locally ---------------- - `make -j` builds clean. - `./runtest --single unit/type/compression` 10/10 passes. Not verified locally (CI will validate): - gtest unit tests on Linux/macOS/32-bit (no libgtest-dev locally).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds discord link for discussions.
Discord Server: https://discord.gg/qqHWHXh3u9