Conversation
naglera
pushed a commit
to naglera/placeholderkv
that referenced
this pull request
Apr 8, 2024
…is missed cases to redis-server. (#12322)
Observed that the sanitizer reported memory leak as clean up is not done
before the process termination in negative/following cases:
**- when we passed '--invalid' as option to redis-server.**
```
-vm:~/mem-leak-issue/redis$ ./src/redis-server --invalid
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'invalid'
Bad directive or wrong number of arguments
=================================================================
==865778==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x7f0985f65867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
valkey-io#1 0x558ec86686ec in ztrymalloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:117
valkey-io#2 0x558ec86686ec in ztrymalloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:135
valkey-io#3 0x558ec86686ec in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:276
valkey-io#4 0x558ec86686ec in zrealloc /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:327
valkey-io#5 0x558ec865dd7e in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1172
valkey-io#6 0x558ec87a1be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
valkey-io#7 0x558ec87a13b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
valkey-io#8 0x558ec85e6f15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
valkey-io#9 0x7f09856e5d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).
```
**- when we pass '--port' as option and missed to add port number to redis-server.**
```
vm:~/mem-leak-issue/redis$ ./src/redis-server --port
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'port'
wrong number of arguments
=================================================================
==865846==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x7fdcdbb1f867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
valkey-io#1 0x557e8b04f6ec in ztrymalloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:117
valkey-io#2 0x557e8b04f6ec in ztrymalloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:135
valkey-io#3 0x557e8b04f6ec in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:276
valkey-io#4 0x557e8b04f6ec in zrealloc /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:327
valkey-io#5 0x557e8b044d7e in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1172
valkey-io#6 0x557e8b188be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
valkey-io#7 0x557e8b1883b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
valkey-io#8 0x557e8afcdf15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
valkey-io#9 0x7fdcdb29fd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Indirect leak of 10 byte(s) in 1 object(s) allocated from:
#0 0x7fdcdbb1fc18 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
valkey-io#1 0x557e8b04f9aa in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:287
valkey-io#2 0x557e8b04f9aa in ztryrealloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:317
valkey-io#3 0x557e8b04f9aa in zrealloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:342
valkey-io#4 0x557e8b033f90 in _sdsMakeRoomFor /home/ubuntu/mem-leak-issue/redis/src/sds.c:271
valkey-io#5 0x557e8b033f90 in sdsMakeRoomFor /home/ubuntu/mem-leak-issue/redis/src/sds.c:295
valkey-io#6 0x557e8b033f90 in sdscatlen /home/ubuntu/mem-leak-issue/redis/src/sds.c:486
valkey-io#7 0x557e8b044e1f in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1165
valkey-io#8 0x557e8b188be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
valkey-io#9 0x557e8b1883b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
valkey-io#10 0x557e8afcdf15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
valkey-io#11 0x7fdcdb29fd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: 18 byte(s) leaked in 2 allocation(s).
```
As part analysis found that the sdsfreesplitres is not called when this condition checks are being hit.
Output after the fix:
```
vm:~/mem-leak-issue/redis$ ./src/redis-server --invalid
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'invalid'
Bad directive or wrong number of arguments
vm:~/mem-leak-issue/redis$
===========================================
vm:~/mem-leak-issue/redis$ ./src/redis-server --jdhg
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'jdhg'
Bad directive or wrong number of arguments
---------------------------------------------------------------------------
vm:~/mem-leak-issue/redis$ ./src/redis-server --port
*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'port'
wrong number of arguments
```
Co-authored-by: Oran Agra <oran@redislabs.com>
naglera
pushed a commit
to naglera/placeholderkv
that referenced
this pull request
Apr 8, 2024
## Issues and solutions from #12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
- Introduced:
Version: 7.0.0
PR: #9963
- Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.
- Initial / Abandoned Solution:
* Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
- Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
#0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
valkey-io#1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
valkey-io#2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)
Previous read of size 4 at 0x0001045990c0 by main thread:
#0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
valkey-io#1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
valkey-io#2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
valkey-io#3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```
2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.
- Introduced:
Version: 7.0.0
PR: #9963
- Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.
- Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
#0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
valkey-io#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
valkey-io#2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
valkey-io#3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
valkey-io#4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)
Previous write of size 4 at 0x000100b54000 by main thread:
#0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
valkey-io#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
valkey-io#2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
valkey-io#3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```
## The final fix as the comments:
redis/redis#12817 (comment)
Optimized solution based on the above comment:
First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.
When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
redis/redis#12817 (comment)
mentioned direction.
---------
Co-authored-by: Oran Agra <oran@redislabs.com>
enjoy-binbin
added a commit
that referenced
this pull request
Aug 14, 2024
We are updating this variable in the main thread, and the
child threads can printing the logs at the same time. This
generating a warning in SANITIZER=thread:
```
WARNING: ThreadSanitizer: data race (pid=74208)
Read of size 4 at 0x000102875c10 by thread T3:
#0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556)
#1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89)
#2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9)
Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0):
#0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b)
#1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52)
#2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c)
#3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
#4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
```
The refresh of daylight_active is not real time, we update
it in aftersleep, so we don't need a strong synchronization,
so using memory_order_relaxed. But also noted we are doing
load/store operations only for daylight_active, which is an
aligned 32-bit integer, so using memory_order_relaxed will
not provide more consistency than what we have today.
So this is just a cleanup that to clear the warning.
Signed-off-by: Binbin <binloveplay1314@qq.com>
mapleFU
pushed a commit
to mapleFU/valkey
that referenced
this pull request
Aug 21, 2024
We are updating this variable in the main thread, and the
child threads can printing the logs at the same time. This
generating a warning in SANITIZER=thread:
```
WARNING: ThreadSanitizer: data race (pid=74208)
Read of size 4 at 0x000102875c10 by thread T3:
#0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556)
valkey-io#1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89)
valkey-io#2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9)
Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0):
#0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b)
valkey-io#1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52)
valkey-io#2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c)
valkey-io#3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
valkey-io#4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
```
The refresh of daylight_active is not real time, we update
it in aftersleep, so we don't need a strong synchronization,
so using memory_order_relaxed. But also noted we are doing
load/store operations only for daylight_active, which is an
aligned 32-bit integer, so using memory_order_relaxed will
not provide more consistency than what we have today.
So this is just a cleanup that to clear the warning.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: mwish <maplewish117@gmail.com>
mapleFU
pushed a commit
to mapleFU/valkey
that referenced
this pull request
Aug 22, 2024
We are updating this variable in the main thread, and the
child threads can printing the logs at the same time. This
generating a warning in SANITIZER=thread:
```
WARNING: ThreadSanitizer: data race (pid=74208)
Read of size 4 at 0x000102875c10 by thread T3:
#0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556)
valkey-io#1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89)
valkey-io#2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9)
Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0):
#0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b)
valkey-io#1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52)
valkey-io#2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c)
valkey-io#3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
valkey-io#4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
```
The refresh of daylight_active is not real time, we update
it in aftersleep, so we don't need a strong synchronization,
so using memory_order_relaxed. But also noted we are doing
load/store operations only for daylight_active, which is an
aligned 32-bit integer, so using memory_order_relaxed will
not provide more consistency than what we have today.
So this is just a cleanup that to clear the warning.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: mwish <maplewish117@gmail.com>
madolson
pushed a commit
that referenced
this pull request
Sep 2, 2024
We are updating this variable in the main thread, and the
child threads can printing the logs at the same time. This
generating a warning in SANITIZER=thread:
```
WARNING: ThreadSanitizer: data race (pid=74208)
Read of size 4 at 0x000102875c10 by thread T3:
#0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556)
#1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89)
#2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9)
Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0):
#0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b)
#1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52)
#2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c)
#3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
#4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
```
The refresh of daylight_active is not real time, we update
it in aftersleep, so we don't need a strong synchronization,
so using memory_order_relaxed. But also noted we are doing
load/store operations only for daylight_active, which is an
aligned 32-bit integer, so using memory_order_relaxed will
not provide more consistency than what we have today.
So this is just a cleanup that to clear the warning.
Signed-off-by: Binbin <binloveplay1314@qq.com>
madolson
pushed a commit
that referenced
this pull request
Sep 3, 2024
We are updating this variable in the main thread, and the
child threads can printing the logs at the same time. This
generating a warning in SANITIZER=thread:
```
WARNING: ThreadSanitizer: data race (pid=74208)
Read of size 4 at 0x000102875c10 by thread T3:
#0 serverLogRaw <null>:52173615 (valkey-server:x86_64+0x10003c556)
#1 _serverLog <null>:52173615 (valkey-server:x86_64+0x10003ca89)
#2 bioProcessBackgroundJobs <null>:52173615 (valkey-server:x86_64+0x1001402c9)
Previous write of size 4 at 0x000102875c10 by main thread (mutexes: write M0):
#0 afterSleep <null>:52173615 (valkey-server:x86_64+0x10004989b)
#1 aeProcessEvents <null>:52173615 (valkey-server:x86_64+0x100031e52)
#2 main <null>:52173615 (valkey-server:x86_64+0x100064a3c)
#3 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
#4 start <null>:52173615 (dyld:x86_64+0xfffffffffff5c365)
```
The refresh of daylight_active is not real time, we update
it in aftersleep, so we don't need a strong synchronization,
so using memory_order_relaxed. But also noted we are doing
load/store operations only for daylight_active, which is an
aligned 32-bit integer, so using memory_order_relaxed will
not provide more consistency than what we have today.
So this is just a cleanup that to clear the warning.
Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Feb 10, 2025
Fix new unittest networking use-after-free error
```
==96611==ERROR: AddressSanitizer: heap-use-after-free on address 0x503000075e00 at pc 0x55e52cbe1495 bp 0x7ffd9e1fc690 sp 0x7ffd9e1fc688
READ of size 8 at 0x503000075e00 thread T0
#0 0x55e52cbe[149](https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457#step:10:150)4 in freeReplicaReferencedReplBuffer /home/runner/work/valkey/valkey/src/replication.c:401:27
#1 0x55e52cbe7abf in freeClientReplicationData /home/runner/work/valkey/valkey/src/replication.c:1261:5
#2 0x55e52cb17a44 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:188:5
#3 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
#4 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
#5 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
#6 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
#7 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
0x503000075e00 is located 16 bytes inside of 24-byte region [0x503000075df0,0x503000075e08)
freed by thread T0 here:
#0 0x55e52ca50a7a in free (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x212a7a) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
#1 0x55e52cb905ba in listEmpty /home/runner/work/valkey/valkey/src/adlist.c:64:9
#2 0x55e52cb179e5 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:179:9
#3 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
#4 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
#5 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
#6 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
#7 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
previously allocated by thread T0 here:
#0 0x55e52ca50d13 in malloc (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x212d13) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
#1 0x55e52cbb844f in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:[155](https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457#step:10:156):17
#2 0x55e52cbb844f in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:184:17
#3 0x55e52cb90be6 in listAddNodeTail /home/runner/work/valkey/valkey/src/adlist.c:126:17
#4 0x55e52cb17873 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:167:9
#5 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
#6 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
#7 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
#8 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
#9 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
```
https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
xbasel
pushed a commit
to xbasel/valkey
that referenced
this pull request
Mar 27, 2025
Fix new unittest networking use-after-free error
```
==96611==ERROR: AddressSanitizer: heap-use-after-free on address 0x503000075e00 at pc 0x55e52cbe1495 bp 0x7ffd9e1fc690 sp 0x7ffd9e1fc688
READ of size 8 at 0x503000075e00 thread T0
#0 0x55e52cbe[149](https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457#step:10:150)4 in freeReplicaReferencedReplBuffer /home/runner/work/valkey/valkey/src/replication.c:401:27
valkey-io#1 0x55e52cbe7abf in freeClientReplicationData /home/runner/work/valkey/valkey/src/replication.c:1261:5
valkey-io#2 0x55e52cb17a44 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:188:5
valkey-io#3 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#4 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#5 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#6 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#7 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
0x503000075e00 is located 16 bytes inside of 24-byte region [0x503000075df0,0x503000075e08)
freed by thread T0 here:
#0 0x55e52ca50a7a in free (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x212a7a) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
valkey-io#1 0x55e52cb905ba in listEmpty /home/runner/work/valkey/valkey/src/adlist.c:64:9
valkey-io#2 0x55e52cb179e5 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:179:9
valkey-io#3 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#4 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#5 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#6 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#7 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
previously allocated by thread T0 here:
#0 0x55e52ca50d13 in malloc (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x212d13) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
valkey-io#1 0x55e52cbb844f in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:[155](https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457#step:10:156):17
valkey-io#2 0x55e52cbb844f in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:184:17
valkey-io#3 0x55e52cb90be6 in listAddNodeTail /home/runner/work/valkey/valkey/src/adlist.c:126:17
valkey-io#4 0x55e52cb17873 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:167:9
valkey-io#5 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#6 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#7 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#8 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#9 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
```
https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
xbasel
pushed a commit
to xbasel/valkey
that referenced
this pull request
Mar 27, 2025
Fix new unittest networking use-after-free error
```
==96611==ERROR: AddressSanitizer: heap-use-after-free on address 0x503000075e00 at pc 0x55e52cbe1495 bp 0x7ffd9e1fc690 sp 0x7ffd9e1fc688
READ of size 8 at 0x503000075e00 thread T0
#0 0x55e52cbe[149](https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457#step:10:150)4 in freeReplicaReferencedReplBuffer /home/runner/work/valkey/valkey/src/replication.c:401:27
valkey-io#1 0x55e52cbe7abf in freeClientReplicationData /home/runner/work/valkey/valkey/src/replication.c:1261:5
valkey-io#2 0x55e52cb17a44 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:188:5
valkey-io#3 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#4 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#5 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#6 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#7 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
0x503000075e00 is located 16 bytes inside of 24-byte region [0x503000075df0,0x503000075e08)
freed by thread T0 here:
#0 0x55e52ca50a7a in free (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x212a7a) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
valkey-io#1 0x55e52cb905ba in listEmpty /home/runner/work/valkey/valkey/src/adlist.c:64:9
valkey-io#2 0x55e52cb179e5 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:179:9
valkey-io#3 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#4 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#5 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#6 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#7 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
previously allocated by thread T0 here:
#0 0x55e52ca50d13 in malloc (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x212d13) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
valkey-io#1 0x55e52cbb844f in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:[155](https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457#step:10:156):17
valkey-io#2 0x55e52cbb844f in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:184:17
valkey-io#3 0x55e52cb90be6 in listAddNodeTail /home/runner/work/valkey/valkey/src/adlist.c:126:17
valkey-io#4 0x55e52cb17873 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:167:9
valkey-io#5 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#6 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#7 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#8 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#9 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
```
https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
murphyjacob4
pushed a commit
to enjoy-binbin/valkey
that referenced
this pull request
Apr 13, 2025
Fix new unittest networking use-after-free error
```
==96611==ERROR: AddressSanitizer: heap-use-after-free on address 0x503000075e00 at pc 0x55e52cbe1495 bp 0x7ffd9e1fc690 sp 0x7ffd9e1fc688
READ of size 8 at 0x503000075e00 thread T0
#0 0x55e52cbe[149](https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457#step:10:150)4 in freeReplicaReferencedReplBuffer /home/runner/work/valkey/valkey/src/replication.c:401:27
valkey-io#1 0x55e52cbe7abf in freeClientReplicationData /home/runner/work/valkey/valkey/src/replication.c:1261:5
valkey-io#2 0x55e52cb17a44 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:188:5
valkey-io#3 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#4 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#5 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#6 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#7 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
0x503000075e00 is located 16 bytes inside of 24-byte region [0x503000075df0,0x503000075e08)
freed by thread T0 here:
#0 0x55e52ca50a7a in free (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x212a7a) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
valkey-io#1 0x55e52cb905ba in listEmpty /home/runner/work/valkey/valkey/src/adlist.c:64:9
valkey-io#2 0x55e52cb179e5 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:179:9
valkey-io#3 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#4 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#5 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#6 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#7 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
previously allocated by thread T0 here:
#0 0x55e52ca50d13 in malloc (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x212d13) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
valkey-io#1 0x55e52cbb844f in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:[155](https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457#step:10:156):17
valkey-io#2 0x55e52cbb844f in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:184:17
valkey-io#3 0x55e52cb90be6 in listAddNodeTail /home/runner/work/valkey/valkey/src/adlist.c:126:17
valkey-io#4 0x55e52cb17873 in test_writeToReplica /home/runner/work/valkey/valkey/src/unit/test_networking.c:167:9
valkey-io#5 0x55e52cac976b in runTestSuite /home/runner/work/valkey/valkey/src/unit/test_main.c:26:28
valkey-io#6 0x55e52cac9bae in main /home/runner/work/valkey/valkey/src/unit/test_main.c:61:14
valkey-io#7 0x7fded4c2a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#8 0x7fded4c2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
valkey-io#9 0x55e52c9b5ec4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x177ec4) (BuildId: 587aaf0e86abaf104cbb714f290b1436f8ddf614)
```
https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927929457
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
ranshid
added a commit
that referenced
this pull request
Jun 18, 2025
…2231) NOTE - this is a backport of #2109 When we refactored the blocking framework we introduced the client reprocessing infrastructure. In cases the client was blocked on keys, it will attempt to reprocess the command. One challenge was to keep track of the command timeout, since we are reprocessing and do not want to re-register the client with a fresh timeout each time. The solution was to consider the client reprocessing flag when the client is blockedOnKeys: ``` if (!c->flag.reprocessing_command) { /* If the client is re-processing the command, we do not set the timeout * because we need to retain the client's original timeout. */ c->bstate->timeout = timeout; } ``` However, this introduced a new issue. There are cases where the client will consecutive blocking of different types for example: ``` CLIENT PAUSE 10000 ALL BZPOPMAX zset 1 ``` would have the client blocked on the zset endlessly if nothing will be written to it. **Credits to @uriyage for locating this with his fuzzer testing** The suggested solution is to only flag the client when it is specifically unblocked on keys. Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
hwware
pushed a commit
that referenced
this pull request
Jun 18, 2025
When calling the command `EVAL error{} 0`, Valkey crashes with the
following stack trace. This patch ensures we never leave the
`err_info.msg` field null when we fail to extract a proper error
message.
```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
2595901:M 18 Jun 2025 01:20:12.917 # valkey 8.1.2 crashed by signal: 11, si_code: 1
2595901:M 18 Jun 2025 01:20:12.917 # Accessing address: (nil)
2595901:M 18 Jun 2025 01:20:12.917 # Crashed running the instruction at: 0x726f8e57ed1d
------ STACK TRACE ------
EIP:
/usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d]
2595905 bio_aof
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
2595904 bio_close_file
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
2595901 valkey-server *
/usr/lib/libc.so.6(+0x3def0) [0x726f8e44def0]
/usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d]
valkey-server *:6379(sdscatfmt+0x894) [0x6530abaa24a4]
valkey-server *:6379(luaCallFunction+0x39a) [0x6530abbc66ea]
valkey-server *:6379(+0x1a0992) [0x6530abbc6992]
valkey-server *:6379(scriptingEngineCallFunction+0x98) [0x6530abbc1298]
valkey-server *:6379(+0x11ff55) [0x6530abb45f55]
valkey-server *:6379(call+0x174) [0x6530aba94454]
valkey-server *:6379(processCommand+0x93d) [0x6530aba958dd]
valkey-server *:6379(processCommandAndResetClient+0x21) [0x6530abaa9d11]
valkey-server *:6379(processInputBuffer+0xe3) [0x6530abaaee83]
valkey-server *:6379(readQueryFromClient+0x65) [0x6530abaaef55]
valkey-server *:6379(+0x18e31a) [0x6530abbb431a]
valkey-server *:6379(aeProcessEvents+0x24a) [0x6530aba790ca]
valkey-server *:6379(aeMain+0x2d) [0x6530aba7938d]
valkey-server *:6379(main+0x3f6) [0x6530aba6e7b6]
/usr/lib/libc.so.6(+0x276b5) [0x726f8e4376b5]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x726f8e437769]
valkey-server *:6379(_start+0x25) [0x6530aba70235]
2595906 bio_lazy_free
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
4/4 expected stacktraces.
------ STACK TRACE DONE ------
------ REGISTERS ------
2595901:M 18 Jun 2025 01:20:12.920 #
RAX:0000000000000000 RBX:0000726f8dd35663
RCX:0000000000000000 RDX:0000000000000000
RDI:0000000000000000 RSI:0000000000000010
RBP:00007ffc2b821a80 RSP:00007ffc2b821938
R8 :000000000000000c R9 :00006530abc111b8
R10:0000000000000001 R11:0000000000000003
R12:00006530abc49adc R13:00006530abc111b7
R14:0000000000000001 R15:0000000000000001
RIP:0000726f8e57ed1d EFL:0000000000010283
CSGSFS:002b000000000033
2595901:M 18 Jun 2025 01:20:12.921 * hide-user-data-from-log is on, skip logging stack content to avoid spilling user data.
------ INFO OUTPUT ------
# Server
redis_version:7.2.4
server_name:valkey
valkey_version:8.1.2
valkey_release_stage:ga
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:38d65aa7b4148d2c
server_mode:standalone
os:Linux 6.14.6-arch1-1 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
gcc_version:15.1.1
process_id:2595901
process_supervised:no
run_id:a0b75f67a217a81142f17553028c010e86c1ee80
tcp_port:6379
server_time_usec:1750209612917634
uptime_in_seconds:16
uptime_in_days:0
hz:10
configured_hz:10
clients_hz:10
lru_clock:5379148
executable:/home/fusl/valkey-server
config_file:
io_threads_active:0
availability_zone:
listener0:name=tcp,bind=*,bind=-::*,port=6379
# Clients
connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
pubsub_clients:0
watching_clients:0
clients_in_timeout_table:0
total_watched_keys:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0
paused_reason:none
paused_actions:none
paused_timeout_milliseconds:0
# Memory
used_memory:911824
used_memory_human:890.45K
used_memory_rss:15323136
used_memory_rss_human:14.61M
used_memory_peak:911824
used_memory_peak_human:890.45K
used_memory_peak_perc:100.29%
used_memory_overhead:892232
used_memory_startup:891824
used_memory_dataset:19592
used_memory_dataset_perc:97.96%
allocator_allocated:1845952
allocator_active:1986560
allocator_resident:6672384
allocator_muzzy:0
total_system_memory:67323842560
total_system_memory_human:62.70G
used_memory_lua:34816
used_memory_vm_eval:34816
used_memory_lua_human:34.00K
used_memory_scripts_eval:184
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:68608
used_memory_vm_total_human:67.00K
used_memory_functions:224
used_memory_scripts:408
used_memory_scripts_human:408B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:0
allocator_rss_ratio:3.36
allocator_rss_bytes:4685824
rss_overhead_ratio:2.30
rss_overhead_bytes:8650752
mem_fragmentation_ratio:17.18
mem_fragmentation_bytes:14431168
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:0
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1750209596
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:1
total_commands_processed:0
instantaneous_ops_per_sec:0
total_net_input_bytes:34
total_net_output_bytes:0
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
evicted_scripts:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:1
total_writes_processed:0
io_threaded_reads_processed:0
io_threaded_writes_processed:0
io_threaded_freed_objects:0
io_threaded_accept_processed:0
io_threaded_poll_processed:0
io_threaded_total_prefetch_batches:0
io_threaded_total_prefetch_entries:0
client_query_buffer_limit_disconnections:0
client_output_buffer_limit_disconnections:0
reply_buffer_shrinks:0
reply_buffer_expands:0
eventloop_cycles:170
eventloop_duration_sum:17739
eventloop_duration_cmd_sum:0
instantaneous_eventloop_cycles_per_sec:9
instantaneous_eventloop_duration_usec:99
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0
# Replication
role:master
connected_slaves:0
replicas_waiting_psync:0
master_failover_state:no-failover
master_replid:d35a0bb7979f490a60174bb363524431d7eb2428
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:10485760
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:0.012543
used_cpu_user:0.016853
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.012440
used_cpu_user_main_thread:0.016714
# Modules
# Commandstats
# Errorstats
# Latencystats
# Cluster
cluster_enabled:0
# Keyspace
------ CLIENT LIST OUTPUT ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
------ CURRENT CLIENT INFO ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
argc: 3
argv[0]: "eval"
argv[1]: 7 bytes
argv[2]: 1 bytes
------ EXECUTING CLIENT INFO ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
argc: 3
argv[0]: "eval"
argv[1]: 7 bytes
argv[2]: 1 bytes
------ MODULES INFO OUTPUT ------
------ CONFIG DEBUG OUTPUT ------
repl-diskless-load disabled
debug-context ""
sanitize-dump-payload no
lazyfree-lazy-user-del yes
lazyfree-lazy-server-del yes
import-mode no
lazyfree-lazy-user-flush yes
list-compress-depth 0
dual-channel-replication-enabled no
repl-diskless-sync yes
activedefrag no
lazyfree-lazy-expire yes
io-threads 1
replica-read-only yes
client-query-buffer-limit 1gb
slave-read-only yes
lazyfree-lazy-eviction yes
proto-max-bulk-len 512mb
------ FAST MEMORY TEST ------
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #0 terminated
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #1 terminated
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #2 terminated
*** Preparing to test memory region 6530abce2000 (212992 bytes)
*** Preparing to test memory region 726f8af7f000 (2621440 bytes)
*** Preparing to test memory region 726f8b200000 (8388608 bytes)
*** Preparing to test memory region 726f8ba00000 (4194304 bytes)
*** Preparing to test memory region 726f8bffe000 (8388608 bytes)
*** Preparing to test memory region 726f8c7ff000 (8388608 bytes)
*** Preparing to test memory region 726f8d000000 (8388608 bytes)
*** Preparing to test memory region 726f8dc00000 (4194304 bytes)
*** Preparing to test memory region 726f8e290000 (16384 bytes)
*** Preparing to test memory region 726f8e3d2000 (20480 bytes)
*** Preparing to test memory region 726f8e5f8000 (32768 bytes)
*** Preparing to test memory region 726f8eb58000 (12288 bytes)
*** Preparing to test memory region 726f8eb5c000 (16384 bytes)
*** Preparing to test memory region 726f8ed63000 (4096 bytes)
*** Preparing to test memory region 726f8eef2000 (397312 bytes)
*** Preparing to test memory region 726f8efc7000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /usr/lib/libc.so.6 (base 0x726f8e410000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
------
=== VALKEY BUG REPORT END. Make sure to include from START to END. ===
```
---------
Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Jun 25, 2025
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.
**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:
```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
#0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
#1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
#2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)
0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
#0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
#2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
#3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
#4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
#5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
previously allocated by thread T0 here:
#0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
#2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
#3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
#4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==46774==ABORTING
```
**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.
**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:
```
start_server {tags {"expire hashtable defrag"}} {
test {hashtable scan defrag on expiry} {
r config set hz 100
set num_keys 20
for {set i 0} {$i < $num_keys} {incr i} {
r set "key_$i" "value_$i"
}
for {set j 0} {$j < 50} {incr j} {
set expire_keys 100
for {set i 0} {$i < $expire_keys} {incr i} {
# Short expiry time to ensure they expire quickly
r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
}
# Verify keys are set
set initial_size [r dbsize]
assert_equal $initial_size [expr $num_keys + $expire_keys]
after 150
for {set i 0} {$i < 10} {incr i} {
r get "expire_key_${i}_${j}"
after 10
}
}
set remaining_keys [r dbsize]
assert_equal $remaining_keys $num_keys
# Verify server is still responsive
assert_equal [r ping] {PONG}
} {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.
Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Aug 22, 2025
When calling the command `EVAL error{} 0`, Valkey crashes with the
following stack trace. This patch ensures we never leave the
`err_info.msg` field null when we fail to extract a proper error
message.
```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
2595901:M 18 Jun 2025 01:20:12.917 # valkey 8.1.2 crashed by signal: 11, si_code: 1
2595901:M 18 Jun 2025 01:20:12.917 # Accessing address: (nil)
2595901:M 18 Jun 2025 01:20:12.917 # Crashed running the instruction at: 0x726f8e57ed1d
------ STACK TRACE ------
EIP:
/usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d]
2595905 bio_aof
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
2595904 bio_close_file
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
2595901 valkey-server *
/usr/lib/libc.so.6(+0x3def0) [0x726f8e44def0]
/usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d]
valkey-server *:6379(sdscatfmt+0x894) [0x6530abaa24a4]
valkey-server *:6379(luaCallFunction+0x39a) [0x6530abbc66ea]
valkey-server *:6379(+0x1a0992) [0x6530abbc6992]
valkey-server *:6379(scriptingEngineCallFunction+0x98) [0x6530abbc1298]
valkey-server *:6379(+0x11ff55) [0x6530abb45f55]
valkey-server *:6379(call+0x174) [0x6530aba94454]
valkey-server *:6379(processCommand+0x93d) [0x6530aba958dd]
valkey-server *:6379(processCommandAndResetClient+0x21) [0x6530abaa9d11]
valkey-server *:6379(processInputBuffer+0xe3) [0x6530abaaee83]
valkey-server *:6379(readQueryFromClient+0x65) [0x6530abaaef55]
valkey-server *:6379(+0x18e31a) [0x6530abbb431a]
valkey-server *:6379(aeProcessEvents+0x24a) [0x6530aba790ca]
valkey-server *:6379(aeMain+0x2d) [0x6530aba7938d]
valkey-server *:6379(main+0x3f6) [0x6530aba6e7b6]
/usr/lib/libc.so.6(+0x276b5) [0x726f8e4376b5]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x726f8e437769]
valkey-server *:6379(_start+0x25) [0x6530aba70235]
2595906 bio_lazy_free
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
4/4 expected stacktraces.
------ STACK TRACE DONE ------
------ REGISTERS ------
2595901:M 18 Jun 2025 01:20:12.920 #
RAX:0000000000000000 RBX:0000726f8dd35663
RCX:0000000000000000 RDX:0000000000000000
RDI:0000000000000000 RSI:0000000000000010
RBP:00007ffc2b821a80 RSP:00007ffc2b821938
R8 :000000000000000c R9 :00006530abc111b8
R10:0000000000000001 R11:0000000000000003
R12:00006530abc49adc R13:00006530abc111b7
R14:0000000000000001 R15:0000000000000001
RIP:0000726f8e57ed1d EFL:0000000000010283
CSGSFS:002b000000000033
2595901:M 18 Jun 2025 01:20:12.921 * hide-user-data-from-log is on, skip logging stack content to avoid spilling user data.
------ INFO OUTPUT ------
redis_version:7.2.4
server_name:valkey
valkey_version:8.1.2
valkey_release_stage:ga
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:38d65aa7b4148d2c
server_mode:standalone
os:Linux 6.14.6-arch1-1 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
gcc_version:15.1.1
process_id:2595901
process_supervised:no
run_id:a0b75f67a217a81142f17553028c010e86c1ee80
tcp_port:6379
server_time_usec:1750209612917634
uptime_in_seconds:16
uptime_in_days:0
hz:10
configured_hz:10
clients_hz:10
lru_clock:5379148
executable:/home/fusl/valkey-server
config_file:
io_threads_active:0
availability_zone:
listener0:name=tcp,bind=*,bind=-::*,port=6379
connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
pubsub_clients:0
watching_clients:0
clients_in_timeout_table:0
total_watched_keys:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0
paused_reason:none
paused_actions:none
paused_timeout_milliseconds:0
used_memory:911824
used_memory_human:890.45K
used_memory_rss:15323136
used_memory_rss_human:14.61M
used_memory_peak:911824
used_memory_peak_human:890.45K
used_memory_peak_perc:100.29%
used_memory_overhead:892232
used_memory_startup:891824
used_memory_dataset:19592
used_memory_dataset_perc:97.96%
allocator_allocated:1845952
allocator_active:1986560
allocator_resident:6672384
allocator_muzzy:0
total_system_memory:67323842560
total_system_memory_human:62.70G
used_memory_lua:34816
used_memory_vm_eval:34816
used_memory_lua_human:34.00K
used_memory_scripts_eval:184
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:68608
used_memory_vm_total_human:67.00K
used_memory_functions:224
used_memory_scripts:408
used_memory_scripts_human:408B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:0
allocator_rss_ratio:3.36
allocator_rss_bytes:4685824
rss_overhead_ratio:2.30
rss_overhead_bytes:8650752
mem_fragmentation_ratio:17.18
mem_fragmentation_bytes:14431168
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:0
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1750209596
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
total_connections_received:1
total_commands_processed:0
instantaneous_ops_per_sec:0
total_net_input_bytes:34
total_net_output_bytes:0
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
evicted_scripts:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:1
total_writes_processed:0
io_threaded_reads_processed:0
io_threaded_writes_processed:0
io_threaded_freed_objects:0
io_threaded_accept_processed:0
io_threaded_poll_processed:0
io_threaded_total_prefetch_batches:0
io_threaded_total_prefetch_entries:0
client_query_buffer_limit_disconnections:0
client_output_buffer_limit_disconnections:0
reply_buffer_shrinks:0
reply_buffer_expands:0
eventloop_cycles:170
eventloop_duration_sum:17739
eventloop_duration_cmd_sum:0
instantaneous_eventloop_cycles_per_sec:9
instantaneous_eventloop_duration_usec:99
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0
role:master
connected_slaves:0
replicas_waiting_psync:0
master_failover_state:no-failover
master_replid:d35a0bb7979f490a60174bb363524431d7eb2428
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:10485760
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
used_cpu_sys:0.012543
used_cpu_user:0.016853
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.012440
used_cpu_user_main_thread:0.016714
cluster_enabled:0
------ CLIENT LIST OUTPUT ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
------ CURRENT CLIENT INFO ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
argc: 3
argv[0]: "eval"
argv[1]: 7 bytes
argv[2]: 1 bytes
------ EXECUTING CLIENT INFO ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
argc: 3
argv[0]: "eval"
argv[1]: 7 bytes
argv[2]: 1 bytes
------ MODULES INFO OUTPUT ------
------ CONFIG DEBUG OUTPUT ------
repl-diskless-load disabled
debug-context ""
sanitize-dump-payload no
lazyfree-lazy-user-del yes
lazyfree-lazy-server-del yes
import-mode no
lazyfree-lazy-user-flush yes
list-compress-depth 0
dual-channel-replication-enabled no
repl-diskless-sync yes
activedefrag no
lazyfree-lazy-expire yes
io-threads 1
replica-read-only yes
client-query-buffer-limit 1gb
slave-read-only yes
lazyfree-lazy-eviction yes
proto-max-bulk-len 512mb
------ FAST MEMORY TEST ------
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #0 terminated
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #1 terminated
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #2 terminated
*** Preparing to test memory region 6530abce2000 (212992 bytes)
*** Preparing to test memory region 726f8af7f000 (2621440 bytes)
*** Preparing to test memory region 726f8b200000 (8388608 bytes)
*** Preparing to test memory region 726f8ba00000 (4194304 bytes)
*** Preparing to test memory region 726f8bffe000 (8388608 bytes)
*** Preparing to test memory region 726f8c7ff000 (8388608 bytes)
*** Preparing to test memory region 726f8d000000 (8388608 bytes)
*** Preparing to test memory region 726f8dc00000 (4194304 bytes)
*** Preparing to test memory region 726f8e290000 (16384 bytes)
*** Preparing to test memory region 726f8e3d2000 (20480 bytes)
*** Preparing to test memory region 726f8e5f8000 (32768 bytes)
*** Preparing to test memory region 726f8eb58000 (12288 bytes)
*** Preparing to test memory region 726f8eb5c000 (16384 bytes)
*** Preparing to test memory region 726f8ed63000 (4096 bytes)
*** Preparing to test memory region 726f8eef2000 (397312 bytes)
*** Preparing to test memory region 726f8efc7000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /usr/lib/libc.so.6 (base 0x726f8e410000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
------
=== VALKEY BUG REPORT END. Make sure to include from START to END. ===
```
---------
Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
zuiderkwast
pushed a commit
that referenced
this pull request
Oct 1, 2025
When calling the command `EVAL error{} 0`, Valkey crashes with the
following stack trace. This patch ensures we never leave the
`err_info.msg` field null when we fail to extract a proper error
message.
```
=== VALKEY BUG REPORT START: Cut & paste starting from here ===
2595901:M 18 Jun 2025 01:20:12.917 # valkey 8.1.2 crashed by signal: 11, si_code: 1
2595901:M 18 Jun 2025 01:20:12.917 # Accessing address: (nil)
2595901:M 18 Jun 2025 01:20:12.917 # Crashed running the instruction at: 0x726f8e57ed1d
------ STACK TRACE ------
EIP:
/usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d]
2595905 bio_aof
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
2595904 bio_close_file
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
2595901 valkey-server *
/usr/lib/libc.so.6(+0x3def0) [0x726f8e44def0]
/usr/lib/libc.so.6(+0x16ed1d) [0x726f8e57ed1d]
valkey-server *:6379(sdscatfmt+0x894) [0x6530abaa24a4]
valkey-server *:6379(luaCallFunction+0x39a) [0x6530abbc66ea]
valkey-server *:6379(+0x1a0992) [0x6530abbc6992]
valkey-server *:6379(scriptingEngineCallFunction+0x98) [0x6530abbc1298]
valkey-server *:6379(+0x11ff55) [0x6530abb45f55]
valkey-server *:6379(call+0x174) [0x6530aba94454]
valkey-server *:6379(processCommand+0x93d) [0x6530aba958dd]
valkey-server *:6379(processCommandAndResetClient+0x21) [0x6530abaa9d11]
valkey-server *:6379(processInputBuffer+0xe3) [0x6530abaaee83]
valkey-server *:6379(readQueryFromClient+0x65) [0x6530abaaef55]
valkey-server *:6379(+0x18e31a) [0x6530abbb431a]
valkey-server *:6379(aeProcessEvents+0x24a) [0x6530aba790ca]
valkey-server *:6379(aeMain+0x2d) [0x6530aba7938d]
valkey-server *:6379(main+0x3f6) [0x6530aba6e7b6]
/usr/lib/libc.so.6(+0x276b5) [0x726f8e4376b5]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x726f8e437769]
valkey-server *:6379(_start+0x25) [0x6530aba70235]
2595906 bio_lazy_free
/usr/lib/libc.so.6(+0x9de22) [0x726f8e4ade22]
/usr/lib/libc.so.6(+0x91fda) [0x726f8e4a1fda]
/usr/lib/libc.so.6(+0x9264c) [0x726f8e4a264c]
/usr/lib/libc.so.6(pthread_cond_wait+0x14e) [0x726f8e4a4d1e]
valkey-server *:6379(bioProcessBackgroundJobs+0x1b4) [0x6530abb46db4]
/usr/lib/libc.so.6(+0x957eb) [0x726f8e4a57eb]
/usr/lib/libc.so.6(+0x11918c) [0x726f8e52918c]
4/4 expected stacktraces.
------ STACK TRACE DONE ------
------ REGISTERS ------
2595901:M 18 Jun 2025 01:20:12.920 #
RAX:0000000000000000 RBX:0000726f8dd35663
RCX:0000000000000000 RDX:0000000000000000
RDI:0000000000000000 RSI:0000000000000010
RBP:00007ffc2b821a80 RSP:00007ffc2b821938
R8 :000000000000000c R9 :00006530abc111b8
R10:0000000000000001 R11:0000000000000003
R12:00006530abc49adc R13:00006530abc111b7
R14:0000000000000001 R15:0000000000000001
RIP:0000726f8e57ed1d EFL:0000000000010283
CSGSFS:002b000000000033
2595901:M 18 Jun 2025 01:20:12.921 * hide-user-data-from-log is on, skip logging stack content to avoid spilling user data.
------ INFO OUTPUT ------
# Server
redis_version:7.2.4
server_name:valkey
valkey_version:8.1.2
valkey_release_stage:ga
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:38d65aa7b4148d2c
server_mode:standalone
os:Linux 6.14.6-arch1-1 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
gcc_version:15.1.1
process_id:2595901
process_supervised:no
run_id:a0b75f67a217a81142f17553028c010e86c1ee80
tcp_port:6379
server_time_usec:1750209612917634
uptime_in_seconds:16
uptime_in_days:0
hz:10
configured_hz:10
clients_hz:10
lru_clock:5379148
executable:/home/fusl/valkey-server
config_file:
io_threads_active:0
availability_zone:
listener0:name=tcp,bind=*,bind=-::*,port=6379
# Clients
connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
pubsub_clients:0
watching_clients:0
clients_in_timeout_table:0
total_watched_keys:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0
paused_reason:none
paused_actions:none
paused_timeout_milliseconds:0
# Memory
used_memory:911824
used_memory_human:890.45K
used_memory_rss:15323136
used_memory_rss_human:14.61M
used_memory_peak:911824
used_memory_peak_human:890.45K
used_memory_peak_perc:100.29%
used_memory_overhead:892232
used_memory_startup:891824
used_memory_dataset:19592
used_memory_dataset_perc:97.96%
allocator_allocated:1845952
allocator_active:1986560
allocator_resident:6672384
allocator_muzzy:0
total_system_memory:67323842560
total_system_memory_human:62.70G
used_memory_lua:34816
used_memory_vm_eval:34816
used_memory_lua_human:34.00K
used_memory_scripts_eval:184
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:68608
used_memory_vm_total_human:67.00K
used_memory_functions:224
used_memory_scripts:408
used_memory_scripts_human:408B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:0
allocator_rss_ratio:3.36
allocator_rss_bytes:4685824
rss_overhead_ratio:2.30
rss_overhead_bytes:8650752
mem_fragmentation_ratio:17.18
mem_fragmentation_bytes:14431168
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:0
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1750209596
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:1
total_commands_processed:0
instantaneous_ops_per_sec:0
total_net_input_bytes:34
total_net_output_bytes:0
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
evicted_scripts:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:1
total_writes_processed:0
io_threaded_reads_processed:0
io_threaded_writes_processed:0
io_threaded_freed_objects:0
io_threaded_accept_processed:0
io_threaded_poll_processed:0
io_threaded_total_prefetch_batches:0
io_threaded_total_prefetch_entries:0
client_query_buffer_limit_disconnections:0
client_output_buffer_limit_disconnections:0
reply_buffer_shrinks:0
reply_buffer_expands:0
eventloop_cycles:170
eventloop_duration_sum:17739
eventloop_duration_cmd_sum:0
instantaneous_eventloop_cycles_per_sec:9
instantaneous_eventloop_duration_usec:99
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0
# Replication
role:master
connected_slaves:0
replicas_waiting_psync:0
master_failover_state:no-failover
master_replid:d35a0bb7979f490a60174bb363524431d7eb2428
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:10485760
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:0.012543
used_cpu_user:0.016853
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.012440
used_cpu_user_main_thread:0.016714
# Modules
# Commandstats
# Errorstats
# Latencystats
# Cluster
cluster_enabled:0
# Keyspace
------ CLIENT LIST OUTPUT ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
------ CURRENT CLIENT INFO ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
argc: 3
argv[0]: "eval"
argv[1]: 7 bytes
argv[2]: 1 bytes
------ EXECUTING CLIENT INFO ------
id=2 addr=127.0.0.1:41372 laddr=127.0.0.1:6379 fd=10 name=*redacted* age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=12 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17060 events=r cmd=eval user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=34 tot-net-out=0 tot-cmds=0
argc: 3
argv[0]: "eval"
argv[1]: 7 bytes
argv[2]: 1 bytes
------ MODULES INFO OUTPUT ------
------ CONFIG DEBUG OUTPUT ------
repl-diskless-load disabled
debug-context ""
sanitize-dump-payload no
lazyfree-lazy-user-del yes
lazyfree-lazy-server-del yes
import-mode no
lazyfree-lazy-user-flush yes
list-compress-depth 0
dual-channel-replication-enabled no
repl-diskless-sync yes
activedefrag no
lazyfree-lazy-expire yes
io-threads 1
replica-read-only yes
client-query-buffer-limit 1gb
slave-read-only yes
lazyfree-lazy-eviction yes
proto-max-bulk-len 512mb
------ FAST MEMORY TEST ------
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #0 terminated
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #1 terminated
2595901:M 18 Jun 2025 01:20:12.921 # Bio worker thread #2 terminated
*** Preparing to test memory region 6530abce2000 (212992 bytes)
*** Preparing to test memory region 726f8af7f000 (2621440 bytes)
*** Preparing to test memory region 726f8b200000 (8388608 bytes)
*** Preparing to test memory region 726f8ba00000 (4194304 bytes)
*** Preparing to test memory region 726f8bffe000 (8388608 bytes)
*** Preparing to test memory region 726f8c7ff000 (8388608 bytes)
*** Preparing to test memory region 726f8d000000 (8388608 bytes)
*** Preparing to test memory region 726f8dc00000 (4194304 bytes)
*** Preparing to test memory region 726f8e290000 (16384 bytes)
*** Preparing to test memory region 726f8e3d2000 (20480 bytes)
*** Preparing to test memory region 726f8e5f8000 (32768 bytes)
*** Preparing to test memory region 726f8eb58000 (12288 bytes)
*** Preparing to test memory region 726f8eb5c000 (16384 bytes)
*** Preparing to test memory region 726f8ed63000 (4096 bytes)
*** Preparing to test memory region 726f8eef2000 (397312 bytes)
*** Preparing to test memory region 726f8efc7000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /usr/lib/libc.so.6 (base 0x726f8e410000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
------
=== VALKEY BUG REPORT END. Make sure to include from START to END. ===
```
---------
Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
zuiderkwast
pushed a commit
that referenced
this pull request
Oct 1, 2025
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.
**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:
```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
#0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
#1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
#2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)
0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
#0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
#2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
#3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
#4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
#5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
previously allocated by thread T0 here:
#0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
#2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
#3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
#4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==46774==ABORTING
```
**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.
**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:
```
start_server {tags {"expire hashtable defrag"}} {
test {hashtable scan defrag on expiry} {
r config set hz 100
set num_keys 20
for {set i 0} {$i < $num_keys} {incr i} {
r set "key_$i" "value_$i"
}
for {set j 0} {$j < 50} {incr j} {
set expire_keys 100
for {set i 0} {$i < $expire_keys} {incr i} {
# Short expiry time to ensure they expire quickly
r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
}
# Verify keys are set
set initial_size [r dbsize]
assert_equal $initial_size [expr $num_keys + $expire_keys]
after 150
for {set i 0} {$i < 10} {incr i} {
r get "expire_key_${i}_${j}"
after 10
}
}
set remaining_keys [r dbsize]
assert_equal $remaining_keys $num_keys
# Verify server is still responsive
assert_equal [r ping] {PONG}
} {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.
Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
hpatro
added a commit
that referenced
this pull request
Oct 8, 2025
With #1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 16, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb)
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 17, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 19, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
cherukum-Amazon
pushed a commit
to cherukum-Amazon/valkey
that referenced
this pull request
Oct 21, 2025
With valkey-io#1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) valkey-io#1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 valkey-io#2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 valkey-io#3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 valkey-io#4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 valkey-io#5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 valkey-io#6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 valkey-io#7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 valkey-io#8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 valkey-io#9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 valkey-io#10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 valkey-io#11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 valkey-io#12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 valkey-io#13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 valkey-io#14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 valkey-io#15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 valkey-io#16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
madolson
pushed a commit
that referenced
this pull request
Oct 21, 2025
With #1401, we introduced additional filters to CLIENT LIST/KILL subcommand. The intended behavior was to pick the last value of the filter. However, we introduced memory leak for all the preceding filters. Before this change: ``` > CLIENT LIST IP 127.0.0.1 IP 127.0.0.1 id=4 addr=127.0.0.1:37866 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=21 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=16989 events=r cmd=client|list user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=49 tot-net-out=0 tot-cmds=0 ``` Leak: ``` Direct leak of 11 byte(s) in 1 object(s) allocated from: #0 0x7f2901aa557d in malloc (/lib64/libasan.so.4+0xd857d) #1 0x76db76 in ztrymalloc_usable_internal /workplace/harkrisp/valkey/src/zmalloc.c:156 #2 0x76db76 in zmalloc_usable /workplace/harkrisp/valkey/src/zmalloc.c:200 #3 0x4c4121 in _sdsnewlen.constprop.230 /workplace/harkrisp/valkey/src/sds.c:113 #4 0x4dc456 in parseClientFiltersOrReply.constprop.63 /workplace/harkrisp/valkey/src/networking.c:4264 #5 0x4bb9f7 in clientListCommand /workplace/harkrisp/valkey/src/networking.c:4600 #6 0x641159 in call /workplace/harkrisp/valkey/src/server.c:3772 #7 0x6431a6 in processCommand /workplace/harkrisp/valkey/src/server.c:4434 #8 0x4bfa9b in processCommandAndResetClient /workplace/harkrisp/valkey/src/networking.c:3571 #9 0x4bfa9b in processInputBuffer /workplace/harkrisp/valkey/src/networking.c:3702 #10 0x4bffa3 in readQueryFromClient /workplace/harkrisp/valkey/src/networking.c:3812 #11 0x481015 in callHandler /workplace/harkrisp/valkey/src/connhelpers.h:79 #12 0x481015 in connSocketEventHandler.lto_priv.394 /workplace/harkrisp/valkey/src/socket.c:301 #13 0x7d3fb3 in aeProcessEvents /workplace/harkrisp/valkey/src/ae.c:486 #14 0x7d4d44 in aeMain /workplace/harkrisp/valkey/src/ae.c:543 #15 0x453925 in main /workplace/harkrisp/valkey/src/server.c:7319 #16 0x7f2900cd7139 in __libc_start_main (/lib64/libc.so.6+0x21139) ``` Note: For filter ID / NOT-ID we group all the option and perform filtering whereas for remaining filters we only pick the last filter option. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com> (cherry picked from commit 155b0bb) Signed-off-by: cherukum-amazon <cherukum@amazon.com>
enjoy-binbin
pushed a commit
that referenced
this pull request
Feb 8, 2026
) I was working on ASAN large memory tests when I countered this issue. The issue was that the hardcoded `999` key could land in an early bucket. Then shrink rehash could finish early, and later inserts could trigger a new expansion rehash, resetting rehash_idx low. The test now picks the survivor key dynamically as the key mapped to the highest bucket index. ``` [test_hashtable.c] Memory leak detected of 336 bytes ================================================================= ==3901==ERROR: LeakSanitizer: detected memory leaks Direct leak of 80 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x563bfdf4c47d in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:156 #2 0x563bfdf4c47d in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:185 #3 0x563bfdd42eaf in hashtableCreate /home/runner/work/valkey/valkey/src/hashtable.c:1217 #4 0x563bfdaa1cbf in test_empty_buckets_rehashing unit/test_hashtable.c:232 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 #6 0x563bfda86b20 in main unit/test_main.c:108 #7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1446 #5 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1433 #6 0x563bfdd45eb1 in insert /home/runner/work/valkey/valkey/src/hashtable.c:1041 #7 0x563bfdd45eb1 in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 #8 0x563bfdd45eb1 in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 #9 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 #10 0x563bfdae772b in runTestSuite unit/test_main.c:36 #11 0x563bfda86b20 in main unit/test_main.c:108 #12 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #13 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #14 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd3f553 in bucketConvertToChained /home/runner/work/valkey/valkey/src/hashtable.c:908 #4 0x563bfdd3f553 in findBucketForInsert /home/runner/work/valkey/valkey/src/hashtable.c:1021 #5 0x563bfdd45d9e in insert /home/runner/work/valkey/valkey/src/hashtable.c:1045 #6 0x563bfdd45d9e in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 #7 0x563bfdd45d9e in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 #8 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 #9 0x563bfdae772b in runTestSuite unit/test_main.c:36 #10 0x563bfda86b20 in main unit/test_main.c:108 #11 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #12 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #13 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 #4 0x563bfdaa1df8 in test_empty_buckets_rehashing unit/test_hashtable.c:248 #5 0x563bfdae772b in runTestSuite unit/test_main.c:36 #6 0x563bfda86b20 in main unit/test_main.c:108 #7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) #9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) SUMMARY: AddressSanitizer: 336 byte(s) leaked in 4 allocation(s). ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
harrylin98
referenced
this pull request
in harrylin98/valkey_forked
Feb 12, 2026
Remove C unit test framework
harrylin98
referenced
this pull request
in harrylin98/valkey_forked
Feb 19, 2026
…lkey-io#3174) I was working on ASAN large memory tests when I countered this issue. The issue was that the hardcoded `999` key could land in an early bucket. Then shrink rehash could finish early, and later inserts could trigger a new expansion rehash, resetting rehash_idx low. The test now picks the survivor key dynamically as the key mapped to the highest bucket index. ``` [test_hashtable.c] Memory leak detected of 336 bytes ================================================================= ==3901==ERROR: LeakSanitizer: detected memory leaks Direct leak of 80 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x563bfdf4c47d in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:156 #2 0x563bfdf4c47d in valkey_malloc /home/runner/work/valkey/valkey/src/zmalloc.c:185 #3 0x563bfdd42eaf in hashtableCreate /home/runner/work/valkey/valkey/src/hashtable.c:1217 valkey-io#4 0x563bfdaa1cbf in test_empty_buckets_rehashing unit/test_hashtable.c:232 valkey-io#5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 valkey-io#4 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1446 valkey-io#5 0x563bfdd45eb1 in hashtableExpandIfNeeded /home/runner/work/valkey/valkey/src/hashtable.c:1433 valkey-io#6 0x563bfdd45eb1 in insert /home/runner/work/valkey/valkey/src/hashtable.c:1041 valkey-io#7 0x563bfdd45eb1 in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#8 0x563bfdd45eb1 in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#9 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#10 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#11 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#12 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#14 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd3f553 in bucketConvertToChained /home/runner/work/valkey/valkey/src/hashtable.c:908 valkey-io#4 0x563bfdd3f553 in findBucketForInsert /home/runner/work/valkey/valkey/src/hashtable.c:1021 valkey-io#5 0x563bfdd45d9e in insert /home/runner/work/valkey/valkey/src/hashtable.c:1045 valkey-io#6 0x563bfdd45d9e in hashtableAddOrFind /home/runner/work/valkey/valkey/src/hashtable.c:1554 valkey-io#7 0x563bfdd45d9e in hashtableAdd /home/runner/work/valkey/valkey/src/hashtable.c:1539 valkey-io#8 0x563bfdaa1e3b in test_empty_buckets_rehashing unit/test_hashtable.c:254 valkey-io#9 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#10 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#11 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#12 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#13 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) Indirect leak of 64 byte(s) in 1 object(s) allocated from: #0 0x7fb0556fd340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x563bfdf4c922 in ztrycalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:214 #2 0x563bfdf4c922 in valkey_calloc /home/runner/work/valkey/valkey/src/zmalloc.c:257 #3 0x563bfdd40967 in resize /home/runner/work/valkey/valkey/src/hashtable.c:741 valkey-io#4 0x563bfdaa1df8 in test_empty_buckets_rehashing unit/test_hashtable.c:248 valkey-io#5 0x563bfdae772b in runTestSuite unit/test_main.c:36 valkey-io#6 0x563bfda86b20 in main unit/test_main.c:108 valkey-io#7 0x7fb05522a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#8 0x7fb05522a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e) valkey-io#9 0x563bfda8a5c4 in _start (/home/runner/work/valkey/valkey/src/valkey-unit-tests+0x17c5c4) (BuildId: 44cfc183e6e82e499bcc9f6adc094d7f774ee9d2) SUMMARY: AddressSanitizer: 336 byte(s) leaked in 4 allocation(s). ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
enjoy-binbin
pushed a commit
that referenced
this pull request
May 7, 2026
CI caught ip and name SDS allocations being leaked in fetchClusterConfiguration. The ip SDS was copied again via sdsnew() before being passed to createClusterNode(), leaking the original. The name SDS was leaked when the node already existed in the dict. Free ip and name on all exit paths in fetchClusterConfiguration. Remove stale guard in freeClusterNode, no longer needed since #1392 CI Error - ``` Direct leak of 33 byte(s) in 3 object(s) allocated from: #0 0x7f4c3a0fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x5564620c124a in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:172 #2 0x5564620c124a in zmalloc_usable /home/runner/work/valkey/valkey/src/zmalloc.c:268 #3 0x5564620dfbe6 in _sdsnewlen.constprop.0 /home/runner/work/valkey/valkey/src/sds.c:102 #4 0x556462050996 in sdsnewlen /home/runner/work/valkey/valkey/src/sds.c:169 #5 0x556462050996 in sdsnew /home/runner/work/valkey/valkey/src/sds.c:185 #6 0x556462050996 in fetchClusterConfiguration /home/runner/work/valkey/valkey/src/valkey-benchmark.c:1477 ``` Issue was reproduceable locally using `leaks --atExit` Signed-off-by: nmvk <r@nmvk.com>
lucasyonge
pushed a commit
that referenced
this pull request
May 11, 2026
CI caught ip and name SDS allocations being leaked in fetchClusterConfiguration. The ip SDS was copied again via sdsnew() before being passed to createClusterNode(), leaking the original. The name SDS was leaked when the node already existed in the dict. Free ip and name on all exit paths in fetchClusterConfiguration. Remove stale guard in freeClusterNode, no longer needed since #1392 CI Error - ``` Direct leak of 33 byte(s) in 3 object(s) allocated from: #0 0x7f4c3a0fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x5564620c124a in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:172 #2 0x5564620c124a in zmalloc_usable /home/runner/work/valkey/valkey/src/zmalloc.c:268 #3 0x5564620dfbe6 in _sdsnewlen.constprop.0 /home/runner/work/valkey/valkey/src/sds.c:102 #4 0x556462050996 in sdsnewlen /home/runner/work/valkey/valkey/src/sds.c:169 #5 0x556462050996 in sdsnew /home/runner/work/valkey/valkey/src/sds.c:185 #6 0x556462050996 in fetchClusterConfiguration /home/runner/work/valkey/valkey/src/valkey-benchmark.c:1477 ``` Issue was reproduceable locally using `leaks --atExit` Signed-off-by: nmvk <r@nmvk.com>
lucasyonge
pushed a commit
that referenced
this pull request
May 12, 2026
CI caught ip and name SDS allocations being leaked in fetchClusterConfiguration. The ip SDS was copied again via sdsnew() before being passed to createClusterNode(), leaking the original. The name SDS was leaked when the node already existed in the dict. Free ip and name on all exit paths in fetchClusterConfiguration. Remove stale guard in freeClusterNode, no longer needed since #1392 CI Error - ``` Direct leak of 33 byte(s) in 3 object(s) allocated from: #0 0x7f4c3a0fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x5564620c124a in ztrymalloc_usable_internal /home/runner/work/valkey/valkey/src/zmalloc.c:172 #2 0x5564620c124a in zmalloc_usable /home/runner/work/valkey/valkey/src/zmalloc.c:268 #3 0x5564620dfbe6 in _sdsnewlen.constprop.0 /home/runner/work/valkey/valkey/src/sds.c:102 #4 0x556462050996 in sdsnewlen /home/runner/work/valkey/valkey/src/sds.c:169 #5 0x556462050996 in sdsnew /home/runner/work/valkey/valkey/src/sds.c:185 #6 0x556462050996 in fetchClusterConfiguration /home/runner/work/valkey/valkey/src/valkey-benchmark.c:1477 ``` Issue was reproduceable locally using `leaks --atExit` Signed-off-by: nmvk <r@nmvk.com>
GilboaAWS
pushed a commit
to GilboaAWS/valkey
that referenced
this pull request
Jun 24, 2026
Update architecture.md - Fix Mermaid Parsing error
GilboaAWS
pushed a commit
to GilboaAWS/valkey
that referenced
this pull request
Jun 24, 2026
Addresses 5 of 6 review comments on the QSBR design. Comment valkey-io#6 (`compressionJob.key` extra-lookup concern) is explicitly deferred to a follow-up PR per reviewer guidance. Comment #1 (line 428) and valkey-io#5 (line 544) — drop language-comparison framing: Removed all references to Rust / `Arc<T>` / "memory-safe languages" / `shared_ptr` from §4.4 intro, the "Why QSBR" bullet list, and the §4.6 "Why the worker loads the active dict itself" paragraph. The rationale now stands on its own technical merit (decoupling the registry from worker hot paths; minimal worker contract; safe-directional failure modes) rather than via comparison to another language's type system. C with explicit protocols is the right tool for this problem; the comparison added rhetorical weight without adding signal. Comment valkey-io#2 (line 326) — duplication with R2.11.4: §3.3 Separation invariants restated the worker contract that R2.11.4 already specifies authoritatively. Slimmed the §3.3 bullet to a one-liner that points at §2.11 R2.11.4 and §4.4. Eliminates drift risk between the two places. Comment valkey-io#3 (line 439) — bound the retiring list, block on cap: Added new step 7 to the QSBR section explaining the cap interaction with R2.3.3. The retiring list is a subset of `dicts[]`, capped at `compression-dict-max-versions`. When grace-barrier draining cannot keep up (worker starvation, persistent `frame_refs > 0`), the cap is reached and BOTH training AND promotion are refused per R2.3.3: `LL_WARNING` log entry, `compression_dict_cap_reached` set in INFO, operator intervention required (raise cap or run COMPRESSION SWEEP). Comment valkey-io#4 (line 449) — grace-barrier wake-up via cond_broadcast: The original step 6 proposed enqueueing barrier jobs into the SPMC inbox to force idle workers to advance generations. This doesn't actually work: under work-stealing semantics a single fast worker can drain all barrier jobs while siblings stay asleep on the cond var. Rewrote step 6 to use a wake-all primitive built on `pthread_cond_broadcast`, and added a "Wake-all primitive" paragraph to §4.6 that describes extending `mutexqueue.h` with two new APIs: a broadcast wake-all (for QSBR grace barriers, config changes, etc.) and a shutdown-signal variant (for pool teardown). Step 6 cross-references §4.6 for the mechanism. Comment valkey-io#6 (line 513) — DEFERRED: Reviewer flagged that `compressionJob.key` (a `robj *` carried in the job) implies the main thread does an additional lookup at install time, doubling the per-write lookup cost. The reviewer explicitly tagged this as "follow up PR" — addressing it would require a redesign of the install-side data flow and is out of scope for the QSBR design change. Tracked as an open item; will be addressed before code lands for the install path (S2.7 in the implementation plan).
GilboaAWS
pushed a commit
to GilboaAWS/valkey
that referenced
this pull request
Jun 24, 2026
* [S2.7] Compression write-path hook
Wires compressionEnqueueCandidate into dbAddInternal and dbSetValue,
and replaces the TODO(S2.7) placeholder in the drain handler with a
real install path. With this change, writes to eligible STRING values
get queued for background compression and the result is installed back
into the kvstore as an OBJ_ENCODING_COMPRESSED robj.
The decoder (S2.6) is shipped but not yet wired into read paths (S2.8),
so as long as compression-enabled stays no (default), behavior is
unchanged. Once an operator turns the switch on, written values get
compressed, but reads return the compressed bytes until S2.8 lands.
Existing transparency tests verify no regression in the default-off
configuration.
Producer side (compression.c, db.c)
Two seams in db.c — end of dbAddInternal and end of dbSetValue —
call compressionEnqueueCandidate(key, value, db->id). The candidate
function applies four guards:
1. Master switch (compression_enabled, via compressionIsEligible).
2. R2.2 eligibility (type/encoding/size/hot-key — also via predicate).
3. R2.1.5 active-dict check — saves an allocator round-trip when
compression-enabled=yes but training hasn't completed.
4. incrRefCount(value) — pins the bytes for the worker AND
reserves the robj address for the drain handler's pointer-
equality stale check (ABA-safe per R2.4.4 + the lifetime
discussion in PR valkey-io#18).
If the worker pool refuses (not started; future S2.11 inbox full),
the pin is released immediately. RDB-load enqueue is deliberately
skipped — TODO(S2.10): the sweep tick will rediscover RDB-loaded
values without hammering the inbox during load.
API change: compressionWorkersEnqueue
Old: compressionWorkersEnqueue(sds key, int dbid, uint64_t version, sds src)
New: compressionWorkersEnqueue(robj *value, int dbid)
The new form requires a pinned robj; the worker reads
objectGetVal(value) once at enqueue (captured into job->src) and
never touches the robj afterwards (R2.11.4 intact). The drain
handler uses job->value for the kvstore lookup and the pointer-
equality stale check.
The version field is gone — pointer equality, made ABA-safe by the
pin, is sufficient. R2.4.4 explains why: holding incrRefCount(value)
prevents the allocator from reusing the address while the job is
in flight.
Drain install (compression_workers.c)
New compressionInstall() helper:
1. void **slot = kvstoreHashtableFindRef(db->keys, didx, key_sds);
2. If slot == NULL OR *slot != job->value: stale (overwrite, expire,
or COW). Discard.
3. Else: createCompressedObject(OBJ_STRING, job->dst, job->dst_len);
dbReplaceValue installs.
4. compressionRegistryIncRef(job->dict_id) on success.
dbReplaceValue routes through dbSetValue(..., overwrite=0, ...),
which does NOT call signalModifiedKey, moduleNotifyKeyUnlink, or
signalDeletedKeyAsReady. Background compression is a storage-only
change per R2.9.2 — no WATCH dirty_cas, no client-side-caching
invalidations, no keyspace notifications.
Pin released on every drain completion path (success, stale-discard,
net-savings reject, ZSTD error, no-active-dict). Test-mode jobs
(job->value == NULL) skip both install and decRef.
Test migration
The 15 existing test-fixture call sites passed raw sds + dummy
version. Migrated to a new testOnlyCompressionWorkersEnqueueRaw(src,
dbid) that sets job->value = NULL. Tests extract jobs via
testOnlyCompressionWorkersDrainOutbox before the production drain
runs, so production-only paths (install, decRef) are never reached
by the value=NULL sentinel.
No new gtest cases for the install path itself — that requires a
fully-initialized server.db / kvstore that the unit-test environment
doesn't construct. End-to-end coverage will come from the Tcl
transparency harness once S2.8 wires the read path.
TODO(S4.1) markers added at:
- compressionInstall: compression_compressions_per_sec, EMA fold,
compression_compressed_objects.
- compressionEnqueueCandidate: compression_candidates_dropped_total
when S2.11 lands (today the pool-not-started rejection is a
config state, not back-pressure).
Verified locally:
- make -j2 -C src → clean (BUILD_ZSTD=yes default).
- make -j2 -C src BUILD_ZSTD=no → clean.
- ./runtest --single unit/type/compression → 10/10 pass.
gtest unit tests not runnable locally; CI validates.
Diff stat:
.../implementation/plan.md | 4 +-
src/compression.c | 35 +++-
src/compression.h | 27 ++-
src/compression_workers.c | 185 +++++++++++++++------
src/compression_workers.h | 56 +++----
src/db.c | 14 ++
src/unit/test_compression_workers.cpp | 31 ++--
7 files changed, 244 insertions(+), 108 deletions(-)
* [S2.7] PR valkey-io#19 review: assert + design-doc alignment
Two reviewer threads addressed:
Thread #1 (T-3369017721) — production code carrying test concerns
The drain handler had a `if (job->value == NULL)` branch that only
existed to handle test-only jobs from
testOnlyCompressionWorkersEnqueueRaw. Reviewer correctly pointed out
that production code shouldn't carry test-only branches.
Fix: replaced with serverAssert(job->value != NULL) at the top of
the per-job loop. Production drain assumes every job has a real
pinned robj; tests must extract their value=NULL jobs via
testOnlyCompressionWorkersDrainOutbox before this drain runs.
Side effect: removed the conditional `if (job->value != NULL)`
guards around decrRefCount and the install branch — the top-of-loop
assert means every code path can assume value is non-NULL.
Thread valkey-io#2 (T-3356207626) — design doc out of sync with implementation
Design §4.6 still described the original version-counter approach
for staleness detection (`uint64_t version` field on compressionJob,
"if version counter moved, discard"). The implementation has used
pointer equality + the incrRefCount-pin since S2.4 PR valkey-io#13.
Fix: updated §4.6 to:
- compressionJob struct: drop `version`, drop `robj *key`, add
`robj *value` (pinned via incrRefCount), and `sds src` and
`int dbid` separately, matching the actual struct.
- Concurrency notes: replaced the "version counter moved" bullet
with the pointer-equality + ABA-safety reasoning, naming the
incrRefCount-reserves-the-address invariant as the protection
mechanism (same property explained in PR valkey-io#18 review).
Verified locally:
- make -j2 -C src → clean
- ./runtest --single unit/type/compression → 10/10 pass
* [S2.7] Fix CI: remove erroneous & on server.db indexing
build-32bit (and the 30+ downstream cells, all CI cells use -Werror):
compression_workers.c:531:20: error: initialization of 'serverDb *'
from incompatible pointer type 'serverDb **'
[-Werror=incompatible-pointer-types]
`server.db` is `serverDb **` (array of pointers, one per DB). So
`server.db[i]` is already `serverDb *` — the address-of operator was
redundant and produced `serverDb **`.
Fix: drop the `&`. Matches the pattern used everywhere else in the
codebase (db.c, server.c, etc.).
Local make didn't catch this — the default SERVER_CFLAGS doesn't
include -Werror. CI does. Built locally with `make SERVER_CFLAGS=-Werror`
to confirm clean.
* [S2.7] Fix CI: tests must use testOnly drain for value=NULL jobs
5 gtest cases failed on build-32bit (and would on every test cell)
with the new production-drain serverAssert(job->value != NULL):
ASSERTION FAILED: compression_workers.c:591 'job->value != NULL'
in: SingleJobRoundTrip, BurstOf256JobsOneWorker,
BurstOf1024JobsFourWorkers, ResizeAcrossEnqueuedJobs,
NetSavingsGuardRejectsIncompressible
Root cause: the previous commit's reviewer-driven hardening (PR valkey-io#19
review thread #1) made the production drain assert that every job
has a non-NULL pinned robj. The premise was "tests use the testOnly
drain to extract jobs before the production drain runs". That premise
was wrong — many tests ALSO call compressionWorkersDrainOutbox
directly to consume-and-dispose test-mode jobs (the drainUntil helper
is the most-used path).
Fix: add testOnlyCompressionWorkersDrainAndDispose(budget) — pulls
jobs via the existing testOnlyCompressionWorkersDrainOutbox, frees
them via testOnlyCompressionWorkersFreeJob, returns count. Migrate
the test fixture's drainUntil helper and all 8 direct
compressionWorkersDrainOutbox call sites in the test file to the
new helper.
Production drain stays clean — no test concerns. Reviewer thread #1
intent preserved.
Verified locally:
- make -j2 -C src SERVER_CFLAGS=-Werror → clean
- ./runtest --single unit/type/compression → 10/10 pass
GilboaAWS
pushed a commit
to GilboaAWS/valkey
that referenced
this pull request
Jun 24, 2026
* [Topic-2 PR-A] COMPRESSION DICT-IMPORT + runtime dict-generation test infra (R2.3.10) Implements the minimal preshared-dictionary import surface so integration tests can run before S1.x training lands. R2.3.10 + §4.5 in the design doc: operator base64-encodes a ZSTD-trained dictionary and installs it via `COMPRESSION DICT-IMPORT <base64-bytes>`. The new dict is promoted as active; the previous active is retired through the existing registry path. The blocker this solves: `compressionEnqueueCandidate` early-returns when `compressionRegistryActive() == NULL`, so without a trained dict the entire write/sweep path is a no-op. Tests that exercise end-to-end compression behaviour (S2.7 write hook, S2.8 read hook, S2.9 sweeper) need a way to install a dict; this PR is that way. S1.x's full training implementation (BIO_COMPRESSION_TRAIN + ZDICT_trainFromBuffer on the bio thread) lands separately on the @GilboaAWS track. Implementation -------------- Hyphenated subcommand `DICT-IMPORT` (CLUSTER COUNT-FAILURE-REPORTS precedent — RESP doesn't have nested subcommand containers). Validation, in order: - Base64 decoding (private static `base64Decode` in compression.c; standard alphabet, optional `=` padding, whitespace rejected). - 4-byte ZSTD magic 0xEC30A437 — rejects raw-content prefixes and other non-trained bytes. Exotic operators with raw prefixes will have to find another route; the 99% case is "I trained a dict, I'm importing it" and that case wants real validation. - `ZSTD_createCDict` / `ZSTD_createDDict` — these accept arbitrary bytes as raw prefixes (never return NULL on garbage), so the magic check above is the actual content-validity gate. The ZSTD calls remain as belt-and-suspenders for OOM and similar. - `compressionRegistryAdd(pair, promote=1)` — same path a trained dict will use once S1.x lands. Reply: integer dict_id on success, RESP error on rejection. INFO renderer ------------- Replaced the `compression_active_dict_id:0` and `compression_known_ dicts:0` placeholders with live values via the new registry accessor `compressionRegistryGetKnownCount()`. Operators running this server can now see imported dicts immediately in `INFO compression` / `COMPRESSION STATUS`. Other field placeholders (compressed_objects, ratio, etc.) remain at 0 until later S2 PRs land their counters. Test infrastructure: runtime dict generation -------------------------------------------- Static dict fixtures don't scale to the test matrix the project needs (per-shape dicts for JSON / kv / log workloads, drift testing where the dict was trained on shape A but workload arrives as shape B, retraining cycles where dict A is replaced by dict B). Shipping multiple ~10 KiB binaries under tests/assets/ would bloat the repo and still not cover the drift case. Instead, we generate dicts at test time, on demand, parametrized by data shape. The dict generator MUST be external to valkey-server. If we used a server-side test command (e.g. DEBUG COMPRESSION TRAIN-FROM-BYTES), a bug in the server's training plumbing could mask itself — both the test fixture and the production training path would share code and exhibit the same bug. The infrastructure landed here uses a separate process that calls only ZDICT_trainFromBuffer directly: - tests/helpers/gen-zstd-dict.c (new): Standalone helper binary. Reads samples from stdin in a simple binary protocol (4-byte big-endian length + N bytes per sample, repeated until EOF), trains a ZSTD dictionary via ZDICT_trainFromBuffer, writes the trained dict to a path passed on argv. Links against the same vendored deps/zstd/libzstd.a as valkey-server, so ZDICT API behaviour matches what production will use, but runs in a separate process with no shared memory or globals with the SUT. - src/Makefile: Adds tests/helpers/gen-zstd-dict to ALL_BUILD_PREREQUISITES when BUILD_ZSTD=yes (gated by the same ifeq block that controls the feature itself). BUILD_ZSTD=no skips it — there's no compression feature to test. clean target updated. - tests/support/compression-helpers.tcl (new): Sample generators (gen_kv_samples, gen_json_samples, gen_log_samples) producing reproducible per-seed sample lists. gen_drifted_samples mixes two shapes by a `drift` fraction in [0,1] for drift / retraining tests. train_dict_from_samples pipes samples through the helper binary; import_dict is the convenience wrapper that trains + base64-encodes + sends COMPRESSION DICT-IMPORT. - tests/test_helper.tcl: source the new support file. - tests/unit/type/compression.tcl: the existing "import a real trained dict" Tcl test now generates samples + trains at test time instead of reading a static fixture. New "drift mixer sanity" test verifies the gen_drifted_samples helper itself. - tests/assets/test-compression.dict: deleted (no longer needed). Tests ----- gtest (392 total, +1 new under CompressionRegistryTest): - GetKnownCountTracksAddsAndCapEnforcement — verifies the new accessor moves with each Add and stays at the cap on rejection. Tcl (28 total, +6 new under unit/type/compression): - Rejects malformed base64. - Rejects valid base64 without ZSTD magic ("hello world"). - Rejects payloads smaller than the magic header. - Validates arity at the command-table level. - Imports a real runtime-trained dict, verifies INFO reflects active_dict_id+known_dicts, second import promotes new + retires previous (count=2). - Smoke-tests gen_drifted_samples (verifies pure-A / pure-B / 50-50 mixing produce shape-distinct outputs). Verification ------------ - 392 gtests pass (was 391 — +1 new). - 28 Tcl tests pass (was 22 — +6 new). - BUILD_ZSTD=yes and BUILD_ZSTD=no both clean with -Werror. - gen-zstd-dict helper builds only when BUILD_ZSTD=yes and is invoked correctly by the Tcl wrapper end-to-end. Out of scope for this PR ------------------------ - DICT-EXPORT (R2.3.10 mentions both; symmetric implementation is a small follow-up once we have one operator who needs it). - DICT-LIST / DICT-DROP (§4.5; pending S4.x observability work). - Real training (S1.x — @GilboaAWS track). - Topic-2 PR-B (compression-stress.tcl) — the integration stress test that USES this command. Lands next. * [Topic-2 PR-B] compression-stress.tcl integration tests + strEncoding fix (valkey-io#34) * [Topic-2 PR-A] COMPRESSION DICT-IMPORT + runtime dict-generation test infra (R2.3.10) (valkey-io#33) Implements the minimal preshared-dictionary import surface so integration tests can run before S1.x training lands. R2.3.10 + §4.5 in the design doc: operator base64-encodes a ZSTD-trained dictionary and installs it via `COMPRESSION DICT-IMPORT <base64-bytes>`. The new dict is promoted as active; the previous active is retired through the existing registry path. The blocker this solves: `compressionEnqueueCandidate` early-returns when `compressionRegistryActive() == NULL`, so without a trained dict the entire write/sweep path is a no-op. Tests that exercise end-to-end compression behaviour (S2.7 write hook, S2.8 read hook, S2.9 sweeper) need a way to install a dict; this PR is that way. S1.x's full training implementation (BIO_COMPRESSION_TRAIN + ZDICT_trainFromBuffer on the bio thread) lands separately on the @GilboaAWS track. Implementation -------------- Hyphenated subcommand `DICT-IMPORT` (CLUSTER COUNT-FAILURE-REPORTS precedent — RESP doesn't have nested subcommand containers). Validation, in order: - Base64 decoding (private static `base64Decode` in compression.c; standard alphabet, optional `=` padding, whitespace rejected). - 4-byte ZSTD magic 0xEC30A437 — rejects raw-content prefixes and other non-trained bytes. Exotic operators with raw prefixes will have to find another route; the 99% case is "I trained a dict, I'm importing it" and that case wants real validation. - `ZSTD_createCDict` / `ZSTD_createDDict` — these accept arbitrary bytes as raw prefixes (never return NULL on garbage), so the magic check above is the actual content-validity gate. The ZSTD calls remain as belt-and-suspenders for OOM and similar. - `compressionRegistryAdd(pair, promote=1)` — same path a trained dict will use once S1.x lands. Reply: integer dict_id on success, RESP error on rejection. INFO renderer ------------- Replaced the `compression_active_dict_id:0` and `compression_known_ dicts:0` placeholders with live values via the new registry accessor `compressionRegistryGetKnownCount()`. Operators running this server can now see imported dicts immediately in `INFO compression` / `COMPRESSION STATUS`. Other field placeholders (compressed_objects, ratio, etc.) remain at 0 until later S2 PRs land their counters. Test infrastructure: runtime dict generation -------------------------------------------- Static dict fixtures don't scale to the test matrix the project needs (per-shape dicts for JSON / kv / log workloads, drift testing where the dict was trained on shape A but workload arrives as shape B, retraining cycles where dict A is replaced by dict B). Shipping multiple ~10 KiB binaries under tests/assets/ would bloat the repo and still not cover the drift case. Instead, we generate dicts at test time, on demand, parametrized by data shape. The dict generator MUST be external to valkey-server. If we used a server-side test command (e.g. DEBUG COMPRESSION TRAIN-FROM-BYTES), a bug in the server's training plumbing could mask itself — both the test fixture and the production training path would share code and exhibit the same bug. The infrastructure landed here uses a separate process that calls only ZDICT_trainFromBuffer directly: - tests/helpers/gen-zstd-dict.c (new): Standalone helper binary. Reads samples from stdin in a simple binary protocol (4-byte big-endian length + N bytes per sample, repeated until EOF), trains a ZSTD dictionary via ZDICT_trainFromBuffer, writes the trained dict to a path passed on argv. Links against the same vendored deps/zstd/libzstd.a as valkey-server, so ZDICT API behaviour matches what production will use, but runs in a separate process with no shared memory or globals with the SUT. - src/Makefile: Adds tests/helpers/gen-zstd-dict to ALL_BUILD_PREREQUISITES when BUILD_ZSTD=yes (gated by the same ifeq block that controls the feature itself). BUILD_ZSTD=no skips it — there's no compression feature to test. clean target updated. - tests/support/compression-helpers.tcl (new): Sample generators (gen_kv_samples, gen_json_samples, gen_log_samples) producing reproducible per-seed sample lists. gen_drifted_samples mixes two shapes by a `drift` fraction in [0,1] for drift / retraining tests. train_dict_from_samples pipes samples through the helper binary; import_dict is the convenience wrapper that trains + base64-encodes + sends COMPRESSION DICT-IMPORT. - tests/test_helper.tcl: source the new support file. - tests/unit/type/compression.tcl: the existing "import a real trained dict" Tcl test now generates samples + trains at test time instead of reading a static fixture. New "drift mixer sanity" test verifies the gen_drifted_samples helper itself. - tests/assets/test-compression.dict: deleted (no longer needed). Tests ----- gtest (392 total, +1 new under CompressionRegistryTest): - GetKnownCountTracksAddsAndCapEnforcement — verifies the new accessor moves with each Add and stays at the cap on rejection. Tcl (28 total, +6 new under unit/type/compression): - Rejects malformed base64. - Rejects valid base64 without ZSTD magic ("hello world"). - Rejects payloads smaller than the magic header. - Validates arity at the command-table level. - Imports a real runtime-trained dict, verifies INFO reflects active_dict_id+known_dicts, second import promotes new + retires previous (count=2). - Smoke-tests gen_drifted_samples (verifies pure-A / pure-B / 50-50 mixing produce shape-distinct outputs). Verification ------------ - 392 gtests pass (was 391 — +1 new). - 28 Tcl tests pass (was 22 — +6 new). - BUILD_ZSTD=yes and BUILD_ZSTD=no both clean with -Werror. - gen-zstd-dict helper builds only when BUILD_ZSTD=yes and is invoked correctly by the Tcl wrapper end-to-end. Out of scope for this PR ------------------------ - DICT-EXPORT (R2.3.10 mentions both; symmetric implementation is a small follow-up once we have one operator who needs it). - DICT-LIST / DICT-DROP (§4.5; pending S4.x observability work). - Real training (S1.x — @GilboaAWS track). - Topic-2 PR-B (compression-stress.tcl) — the integration stress test that USES this command. Lands next. * [Topic-2 PR-B] compression-stress.tcl integration tests + 2 prerequisite fixes Adds tests/integration/compression.tcl — first end-to-end exercise of the merged S2.x stack against a real workload. Builds on top of PR-A (valkey-io#33)'s runtime dict-generation infrastructure: each test imports a freshly-trained dict via tests/support/compression-helpers.tcl, then exercises one specific behaviour of the hot path. Six test cases: 1. Write-path round trip. master=compression + sweeper=disabled. SET a compressible value, poll until OBJECT ENCODING reports "compressed", verify GET round-trips the original bytes through the read-path transient view (R2.5.7). 2. Sweeper compresses pre-existing keys. master=off + populate 100 RAW keys, then flip to master=compression + sweeper=enabled. Wait for ALL keys compressed; verify EVERY value round-trips (no spot-checks). 3. Decompression drain. Continuing from valkey-io#2's compressed state, flip to master=decompression + sweeper=enabled. Wait for ALL keys drained back to RAW; verify EVERY value round-trips. 4. COMPRESSION SWEEP FORCE end-to-end. master=compression + sweeper=disabled (manual-only). Populate uncompressed, then run COMPRESSION SWEEP FORCE. Wait for ALL keys to be compressed by a single forced pass; verify EVERY value round-trips. 5. Mixed workload preserves data integrity under live sweeper. master=compression + sweeper=enabled + 50% pacing. Populate 200 keys, run 500 random ops (GET/SET/APPEND/SETRANGE/DEL). 6. Ineligibility — values outside the size envelope and hot keys. Verifies the eligibility predicate (R2.2): values below compression-min-value-size, values above compression-max-value-size, and freshly-written hot keys must NOT be compressed even with the sweeper running at maximum cadence. Prerequisite fix 1: src/object.c strEncoding() ================================================ OBJECT ENCODING was returning "unknown" for compressed values because strEncoding() didn't have a case for OBJ_ENCODING_COMPRESSED. Design R2.7.1 requires it returns "compressed". One-line fix that slipped through earlier S2 PRs. Prerequisite fix 2: src/compression.c compressionEnqueueCandidate() ==================================================================== Use-after-free caught by AddressSanitizer on the first PR-B CI run. The eligibility predicate accepts encoding==RAW; a value currently in transient-view state (R2.5.7) reads as RAW because val_ptr is the per-iteration temp uncompressed sds. compressionEnqueueCandidate would then capture job->src = temp_sds, which restoreTransientEntry frees at beforeSleep — leaving the worker thread's job->src dangling into freed memory. Fix: gate the enqueue on !transientViewActive(value). Skipping the enqueue is functionally harmless — a value in transient view is already compressed (the original frame is saved in the side-map and will be restored at beforeSleep). One line added at the top of compressionEnqueueCandidate, with an explanatory comment naming the exact ASan trace it fixes. Tests: 392 gtests + 34 Tcl tests pass (28 unit + 6 integration). Both BUILD_ZSTD={yes,no} build clean with -Werror. Verified the asan fix locally by rebuilding with -fsanitize=address and re-running the integration suite — no use-after-free.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Actually testing everything works.