Skip to content

.#24

Merged
zuiderkwast merged 1 commit into
valkey-io:unstablefrom
PlavorSeol:discord
Mar 25, 2024
Merged

.#24
zuiderkwast merged 1 commit into
valkey-io:unstablefrom
PlavorSeol:discord

Conversation

@PlavorSeol

@PlavorSeol PlavorSeol commented Mar 25, 2024

Copy link
Copy Markdown
Contributor

No description provided.

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this link permanent? We don't want to update it every day......

@zuiderkwast zuiderkwast merged commit 601177b into valkey-io:unstable Mar 25, 2024
@PlavorSeol PlavorSeol deleted the discord branch March 25, 2024 09:35
@PingXie PingXie added the polish label Mar 31, 2024
@PlavorSeol PlavorSeol changed the title Update Discord server invite . Apr 12, 2024
zuiderkwast pushed a commit that referenced this pull request Jun 25, 2025
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.

**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:

```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
    #0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
    #1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
    #2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    #3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    #4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    #5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    #6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    #7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    #8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    #9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    #10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    #11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
    #12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)

0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
    #0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
    #1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
    #2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
    #3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
    #4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
    #5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
    #6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
    #7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
    #8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
    #9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
    #10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
    #11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
    #12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
    #13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
    #14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
    #15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    #16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    #17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    #18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    #19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    #20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    #21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    #22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    #23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    #24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)

previously allocated by thread T0 here:
    #0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
    #1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
    #2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
    #3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
    #4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
    #5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
    #6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
    #7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
    #8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
    #9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
    #10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
    #11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
    #12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
    #13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
    #14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    #15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    #16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    #17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    #18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    #19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    #20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    #21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    #22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    #23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)

SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
  0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==46774==ABORTING
```


**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.

**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:

```
start_server {tags {"expire hashtable defrag"}} {
    test {hashtable scan defrag on expiry} {

        r config set hz 100

        set num_keys 20
        for {set i 0} {$i < $num_keys} {incr i} {
            r set "key_$i" "value_$i"
        }

        for {set j 0} {$j < 50} {incr j} {
            set expire_keys 100
            for {set i 0} {$i < $expire_keys} {incr i} {
                # Short expiry time to ensure they expire quickly
                r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
            }

            # Verify keys are set
            set initial_size [r dbsize]
            assert_equal $initial_size [expr $num_keys + $expire_keys]
            
            after 150
            for {set i 0} {$i < 10} {incr i} {
                r get "expire_key_${i}_${j}"
                after 10
            }
        }

        set remaining_keys [r dbsize]
        assert_equal $remaining_keys $num_keys

        # Verify server is still responsive
        assert_equal [r ping] {PONG}
    } {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.

Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
ranshid pushed a commit to ranshid/valkey that referenced this pull request Sep 30, 2025
…y-io#2257)

**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.

**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:

```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
    #0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
    #1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
    #2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    #3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    #4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    #5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    #6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    #7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    #8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    valkey-io#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    valkey-io#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    valkey-io#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
    valkey-io#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)

0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
    #0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
    #1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
    #2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
    #3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
    #4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
    #5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
    #6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
    #7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
    #8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
    valkey-io#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
    valkey-io#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
    valkey-io#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
    valkey-io#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
    valkey-io#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
    valkey-io#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
    valkey-io#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    valkey-io#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    valkey-io#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    valkey-io#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    valkey-io#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    valkey-io#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    valkey-io#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    valkey-io#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    valkey-io#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    valkey-io#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)

previously allocated by thread T0 here:
    #0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
    #1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
    #2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
    #3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
    #4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
    #5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
    #6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
    #7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
    #8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
    valkey-io#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
    valkey-io#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
    valkey-io#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
    valkey-io#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
    valkey-io#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
    valkey-io#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    valkey-io#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    valkey-io#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    valkey-io#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    valkey-io#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    valkey-io#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    valkey-io#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    valkey-io#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    valkey-io#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    valkey-io#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)

SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
  0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==46774==ABORTING
```


**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.

**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:

```
start_server {tags {"expire hashtable defrag"}} {
    test {hashtable scan defrag on expiry} {

        r config set hz 100

        set num_keys 20
        for {set i 0} {$i < $num_keys} {incr i} {
            r set "key_$i" "value_$i"
        }

        for {set j 0} {$j < 50} {incr j} {
            set expire_keys 100
            for {set i 0} {$i < $expire_keys} {incr i} {
                # Short expiry time to ensure they expire quickly
                r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
            }

            # Verify keys are set
            set initial_size [r dbsize]
            assert_equal $initial_size [expr $num_keys + $expire_keys]
            
            after 150
            for {set i 0} {$i < 10} {incr i} {
                r get "expire_key_${i}_${j}"
                after 10
            }
        }

        set remaining_keys [r dbsize]
        assert_equal $remaining_keys $num_keys

        # Verify server is still responsive
        assert_equal [r ping] {PONG}
    } {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.

Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
zuiderkwast pushed a commit that referenced this pull request Oct 1, 2025
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.

**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:

```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
    #0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
    #1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
    #2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    #3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    #4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    #5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    #6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    #7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    #8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    #9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    #10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    #11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
    #12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)

0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
    #0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
    #1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
    #2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
    #3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
    #4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
    #5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
    #6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
    #7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
    #8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
    #9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
    #10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
    #11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
    #12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
    #13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
    #14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
    #15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    #16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    #17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    #18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    #19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    #20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    #21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    #22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    #23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    #24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)

previously allocated by thread T0 here:
    #0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
    #1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
    #2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
    #3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
    #4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
    #5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
    #6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
    #7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
    #8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
    #9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
    #10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
    #11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
    #12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
    #13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
    #14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
    #15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
    #16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
    #17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
    #18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
    #19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
    #20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
    #21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
    #22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
    #23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)

SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
  0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==46774==ABORTING
```


**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.

**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:

```
start_server {tags {"expire hashtable defrag"}} {
    test {hashtable scan defrag on expiry} {

        r config set hz 100

        set num_keys 20
        for {set i 0} {$i < $num_keys} {incr i} {
            r set "key_$i" "value_$i"
        }

        for {set j 0} {$j < 50} {incr j} {
            set expire_keys 100
            for {set i 0} {$i < $expire_keys} {incr i} {
                # Short expiry time to ensure they expire quickly
                r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
            }

            # Verify keys are set
            set initial_size [r dbsize]
            assert_equal $initial_size [expr $num_keys + $expire_keys]
            
            after 150
            for {set i 0} {$i < 10} {incr i} {
                r get "expire_key_${i}_${j}"
                after 10
            }
        }

        set remaining_keys [r dbsize]
        assert_equal $remaining_keys $num_keys

        # Verify server is still responsive
        assert_equal [r ping] {PONG}
    } {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.

Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>
sarthakaggarwal97 pushed a commit that referenced this pull request Jun 23, 2026
#### Purpose

This workflow was originally introduced in PR
[#3358](#3358), where we detect
the failures in our scheduled `daily` runs and create / update github
issues.

We want to do more things with AI with respect to tests failures. It
could include potentially finding the root cause, any PR that broke the
tests, some helpful dashboard to track daily tests, maybe some analysis
or possible fix as well.
To achieve that, we are moving this issue management out of this
repository and into `valkey-ci-agent`.

The Daily workflow in this repository still records per-job test
failures, consolidates them into `all-test-failures.json`, and uploads
the `all-test-failures` artifact. The workflow being removed here was
only responsible for consuming that artifact and creating or updating
GitHub issues.

#### Changes

Remove `.github/workflows/test-failure-detector.yml`.

Issue creation and updates are now handled by the Test Failure Detector
workflow in `valkey-ci-agent` through this PR
[#24](valkey-io/valkey-ci-agent#24).

#### Notes

This should be merged together with the corresponding `valkey-ci-agent`
change so scheduled test-failure detection continues without a gap.

Signed-off-by: Bonnie Chan <bonchan35@gmail.com>
GilboaAWS pushed a commit to GilboaAWS/valkey that referenced this pull request Jun 24, 2026
* [S2.8] Out-of-process explicit decompression (3/3)

PR 3 of 3 in the S2.8 split. Wires explicit objectGetUncompressedView
calls into the two paths that bypass lookupKey() and thus do not
benefit from PR valkey-io#23's transient-view hook:

  - rioWriteBulkObject (aof.c) — AOF rewrite child iterates kvstore
    via kvstoreIteratorNext; AOF on-disk format is uncompressed RESP
    per R2.6.5.
  - rdbSaveStringObject (rdb.c) — same iteration pattern; per R2.6.5
    (AOF preamble within RDB) and R2.6.8 (full-sync replication RDB),
    uncompressed bytes are mandatory.

Both today panic on OBJ_ENCODING_COMPRESSED (rioWriteBulkObject:
serverPanic("Unknown string encoding"); rdbSaveStringObject:
serverAssertWithInfo on sdsEncodedObject). The crash is latent only
because no dictionary training has merged yet (S1.2 pending) — once
training lands, BGSAVE/BGREWRITEAOF on a compressed key crashes the
server. PR 3 is a prerequisite for S1.2 keeping unstable continuously
safe.

R2.6.1 (compressed disk RDB encoding via RDB_ENC_COMPRESSED + AUX
dict entries) is intentionally deferred to S3.1/S3.2 (@GilboaAWS).
Until that lands, the conservative choice for ALL RDB targets is
uncompressed, matching pre-feature behavior. Disk RDBs pay the
decompression cost once at save time and re-acquire compression at
load time via the post-S1.2 sweeper.

Dict lifetime safety in the forked child: registry is a fork-time
snapshot; DDicts are immutable; child uses its own copy of the
registry pointers. No interaction with the parent's QSBR generation
counters.

Replication feed (feedReplicationBufferWithObject) does NOT need
wiring — already verified to operate on argv arguments and synthetic
SELECT robjs only, never on kvstore values.

DUMP/RESTORE/MIGRATE (R2.6.7) go through lookupKey() which now
decompresses transparently per S2.8 PR 2; nothing to change.

Tests deferred to the post-S1.2 transparency Tcl harness — that's
where real compressed values exist and BGSAVE/BGREWRITEAOF
round-trips can be exercised end-to-end.

Verified locally: make -j2 -C src SERVER_CFLAGS=-Werror clean both
with BUILD_ZSTD=yes (default) and BUILD_ZSTD=no; compression Tcl
tests 10/10 pass.

* docs(planning): mark S2.8 fully complete via PR valkey-io#24 (3/3)

The S2.8 forward-reference in plan.md previously named the
out-of-process explicit-decompression work as 'PR 3 of 3 — NEXT'.
This PR closes that out: PR valkey-io#24 IS that work. Update the entry to
reference PR valkey-io#24 and capture the as-implemented scope (which paths,
which deferred to S3.1/S3.2 for R2.6.1 disk-RDB compressed encoding,
which already covered by PR valkey-io#23's transient-view hook).

S2.8 (Read-path hook) is now fully landed across PRs valkey-io#21, valkey-io#22, valkey-io#23,
and valkey-io#24. Next on the @ikolomi track: S2.9, S2.10, S2.11. Next on the
@GilboaAWS track: S1.2 (training) — now safe to merge because S2.8
PR 3 means the forked-child paths (AOF rewrite, RDB save) no longer
panic when they encounter compressed values.
GilboaAWS pushed a commit to GilboaAWS/valkey that referenced this pull request Jun 24, 2026
* test(compression): COW-invariant merge-blocker (S2.13)

Add tests/unit/compression-cow-invariant.tcl, the runtime form of the
R2.4.5 COW-audit checklist and a v1 merge blocker per design §7.2.

The feature relies on two correctness invariants when a value is, or is
being, compressed:

  (1) Decompress-before-mutate. lookupKey(...,LOOKUP_WRITE) permanently
      decompresses any OBJ_ENCODING_COMPRESSED value to RAW before the
      command handler runs, so no in-place byte-mutating command ever
      operates on a compressed frame (which would corrupt bytes or hit
      getDecodedObject()'s serverPanic — COMPRESSED is neither
      sdsEncodedObject nor INT).

  (2) Worker-snapshot immutability (R2.4.4). While a worker reads a
      value's sds bytes, the value is pinned (refcount>=2), so a
      concurrent in-place mutation COWs via dbUnshareStringValue,
      leaving the worker's bytes untouched.

Both are code-discipline, not type-enforced. The test exercises them at
runtime across every in-place string mutator (APPEND, SETRANGE, SETBIT,
BITFIELD SET, GETSET, GETDEL, SET-overwrite), a 200-iteration mutation
storm against the live worker pool, and a transient-view + write-path
interaction. Each asserts the result matches the value semantics
computed independently in Tcl, that compression_errors_total stays 0,
and that the mutated value re-compresses and round-trips.

The audit behind the test found no violators: t_string.c, bitops.c,
hyperloglog.c, module.c (DMA write / truncate / OpenKey), and debug.c
all honor the write-lookup decompress + dbUnshareStringValue discipline.

Auto-discovered via the tests/unit/*.tcl glob; skips cleanly under
BUILD_ZSTD=no (no gen-zstd-dict helper). Compression is reached via
COMPRESSION SWEEP FORCE (the deterministic operator trigger) rather than
async enqueue timing; startup raises compression-dict-max-versions and
flushes for --external shared-server headroom.

* test(compression): clean up shared-server state in COW-invariant test

The COW-invariant test enables compression (master=compression,
sweeper, an imported dict) and leaves compressed values in the
keyspace. In --external shared-server mode this pollutes the server
for every subsequent test file. The next file, unit/other, runs
DEBUG DIGEST, whose computeDatasetDigest -> xorObjectDigest ->
mixStringObjectDigest path iterates the kvstore directly and calls
getDecodedObject() on the leftover compressed value, hitting
serverPanic("Unknown encoding type") (object.c) -> server crash ->
the external CI jobs (test-external-{standalone,cluster,nodebug})
fail with "I/O error reading reply".

Add an end-of-file cleanup test that flushes the keyspace (dropping
all compressed frames) and turns the feature off, leaving the shared
server clean. dict-max-versions is left at 16 (not reset to the
default 4) so later suites that import dicts into the shared registry
have headroom.

This is test hygiene only. The underlying DEBUG DIGEST crash on
compressed values is a real, pre-existing product bug (a kvstore-
direct reader that does not decompress, the same class PR valkey-io#24 fixed
for rdbSaveStringObject / AOF rewrite but missed for the digest
path). It is tracked separately and must be fixed before transparency
mode (full corpus under --compression) lands.

* test(compression): tag COW-invariant test external:skip

The previous cleanup-at-end approach did not fully solve the shared
external-server pollution: this test churns global compression state
and cannot restore a pristine registry (a dict that ever held frames
is not reliably reclaimable today, so it lingers). That broke
unit/type/compression's documented-defaults / clean-registry
assertions, which run after this file in --external mode.

Tag the file external:skip instead. The test deliberately reconfigures
the server globally (master switch, sweeper, imported dict, compressed
frames), which makes it a poor citizen on a shared, externally-managed
server. Its COW-correctness value is fully exercised in normal
(dedicated-server) mode, the primary CI mode, where the server is torn
down between files. This matches the precedent of other stateful tests
(dump, expire, introspection, ...).

Removes the now-unneeded end-of-file cleanup test.

* docs(plan): mark S2.13 (COW audit) + S6.1 (cow-invariant test) done

PR valkey-io#41 delivers the S2.13 COW audit (no violators) and the S6.1
merge-blocker test tests/unit/compression-cow-invariant.tcl. Record the
audit outcome, the external:skip rationale, and the separately-tracked
DEBUG DIGEST / getDecodedObject crash discovered during the audit
(Phase-B blocker, follow-up PR). Update the Phase-A override note to
reflect S4.x (valkey-io#39) and S2.13 (valkey-io#41) as done.

* test(compression): fix stale comments after external:skip change

The startup and cow_configure comments still described running 'against
the same server as the integration suite in --external mode'. With the
file now tagged external:skip that's inaccurate — it runs only against a
fresh dedicated server. Reword to reflect that the startup flush + cap
headroom are defensive clean-slate setup, not shared-server handling.
GilboaAWS pushed a commit to GilboaAWS/valkey that referenced this pull request Jun 24, 2026
* fix(compression): decode compressed values in DEBUG DIGEST readers

DEBUG DIGEST and DEBUG DIGEST-VALUE iterate the kvstore directly
(computeDatasetDigest via kvstoreIteratorNext; DIGEST-VALUE via dbFind)
so a debug command can digest logically-expired keys. They therefore
bypass the lookupKey() transient-view decompression hook and observe raw
OBJ_ENCODING_COMPRESSED robjs.

Both funnel a string value through xorObjectDigest's OBJ_STRING branch
into mixStringObjectDigest / xorStringObjectDigest, which called
getDecodedObject() — that panics on any encoding other than
RAW/EMBSTR/INT (serverPanic 'Unknown encoding type'), crashing the
server on the first compressed value.

Route the two digest helpers through objectGetUncompressedView (R2.5.2),
the single decoder primitive PR valkey-io#24 used for rdbSaveStringObject /
rioWriteBulkObject. For a compressed value the helper decompresses into
a scratch sds (read directly — not via getDecodedObject, since
incrRefCount on the OBJ_STATIC_REFCOUNT view would panic); otherwise it
falls back to getDecodedObject (still handles INT). No-op passthrough
for non-compressed values and for BUILD_ZSTD=no.

The digest is thus defined over logical (decompressed) bytes: identical
whether a value is stored compressed or not. Pre-existing crash on
unstable; a blocker for transparency mode (full Tcl corpus under
--compression, which runs DEBUG DIGEST extensively).

Test tests/unit/compression-debug-readers.tcl (external:skip): compressed
values -> DEBUG DIGEST / DIGEST-VALUE don't crash, digest is byte-identical
compressed vs decompressed, and a DEBUG RELOAD round-trip preserves both
digest and values.

* docs(compression): record S2.14 (DEBUG DIGEST reader fix)

Mark S2.14 done in plan.md (the follow-up to the S2.13 audit finding) and
list DEBUG DIGEST / DIGEST-VALUE among R2.5.7's explicit-decompression
kvstore-direct readers in the design doc.
GilboaAWS added a commit to GilboaAWS/valkey that referenced this pull request Jun 25, 2026
Compressed on-disk RDB (RDB_ENC_COMPRESSED encode/decode, R2.6.1-R2.6.4)
is a local-snapshot optimization — smaller/faster RDB and faster
warm-up to the compressed state after a restart. It is NOT a
correctness requirement and NOT a sync-speed improvement: full-sync
replication is uncompressed in v1 regardless (R2.6.8).

v1 is already correct and memory-safe without it:
  - All RDB save targets (disk, full-sync, AOF preamble, DUMP/MIGRATE)
    emit uncompressed, routing each value through
    objectGetUncompressedView at save time (PR valkey-io#24). Decompression is
    transient and per-value (one throwaway scratch sds at a time; the
    live compressed robj is never mutated), so a full-keyspace save
    never decompresses everything at once — peak overhead is a single
    value <= compression-max-value-size.
  - On load there is nothing compressed to read; the sweeper
    re-compresses in memory afterward.

Changes:
  - plan.md: S3.3 (full-sync uncompressed) and S3.4 (AOF uncompressed)
    marked complete — covered by PR valkey-io#24. S3.1/S3.2 marked deferred and
    added to the §8 v2 list.
  - design §2.6: scope note at the top; R2.6.1-R2.6.4 tagged v2;
    R2.6.8 and the §1 boundary note updated for the v1/v2 split.

In v1 RDB_ENC_COMPRESSED (=4) stays reserved-but-unused and
RDB_VERSION is not bumped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants