Skip to content

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Aug 13, 2025

Fix #14267
This bug was introduced by #13495

Summary

When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process.
If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted.
The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation.

Code Path:

replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked()

Solution

This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete.

Crash Report

Logged crash report (pid 795456):
=== REDIS BUG REPORT START: Cut & paste starting from here ===
795456:S 13 Aug 2025 23:48:58.091 # ------------------------------------------------
795456:S 13 Aug 2025 23:48:58.091 # !!! Software Failure. Press left mouse button to continue
795456:S 13 Aug 2025 23:48:58.091 # Guru Meditation: illegal decrRefCount for object with: type 0, encoding 0, refcount 0 #object.c:594

------ STACK TRACE ------

795458 bio_close_file
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7efceda98d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7efceda9b7ed]
src/redis-server 127.0.0.1:21113(bioProcessBackgroundJobs+0x1ea)[0x55ba7619ec9a]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7efceda9caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7efcedb29c3c]

795459 bio_aof
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7efceda98d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7efceda9b7ed]
src/redis-server 127.0.0.1:21113(bioProcessBackgroundJobs+0x1ea)[0x55ba7619ec9a]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7efceda9caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7efcedb29c3c]

795456 redis-server *
src/redis-server 127.0.0.1:21113(+0xe454f)[0x55ba760e854f]
src/redis-server 127.0.0.1:21113(+0x25f889)[0x55ba76263889]
src/redis-server 127.0.0.1:21113(+0x262c35)[0x55ba76266c35]
src/redis-server 127.0.0.1:21113(rdbLoadWithEmptyFunc+0xfc)[0x55ba7611b45c]
src/redis-server 127.0.0.1:21113(readSyncBulkPayload+0xa2c)[0x55ba76108a8c]
src/redis-server 127.0.0.1:21113(+0x227844)[0x55ba7622b844]
src/redis-server 127.0.0.1:21113(aeMain+0xf9)[0x55ba76097a19]
src/redis-server 127.0.0.1:21113(main+0x4a7)[0x55ba760918b7]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7efceda2a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7efceda2a28b]
src/redis-server 127.0.0.1:21113(_start+0x25)[0x55ba76093135]

795460 bio_lazy_free
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7efceda98d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7efceda9b7ed]
src/redis-server 127.0.0.1:21113(bioProcessBackgroundJobs+0x1ea)[0x55ba7619ec9a]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7efceda9caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7efcedb29c3c]

4/4 expected stacktraces.

------ STACK TRACE DONE ------

------ INFO OUTPUT ------
# Server
redis_version:255.255.255
redis_git_sha1:3fa7a656
redis_git_dirty:1
redis_build_id:4245fdb463b9f9f8
redis_mode:standalone
os:Linux 6.8.0-71-generic x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:13.3.0
process_id:795456
process_supervised:no
run_id:3e7185b71f41e574eb92b621f6f22a247f0e50cb
tcp_port:21113
server_time_usec:1755100138083040
uptime_in_seconds:8
uptime_in_days:0
hz:100
configured_hz:100
lru_clock:10269674
executable:/home/sundb/data/rf_2/src/redis-server
config_file:/home/sundb/data/rf_2/./tests/tmp/redis.conf.795389.6
io_threads_active:0
listener0:name=tcp,bind=127.0.0.1,port=21113
listener1:name=unix,bind=/home/sundb/data/rf_2/tests/tmp/server.795389.5/socket

# Clients
connected_clients:2
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:32
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
pubsub_clients:0
watching_clients:0
clients_in_timeout_table:0
total_watched_keys:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0

# Memory
used_memory:42492976
used_memory_human:40.52M
used_memory_rss:115081216
used_memory_rss_human:109.75M
used_memory_peak:103072816
used_memory_peak_human:98.30M
used_memory_peak_time:1755100135
used_memory_peak_perc:41.23%
used_memory_overhead:10976456
used_memory_startup:651512
used_memory_dataset:31516520
used_memory_dataset_perc:75.32%
allocator_allocated:54865448
allocator_active:103927808
allocator_resident:114491392
allocator_muzzy:0
total_system_memory:66516320256
total_system_memory_human:61.95G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:192
used_memory_scripts:192
used_memory_scripts_human:192B
maxmemory:104857600
maxmemory_human:100.00M
maxmemory_policy:noeviction
allocator_frag_ratio:1.90
allocator_frag_bytes:48986328
allocator_rss_ratio:1.10
allocator_rss_bytes:10563584
rss_overhead_ratio:1.01
rss_overhead_bytes:589824
mem_fragmentation_ratio:2.12
mem_fragmentation_bytes:60746992
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_replica_full_sync_buffer:0
mem_clients_slaves:0
mem_clients_normal:4864
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:73
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:1
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:750000
rdb_bgsave_in_progress:0
rdb_last_save_time:1755100130
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
loading_start_time:1755100138
loading_total_bytes:177
loading_rdb_used_mem:0
loading_loaded_bytes:0
loading_loaded_perc:0.00
loading_eta_seconds:1

# Threads
io_thread_0:clients=2,reads=605914,writes=605913

# Stats
total_connections_received:3
total_commands_processed:750016
instantaneous_ops_per_sec:109854
total_net_input_bytes:29084246
total_net_output_bytes:4006708
total_net_repl_input_bytes:264
total_net_repl_output_bytes:0
instantaneous_input_kbps:2682.02
instantaneous_output_kbps:429.12
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_subkeys:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
evicted_scripts:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:32441
active_defrag_misses:1379
active_defrag_key_hits:14014
active_defrag_key_misses:1580
total_active_defrag_time:38
current_active_defrag_time:38
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:605914
total_writes_processed:605913
io_threaded_reads_processed:0
io_threaded_writes_processed:0
io_threaded_total_prefetch_batches:0
io_threaded_total_prefetch_entries:0
client_query_buffer_limit_disconnections:0
client_output_buffer_limit_disconnections:0
reply_buffer_shrinks:3
reply_buffer_expands:2
eventloop_cycles:606226
eventloop_duration_sum:3785411
eventloop_duration_cmd_sum:319753
instantaneous_eventloop_cycles_per_sec:86612
instantaneous_eventloop_duration_usec:5
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

# Replication
role:slave
master_host:127.0.0.1
master_port:21112
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:1
slave_read_repl_offset:1
slave_repl_offset:1
replica_full_sync_buffer_size:0
replica_full_sync_buffer_peak:0
master_current_sync_attempts:1
master_total_sync_attempts:1
master_sync_total_bytes:0
master_sync_read_bytes:217
master_sync_left_bytes:-217
master_sync_perc:0.00
master_sync_last_io_seconds_ago:0
master_link_down_since_seconds:-1
total_disconnect_time_sec:0
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:2195379e33eb69559471c4a9e947b60ac29bd6ad
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:3.534392
used_cpu_user:0.987811
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:3.534152
used_cpu_user_main_thread:0.987744

# Modules
module:name=vectorset,ver=1,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors|handle-repl-async-load]

# Commandstats
cmdstat_select:calls=2,usec=3,usec_per_call=1.50,rejected_calls=0,failed_calls=0
cmdstat_del:calls=250000,usec=103551,usec_per_call=0.41,rejected_calls=0,failed_calls=0
cmdstat_info:calls=1,usec=115,usec_per_call=115.00,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=1,usec=4,usec_per_call=4.00,rejected_calls=0,failed_calls=0
cmdstat_setrange:calls=500000,usec=215954,usec_per_call=0.43,rejected_calls=0,failed_calls=0
cmdstat_replicaof:calls=1,usec=113,usec_per_call=113.00,rejected_calls=0,failed_calls=0
cmdstat_dbsize:calls=2,usec=0,usec_per_call=0.00,rejected_calls=0,failed_calls=0
cmdstat_config|set:calls=8,usec=10,usec_per_call=1.25,rejected_calls=0,failed_calls=0
cmdstat_config|get:calls=1,usec=3,usec_per_call=3.00,rejected_calls=0,failed_calls=0

# Errorstats

# Latencystats
latency_percentiles_usec_select:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_del:p50=0.001,p99=1.003,p99.9=1.003
latency_percentiles_usec_info:p50=115.199,p99=115.199,p99.9=115.199
latency_percentiles_usec_ping:p50=4.015,p99=4.015,p99.9=4.015
latency_percentiles_usec_setrange:p50=0.001,p99=2.007,p99.9=4.015
latency_percentiles_usec_replicaof:p50=113.151,p99=113.151,p99.9=113.151
latency_percentiles_usec_dbsize:p50=0.001,p99=0.001,p99.9=0.001
latency_percentiles_usec_config|set:p50=1.003,p99=3.007,p99.9=3.007
latency_percentiles_usec_config|get:p50=3.007,p99=3.007,p99.9=3.007

# Cluster
cluster_enabled:0

# Keyspace
db9:keys=189318,expires=0,avg_ttl=0,subexpiry=0

# Keysizes
db9_distrib_strings_sizes:128=250000

------ CLIENT LIST OUTPUT ------
id=5 addr=127.0.0.1:35771 laddr=127.0.0.1:21113 fd=12 name= age=8 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=2048 rbp=1024 obl=0 oll=0 omem=0 tot-mem=2944 events=r cmd=info user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 tot-net-in=617 tot-net-out=6696 tot-cmds=14
id=6 addr=127.0.0.1:44391 laddr=127.0.0.1:21113 fd=13 name= age=8 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=76 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=del user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 tot-net-in=29083358 tot-net-out=4000005 tot-cmds=750001

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
sanitize-dump-payload no
lazyfree-lazy-user-del no
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
client-query-buffer-limit 1gb
lazyfree-lazy-server-del no
proto-max-bulk-len 512mb
io-threads 1
repl-diskless-load disabled
repl-diskless-sync yes
lazyfree-lazy-user-flush no
slave-read-only yes
activedefrag yes
replica-read-only yes
list-compress-depth 0

------ FAST MEMORY TEST ------
795456:S 13 Aug 2025 23:48:58.093 # Bio worker thread #0 terminated
795456:S 13 Aug 2025 23:48:58.094 # Bio worker thread #1 terminated
795456:S 13 Aug 2025 23:48:58.094 # Bio worker thread #2 terminated
*** Preparing to test memory region 55ba7641c000 (2322432 bytes)
*** Preparing to test memory region 55ba8cc3f000 (135168 bytes)
*** Preparing to test memory region 7efcdea00000 (90177536 bytes)
*** Preparing to test memory region 7efce4000000 (135168 bytes)
*** Preparing to test memory region 7efce8000000 (16777216 bytes)
*** Preparing to test memory region 7efce90fb000 (9437184 bytes)
*** Preparing to test memory region 7efce99fc000 (8388608 bytes)
*** Preparing to test memory region 7efcea1fd000 (8388608 bytes)
*** Preparing to test memory region 7efcea9fe000 (8388608 bytes)
*** Preparing to test memory region 7efceb1ff000 (8388608 bytes)
*** Preparing to test memory region 7efceba00000 (8388608 bytes)
*** Preparing to test memory region 7efcec200000 (10485760 bytes)
*** Preparing to test memory region 7efced200000 (8388608 bytes)
*** Preparing to test memory region 7efcedc05000 (53248 bytes)
*** Preparing to test memory region 7efcee07a000 (16384 bytes)
*** Preparing to test memory region 7efcee234000 (28672 bytes)
*** Preparing to test memory region 7efcee369000 (8192 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/redis/redis/issues

  If a Redis module was involved, please open in the module's repo instead.

  Suspect RAM error? Use redis-server --test-memory to verify it.

  Some other issues could be detected by redis-server --check-system

@sundb sundb requested a review from oranagra August 13, 2025 15:59
@snyk-io
Copy link

snyk-io bot commented Aug 13, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@sundb sundb requested a review from tezc August 13, 2025 15:59
@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Aug 13, 2025
@kaplanben
Copy link

kaplanben commented Aug 13, 2025

Logo
Checkmarx One – Scan Summary & Detailsb9e9d3cd-c411-4da4-a1b7-20d33fd9e45a

New Issues (8)

Checkmarx found the following issues in this Pull Request

Severity Issue Source File / Package Checkmarx Insight
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/sha1.c: 65
detailsThe buffer buffer created in /src/sha1.c at line 65 is written to a buffer in /src/sha1.c at line 65 by block, but an error in calculating the al...
ID: N9gGLsUP8UQvFZEl1N39fgD7jYQ%3D
Attack Vector
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/redis-cli.c: 3677
detailsThe buffer buf created in /src/redis-cli.c at line 3677 is written to a buffer in /deps/hiredis/sds.c at line 234 by newsh, but an error in calc...
ID: %2BpSSxZAM7xfUiads1egmyYebO5I%3D
Attack Vector
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /deps/linenoise/linenoise.c: 1200
detailsThe buffer buf created in /deps/linenoise/linenoise.c at line 1200 is written to a buffer in /deps/hiredis/sds.c at line 97 by sh, but an error i...
ID: oykVSjUcVC%2FEMplDwW4P3YG7%2FzE%3D
Attack Vector
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/redis-cli.c: 3677
detailsThe buffer buf created in /src/redis-cli.c at line 3677 is written to a buffer in /deps/hiredis/sds.c at line 234 by hdrlen, but an error in cal...
ID: zN%2FI3F1XTVrKpHuopU6EZZmWXt4%3D
Attack Vector
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/redis-cli.c: 10594
detailsThe buffer argv created in /src/redis-cli.c at line 10594 is written to a buffer in /deps/hiredis/sds.c at line 97 by sh, but an error in calcul...
ID: eStOv%2FTaWfWWBCJCCgzT7mgYJU0%3D
Attack Vector
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /deps/linenoise/linenoise.c: 1166
detailsThe buffer fgetc created in /deps/linenoise/linenoise.c at line 1166 is written to a buffer in /deps/hiredis/sds.c at line 97 by sh, but an error...
ID: v3h9G7I8PLSutWNyC8k4gGzAdDA%3D
Attack Vector
MEDIUM Divide_By_Zero /modules/vector-sets/fastjson_test.c: 121
detailsThe application performs an illegal operation in generate_random_string, in /modules/vector-sets/fastjson_test.c. In line 121, the program at...
ID: qiowoZ%2FDUFf8wA3ZCvKY8M0GHks%3D
Attack Vector
MEDIUM Divide_By_Zero /src/redis-cli.c: 6040
detailsThe application performs an illegal operation in clusterManagerNodeMasterRandom, in /src/redis-cli.c. In line 6053, the program attempts to divi...
ID: Wdmj3BiFZXbdNClmOY%2Fr1waYywk%3D
Attack Vector
Fixed Issues (4)

Great job! The following issues were fixed in this Pull Request

Severity Issue Source File / Package
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/redis-cli.c: 3677
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/redis-cli.c: 3677
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/redis-cli.c: 3677
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/redis-cli.c: 3677

@sundb sundb requested a review from Copilot August 14, 2025 03:32

This comment was marked as outdated.

sundb and others added 4 commits August 14, 2025 11:39
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sundb sundb requested a review from Copilot August 14, 2025 03:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical bug where active defragmentation could be triggered during replica database flush operations, causing crashes due to concurrent database modifications. The fix temporarily disables active defrag before emptying the database and restores the setting afterward.

  • Prevents active defrag from running during replicationEmptyDbCallback() execution
  • Adds comprehensive test coverage to verify the fix works correctly
  • Resolves crashes caused by defrag modifying the database while it's being emptied

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/replication.c Adds temporary disabling of active defrag around database flush operation
tests/unit/memefficiency.tcl Adds test case to reproduce and verify the fix for issue #14267

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

sundb and others added 3 commits August 14, 2025 11:46
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

@oranagra
Copy link
Member

i see #13495 is part of 8.2, so do we really need to backport that to 8.0?

@sundb
Copy link
Collaborator Author

sundb commented Aug 14, 2025

i see #13495 is part of 8.2, so do we really need to backport that to 8.0?

@oranagra i saw that it was introduced in 8.0-M01.

@oranagra
Copy link
Member

odd. the PR is part of the 8.2 project..

@sundb sundb merged commit 46a3efa into redis:unstable Aug 15, 2025
19 of 20 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Redis 8.2 Aug 15, 2025
@sundb sundb deleted the defrag-when-empty branch August 15, 2025 03:25
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Aug 18, 2025
…s#14274)

Fix redis#14267
This bug was introduced by redis#13495

### Summary

When a replica clears a large database, it periodically calls
processEventsWhileBlocked() in the replicationEmptyDbCallback() callback
during the key deletion process.
If defragmentation is enabled, this means that active defrag can be
triggered while the database is being deleted.
The defragmentation process may also modify the database at this time,
which could lead to crashes when the database is accessed after
defragmentation.

Code Path:
```
replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked()
```

### Solution

This PR temporarily disables active defrag before emptying the database,
then restores the active defrag setting after the empty is complete.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@YaacovHazan YaacovHazan mentioned this pull request Aug 18, 2025
YaacovHazan pushed a commit that referenced this pull request Aug 18, 2025
Fix #14267
This bug was introduced by #13495

### Summary

When a replica clears a large database, it periodically calls
processEventsWhileBlocked() in the replicationEmptyDbCallback() callback
during the key deletion process.
If defragmentation is enabled, this means that active defrag can be
triggered while the database is being deleted.
The defragmentation process may also modify the database at this time,
which could lead to crashes when the database is accessed after
defragmentation.

Code Path:
```
replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked()
```

### Solution

This PR temporarily disables active defrag before emptying the database,
then restores the active defrag setting after the empty is complete.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sundb sundb removed this from Redis 8.2 Aug 19, 2025
@sundb sundb moved this from Todo to Done in Redis 8.4 Aug 19, 2025
@sundb sundb moved this from Todo to Done in Redis 8.2 Backport Aug 19, 2025
sundb added a commit that referenced this pull request Sep 2, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by #13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Sep 28, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Sep 29, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
sundb added a commit to YaacovHazan/redis that referenced this pull request Sep 30, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[CRASH] Redis 8.0.2 slave keep crashing when it startup.

4 participants