Skip to content

[CRASH] Redis cluster.c:1946 crashed by signal 11, si_code: 1 #12441

@Congyu-Liu

Description

@Congyu-Liu

We found a crash (segfault) in 7.0.11 in a 3-node cluster. Addr2line indicates that it happens at cluster.c:1946, where curmaster is NULL when being dereferenced. But this bug seems flaky so I cannot provide concrete instructions to reproduce.

Here's the bug report.

Crash report


=== REDIS BUG REPORT START: Cut & paste starting from here ===
12:M 25 Jul 2023 20:16:44.449 # Redis 7.0.11 crashed by signal: 11, si_code: 1
12:M 25 Jul 2023 20:16:44.449 # Accessing address: 0x84c
12:M 25 Jul 2023 20:16:44.449 # Crashed running the instruction at: 0x6b8b8d

------ STACK TRACE ------
EIP:
/bin/redis-server *:6379 [cluster](clusterUpdateSlotsConfigWith+0x56d)[0x6b8b8d]

Backtrace:
/bin/redis-server *:6379 [cluster](sigsegvHandler+0x2c9)[0x6a1f49]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7ffff7e58420]
/bin/redis-server *:6379 [cluster](clusterUpdateSlotsConfigWith+0x56d)[0x6b8b8d]
/bin/redis-server *:6379 [cluster](clusterProcessPacket+0x3157)[0x6be317]
/bin/redis-server *:6379 [cluster](clusterReadHandler+0x3c4)[0x6c2ec4]
/bin/redis-server *:6379 [cluster][0x808cf1]
/bin/redis-server *:6379 [cluster](aeProcessEvents+0x82f)[0x50ca7f]
/bin/redis-server *:6379 [cluster](aeMain+0x3d)[0x50d75d]
/bin/redis-server *:6379 [cluster](main+0x1274)[0x53fce4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ffff7c5b083]
/bin/redis-server *:6379 [cluster](__libc_start_main+0x493)[0x8ec063]
/bin/redis-server *:6379 [cluster](_start+0x2e)[0x450a1e]

------ REGISTERS ------
12:M 25 Jul 2023 20:16:44.479 #
RAX:0000000000000100 RBX:00007fffffffd2e0
RCX:0000000000004ccc RDX:00000000000001a4
RDI:000000000000084c RSI:0000000000001554
RBP:00007fffffffd3b0 RSP:00007fffffff51c0
R8 :0000000000000030 R9 :0000000000000000
R10:0000000000000002 R11:0000000000000001
R12:0000000000000000 R13:000061d000006e80
R14:00007fffffff51c0 R15:0000000000001555
RIP:00000000006b8b8d EFL:0000000000010246
CSGSFS:002b000000000033
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51cf) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51ce) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51cd) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51cc) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51cb) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51ca) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c9) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c8) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c7) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c6) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c5) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c4) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c3) -> 0000000000000000
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c2) -> 00000000006b8620
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c1) -> 0000000000956f2b
12:M 25 Jul 2023 20:16:44.479 # (00007fffffff51c0) -> 0000000041b58ab3

------ INFO OUTPUT ------
# Server
redis_version:7.0.11
redis_git_sha1:391aa407
redis_git_dirty:0
redis_build_id:e1cfd8d84705f27e
redis_mode:cluster
os:Linux 5.15.68 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:4.2.1
process_id:12
process_supervised:no
run_id:87417ce7414ac80853927717729141f42aee787a
tcp_port:6379
server_time_usec:1690316204438089
uptime_in_seconds:9
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12594604
executable:/bin/redis-server
config_file:
io_threads_active:0

# Clients
connected_clients:0
cluster_connections:2
maxclients:4064
client_recent_max_input_buffer:16390
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:1555608
used_memory_human:1.48M
used_memory_rss:21348352
used_memory_rss_human:20.36M
used_memory_peak:1590828
used_memory_peak_human:1.52M
used_memory_peak_perc:97.79%
used_memory_overhead:1189834
used_memory_startup:1189649
used_memory_dataset:365774
used_memory_dataset_perc:99.95%
allocator_allocated:1544425
allocator_active:21316608
allocator_resident:21316608
total_system_memory:2078552064
total_system_memory_human:1.94G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:181
used_memory_scripts:181
used_memory_scripts_human:181B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:13.80
allocator_frag_bytes:19772183
allocator_rss_ratio:1.00
allocator_rss_bytes:0
rss_overhead_ratio:1.00
rss_overhead_bytes:31744
mem_fragmentation_ratio:13.82
mem_fragmentation_bytes:19803927
mem_not_counted_for_evict:0
mem_replication_backlog:4
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:0
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:libc
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1690316195
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:19
total_commands_processed:29
instantaneous_ops_per_sec:10
total_net_input_bytes:54920
total_net_output_bytes:16941
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.67
instantaneous_output_kbps:0.33
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:2
sync_partial_ok:0
sync_partial_err:2
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:7
dump_payload_sanitizations:0
total_reads_processed:56
total_writes_processed:33
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:3
reply_buffer_expands:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:558fce6645c495d52247a26acde2ed2f21712fa6
master_replid2:f0462026e8365b312f488046ab9e66cd078891d3
master_repl_offset:0
second_repl_offset:1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.044078
used_cpu_user:0.087177
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.044010
used_cpu_user_main_thread:0.087043

# Modules

# Commandstats
cmdstat_ping:calls=2,usec=33,usec_per_call=16.50,rejected_calls=0,failed_calls=0
cmdstat_info:calls=3,usec=445,usec_per_call=148.33,rejected_calls=0,failed_calls=0
cmdstat_cluster|set-config-epoch:calls=1,usec=15,usec_per_call=15.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|nodes:calls=5,usec=1536,usec_per_call=307.20,rejected_calls=0,failed_calls=0
cmdstat_cluster|meet:calls=2,usec=384,usec_per_call=192.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|info:calls=1,usec=90,usec_per_call=90.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|keyslot:calls=1,usec=40,usec_per_call=40.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|reset:calls=1,usec=40073,usec_per_call=40073.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|setslot:calls=1,usec=34,usec_per_call=34.00,rejected_calls=0,failed_calls=1
cmdstat_cluster|countkeysinslot:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0
cmdstat_cluster|myid:calls=4,usec=116,usec_per_call=29.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|addslots:calls=1,usec=536,usec_per_call=536.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|delslotsrange:calls=1,usec=225,usec_per_call=225.00,rejected_calls=0,failed_calls=0
cmdstat_psync:calls=2,usec=323,usec_per_call=161.50,rejected_calls=0,failed_calls=0
cmdstat_rename:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0
cmdstat_dump:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0
cmdstat_set:calls=0,usec=0,usec_per_call=0.00,rejected_calls=3,failed_calls=0
cmdstat_replconf:calls=4,usec=70,usec_per_call=17.50,rejected_calls=0,failed_calls=0

# Errorstats
errorstat_CLUSTERDOWN:count=2
errorstat_ERR:count=3
errorstat_MOVED:count=2

# Latencystats
latency_percentiles_usec_ping:p50=12.031,p99=21.119,p99.9=21.119
latency_percentiles_usec_info:p50=174.079,p99=201.727,p99.9=201.727
latency_percentiles_usec_cluster|set-config-epoch:p50=15.039,p99=15.039,p99.9=15.039
latency_percentiles_usec_cluster|nodes:p50=354.303,p99=360.447,p99.9=360.447
latency_percentiles_usec_cluster|meet:p50=172.031,p99=212.991,p99.9=212.991
latency_percentiles_usec_cluster|info:p50=90.111,p99=90.111,p99.9=90.111
latency_percentiles_usec_cluster|keyslot:p50=40.191,p99=40.191,p99.9=40.191
latency_percentiles_usec_cluster|reset:p50=40108.031,p99=40108.031,p99.9=40108.031
latency_percentiles_usec_cluster|setslot:p50=34.047,p99=34.047,p99.9=34.047
latency_percentiles_usec_cluster|myid:p50=20.095,p99=50.175,p99.9=50.175
latency_percentiles_usec_cluster|addslots:p50=536.575,p99=536.575,p99.9=536.575
latency_percentiles_usec_cluster|delslotsrange:p50=225.279,p99=225.279,p99.9=225.279
latency_percentiles_usec_psync:p50=115.199,p99=208.895,p99.9=208.895
latency_percentiles_usec_replconf:p50=7.007,p99=39.167,p99.9=39.167

# Cluster
cluster_enabled:1

# Keyspace

------ CLIENT LIST OUTPUT ------

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
activedefrag no
repl-diskless-sync yes
lazyfree-lazy-eviction no
client-query-buffer-limit 1gb
lazyfree-lazy-user-flush no
lazyfree-lazy-expire no
io-threads 1
io-threads-do-reads no
repl-diskless-load disabled
lazyfree-lazy-user-del no
replica-read-only yes
slave-read-only yes
sanitize-dump-payload no
proto-max-bulk-len 512mb
lazyfree-lazy-server-del no
list-compress-depth 0

------ FAST MEMORY TEST ------
12:M 25 Jul 2023 20:16:44.497 # Bio thread for job type #0 terminated
12:M 25 Jul 2023 20:16:44.497 # Bio thread for job type #1 terminated
12:M 25 Jul 2023 20:16:44.499 # Bio thread for job type #2 terminated
*** Preparing to test memory region b4d000 (9236480 bytes)
*** Preparing to test memory region 7fff7000 (268435456 bytes)
*** Preparing to test memory region 2008fff7000 (15392867561472 bytes)
*** Preparing to test memory region 10007e66a000 (1044480 bytes)
*** Preparing to test memory region 10007e769000 (12288 bytes)
*** Preparing to test memory region 10007e76c000 (1044480 bytes)
*** Preparing to test memory region 10007e86b000 (12288 bytes)
*** Preparing to test memory region 10007e86e000 (1044480 bytes)
*** Preparing to test memory region 10007e96d000 (23638016 bytes)
*** Preparing to test memory region 602000000000 (458752 bytes)
*** Preparing to test memory region 602e00000000 (65536 bytes)
*** Preparing to test memory region 603000000000 (2031616 bytes)
*** Preparing to test memory region 603e00000000 (65536 bytes)
*** Preparing to test memory region 604000000000 (1507328 bytes)
*** Preparing to test memory region 604e00000000 (65536 bytes)
*** Preparing to test memory region 606000000000 (65536 bytes)
*** Preparing to test memory region 606e00000000 (65536 bytes)
*** Preparing to test memory region 607000000000 (65536 bytes)
*** Preparing to test memory region 607e00000000 (65536 bytes)
*** Preparing to test memory region 608000000000 (65536 bytes)
*** Preparing to test memory region 608e00000000 (65536 bytes)
*** Preparing to test memory region 60b000000000 (65536 bytes)
*** Preparing to test memory region 60be00000000 (65536 bytes)
*** Preparing to test memory region 60c000000000 (65536 bytes)
*** Preparing to test memory region 60ce00000000 (65536 bytes)
*** Preparing to test memory region 60d000000000 (65536 bytes)
*** Preparing to test memory region 60de00000000 (65536 bytes)
*** Preparing to test memory region 60e000000000 (65536 bytes)
*** Preparing to test memory region 60ee00000000 (65536 bytes)
*** Preparing to test memory region 60f000000000 (65536 bytes)
*** Preparing to test memory region 60fe00000000 (65536 bytes)
*** Preparing to test memory region 610000000000 (65536 bytes)
*** Preparing to test memory region 610e00000000 (65536 bytes)
*** Preparing to test memory region 611000000000 (65536 bytes)
*** Preparing to test memory region 611e00000000 (65536 bytes)
*** Preparing to test memory region 612000000000 (65536 bytes)
*** Preparing to test memory region 612e00000000 (65536 bytes)
*** Preparing to test memory region 613000000000 (65536 bytes)
*** Preparing to test memory region 613e00000000 (65536 bytes)
*** Preparing to test memory region 614000000000 (65536 bytes)
*** Preparing to test memory region 614e00000000 (65536 bytes)
*** Preparing to test memory region 615000000000 (65536 bytes)
*** Preparing to test memory region 615e00000000 (65536 bytes)
*** Preparing to test memory region 616000000000 (65536 bytes)
*** Preparing to test memory region 616e00000000 (65536 bytes)
*** Preparing to test memory region 617000000000 (65536 bytes)
*** Preparing to test memory region 617e00000000 (65536 bytes)
*** Preparing to test memory region 618000000000 (65536 bytes)
*** Preparing to test memory region 618e00000000 (65536 bytes)
*** Preparing to test memory region 619000000000 (131072 bytes)
*** Preparing to test memory region 619e00000000 (65536 bytes)
*** Preparing to test memory region 61a000000000 (65536 bytes)
*** Preparing to test memory region 61ae00000000 (65536 bytes)
*** Preparing to test memory region 61b000000000 (65536 bytes)
*** Preparing to test memory region 61be00000000 (65536 bytes)
*** Preparing to test memory region 61c000000000 (65536 bytes)
*** Preparing to test memory region 61ce00000000 (65536 bytes)
*** Preparing to test memory region 61d000000000 (65536 bytes)
*** Preparing to test memory region 61de00000000 (65536 bytes)
*** Preparing to test memory region 61e000000000 (65536 bytes)
*** Preparing to test memory region 61ee00000000 (65536 bytes)
*** Preparing to test memory region 621000000000 (524288 bytes)
*** Preparing to test memory region 621e00000000 (65536 bytes)
*** Preparing to test memory region 624000000000 (65536 bytes)
*** Preparing to test memory region 624000010000 (655360 bytes)
*** Preparing to test memory region 624e00000000 (65536 bytes)
*** Preparing to test memory region 625000000000 (65536 bytes)
*** Preparing to test memory region 625e00000000 (65536 bytes)
*** Preparing to test memory region 629000000000 (983040 bytes)
*** Preparing to test memory region 629e00000000 (65536 bytes)
*** Preparing to test memory region 62a000000000 (720896 bytes)
*** Preparing to test memory region 62ae00000000 (65536 bytes)
*** Preparing to test memory region 62b000000000 (458752 bytes)
*** Preparing to test memory region 62be00000000 (65536 bytes)
*** Preparing to test memory region 62d000000000 (131072 bytes)
*** Preparing to test memory region 62de00000000 (65536 bytes)
*** Preparing to test memory region 62e000000000 (196608 bytes)
*** Preparing to test memory region 62ee00000000 (65536 bytes)
*** Preparing to test memory region 62f000000000 (65536 bytes)
*** Preparing to test memory region 62fe00000000 (65536 bytes)
*** Preparing to test memory region 631000000000 (131072 bytes)
*** Preparing to test memory region 631e00000000 (65536 bytes)
*** Preparing to test memory region 640000000000 (12288 bytes)
*** Preparing to test memory region 7ffff3390000 (8388608 bytes)
*** Preparing to test memory region 7ffff3b9f000 (8388608 bytes)
*** Preparing to test memory region 7ffff43ae000 (8388608 bytes)
*** Preparing to test memory region 7ffff4bbc000 (819200 bytes)
*** Preparing to test memory region 7ffff5084000 (135168 bytes)
*** Preparing to test memory region 7ffff50a5000 (33726464 bytes)
*** Preparing to test memory region 7ffff70cf000 (131072 bytes)
*** Preparing to test memory region 7ffff78cf000 (3526656 bytes)
*** Preparing to test memory region 7ffff7c2c000 (4096 bytes)
*** Preparing to test memory region 7ffff7c2d000 (40960 bytes)
*** Preparing to test memory region 7ffff7e25000 (16384 bytes)
*** Preparing to test memory region 7ffff7e63000 (16384 bytes)
*** Preparing to test memory region 7ffff7fc6000 (16384 bytes)
*** Preparing to test memory region 7ffff7ffb000 (4096 bytes)
*** Preparing to test memory region 7ffff7ffe000 (4096 bytes)

Additional information

Steps to reproduce (if any)

The input involves some reconfigurations: at first other 2 nodes become the replica of this node; later one of them failover and becomes the new master; after this node becomes the replica, it gets soft reset (before this it is instructed to send itself a meet message), and eventually crashes when the other replica connects to it.

I am looking into this and trying to figure out the root cause.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions