Skip to content

[CRASH] command Keys * crashed the server when there are more than 10 GB of cached data #486

@paulmchen

Description

@paulmchen

Running "keys *" command crashed keydb server.

Note: it can be reproduced from KeyDB open source when there is a large amount of the data loaded in the db (let's say > 10GB of the cached data). If there are only 500MB of the cached data, "keys *" command works fine.

Crash report

14203:14220:M 29 Aug 2022 16:36:58.128 # Replication backlog is too small, resizing to: 536870912 bytes
14203:14220:M 29 Aug 2022 16:36:58.128 * Switching to disk backed replication backlog due to exceeding memory limits
14203:14220:M 29 Aug 2022 16:40:01.858 * Reclaiming 1879048192 replication backlog bytes
14203:14220:M 29 Aug 2022 16:40:02.035 * Returning to memory backed replication backlog

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
14203:14214:M 29 Aug 2022 16:50:31.869 # KeyDB 255.255.255 crashed by signal: 11, si_code: 1
14203:14214:M 29 Aug 2022 16:50:31.869 # Accessing address: 0x7f4af011893b
14203:14214:M 29 Aug 2022 16:50:31.869 # Crashed running the instruction at: 0x7f5581cb4818

------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libc.so.6(+0x18b818) [0x7f5581cb4818]

Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0) [0x7f5581d2f3c0]
/lib/x86_64-linux-gnu/libc.so.6(+0x18b818) [0x7f5581cb4818]
./keydb-oss 192.168.0.65:7001(ProcessPendingAsyncWrites()+0x28c) [0x55bc2a5887ec]
./keydb-oss 192.168.0.65:7001(AsyncWorkQueue::WorkerThreadMain()+0x9ca) [0x55bc2a68167a]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6d84) [0x7f5581984d84]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f5581d23609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f5581c48163]

------ REGISTERS ------
14203:14214:M 29 Aug 2022 16:50:31.870 #
RAX:00007f5571e68f88 RBX:0000000000000000
RCX:0000000000000000 RDX:fffffffff111874b
RDI:00007f5571e68f88 RSI:00007f4aff000210
RBP:00007f5571e68d00 RSP:00007f5577ffcf58
R8 :0000000000000000 R9 :0000000000001000
R10:0000000000000620 R11:00007f5577fff700
R12:00007f5571e6cf9c R13:00000000f111874b
R14:00007f5577ffcf70 R15:000055bc2a58db80
RIP:00007f5581cb4818 EFL:0000000000010206
CSGSFS:002b000000000033
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf67) -> 000055bc2a9c4760
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf66) -> 00007f5577ffd020
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf65) -> 0000000000139bba
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf64) -> 000055bc2a9c6260
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf63) -> 00007f55806471c0
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf62) -> 0000000000139bba
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf61) -> 000055bc2a9c6260
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf60) -> 8d9342c5aa9f9500
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf5f) -> 000055bc2a9c6260
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf5e) -> 0000000000000000
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf5d) -> 000055bc2a9c6260
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf5c) -> 000055bc2a68c452
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf5b) -> 0000000000139bba
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf5a) -> 00007f558060a080
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf59) -> 00007f5581d2bbb0
14203:14214:M 29 Aug 2022 16:50:31.870 # (00007f5577ffcf58) -> 000055bc2a5887ec

------ INFO OUTPUT ------

Server

redis_version:255.255.255
redis_git_sha1:a3c31897
redis_git_dirty:1
redis_build_id:ff3e899de1d76cef
redis_mode:standalone
os:Linux 5.4.0-66-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:10.3.0
process_id:14203
process_supervised:no
run_id:def22e61a90ddfc24cea0856a0567fee6f48069b
tcp_port:7001
server_time_usec:1661763031870818
uptime_in_seconds:1161
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:818647
executable:/root/paul/DCS3.0-Enterprise-May-RocksDB/./keydb-oss
config_file:/root/paul/DCS3.0-Enterprise-May-RocksDB/./redis_mem_hq3.conf

Clients

connected_clients:1
cluster_connections:0
maxclients:10010
client_recent_max_input_buffer:32
client_recent_max_output_buffer:6442450928
blocked_clients:1
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:1716170698
thread_0_clients:2
thread_1_clients:0
thread_2_clients:0
thread_3_clients:0

Memory

used_memory:23828796592
used_memory_human:22.19G
used_memory_rss:26812321792
used_memory_rss_human:24.97G
used_memory_peak:27880598616
used_memory_peak_human:25.97G
used_memory_peak_perc:85.47%
used_memory_overhead:13517855957
used_memory_startup:3035128
used_memory_dataset:10310940635
used_memory_dataset_perc:43.28%
allocator_allocated:25288792872
allocator_active:25609453568
allocator_resident:27089485824
total_system_memory:135092969472
total_system_memory_human:125.82G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:17179869184
maxmemory_human:16.00G
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.01
allocator_frag_bytes:320660696
allocator_rss_ratio:1.06
allocator_rss_bytes:1480032256
rss_overhead_ratio:0.99
rss_overhead_bytes:-277164032
mem_fragmentation_ratio:1.06
mem_fragmentation_bytes:1522075560
mem_not_counted_for_evict:268439034
mem_replication_backlog:268435456
mem_clients_slaves:20512
mem_clients_normal:6442471437
mem_aof_buffer:3584
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:none

Persistence

loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:229367149
rdb_bgsave_in_progress:0
rdb_last_save_time:1661761901
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:5730304
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:16023540291
aof_base_size:0
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

Stats

total_connections_received:2004
total_commands_processed:229368236
instantaneous_ops_per_sec:4
total_net_input_bytes:13755451140
total_net_output_bytes:17170455798
instantaneous_input_kbps:0.19
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:1
sync_partial_ok:0
sync_partial_err:1
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:57021908
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:229370232
total_writes_processed:230777237
instantaneous_lock_contention:2
avg_lock_contention:0.437500
storage_provider_read_hits:0
storage_provider_read_misses:0

Replication

role:master
connected_slaves:1
slave0:ip=192.168.0.185,port=7001,state=online,offset=16023541747,lag=0
master_failover_state:no-failover
master_replid:f913ef45880a3ad06535ce1f64049c74a62a5ed9
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:16023541747
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:268435456
repl_backlog_first_byte_offset:15755106292
repl_backlog_histlen:268435456

CPU

used_cpu_sys:1080.013429
used_cpu_user:2569.902455
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
server_threads:4
long_lock_waits:40490
used_cpu_sys_main_thread:6.875634
used_cpu_user_main_thread:90.755174

Modules

Commandstats

cmdstat_psync:calls=1,usec=11129,usec_per_call=11129.00,rejected_calls=0,failed_calls=0
cmdstat_info:calls=5,usec=192,usec_per_call=38.40,rejected_calls=0,failed_calls=0
cmdstat_keys:calls=1,usec=158,usec_per_call=158.00,rejected_calls=0,failed_calls=0
cmdstat_command:calls=3,usec=2265,usec_per_call=755.00,rejected_calls=0,failed_calls=0
cmdstat_replping:calls=1,usec=2,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_replconf:calls=1076,usec=1204,usec_per_call=1.12,rejected_calls=0,failed_calls=0
cmdstat_set:calls=229367149,usec=193984905,usec_per_call=0.85,rejected_calls=0,failed_calls=0

Errorstats

Cluster

cluster_enabled:0

Keyspace

db0:keys=170097245,expires=0,avg_ttl=0,cached_keys=170097245

KeyDB

mvcc_depth:1

------ CLIENT LIST OUTPUT ------
id=6 addr=192.168.0.185:37867 laddr=192.168.0.65:7001 fd=25 name= age=1130 idle=0 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20512 events=r cmd=replconf user=default redir=-1
id=2009 addr=192.168.0.65:48694 laddr=192.168.0.65:7001 fd=28 name= age=101 idle=96 flags=b db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=5 obl=0 oll=0 omem=6442450928 tot-mem=6442471437 events=r cmd=keys user=default redir=-1

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
14203:14214:M 29 Aug 2022 16:50:31.882 # main thread terminated
14203:14214:M 29 Aug 2022 16:50:31.882 # Bio thread for job type #0 terminated
14203:14214:M 29 Aug 2022 16:50:31.882 # Bio thread for job type #1 terminated
14203:14214:M 29 Aug 2022 16:50:31.882 # Bio thread for job type #2 terminated

Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /lib/x86_64-linux-gnu/libc.so.6 (base 0x7f5581b29000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin

=== KEYDB BUG REPORT END. Make sure to include from START to END. ===

Aditional information

  1. This is a master/slave configuration. Here is the redis.conf of the master. Can be reproduced after loading more than 10GB of data into the cache. (see the command to reproduce in the next section.

  2. Steps to reproduce (if any)

a. start both master and slave servers
b. run the following memtier command to load large amount of the cached data:

 ./memtier_benchmark -s 127.0.0.1 -p 6379 -t 10 -c 100 -n 150000 --command='SET __key__ __data__' --data-size-range=1-32 --key-minimum=1 --key-maximum=150000000 --command-key-pattern=P 

c. run redis-cli

> keys *

The master server will crash (see stack trace from the crash report).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions