Skip to content

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Jul 2, 2025

This PR fixes #14056 (comment)

Summary

Because evport uses eventLoop->events[fd].mask to determine whether to remove the event, but in ae.c we call aeApiDelEvent() before updating eventLoop->events[fd].mask, this causes evport to always see the old value, and as a result, port_dissociate() is never called to remove the fd.
This issue may not surface easily in a non-multithreaded, but since in the multi-threaded case we frequently reassign fds to different threads, it makes the crash much more likely to occur.

Reproduce steps on SmartOS

./src/redis-server --io-threads 2 
./src/redis-cli
CTRL+C # Close redis-cli

Crash report

=== REDIS BUG REPORT START: Cut & paste starting from here ===
286403:M 01 Jul 2025 17:30:28.455 # Redis 8.0.2 crashed by signal: 6, si_code: -1

------ INFO OUTPUT ------
# Server
redis_version:8.0.2
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:7a0f273cdee1da4f
redis_mode:standalone
os:SunOS 5.11 i86pc
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:evport
atomicvar_api:c11-builtin
gcc_version:13.3.0
process_id:286403
process_supervised:no
run_id:22ba5f54089c2aa56adc504ebbdd886da68ba752
tcp_port:6379
server_time_usec:1751405428455589
uptime_in_seconds:2
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:6574964
executable:/opt/local/bin/redis-server
config_file:/opt/local/etc/redis.conf
io_threads_active:1
listener0:name=tcp,bind=127.0.0.1,bind=-::1,port=6379

# Clients
connected_clients:2
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:497929
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
pubsub_clients:0
watching_clients:0
clients_in_timeout_table:0
total_watched_keys:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0

# Memory
used_memory:7216963
used_memory_human:6.88M
used_memory_rss:0
used_memory_rss_human:0B
used_memory_peak:7684355
used_memory_peak_human:7.33M
used_memory_peak_perc:93.92%
used_memory_overhead:953338
used_memory_startup:694074
used_memory_dataset:6263625
used_memory_dataset_perc:96.03%
allocator_allocated:6274378
allocator_active:0
allocator_resident:0
allocator_muzzy:0
total_system_memory:481036337152
total_system_memory_human:448.00G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:221
used_memory_scripts:221
used_memory_scripts_human:221B
maxmemory:64000000000
maxmemory_human:59.60G
maxmemory_policy:allkeys-lfu
allocator_frag_ratio:1.00
allocator_frag_bytes:0
allocator_rss_ratio:-nan
allocator_rss_bytes:0
rss_overhead_ratio:-nan
rss_overhead_bytes:0
mem_fragmentation_ratio:0.00
mem_fragmentation_bytes:-6274378
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_replica_full_sync_buffer:0
mem_clients_slaves:0
mem_clients_normal:78067
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:libc
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:13194
rdb_bgsave_in_progress:0
rdb_last_save_time:1751405426
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Threads
io_thread_0:clients=0,reads=0,writes=0
io_thread_1:clients=1,reads=5400,writes=5390
io_thread_2:clients=0,reads=2307,writes=2310
io_thread_3:clients=1,reads=110,writes=110

# Stats
total_connections_received:3
total_commands_processed:7800
instantaneous_ops_per_sec:4557
total_net_input_bytes:6763399
total_net_output_bytes:3445444
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:3446.34
instantaneous_output_kbps:1971.31
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_subkeys:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
evicted_scripts:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:460
keyspace_misses:1358
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:7817
total_writes_processed:7810
io_threaded_reads_processed:7817
io_threaded_writes_processed:7810
client_query_buffer_limit_disconnections:0
client_output_buffer_limit_disconnections:0
reply_buffer_shrinks:1
reply_buffer_expands:0
eventloop_cycles:28744
eventloop_duration_sum:135296
eventloop_duration_cmd_sum:37989
instantaneous_eventloop_cycles_per_sec:17341
instantaneous_eventloop_duration_usec:18
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:e6218aaed8f3267e369f5a26c525a269a3c9c4ed
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.302772
used_cpu_user:0.286820
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000

# Modules
module:name=vectorset,ver=1,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors|handle-repl-async-load]

# Commandstats
cmdstat_multi:calls=1525,usec=463,usec_per_call=0.30,rejected_calls=0,failed_calls=0
cmdstat_get:calls=12,usec=36,usec_per_call=3.00,rejected_calls=0,failed_calls=0
cmdstat_exec:calls=1525,usec=34029,usec_per_call=22.31,rejected_calls=0,failed_calls=0
cmdstat_hgetall:calls=1806,usec=6057,usec_per_call=3.35,rejected_calls=0,failed_calls=0
cmdstat_expire:calls=1466,usec=4739,usec_per_call=3.23,rejected_calls=0,failed_calls=0
cmdstat_hmset:calls=1466,usec=19549,usec_per_call=13.33,rejected_calls=0,failed_calls=0

# Errorstats

# Latencystats
latency_percentiles_usec_multi:p50=0.001,p99=1.003,p99.9=2.007
latency_percentiles_usec_get:p50=2.007,p99=10.047,p99.9=10.047
latency_percentiles_usec_exec:p50=19.071,p99=89.087,p99.9=1163.263
latency_percentiles_usec_hgetall:p50=1.003,p99=19.071,p99.9=307.199
latency_percentiles_usec_expire:p50=3.007,p99=9.023,p99.9=32.127
latency_percentiles_usec_hmset:p50=11.007,p99=51.199,p99.9=1138.687

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=1261,expires=1261,avg_ttl=31535999964,subexpiry=0

# Keysizes
db0_distrib_hashes_items:8=1261

------ CLIENT LIST OUTPUT ------
id=4 addr=127.0.0.1:64792 laddr=127.0.0.1:6379 fd=25 name= age=1 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=482943 argv-mem=0 multi-mem=0 rbs=1024 rbp=1024 obl=0 oll=0 omem=0 tot-mem=484793 events=r cmd=hgetall user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=1
id=6 addr=127.0.0.1:44352 laddr=127.0.0.1:6379 fd=27 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=31976 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=49186 events=r cmd=exec user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=3

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
io-threads 4
repl-diskless-load disabled
replica-read-only yes
lazyfree-lazy-user-flush no
lazyfree-lazy-server-del no
sanitize-dump-payload no
lazyfree-lazy-user-del no
lazyfree-lazy-expire no
list-compress-depth 0
proto-max-bulk-len 512mb
lazyfree-lazy-eviction no
client-query-buffer-limit 1gb
activedefrag no
slave-read-only yes
repl-diskless-sync yes

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

@snyk-io
Copy link

snyk-io bot commented Jul 2, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@github-project-automation github-project-automation bot moved this to Todo in Redis 8.2 Jul 2, 2025
@sundb sundb requested review from ShooterIT and oranagra July 2, 2025 11:01
@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Jul 2, 2025
@ShooterIT
Copy link
Member

ShooterIT commented Jul 2, 2025

you mean we call aeApiDelEvent first then update mask in aeDeleteFileEvent, so the mask in aeApiDelEvent is not updated?

void aeDeleteFileEvent(aeEventLoop *eventLoop, int fd, int mask)
{
    ...
    aeApiDelEvent(eventLoop, fd, mask);
    fe->mask = fe->mask & (~mask);
    ...
}

@sundb
Copy link
Collaborator Author

sundb commented Jul 2, 2025

you mean we call aeApiDelEvent first then update mask in aeDeleteFileEvent, so the mask in aeApiDelEvent is not updated?

yes, Just like the code in ae ae_epoll

static void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int delmask) {
    ....
    int mask = eventLoop->events[fd].mask & (~delmask);

it's the responsibility for the aeApiDelEvent() to calculate the final mask.

Copy link
Member

@ShooterIT ShooterIT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@sundb sundb merged commit 5b7eec4 into redis:unstable Jul 3, 2025
18 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Redis 8.2 Jul 3, 2025
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Jul 3, 2025
This PR fixes
redis#14056 (comment)

## Summary
Because evport uses `eventLoop->events[fd].mask` to determine whether to
remove the event, but in ae.c we call `aeApiDelEvent()` before updating
`eventLoop->events[fd].mask`, this causes evport to always see the old
value, and as a result, `port_dissociate()` is never called to remove
the fd.
This issue may not surface easily in a non-multithreaded, but since in
the multi-threaded case we frequently reassign fds to different threads,
it makes the crash much more likely to occur.
fcostaoliveira pushed a commit to filipecosta90/redis that referenced this pull request Jul 4, 2025
This PR fixes
redis#14056 (comment)

## Summary
Because evport uses `eventLoop->events[fd].mask` to determine whether to
remove the event, but in ae.c we call `aeApiDelEvent()` before updating
`eventLoop->events[fd].mask`, this causes evport to always see the old
value, and as a result, `port_dissociate()` is never called to remove
the fd.
This issue may not surface easily in a non-multithreaded, but since in
the multi-threaded case we frequently reassign fds to different threads,
it makes the crash much more likely to occur.
YaacovHazan pushed a commit that referenced this pull request Jul 6, 2025
This PR fixes
#14056 (comment)

## Summary
Because evport uses `eventLoop->events[fd].mask` to determine whether to
remove the event, but in ae.c we call `aeApiDelEvent()` before updating
`eventLoop->events[fd].mask`, this causes evport to always see the old
value, and as a result, `port_dissociate()` is never called to remove
the fd.
This issue may not surface easily in a non-multithreaded, but since in
the multi-threaded case we frequently reassign fds to different threads,
it makes the crash much more likely to occur.
@sundb sundb mentioned this pull request Aug 4, 2025
sundb added a commit that referenced this pull request Aug 4, 2025
This is the General Availability release of Redis Open Source 8.2.

### Major changes compared to 8.0

- Streams - new commands: `XDELEX` and `XACKDEL`; extension to `XADD`
and `XTRIM`
- Bitmap - `BITOP`: new operators: `DIFF`, `DIFF1`, `ANDOR`, and `ONE`
- Query Engine - new SVS-VAMANA vector index type which supports vector
compression
- More than 15 performance and resource utilization improvements
- New metrics: per-slot usage metrics, key size distributions for basic
data types, and more

### Binary distributions

- Alpine and Debian Docker images - https://hub.docker.com/_/redis
- Install using snap - see https://github.com/redis/redis-snap
- Install using brew - see https://github.com/redis/homebrew-redis
- Install using RPM - see https://github.com/redis/redis-rpm
- Install using Debian APT - see https://github.com/redis/redis-debian


### Operating systems we test Redis 8.2 on

- Ubuntu 22.04 (Jammy Jellyfish), 24.04 (Noble Numbat)
- Rocky Linux 8.10, 9.5
- AlmaLinux 8.10, 9.5
- Debian 12 (Bookworm)
- macOS 13 (Ventura), 14 (Sonoma), 15 (Sequoia)

### Security fixes (compared to 8.2-RC1)

- (CVE-2025-32023) Fix out-of-bounds write in `HyperLogLog` commands
- (CVE-2025-48367) Retry accepting other connections even if the
accepted connection reports an error

### New Features (compared to 8.2-RC1)

- #14141 Keyspace notifications - new event types:
  - `OVERWRITTEN` - the value of a key is completely overwritten
  - `TYPE_CHANGED` - key type change

### Bug fixes (compared to 8.2-RC1)

- #14162 Crash when using evport with I/O threads
- #14163 `EVAL` crash when error table is empty
- #14144 Vector sets - RDB format is not compatible with big endian
machines
- #14165 Endless client blocking for blocking commands
- #14164 Prevent `CLIENT UNBLOCK` from unblocking `CLIENT PAUSE`
- #14216 TTL was not removed by the `SET` command
- #14224 `HINCRBYFLOAT` removes field expiration on replica

### Performance and resource utilization improvements (compared to
8.2-RC1)

- #14200 Store iterators on stack instead of on heap
- #14144 Vector set - improve RDB loading / RESTORE speed by storing the
worst link info
- #Q6430 More compression variants for the SVS-VAMANA vector index
- #Q6535 `SHARD_K_RATIO` parameter - favor network latency over accuracy
for KNN vector query in a Redis cluster (unstable feature) (MOD-10359)

### Modules API

- #14051 `RedisModule_Get*`, `RedisModule_Set*` - allow modules to
access Redis configurations
- #14114 `RM_UnsubscribeFromKeyspaceEvents` - unregister a module from
specific keyspace notifications
@sundb sundb deleted the evport_delevent branch August 7, 2025 02:08
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
This PR fixes
redis#14056 (comment)

## Summary
Because evport uses `eventLoop->events[fd].mask` to determine whether to
remove the event, but in ae.c we call `aeApiDelEvent()` before updating
`eventLoop->events[fd].mask`, this causes evport to always see the old
value, and as a result, `port_dissociate()` is never called to remove
the fd.
This issue may not surface easily in a non-multithreaded, but since in
the multi-threaded case we frequently reassign fds to different threads,
it makes the crash much more likely to occur.
@YaacovHazan YaacovHazan moved this from Todo to Done in Redis 8.0 Backport Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[CRASH] <Redis 7.0.12>crashed by signal: 6, si_code: -6

2 participants