Skip to content

Conversation

@hpatro
Copy link
Contributor

@hpatro hpatro commented Oct 18, 2023

Fixing issues described in #12672, started after #11695 when the defrag tests are being executed in cluster mode too.
For some reason, it looks like the defragmentation is over too quickly, before the test is able to detect that it's running.
so now instead of waiting to see that it's active, we wait to see that it did some work

[err]: Active defrag big list: cluster in tests/unit/memefficiency.tcl
defrag not started.
[err]: Active defrag big keys: cluster in tests/unit/memefficiency.tcl
defrag didn't stop.

@hpatro
Copy link
Contributor Author

hpatro commented Oct 19, 2023

For failure with defrag not started. The observation from the output on failure is active_defrag_running is 0. However, the total_active_defrag_time is greater than 0, In this case it's 120. Also, the allocator_frag_ratio has dropped below the threshold (1.05). See the info memory / info stats o/p below.

So, the validation can be moved to check total_active_defrag_time to be greater than 0 which is more accurate rather than checking if active_defrag_running or not.

Fix: 93b309c

# Memory
used_memory:106934160
used_memory_human:101.98M
used_memory_rss:171999232
used_memory_rss_human:164.03M
used_memory_peak:183308360
used_memory_peak_human:174.82M
used_memory_peak_perc:58.34%
used_memory_overhead:32940064
used_memory_startup:3747088
used_memory_dataset:73994096
used_memory_dataset_perc:71.71%
allocator_allocated:107043576
allocator_active:108261376
allocator_resident:154058752
total_system_memory:99000156160
total_system_memory_human:92.20G
used_memory_lua:18091008
used_memory_vm_eval:18091008
used_memory_lua_human:17.25M
used_memory_scripts_eval:27324288
number_of_cached_scripts:50000
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:18123776
used_memory_vm_total_human:17.28M
used_memory_functions:184
used_memory_scripts:27324472
used_memory_scripts_human:26.06M
maxmemory:0
maxmemory_human:0B
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.01
allocator_frag_bytes:1217800
allocator_rss_ratio:1.42
allocator_rss_bytes:45797376
rss_overhead_ratio:1.12
rss_overhead_bytes:17940480
mem_fragmentation_ratio:1.61
mem_fragmentation_bytes:65065272
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:33424
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Stats
total_connections_received:1
total_commands_processed:1100070
instantaneous_ops_per_sec:8
total_net_input_bytes:150701569
total_net_output_bytes:9976882
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.13
instantaneous_output_kbps:51.73
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:109231
active_defrag_misses:360773
active_defrag_key_hits:7
active_defrag_key_misses:0
total_active_defrag_time:120
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:634334
total_writes_processed:632781
io_threaded_reads_processed:0
io_threaded_writes_processed:0
client_query_buffer_limit_disconnections:0
client_output_buffer_limit_disconnections:0
reply_buffer_shrinks:4
reply_buffer_expands:8
eventloop_cycles:636246
eventloop_duration_sum:6318397
eventloop_duration_cmd_sum:1634310
instantaneous_eventloop_cycles_per_sec:107
instantaneous_eventloop_duration_usec:8
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

@oranagra oranagra marked this pull request as ready for review October 20, 2023 18:17
@oranagra
Copy link
Member

oranagra commented Oct 21, 2023

Ci test
https://github.com/redis/redis/actions/runs/6595540519

So if I understand correctly, you're arguing that because your your changes (few additional keys, expiry data, and cluster mode), the tests run faster and finish too quickly? Sounds unlikely to me.

I think I saw other errors for certain thresholds, not just the "Defrag didn't start". Don't remember which tests I saw failing.

@hpatro
Copy link
Contributor Author

hpatro commented Oct 21, 2023

Ci for that unit https://github.com/redis/redis/actions/runs/6593855885

So if I understand correctly, you're arguing that because your your changes (few additional keys, expiry data, and cluster mode), the tests run faster and finish too quickly? Sounds unlikely to me.

Well the info output suggests otherwise.

I think I saw other errors for certain thresholds, not just the "Defrag didn't start". Don't remember which tests I saw failing.

Yes @roshkhatri is digging into the other ones listed on #12672 .

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems odd that the per-slot dict and / or cluster mode will cause the defrag to finish sooner.
but i also don't see anything wrong with that change.

@oranagra oranagra merged commit 26eb4ce into redis:unstable Oct 22, 2023
oranagra pushed a commit that referenced this pull request Oct 27, 2023
Fixing issues described in #12672, started after #11695
Related to #12674

Fixes the `defrag didn't stop' issue.

In some cases of how the keys were stored in memory
defrag_later_item_in_progress was not getting reset once we finish
defragging the later items and we move to the next slot. This stopped
the scan to happen in the later slots and did not get 
oranagra added a commit that referenced this pull request Nov 2, 2023
Reverts the skipping defrag tests in cluster mode (done in #12672.
instead it skips only some defrag tests that are relevant for cluster modes.
The test now run well after investigating and making the changes in #12674 and #12694.

Co-authored-by: Oran Agra <oran@redislabs.com>
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 25, 2025
Fixing issues described in redis#12672, started after redis#11695
Related to redis#12674

Fixes the `defrag didn't stop' issue.

In some cases of how the keys were stored in memory
defrag_later_item_in_progress was not getting reset once we finish
defragging the later items and we move to the next slot. This stopped
the scan to happen in the later slots and did not get
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants