Skip to content

Conversation

@ShooterIT
Copy link
Collaborator

  • DEBUG ASM-FAILPOINT to set fail point
  • Error handler for both source and destination side & tcl tests
  • Error info for task
  • Destination node sends ACK in Cron
  • Source node checks applied offset gap and pauses writing
  • Source node sends stream-eof after slot commands stream drained in beforeSleep

@ShooterIT ShooterIT requested a review from tezc July 3, 2025 16:47
@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2025

Codecov Report

Attention: Patch coverage is 95.66787% with 12 lines in your changes missing coverage. Please review.

Please upload report for BASE (cluster-asm@3764612). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/cluster_asm.c 95.75% 11 Missing ⚠️
src/debug.c 75.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             cluster-asm      #43   +/-   ##
==============================================
  Coverage               ?   69.20%           
==============================================
  Files                  ?      125           
  Lines                  ?    73523           
  Branches               ?        0           
==============================================
  Hits                   ?    50884           
  Misses                 ?    22639           
  Partials               ?        0           
Files with missing lines Coverage Δ
src/cluster_legacy.c 74.08% <100.00%> (ø)
src/networking.c 91.41% <100.00%> (ø)
src/rdb.c 77.62% <100.00%> (ø)
src/replication.c 87.20% <100.00%> (ø)
src/server.h 100.00% <ø> (ø)
src/debug.c 52.32% <75.00%> (ø)
src/cluster_asm.c 85.78% <95.75%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

task->main_channel_client = c;

clusterAsmOnEvent(task->slot_ranges, ASM_EVENT_MIGRATE_WAIT_PAUSE, NULL);
if (task->state == ASM_SEND_BULK_AND_STREAM) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one problem is that due to incoming writes, we never see destination drains below the threshold. Perhaps, after sometime, we can check if the gap is getting smaller or not. If not, we fail the migration. If it is getting smaller but failed to go below the threshold, then we proceed and stop the traffic. This is the worst case scenario, so feel like we can afford to stop the writes a bit longer in this case just to complete the migration. Let's think about that in the background, perhaps you have better ideas around that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, agree, we can have multiple strategies to pause write: gap change, time limit. let's do it later.

@tezc
Copy link
Owner

tezc commented Jul 4, 2025

@ShooterIT looks super nice, just have minor comments.

@ShooterIT ShooterIT merged commit 955122d into tezc:cluster-asm Jul 4, 2025
18 checks passed
@ShooterIT ShooterIT deleted the asm-error-handler branch July 4, 2025 13:24
tezc pushed a commit that referenced this pull request Sep 10, 2025
- DEBUG ASM-FAILPOINT to set fail point
- Error handler for both source and destination side & tcl tests
- Error info for task
- Destination node sends ACK in Cron
- Source node checks applied offset gap and pauses writing
- Source node sends stream-eof after slot commands stream drained in beforeSleep
tezc pushed a commit that referenced this pull request Sep 16, 2025
- DEBUG ASM-FAILPOINT to set fail point
- Error handler for both source and destination side & tcl tests
- Error info for task
- Destination node sends ACK in Cron
- Source node checks applied offset gap and pauses writing
- Source node sends stream-eof after slot commands stream drained in beforeSleep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants