-
Notifications
You must be signed in to change notification settings - Fork 0
Asm error handler #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ShooterIT
commented
Jul 3, 2025
- DEBUG ASM-FAILPOINT to set fail point
- Error handler for both source and destination side & tcl tests
- Error info for task
- Destination node sends ACK in Cron
- Source node checks applied offset gap and pauses writing
- Source node sends stream-eof after slot commands stream drained in beforeSleep
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## cluster-asm #43 +/- ##
==============================================
Coverage ? 69.20%
==============================================
Files ? 125
Lines ? 73523
Branches ? 0
==============================================
Hits ? 50884
Misses ? 22639
Partials ? 0
🚀 New features to boost your workflow:
|
| task->main_channel_client = c; | ||
|
|
||
| clusterAsmOnEvent(task->slot_ranges, ASM_EVENT_MIGRATE_WAIT_PAUSE, NULL); | ||
| if (task->state == ASM_SEND_BULK_AND_STREAM) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one problem is that due to incoming writes, we never see destination drains below the threshold. Perhaps, after sometime, we can check if the gap is getting smaller or not. If not, we fail the migration. If it is getting smaller but failed to go below the threshold, then we proceed and stop the traffic. This is the worst case scenario, so feel like we can afford to stop the writes a bit longer in this case just to complete the migration. Let's think about that in the background, perhaps you have better ideas around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, agree, we can have multiple strategies to pause write: gap change, time limit. let's do it later.
|
@ShooterIT looks super nice, just have minor comments. |
- DEBUG ASM-FAILPOINT to set fail point - Error handler for both source and destination side & tcl tests - Error info for task - Destination node sends ACK in Cron - Source node checks applied offset gap and pauses writing - Source node sends stream-eof after slot commands stream drained in beforeSleep
- DEBUG ASM-FAILPOINT to set fail point - Error handler for both source and destination side & tcl tests - Error info for task - Destination node sends ACK in Cron - Source node checks applied offset gap and pauses writing - Source node sends stream-eof after slot commands stream drained in beforeSleep