-
Notifications
You must be signed in to change notification settings - Fork 0
Fix CI issues #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI issues #110
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## cluster-asm #110 +/- ##
==============================================
Coverage ? 75.89%
==============================================
Files ? 131
Lines ? 76332
Branches ? 0
==============================================
Hits ? 57935
Misses ? 18397
Partials ? 0
🚀 New features to boost your workflow:
|
src/cluster_asm.c
Outdated
| } else { | ||
| /* Cancel pending trim jobs if becoming a replica. */ | ||
| asmCancelPendingTrimJobs(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder, master and replica handle trim separately in the current design. So, I feel like we should not cancel on replica? e.g. we cancel pending job on replica, replica has unowned keys now. Master may not send TRIMSLOTS for these unowned slots if it has already completed the trim. Next time replica becomes master, it will have unowned keys. Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we should deal with unowned keys as quick as possible, otherwise it causes duplication for modules as they replicate their queries and merge all the results. So, unrelated to the change here, I also feel like reverting asmTrimSlotsIfNotOwned() to trim all the unowned slots, just in case. We'll be sure once we become a master, we won't have unowned keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder, master and replica handle trim separately in the current design.
I see, i cancel the pending trim jobs instead of active jobs. the pending trim jobs are delayed to trigger only on master, all the pending trim jobs will be added to active_trim_jobs once when triggering, besides we don't propagate the pending trim jobs to replica before triggering.
without this change, if we perform cluster failover and there is a slot migration task, the old master will cancel the task and add a trim job into pending list, then trigger a trim job when becoming a replica. the new master may also propagate a trimslot command again since this task is failed from POV of new master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, your change will prevent double trim? Replica may get TRIMSLOTS for the empty slots. Even if that happens, it is harmless. So, we don't have a bug around here right?
Now, consider the other scenario:
- There is an import but it failed. Master has not replicated any key to replica yet.
- Master created trim job and put it into pending list.
- Failover happens and old master discard pending job.
4-a. The new master does not propagate TRIMSLOTS as it is unaware of the failed migration. The old master stays with unowned keys.
4-b. (Another scenario alternative to 4-a) Another failover happens immediately and this old master becomes master again with unowned keys.
I feel like we should let replica trim its slots. (+ perhaps asmTrimSlotsIfNotOwned() should check all the slots when the node becomes master)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for 4-a, a full sync will happen, right?
for 4-b, it seems a bad cases
perhaps asmTrimSlotsIfNotOwned() should check all the slots when the node becomes master
this change may break legacy slot migration and cause data loss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit odd but otherwise how do we clean up these slots in case of 4-b? Any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't have an idea. so you accept double trim, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks harmless to me? Replica will write TRIMSLOTS to AOF on its own. Does that bring a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, just feel uncomfortable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
No description provided.