Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the node swap functionality to use a more robust ChangeEnsemble action pattern. The key purpose is to ensure followers are caught up with the leader before completing ensemble changes to prevent data loss.
- Replaced synchronous
SwapNodewith asynchronousChangeEnsemblethat waits for followers to catch up - Introduced follower catch-up monitoring to prevent premature ensemble changes
- Removed unused helper functions and cleaned up election context management
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| coordinator/coordinator.go | Renamed handler from handleActionSwap to handleActionChangeEnsemble to reflect the new action type |
| coordinator/controllers/shard_controller_test.go | Updated tests to use new ChangeEnsembleAction API and added comprehensive test for data loss prevention |
| coordinator/controllers/shard_controller_election.go | Added follower catch-up monitoring logic and integrated ChangeEnsembleAction into election process |
| coordinator/controllers/shard_controller.go | Refactored from synchronous SwapNode to asynchronous ChangeEnsemble and removed unused helper functions |
| coordinator/balancer/scheduler.go | Updated to create ChangeEnsembleAction with callback instead of SwapNodeAction |
| coordinator/balancer/action_test.go | Removed test file for deprecated SwapNodeAction |
| coordinator/actions/swap.go | Replaced SwapNodeAction with ChangeEnsembleAction including proper error handling and wait mechanisms |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fdb1bd1 to
607d097
Compare
|
Can we just check if shardMeta.RemovedNodes is empty in shardController.swapNode? If it is, proceed with the swap; otherwise, skip it. |
Hi @dao-jun Good point, Unfortunately, the answer is no so far. The current Plus, I mentioned "so far" because we still have another issue that related to failure recovery. Once the issue is fixed, we can rely on the |
| }, func() { | ||
| defer waitGroup.Done() | ||
| err := backoff.RetryNotify(func() error { | ||
| fs, err := e.provider.GetStatus(e.Context, server, &proto.GetStatusRequest{Shard: e.shard}) |
There was a problem hiding this comment.
Wait until all followers caughtup with leader?
| term := mutShardMeta.Term | ||
| ensemble := mutShardMeta.Ensemble | ||
| leader := mutShardMeta.Leader | ||
| leaderEntry := candidatesStatus[*leader] |
There was a problem hiding this comment.
Did we ensure the coordinator receive newTermResponse from the current leader in fenceNewTermQuorum?
| "oxia": "election-monitor-followers-caught-up", | ||
| "shard": fmt.Sprintf("%d", e.shard), | ||
| }, func() { | ||
| e.ensureFollowerCaught(ensemble, leader, leaderEntry) |
There was a problem hiding this comment.
As we fenced the leader, can the followers still caughtup with the leader?
Motivation
fixes: #845
We will update the memory metadata in the current implementation of change ensemble(swap node), then trigger the leader election, which will persist the new ensemble to metadata without any precondition. If there is any failure to update the elected, the ensemble will be changed and might be changed next time.
Keeping moving shards without follower-caught-up validation will cause a data loss issue.
Modification
election#stop.IsReadyForChangeEnsemblebefore changing the ensemble, and fail fast if the precondition is not satisfied.copy-on-writeforUpdateShardMetadatato avoid data racing.