fallback to follower when leader is busy#916
Conversation
Signed-off-by: you06 <you1474600@gmail.com>
| // ignore error and use old region info. | ||
| logutil.Logger(bo.GetCtx()).Error("load region failure", | ||
| zap.Uint64("regionID", regionID), zap.Error(err)) | ||
| c.mu.RLock() |
There was a problem hiding this comment.
Yes, a possible data race.
| } | ||
|
|
||
| func (state *tryFollower) onSendSuccess(selector *replicaSelector) { | ||
| if !selector.region.switchWorkLeaderToPeer(selector.targetReplica().peer) { |
There was a problem hiding this comment.
The former naming and meaning of the switchWorkLeaderToPeer function is quite confusing, I don't understand what's the purpose of it..
There was a problem hiding this comment.
The former usage of tryFollower is after the failure of accessKnownLeader, in this case, if one of the follower can serve the leader-read request, it's the new leader, so switch the leader to this peer.
| } | ||
|
|
||
| // For some reason, the leader is unreachable by now, try followers instead. | ||
| func (s *replicaSelector) fallback2Follower(ctx *RPCContext) bool { |
There was a problem hiding this comment.
By now is the only situation that would be used the stale read fallback -> leader -> fallback replicas?
There was a problem hiding this comment.
Yes, when fallbacking to replica from leader, it's a follower read request, not stale read.
|
/cc @crazycs520 PTAL |
Signed-off-by: you06 <you1474600@gmail.com>
Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
ekexium
left a comment
There was a problem hiding this comment.
Is it necessary to give it a hint or constraint on which follower to try (first)? For example we may want it to do the follower read in its local zone as much as possible?
Signed-off-by: you06 <you1474600@gmail.com>
Implemented this strategy, PTAL. |
Signed-off-by: you06 <you1474600@gmail.com>
|
|
||
| if len(state.labels) > 0 { | ||
| idx, selectReplica := filterReplicas(func(selectReplica *replica) bool { | ||
| return selectReplica.store.IsLabelsMatch(state.labels) |
There was a problem hiding this comment.
Is the selectReplica.isExhausted(1) check missing here? How about putting it into the default filterReplicas and pass a nil checker function if there's no labels?
There was a problem hiding this comment.
The replica may be exhausted by data-is-not-ready, which does not affect follower read.
* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> * add comment Signed-off-by: you06 <you1474600@gmail.com> * Update internal/locate/region_request.go Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> * after fallback to replica read from leader, retry local follower first Signed-off-by: you06 <you1474600@gmail.com> * address comment Signed-off-by: you06 <you1474600@gmail.com> --------- Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com>
* fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Signed-off-by: you06 <you1474600@gmail.com>
* reload region cache when store is resolved from invalid status (#843) Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: disksing <i@disksing.com> * fallback to follower when leader is busy (#916) (#923) * fallback to follower when leader is busy Signed-off-by: you06 <you1474600@gmail.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com> * Resume max retry time check for stale read retry with leader option(#903) (#911) * Resume max retry time check for stale read retry with leader option Signed-off-by: cfzjywxk <lsswxrxr@163.com> * add cancel Signed-off-by: cfzjywxk <lsswxrxr@163.com> --------- Signed-off-by: cfzjywxk <lsswxrxr@163.com> * add region cache state test & fix some issues of replica selector (#910) Signed-off-by: you06 <you1474600@gmail.com> remove duplicate code Signed-off-by: you06 <you1474600@gmail.com> * enable workflow for tidb-7.1 Signed-off-by: you06 <you1474600@gmail.com> * update Signed-off-by: you06 <you1474600@gmail.com> update Signed-off-by: you06 <you1474600@gmail.com> fix test Signed-off-by: you06 <you1474600@gmail.com> fix test Signed-off-by: you06 <you1474600@gmail.com> * lint Signed-off-by: you06 <you1474600@gmail.com> * lint Signed-off-by: you06 <you1474600@gmail.com> * fix flaky test Signed-off-by: you06 <you1474600@gmail.com> --------- Signed-off-by: you06 <you1474600@gmail.com> Signed-off-by: cfzjywxk <lsswxrxr@163.com> Co-authored-by: disksing <i@disksing.com> Co-authored-by: cfzjywxk <cfzjywxk@gmail.com> Co-authored-by: cfzjywxk <lsswxrxr@163.com>
Fallback to follower when leader is busy.
Inject data-is-not-ready for stale read and server-is-busy for leader, so fallback to leader will be stucked.
With this patch, server-is-busy on leader will try followers.