raftstore: properly release snapshot precheck resource after snapshot reception#17903
raftstore: properly release snapshot precheck resource after snapshot reception#17903ti-chi-bot[bot] merged 4 commits intotikv:masterfrom
Conversation
ff314f0 to
0e1ec48
Compare
…apshot reception Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
|
cc @hhwyt |
| context.finish(raft_router) | ||
| }; | ||
| async move { | ||
| defer!(cleanup_after_recv( |
There was a problem hiding this comment.
Why not clean up before responding to the sink?
There was a problem hiding this comment.
That might be a good idea. Even if responding to the sink is slow, we don't have to let it block the success of the next snapshot precheck. Do you see any downside with that? @Connor1996
There was a problem hiding this comment.
Discussed with @Connor1996 offline. I believe cleaning up before responding to the sink is still an option but it probably doesn't make a significant difference since responding to the sink should be quick. I think I’ll keep it as it is to maintain consistency with the current behavior, where we decrement recving_count after responding to the sink.
Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Connor1996, LykxSassinator The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
What is changed and how it works?
Issue Number: Close #17881
This PR fixes a case where snapshot precheck may succeed but the receiver would reject the snapshot due to incorrect ordering of resource release and
recving_countupdates.Previous ordering:
snap_mgr.recv_snap_precheck)snap_mgr.recv_snap_complete)The issue lies between steps 3 and 4. After releasing the precheck resource (step 3), a new precheck can succeed. However, the
receiving_busycheck on the receiver would fail becauserecving_counthasn't been decremented. This PR ensures thatrecving_countis decremented before releasing the precheck resource.In addition, this PR fixes another potential issue where the precheck resource is not released when snapshot reception encounters a network error.
What's Changed:
Related changes
pingcap/docs/pingcap/docs-cn:Check List
Tests
Side effects
Release note