Skip to content

raftstore: campaign newly created regions in time after Split#17625

Merged
ti-chi-bot[bot] merged 38 commits intotikv:masterfrom
LykxSassinator:postpone_split_region
Nov 19, 2024
Merged

raftstore: campaign newly created regions in time after Split#17625
ti-chi-bot[bot] merged 38 commits intotikv:masterfrom
LykxSassinator:postpone_split_region

Conversation

@LykxSassinator
Copy link
Contributor

@LykxSassinator LykxSassinator commented Oct 10, 2024

What is changed and how it works?

Issue Number: Close #12410 and #17602.

What's Changed:

As issues #12410 and #17602 shows, the original design of CmdEpochChecker allows the BatchSplit or Split proposals even though the last TransferLeader is still on-going, making the newly created region cannot campaign leader immediately. The root cause of these issues is that TransferLeader does not be recorded into the proposal queue in CmdEpochChecker as it will not change the conf_ver, if the proposal queue has no conflicts admin commands.
And these issues will damage the stability of TiKV, returning 9005 errors to the Client who wanna access the relative data.

So, to tackle these issues, this pr make the campaign of the newly splitted regions triggered in time, when the leadership of the parent region is stable after on_role_changed. And the newly added progress can be reviewed from the following diagram on step 6.
image

This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

This pr introduces an supplementary rule, making `TransferLeader` mutually exclusive with other commands 
which will change the `conf_ver`, for checking the execution validity.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 10, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Oct 10, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 10, 2024
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator LykxSassinator marked this pull request as ready for review October 10, 2024 07:30
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2024
@LykxSassinator
Copy link
Contributor Author

/test pull-unit-test

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
…ansfer leader.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 21, 2024
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
@SpadeA-Tang
Copy link
Member

SpadeA-Tang commented Oct 22, 2024

If Im right, the error is caused by:

  1. peer 1 transfer leader to peer 2
  2. peer 1 becomes follower, peer 2 becomes candidate
  3. on_ready_split_region
    the child region will not compagin because peer 1 is not leader
    the same for peer 2

So, the child region will wait election timeout.

For this, why not just change the compain condition from leader to leader/candidate?

@LykxSassinator
Copy link
Contributor Author

LykxSassinator commented Oct 22, 2024

Yep, this proposal is valid.

However, we want to make "TransferLeader" and other operations which changes the conf_ver mutually exclusive, and it's used to enforce the rule that "only one operation can exist in a region before the subsequent operation", avoiding unexpected behaviors that we do not observe right now.

So, this implementation enhance the checking of the propose_check_epoch.

@glorv
Copy link
Contributor

glorv commented Oct 23, 2024

For this, why not just change the compaign condition from leader to leader/candidate?

@SpadeA-Tang I thinks there are 2 ways that a follower can become candidate: 1) transfer leader, 2) election timeout. So if the following became candidate due to election timeout and also apply split at the same time, the new created region should not start a election(and may become leader) because thus there may be two leader for the same region if the old leader does not finish executing the split command.

@overvenus
Copy link
Member

If Im right, the error is caused by:

  1. peer 1 transfer leader to peer 2
  2. peer 1 becomes follower, peer 2 becomes candidate
  3. on_ready_split_region
    the child region will not compagin because peer 1 is not leader
    the same for peer 2

So, the child region will wait election timeout.

For this, why not just change the compain condition from leader to leader/candidate?

It's not enough, peer 2 may already split before it receives MsgTimeoutNow.

Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Connor1996, hbisheng, overvenus, SpadeA-Tang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Connor1996,SpadeA-Tang,overvenus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@LykxSassinator
Copy link
Contributor Author

/test all

@ti-chi-bot ti-chi-bot bot merged commit 361a8eb into tikv:master Nov 19, 2024
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Nov 19, 2024
@LykxSassinator LykxSassinator added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Nov 19, 2024
@LykxSassinator LykxSassinator deleted the postpone_split_region branch November 19, 2024 10:22
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Nov 19, 2024
…kv#17625)

close tikv#12410

This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #17846.

ti-chi-bot bot pushed a commit that referenced this pull request Nov 19, 2024
…7625) (#17846)

close #12410

This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator LykxSassinator added the needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. label Dec 11, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Dec 11, 2024
close tikv#12410

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #17983.

ti-chi-bot bot pushed a commit that referenced this pull request Dec 12, 2024
…7625) (#17983)

close #12410

This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Dec 13, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Dec 13, 2024
close tikv#12410

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #17996.

@ti-chi-bot ti-chi-bot bot removed the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Dec 13, 2024
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. label Jan 21, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Jan 21, 2025
close tikv#12410

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #18148.

ti-chi-bot bot pushed a commit that referenced this pull request Jan 26, 2025
…7625) (#18148)

close #12410

This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
ti-chi-bot bot added a commit that referenced this pull request Feb 20, 2025
…7625) (#17996)

close #12410

This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Leader transfer should be postponed until on-going split is finished

8 participants