Skip to content

log-backup: Keep the order of observation IDs consistent with the order in which they were received (#18290)#18609

Closed
ti-chi-bot wants to merge 1 commit intotikv:release-7.5from
ti-chi-bot:cherry-pick-18290-to-release-7.5
Closed

log-backup: Keep the order of observation IDs consistent with the order in which they were received (#18290)#18609
ti-chi-bot wants to merge 1 commit intotikv:release-7.5from
ti-chi-bot:cherry-pick-18290-to-release-7.5

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #18290

What is changed and how it works?

Issue Number: Close #18243

What's Changed:

Keep the order of observation IDs consistent with the order in which they were received

Additional notes

There are always three observe operations (Stop[Pre-Candidate], Stop[Candidate] and Start[Leader]) generated when any peer becomes leader. But the observe operation Start[Leader] may lost due to no task registered yet. Besides, when a log backup task is being registered, the endpoint will send a observe operation Start[Scanned] for a leader.

Case 1: If the observe operation Start[Leader] is ignored because the task is not registered yet.
We can make sure the endpoint must get the region when a new task is being registered. We have the following execution order:

1. RaftStoreEvent::RoleChange -> region_info_accessor.scheduler
2. Start[Leader] -> backup_stream::Endpoint.scheduler [IGNORED]
3. register task ranges so that any observe operation won't be ignored
4. RegionInfoQuery::SeekRegion -> region_info_accessor.scheduler

In this case, the step 2 is already done, so we can make sure that the region update query is already in the queue of region_info_accessor.scheduler when the endpoint sends RegionInfoQuery::SeekRegion to the region_info_accessor. Therefore, the endpoint can get the region from seek_region.

Case2: If the endpoint can not get the region from seek_region.
We can make sure the observe operation Start[Leader] is not ignored. We have the following execution order:

1. register task ranges so that any observe operation won't be ignored
2. RegionInfoQuery::SeekRegion -> region_info_accessor.scheduler
3. RaftStoreEvent::RoleChange -> region_info_accessor.scheduler
4. Start[Leader] -> backup_stream::Endpoint.scheduler

In this case, the step 2 is already done, so we can make sure that the task range is registered. Therefore, the step 4 is not ignored and the observe operation Start[Leader] is scheduled.

In summary, the region_operator may meet the Start[Scanned] -> Stop[Pre-Candidate] -> Stop[Candidate] -> Start[Leader] and repeat scanning the region. But it won't lost the region if it is leader.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

Fix the issue that the log backup observer loses observation of a region.

close tikv#18243

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/cherry-pick-for-release-7.5 This PR is cherry-picked to release-7.5 from a source PR. labels Jul 1, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 1, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign disksing for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link
Member Author

@Leavrth This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 1, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 1, 2025

@ti-chi-bot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test f9e2b75 link true /test pull-unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Jul 2, 2025
@Leavrth
Copy link
Contributor

Leavrth commented Jul 11, 2025

based on #16420

@Leavrth Leavrth closed this Jul 11, 2025
@ti-chi-bot ti-chi-bot bot removed the cherry-pick-approved Cherry pick PR approved by release team. label Jul 11, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 11, 2025

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be approved by the approvers firstly.
  2. AFTER it has been approved by approvers, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/cherry-pick-not-approved do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-7.5 This PR is cherry-picked to release-7.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants