Skip to content

log-backup: Keep the order of observation IDs consistent with the order in which they were received#18290

Merged
ti-chi-bot[bot] merged 5 commits intotikv:masterfrom
Leavrth:observe_operation_in_sequence
Mar 14, 2025
Merged

log-backup: Keep the order of observation IDs consistent with the order in which they were received#18290
ti-chi-bot[bot] merged 5 commits intotikv:masterfrom
Leavrth:observe_operation_in_sequence

Conversation

@Leavrth
Copy link
Contributor

@Leavrth Leavrth commented Mar 11, 2025

What is changed and how it works?

Issue Number: Close #18243

What's Changed:

Keep the order of observation IDs consistent with the order in which they were received

Additional notes

There are always three observe operations (Stop[Pre-Candidate], Stop[Candidate] and Start[Leader]) generated when any peer becomes leader. But the observe operation Start[Leader] may lost due to no task registered yet. Besides, when a log backup task is being registered, the endpoint will send a observe operation Start[Scanned] for a leader.

Case 1: If the observe operation Start[Leader] is ignored because the task is not registered yet.
We can make sure the endpoint must get the region when a new task is being registered. We have the following execution order:

1. RaftStoreEvent::RoleChange -> region_info_accessor.scheduler
2. Start[Leader] -> backup_stream::Endpoint.scheduler [IGNORED]
3. register task ranges so that any observe operation won't be ignored
4. RegionInfoQuery::SeekRegion -> region_info_accessor.scheduler

In this case, the step 2 is already done, so we can make sure that the region update query is already in the queue of region_info_accessor.scheduler when the endpoint sends RegionInfoQuery::SeekRegion to the region_info_accessor. Therefore, the endpoint can get the region from seek_region.

Case2: If the endpoint can not get the region from seek_region.
We can make sure the observe operation Start[Leader] is not ignored. We have the following execution order:

1. register task ranges so that any observe operation won't be ignored
2. RegionInfoQuery::SeekRegion -> region_info_accessor.scheduler
3. RaftStoreEvent::RoleChange -> region_info_accessor.scheduler
4. Start[Leader] -> backup_stream::Endpoint.scheduler

In this case, the step 2 is already done, so we can make sure that the task range is registered. Therefore, the step 4 is not ignored and the observe operation Start[Leader] is scheduled.

In summary, the region_operator may meet the Start[Scanned] -> Stop[Pre-Candidate] -> Stop[Candidate] -> Start[Leader] and repeat scanning the region. But it won't lost the region if it is leader.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

Fix the issue that the log backup observer loses observation of a region.

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the dco. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 11, 2025
Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
@Leavrth Leavrth force-pushed the observe_operation_in_sequence branch from 81e52a9 to 5caf380 Compare March 11, 2025 01:39
Leavrth and others added 4 commits March 11, 2025 11:51
Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: Jianjun Liao <36503113+Leavrth@users.noreply.github.com>
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 13, 2025
.try_for_each(|r| {
tx.blocking_send(ObserveOp::Start {
region: r.region.clone(),
handle: ObserveHandle::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If remove the handle here, will some stale region change of Lock CF came and Lock forever?
@YuJuncen WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems no problem, since every places is use equal to condition rather than less or greater to compare handle id.

@ti-chi-bot ti-chi-bot bot added the lgtm label Mar 14, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3pointer, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 14, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 14, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-13 05:57:16.616256327 +0000 UTC m=+421792.221785255: ☑️ agreed by YuJuncen.
  • 2025-03-14 10:32:14.091936696 +0000 UTC m=+6027.776172791: ☑️ agreed by 3pointer.

@ti-chi-bot ti-chi-bot bot merged commit c99d0a7 into tikv:master Mar 14, 2025
8 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Mar 14, 2025
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Jul 1, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Jul 1, 2025
close tikv#18243

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #18609.
But this PR has conflicts, please resolve them!

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #18980.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #18981.

ti-chi-bot bot pushed a commit that referenced this pull request Sep 22, 2025
…er in which they were received (#18290) (#18981)

close #18243

Keep the order of observation IDs consistent with the order in which they were received

Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>

Co-authored-by: Jianjun Liao <jianjun.liao@outlook.com>
RidRisR pushed a commit to RidRisR/tikv that referenced this pull request Sep 28, 2025
…er in which they were received (tikv#18290)

close tikv#18243

Keep the order of observation IDs consistent with the order in which they were received

Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: Jianjun Liao <36503113+Leavrth@users.noreply.github.com>
RidRisR pushed a commit to RidRisR/tikv that referenced this pull request Sep 28, 2025
…er in which they were received (tikv#18290)

close tikv#18243

Keep the order of observation IDs consistent with the order in which they were received

Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: Jianjun Liao <36503113+Leavrth@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The log backup observer loses observation of a region

4 participants