Skip to content

log backup: use global checkpoint ts as source of truth#58135

Merged
ti-chi-bot[bot] merged 7 commits intopingcap:masterfrom
3pointer:issue_58031
Dec 13, 2024
Merged

log backup: use global checkpoint ts as source of truth#58135
ti-chi-bot[bot] merged 7 commits intopingcap:masterfrom
3pointer:issue_58031

Conversation

@3pointer
Copy link
Contributor

@3pointer 3pointer commented Dec 10, 2024

What problem does this PR solve?

Issue Number: close #58031

Problem Summary:

The previous lag calculation relied on c.lastCheckpoint.TS to compute the lag. However, this approach is unreliable, especially when ownership changes, as c.lastCheckpoint.TS is not guaranteed to increase steadily. This PR addresses the issue by introducing a global checkpoint timestamp that maintains a strictly non-decreasing state.

What changed and how does it work?

The lag calculation now utilizes a global checkpoint timestamp instead of c.lastCheckpoint.TS. This global timestamp ensures consistency and stability, as it always increases or stays the same, even during ownership transitions. This change guarantees a more robust and accurate lag measurement.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix this issue that when new advancer owner starts the task unexpected paused due to last checkpoint ts equal to start ts

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 10, 2024
@tiprow
Copy link

tiprow bot commented Dec 10, 2024

Hi @3pointer. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link

codecov bot commented Dec 10, 2024

Codecov Report

❌ Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.9132%. Comparing base (68ac9ec) to head (99c99bb).
⚠️ Report is 1721 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #58135        +/-   ##
================================================
+ Coverage   73.1841%   74.9132%   +1.7291%     
================================================
  Files          1675       1693        +18     
  Lines        461917     466183      +4266     
================================================
+ Hits         338050     349233     +11183     
+ Misses       103127      95400      -7727     
- Partials      20740      21550       +810     
Flag Coverage Δ
integration 46.5771% <0.0000%> (?)
unit 72.7866% <71.4285%> (+0.2292%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling ∅ <ø> (∅)
parser ∅ <ø> (∅)
br 62.5139% <71.4285%> (+15.3956%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 10, 2024
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 11, 2024
})
adv.StartTaskListener(ctx)
c.advanceClusterTimeBy(2 * time.Minute)
// if global ts is not advanced, the checkpoint will not be lagged
Copy link
Contributor

@RidRisR RidRisR Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? If there is a new task and the global checkpoint is never advanced, the task will never be paused even exceed the limit.

This implies that we should never pause a task that never advanced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it should be if global ts is less than task.start-ts which implies that could have some corner cases when start a new task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	if globalTs <= c.task.StartTs {
		// task is not started yet
		return false, nil
	}

Then maybe here should be < instead of <= ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if globalTs == c.task.StartTs this will only happen when new task created.

Since it's not the common case after task running for some time. I think it's better to make this not pause by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the task may be stuck from creating, say, the advancer doesn't work or one of TiKV didn't notice the task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talk with @YuJuncen , we had a agreement to make the check include when globalTs == task.StartTs. I changed it.

Additionally I found the unproper error return logic when add task. I also fixed it this PR, and fix the related test cases.

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Dec 12, 2024
@3pointer
Copy link
Contributor Author

/retest

@tiprow
Copy link

tiprow bot commented Dec 12, 2024

@3pointer: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BornChanger, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Dec 13, 2024
@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 13, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-12-12 05:24:24.307792113 +0000 UTC m=+502454.396594649: ☑️ agreed by YuJuncen.
  • 2024-12-13 12:29:15.088349356 +0000 UTC m=+614345.177151898: ☑️ agreed by BornChanger.

@tiprow
Copy link

tiprow bot commented Dec 13, 2024

@3pointer: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow 99c99bb link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@BornChanger BornChanger added the needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. label Dec 13, 2024
@ti-chi-bot ti-chi-bot bot merged commit e3248e7 into pingcap:master Dec 13, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Dec 13, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #58259.

@BornChanger BornChanger added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Dec 14, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #58265.

@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. label Jan 21, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jan 21, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #59061.

@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Feb 10, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Feb 10, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #59369.
But this PR has conflicts, please resolve them!

crazycs520 pushed a commit to crazycs520/tidb that referenced this pull request Jun 4, 2025
* skip TestIndexJoin31494
* fix
* global sort: add boundaries to split keys when generating plan (pingcap#58323) (pingcap#58356)
* statistics: get right max table id when to init stats (pingcap#58280) (pingcap#58298)
* executor: Fix the parse problematic slow log panic issue due to empty …
* statstics: trigger evict by the timer (pingcap#58027) (pingcap#58268)
* br: make table existence check unified on different br client (pingcap#58211) (pingcap#58262)
* log backup: use global checkpoint ts as source of truth (pingcap#58135) (pingcap#58265)
* executor: skip execution when build query for VIEW in I_S (pingcap#58203) (pingcap#58236)
* statistics: copy stats when to update it for avoiding data race (pingcap#5810…
* domain,infoschema: make infoschema activity block GC safepoint advanci…
* planner: handle panic when loading bindings at startup (pingcap#58017) (pingcap#58035)
* statistics: right deal with error for reading stats from storage  (pingcap#58…
* statistics: lite init used wrong value to build table stats ver (pingcap#5802…
* lightning, ddl: set TS to engineMeta after ResetEngineSkipAllocTS  (pingcap#5…
* *: avoid unlock of unlocked mutex panic on TableDeltaMap (pingcap#57799) (pingcap#57997)
* ddl: handle context done after sending DDL jobs (pingcap#57945) (pingcap#57989)
* *: activate txn for query on infoschema tables (pingcap#57937) (pingcap#57951)
* lightning: add PK to internal tables (pingcap#57480) (pingcap#57932)
* statistics: correct behavior of non-lite InitStats and stats sync load…
* statistics: avoid stats meta full load after table analysis (pingcap#57756) (pingcap#57911)
* dumpling: use I_S to get table list for TiDB and add database to WHERE…
* br: fix insert gc failed due to slow schema reload (pingcap#57742) (pingcap#57907)
* statistics: do not record historical stats meta if the table is locked…
* metrics: remove the filled colors (pingcap#57838) (pingcap#57866)
* planner: use TableInfo.DBID to locate schema (pingcap#57785) (pingcap#57870)
* *: support cancel query like 'select * from information_schema.tables'…
* session: make `TxnInfo()` return even if process info is empty (pingcap#57044) (pingcap#57161)
* ddl: Fixed partitioning a non-partitioned table with placement rules (…
* *: Reorg partition fix delete ranges and handling non-clustered tables…
* executor: fix query infoschema.tables table_schema/table_name with fil…
* ddl: check context done in isReorgRunnable function (pingcap#57813) (pingcap#57820)
* ddl: fix ExistsTableRow and add tests for skip reorg checks (pingcap#57778) (pingcap#57801)
* *: Fix for TRUNCATE PARTITION and Global Index (pingcap#57724)
* br: prompt k8s.io/api version (pingcap#57791) (pingcap#57802)
* statistics: fix some problem related to stats async load (pingcap#57723) (pingcap#57775)
* expression: fix wrong calculation order of `radians` (pingcap#57672) (pingcap#57688)
* statistics: rightly deal with timout when to send sync load  (pingcap#57712) (pingcap#57751)
* ddl: `tidb_scatter_region` variable supports setting value in both upp…
* planner: fix that vector index output empty result when pk is non-int …
* ddl: dynamically adjusting the max write speed of reorganization job (…
* executor: fix hang in hash agg when exceeding memory limit leads to pa…
* statistics: use infoschema api to get table info (pingcap#57574) (pingcap#57614)
* planner: Use realtimeRowCount when all topN collected (pingcap#56848) (pingcap#57689)
* statistics: handle deleted tables correctly in the PQ (pingcap#57649) (pingcap#57674)
* backup: reset timeout on store level (pingcap#55526) (pingcap#57667)
* planner/core: fix a wrong privilege check for CTE & UPDATE statement (…
@BornChanger BornChanger added the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Sep 30, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Sep 30, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #63826.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

log backup advancer stop the log backup task when it failed to get global checkpoint ts at the first tick

5 participants