log backup: use global checkpoint ts as source of truth#58135
log backup: use global checkpoint ts as source of truth#58135ti-chi-bot[bot] merged 7 commits intopingcap:masterfrom
Conversation
|
Hi @3pointer. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #58135 +/- ##
================================================
+ Coverage 73.1841% 74.9132% +1.7291%
================================================
Files 1675 1693 +18
Lines 461917 466183 +4266
================================================
+ Hits 338050 349233 +11183
+ Misses 103127 95400 -7727
- Partials 20740 21550 +810
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| }) | ||
| adv.StartTaskListener(ctx) | ||
| c.advanceClusterTimeBy(2 * time.Minute) | ||
| // if global ts is not advanced, the checkpoint will not be lagged |
There was a problem hiding this comment.
Why? If there is a new task and the global checkpoint is never advanced, the task will never be paused even exceed the limit.
This implies that we should never pause a task that never advanced.
There was a problem hiding this comment.
actually it should be if global ts is less than task.start-ts which implies that could have some corner cases when start a new task.
There was a problem hiding this comment.
if globalTs <= c.task.StartTs {
// task is not started yet
return false, nil
}
Then maybe here should be < instead of <= ?
There was a problem hiding this comment.
if globalTs == c.task.StartTs this will only happen when new task created.
Since it's not the common case after task running for some time. I think it's better to make this not pause by default
There was a problem hiding this comment.
Sometimes the task may be stuck from creating, say, the advancer doesn't work or one of TiKV didn't notice the task.
There was a problem hiding this comment.
After talk with @YuJuncen , we had a agreement to make the check include when globalTs == task.StartTs. I changed it.
Additionally I found the unproper error return logic when add task. I also fixed it this PR, and fix the related test cases.
|
/retest |
|
@3pointer: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BornChanger, YuJuncen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
@3pointer: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
|
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
* skip TestIndexJoin31494 * fix * global sort: add boundaries to split keys when generating plan (pingcap#58323) (pingcap#58356) * statistics: get right max table id when to init stats (pingcap#58280) (pingcap#58298) * executor: Fix the parse problematic slow log panic issue due to empty … * statstics: trigger evict by the timer (pingcap#58027) (pingcap#58268) * br: make table existence check unified on different br client (pingcap#58211) (pingcap#58262) * log backup: use global checkpoint ts as source of truth (pingcap#58135) (pingcap#58265) * executor: skip execution when build query for VIEW in I_S (pingcap#58203) (pingcap#58236) * statistics: copy stats when to update it for avoiding data race (pingcap#5810… * domain,infoschema: make infoschema activity block GC safepoint advanci… * planner: handle panic when loading bindings at startup (pingcap#58017) (pingcap#58035) * statistics: right deal with error for reading stats from storage (pingcap#58… * statistics: lite init used wrong value to build table stats ver (pingcap#5802… * lightning, ddl: set TS to engineMeta after ResetEngineSkipAllocTS (pingcap#5… * *: avoid unlock of unlocked mutex panic on TableDeltaMap (pingcap#57799) (pingcap#57997) * ddl: handle context done after sending DDL jobs (pingcap#57945) (pingcap#57989) * *: activate txn for query on infoschema tables (pingcap#57937) (pingcap#57951) * lightning: add PK to internal tables (pingcap#57480) (pingcap#57932) * statistics: correct behavior of non-lite InitStats and stats sync load… * statistics: avoid stats meta full load after table analysis (pingcap#57756) (pingcap#57911) * dumpling: use I_S to get table list for TiDB and add database to WHERE… * br: fix insert gc failed due to slow schema reload (pingcap#57742) (pingcap#57907) * statistics: do not record historical stats meta if the table is locked… * metrics: remove the filled colors (pingcap#57838) (pingcap#57866) * planner: use TableInfo.DBID to locate schema (pingcap#57785) (pingcap#57870) * *: support cancel query like 'select * from information_schema.tables'… * session: make `TxnInfo()` return even if process info is empty (pingcap#57044) (pingcap#57161) * ddl: Fixed partitioning a non-partitioned table with placement rules (… * *: Reorg partition fix delete ranges and handling non-clustered tables… * executor: fix query infoschema.tables table_schema/table_name with fil… * ddl: check context done in isReorgRunnable function (pingcap#57813) (pingcap#57820) * ddl: fix ExistsTableRow and add tests for skip reorg checks (pingcap#57778) (pingcap#57801) * *: Fix for TRUNCATE PARTITION and Global Index (pingcap#57724) * br: prompt k8s.io/api version (pingcap#57791) (pingcap#57802) * statistics: fix some problem related to stats async load (pingcap#57723) (pingcap#57775) * expression: fix wrong calculation order of `radians` (pingcap#57672) (pingcap#57688) * statistics: rightly deal with timout when to send sync load (pingcap#57712) (pingcap#57751) * ddl: `tidb_scatter_region` variable supports setting value in both upp… * planner: fix that vector index output empty result when pk is non-int … * ddl: dynamically adjusting the max write speed of reorganization job (… * executor: fix hang in hash agg when exceeding memory limit leads to pa… * statistics: use infoschema api to get table info (pingcap#57574) (pingcap#57614) * planner: Use realtimeRowCount when all topN collected (pingcap#56848) (pingcap#57689) * statistics: handle deleted tables correctly in the PQ (pingcap#57649) (pingcap#57674) * backup: reset timeout on store level (pingcap#55526) (pingcap#57667) * planner/core: fix a wrong privilege check for CTE & UPDATE statement (…
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: close #58031
Problem Summary:
The previous lag calculation relied on c.lastCheckpoint.TS to compute the lag. However, this approach is unreliable, especially when ownership changes, as c.lastCheckpoint.TS is not guaranteed to increase steadily. This PR addresses the issue by introducing a global checkpoint timestamp that maintains a strictly non-decreasing state.
What changed and how does it work?
The lag calculation now utilizes a global checkpoint timestamp instead of c.lastCheckpoint.TS. This global timestamp ensures consistency and stability, as it always increases or stays the same, even during ownership transitions. This change guarantees a more robust and accurate lag measurement.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.