Skip to content

txnkv: Add a named error ErrCommitTSLag when waiting for commitWaitUntilTSO timeout#1842

Merged
ti-chi-bot[bot] merged 1 commit intotikv:masterfrom
lcwangchao:commit_ts_lag
Jan 15, 2026
Merged

txnkv: Add a named error ErrCommitTSLag when waiting for commitWaitUntilTSO timeout#1842
ti-chi-bot[bot] merged 1 commit intotikv:masterfrom
lcwangchao:commit_ts_lag

Conversation

@lcwangchao
Copy link
Contributor

@lcwangchao lcwangchao commented Jan 14, 2026

close #1843

This PR:

  • returns an error ErrCommitTSLag if the PD TSO lags too much compared with the value set by txn. SetCommitWaitUntilTSOTimeout
  • Provide a method IsErrorCommitTSLag to determine the above error
  • Add more metrics:
    • txn_lag_commit_ts_wait_seconds: the total time waiting for the lag commit ts to reach the expected value
    • txn_lag_commit_ts_attempt_count: total attempts to get the PD TSO when lag is detected.
    • A new field CommitTsLagWaitTime for CommitDetails, which is the same as txn_lag_commit_ts_wait_seconds. It can be displayed in TiDB slow log details.

Summary by CodeRabbit

  • New Features

    • Adds commit-timestamp lag metrics (wait durations and attempt counts), exposes lag details in commit diagnostics, and introduces an explicit error signal for commit-timestamp lag.
    • Adds instrumentation and public hooks to surface commit-wait configuration and metrics.
  • Tests

    • Adds deterministic tests covering commit-wait scenarios, retries, timeouts, and metric assertions.
  • Chores

    • Bumps Go toolchain and updates indirect dependencies.

✏️ Tip: You can customize this high-level summary in your review settings.

@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 14, 2026
@lcwangchao
Copy link
Contributor Author

/retest

@lcwangchao lcwangchao force-pushed the commit_ts_lag branch 4 times, most recently from f138e86 to 7025a3a Compare January 14, 2026 10:47
@lcwangchao
Copy link
Contributor Author

/retest

1 similar comment
@lcwangchao
Copy link
Contributor Author

/retest

@lcwangchao lcwangchao force-pushed the commit_ts_lag branch 2 times, most recently from 97741b2 to 85e6b4c Compare January 14, 2026 12:31
@lcwangchao
Copy link
Contributor Author

/retest

1 similar comment
@lcwangchao
Copy link
Contributor Author

/retest

@lcwangchao lcwangchao force-pushed the commit_ts_lag branch 3 times, most recently from 03d20f8 to 5b1dba3 Compare January 14, 2026 14:29
@lcwangchao lcwangchao changed the title txnkv: Add an named error ErrCommitTSLag when waiting commitWaitUntilTSO timeout txnkv: Add a named error ErrCommitTSLag when waiting for commitWaitUntilTSO timeout Jan 15, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

Tip

For best results, initiate chat on the files or code changes.

@lcwangchao Hello! How can I assist you with this PR? I can help with:

  • Reviewing the changes if you'd like feedback on the implementation
  • Answering questions about the code or approach
  • Investigating issues if there are test failures or concerns
  • Generating documentation or code examples
  • Any other specific task you have in mind

Please let me know what you need!

@lcwangchao
Copy link
Contributor Author

@coderabbitai please review this PR

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

@lcwangchao I'll review the changes in this PR right away!

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

Adds a named error ErrCommitTSLag, a backoff config BoCommitTSLag, Prometheus histograms/observers, commit-timestamp lag detection and backoff retry in transaction commit flow with lag metrics and details, updates commit details model, extends integration tests, and bumps some example module dependencies.

Changes

Cohort / File(s) Change Summary
Error & Retry Config
error/error.go, config/retry/config.go
Add exported error ErrCommitTSLag and matcher IsErrorCommitTSLag(err error) bool; add backoff config BoCommitTSLag = NewConfig("commitTSLag", &metrics.BackoffHistogramEmpty, NewBackoffFnCfg(2, 500, NoJitter), tikverr.ErrCommitTSLag)
Metrics
metrics/metrics.go, metrics/shortcuts.go
Add histogram vectors TiKVTxnLagCommitTSWaitHistogram, TiKVTxnLagCommitTSAttemptHistogram and shortcut observers LagCommitTSWaitHistogramWithOK, LagCommitTSWaitHistogramWithError, LagCommitTSAttemptHistogramWithOK, LagCommitTSAttemptHistogramWithError
Execution Details / Model
util/execdetails.go
Add CommitTSLagDetails (WaitTime, BackoffCnt, FirstLagTS, WaitUntilTS); add LagDetails field to CommitDetails; add CommitTSLagDetails.Merge; include lag merging in CommitDetails.Merge and Clone
Transaction Core
txnkv/transaction/txn.go, txnkv/transaction/2pc.go
Implement lag-aware GetTimestampForCommit with firstAttemptTS, backoff-driven retry using BoCommitTSLag, metrics recording, fillCommitTSLagDetails; add GetCommitWaitUntilTSO() and GetCommitWaitUntilTSOTimeout() getters and CtxInGetTimestampForCommitKey; call fillCommitTSLagDetails from 2PC paths
Tests & Probe
integration_tests/option_test.go, txnkv/transaction/test_probe.go
Introduce mockCommitTSOracle, rewrite TestSetCommitWaitUntilTSO with deterministic TSO scenarios and histogram assertions; remove TxnProbe.GetCommitWaitUntilTSOTimeout() accessor
Examples / Dependencies
examples/txnkv/go.mod
Bump Go toolchain (1.23.0 → 1.23.12) and update indirect dependencies (kvproto, pd/client)

Sequence Diagram(s)

sequenceDiagram
    participant Txn as Transaction
    participant Backoff as Backoffer
    participant PD as PD/TSO
    participant Metrics as Metrics

    Txn->>PD: GetTimestampForCommit (first attempt)
    PD-->>Txn: startTS
    Txn->>Txn: if startTS < commitWaitUntilTSO -> record FirstLagTS, start timer
    loop retry until TS >= commitWaitUntilTSO or timeout
        Txn->>Backoff: apply BoCommitTSLag backoff
        Backoff-->>Txn: delay
        Txn->>PD: GetTimestampForCommit (retry)
        PD-->>Txn: updated TS
    end
    alt success
        Txn->>Metrics: record WaitHistogram (ok) and AttemptHistogram (ok)
        Txn-->>Caller: return TS with LagDetails
    else timeout/failure
        Txn->>Metrics: record WaitHistogram (err) and AttemptHistogram (err)
        Txn-->>Caller: return ErrCommitTSLag
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I twitch my nose at lagging time,
I wait, I back off, hop by hop,
I count the tries and mark each chime,
If TSO stalls, I flag the stop,
Metrics in paw, I name the flop.

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a named error ErrCommitTSLag for commit timeout scenarios, which is the primary objective of the PR.
Linked Issues check ✅ Passed All coding requirements from issue #1843 are met: ErrCommitTSLag error added with IsErrorCommitTSLag matcher; metrics for wait time and attempt count added; lag details exposed in CommitDetails.
Out of Scope Changes check ✅ Passed All changes are scoped to commit TSO lag detection and metrics. Go toolchain and dependency updates in examples/txnkv are development-only maintenance updates supporting the feature development.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
txnkv/transaction/txn.go (1)

1949-1951: Consider adding documentation for the exported context key.

CtxInGetTimestampForCommitKey is exported but lacks documentation explaining its purpose. Since it's used for test instrumentation, a brief comment would help maintainers understand when and how to use it.

// CtxInGetTimestampForCommitKey is a context key used in tests to identify
// which transaction's commit timestamp is being fetched, enabling mock oracles
// to return controlled TSO values.
var CtxInGetTimestampForCommitKey = ctxInGetTimestampForCommitKeyType{}

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc5b95 and bdc3c3a.

📒 Files selected for processing (10)
  • config/retry/config.go
  • error/error.go
  • examples/txnkv/go.mod
  • integration_tests/option_test.go
  • metrics/metrics.go
  • metrics/shortcuts.go
  • txnkv/transaction/2pc.go
  • txnkv/transaction/test_probe.go
  • txnkv/transaction/txn.go
  • util/execdetails.go
💤 Files with no reviewable changes (1)
  • txnkv/transaction/test_probe.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • error/error.go
  • metrics/metrics.go
  • txnkv/transaction/2pc.go
  • examples/txnkv/go.mod
🧰 Additional context used
🧬 Code graph analysis (2)
config/retry/config.go (2)
metrics/shortcuts.go (1)
  • BackoffHistogramEmpty (75-75)
error/error.go (1)
  • ErrCommitTSLag (95-95)
metrics/shortcuts.go (1)
metrics/metrics.go (2)
  • TiKVTxnLagCommitTSWaitHistogram (130-130)
  • TiKVTxnLagCommitTSAttemptHistogram (131-131)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: integration-local-race
  • GitHub Check: integration-next-gen-tikv
  • GitHub Check: integration-tikv
  • GitHub Check: integration-raw-tikv (v1ttl)
  • GitHub Check: integration-local
  • GitHub Check: integration-raw-tikv (v2)
  • GitHub Check: race-test
  • GitHub Check: test
🔇 Additional comments (12)
metrics/shortcuts.go (1)

195-199: LGTM!

The new lag commit TS metric shortcuts are correctly declared and initialized, following the established patterns in this file. The label values ("ok"/"err") are consistent with other metrics like TxnHeartBeatHistogramOK/TxnHeartBeatHistogramError.

Also applies to: 361-365

config/retry/config.go (1)

136-136: LGTM!

The new BoCommitTSLag backoff configuration is well-placed and follows the established patterns. Using BackoffHistogramEmpty is consistent with similar internal backoffs (BoTxnNotFound, BoMaxTsNotSynced), and the (base=2, cap=500, NoJitter) parameters provide fast initial retries with a reasonable cap for TSO lag scenarios.

util/execdetails.go (2)

141-165: LGTM with a minor note.

The CommitTSLagDetails struct and its Merge method are well-designed. The merge semantics appropriately accumulate WaitTime and BackoffCnt while using the latest timestamps.

Minor: The comment on line 162 says "For sample" - did you mean "For simplicity"?


171-171: LGTM!

The LagDetails field is properly integrated into CommitDetails:

  • Added to the struct at line 171
  • Correctly merged at line 199 via LagDetails.Merge
  • Properly cloned at line 260 (value copy is sufficient since CommitTSLagDetails has no pointer fields)

Also applies to: 199-199, 260-260

txnkv/transaction/txn.go (4)

1955-2043: LGTM!

The GetTimestampForCommit implementation is well-structured:

  1. Early return optimization (lines 1966-1968): Avoids unnecessary work when no lag exists.
  2. Fail-fast for excessive lag (lines 1996-2004): Prevents waiting when the clock drift clearly exceeds the timeout.
  3. Backoff-driven retry loop (lines 2009-2025): Uses the dedicated BoCommitTSLag backoff for controlled retries.
  4. Comprehensive metrics (lines 1970-1986): Records both success and failure cases with appropriate histograms.
  5. Detailed error context (lines 2028-2038): Wraps errors with diagnostic information including timestamps and retry counts.

The loop condition ts <= txn.commitWaitUntilTSO correctly ensures the commit TS strictly exceeds the wait threshold.


224-229: LGTM!

The commitLagWaitStats struct cleanly groups lag-related statistics collected during GetTimestampForCommit for later population into CommitDetails.LagDetails.


362-377: LGTM!

The getter methods GetCommitWaitUntilTSO() and GetCommitWaitUntilTSOTimeout() provide clean access to the commit wait configuration, useful for diagnostics and testing.


2045-2052: LGTM!

The fillCommitTSLagDetails helper correctly populates lag details only when lag actually occurred (indicated by firstAttemptTS > 0), avoiding pollution of metrics when there's no lag.

integration_tests/option_test.go (4)

72-89: LGTM!

The mockCommitTSOracle.GetTimestamp implementation now correctly uses type assertion (startTS, ok := ts.(uint64)) to compare the context value with the expected start timestamp, fixing the interface comparison issue from the previous review.


142-183: Well-structured test cases!

The test table comprehensively covers the key scenarios:

  1. No lag - TSO already exceeds threshold
  2. Lag with retry - One and two retry cases
  3. Excessive lag - Immediate failure when drift exceeds timeout
  4. Timeout during retry - Backoff exceeds configured timeout

207-231: LGTM!

The assertions correctly validate:

  • GetCommitWaitUntilTSO() reflects the configured value
  • Metric histogram increments match expected patterns ([1,1,0,0] for success with lag, [0,0,1,1] for errors)
  • CommitTSLagDetails fields are properly populated when lag occurs
  • Error cases correctly identify ErrCommitTSLag via tikverr.IsErrorCommitTSLag

111-119: LGTM!

The helper methods provide clean access to Prometheus metric values for test assertions. The type assertion is appropriate for test-only code where the metric types are known.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@integration_tests/option_test.go`:
- Line 216: The assertion is checking the method value
txn.KVTxn.SetCommitWaitUntilTSO (a method expression) which is always non-zero
in Go; remove the line asserting s.NotZero(txn.KVTxn.SetCommitWaitUntilTSO) and
rely on the existing check txn.GetCommitWaitUntilTSO() (already at line ~206) to
verify the commitWaitUntilTSO state.

In `@util/execdetails.go`:
- Line 158: Add a clarifying comment on the CommitDetails type explaining that
the LagDetails (CommitTSLagDetails) field is intentionally excluded from Clone()
and Merge(): it records metrics for a single commit attempt and is freshly
populated by fillCommitTSLagDetails() for each attempt rather than being
accumulated or preserved across clones/merges, so Clone() and Merge()
deliberately do not copy or merge LagDetails to avoid stale or double-counted
metrics.
🧹 Nitpick comments (1)
integration_tests/option_test.go (1)

72-88: The asymmetric lock/unlock pattern is confusing but correct.

The locking pattern where unlock happens either in an else branch (line 78) or via defer (line 78 for the matching case) works correctly but is hard to follow. Consider restructuring for clarity.

♻️ Suggested refactor for clarity
 func (o *mockCommitTSOracle) GetTimestamp(ctx context.Context, option *oracle.Option) (uint64, error) {
 	if ts := ctx.Value(transaction.CtxInGetTimestampForCommitKey); ts != nil {
 		o.mu.Lock()
-		if ts != o.mu.startTS {
-			o.mu.Unlock()
-		} else {
-			defer o.mu.Unlock()
+		defer o.mu.Unlock()
+		if ts == o.mu.startTS {
 			if len(o.mu.commitTSOffsets) == 0 {
 				panic("empty mock oracle ts set")
 			}
 			offset := o.mu.commitTSOffsets[0]
 			o.mu.commitTSOffsets = o.mu.commitTSOffsets[1:]
 			return o.mu.startTS + offset, nil
 		}
 	}
 	return o.Oracle.GetTimestamp(ctx, option)
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01cdf70 and 5b1dba3.

📒 Files selected for processing (10)
  • config/retry/config.go
  • error/error.go
  • examples/txnkv/go.mod
  • integration_tests/option_test.go
  • metrics/metrics.go
  • metrics/shortcuts.go
  • txnkv/transaction/2pc.go
  • txnkv/transaction/test_probe.go
  • txnkv/transaction/txn.go
  • util/execdetails.go
💤 Files with no reviewable changes (1)
  • txnkv/transaction/test_probe.go
🧰 Additional context used
🧬 Code graph analysis (2)
config/retry/config.go (2)
metrics/shortcuts.go (1)
  • BackoffHistogramEmpty (75-75)
error/error.go (1)
  • ErrCommitTSLag (95-95)
metrics/shortcuts.go (1)
metrics/metrics.go (2)
  • TiKVTxnLagCommitTSWaitHistogram (130-130)
  • TiKVTxnLagCommitTSAttemptHistogram (131-131)
🔇 Additional comments (15)
examples/txnkv/go.mod (1)

3-3: LGTM - Dependency updates look appropriate.

Go toolchain patch bump (1.23.0 → 1.23.12) and indirect dependency updates align with the new commit timestamp lag features introduced in this PR.

Also applies to: 25-25, 34-34

error/error.go (2)

94-95: LGTM - Error declaration follows existing patterns.

The new ErrCommitTSLag sentinel error is well-documented and placed appropriately alongside other TiKV-related errors.


425-429: LGTM - Helper function correctly uses errors.Is.

The IsErrorCommitTSLag helper follows the established pattern (similar to IsErrorUndetermined) and properly supports wrapped error chains.

util/execdetails.go (1)

141-152: LGTM - Well-documented struct for lag metrics.

The CommitTSLagDetails struct clearly captures all necessary information for commit timestamp lag tracking with appropriate field comments.

metrics/metrics.go (3)

130-131: LGTM - Histogram declarations follow existing patterns.

The new histogram vectors are consistently named and typed, matching the established conventions in this file.


952-970: LGTM - Histogram configuration is appropriate.

The bucket ranges are well-suited for their respective use cases:

  • Wait time: 0.5ms–16s covers typical PD TSO lag scenarios
  • Attempt count: 1–32 is reasonable for retry tracking

Both metrics follow naming conventions (_seconds for duration, _count for counts) and use the LblResult label consistently.


1075-1076: LGTM - Metrics properly registered.

Both new histograms are correctly added to RegisterMetrics().

config/retry/config.go (1)

136-136: LGTM - Backoff config follows established patterns.

BoCommitTSLag uses configuration values (base=2ms, cap=500ms, NoJitter) consistent with similar timestamp-related backoff configs like BoMaxTsNotSynced. Using BackoffHistogramEmpty is appropriate since dedicated lag metrics are tracked separately via TiKVTxnLagCommitTSWaitHistogram.

metrics/shortcuts.go (1)

195-199: LGTM!

The new metric shortcuts follow the established pattern in this file and correctly initialize the histogram observers with "ok" and "err" labels, consistent with other metrics like TxnHeartBeatHistogramOK/TxnHeartBeatHistogramError.

Also applies to: 361-365

txnkv/transaction/txn.go (3)

224-229: LGTM!

The commitLagWaitStats struct is well-designed to capture the lag statistics needed for metrics and diagnostics.


1955-2043: LGTM!

The GetTimestampForCommit implementation correctly handles the commit TS lag scenario:

  • Early return when no lag is detected (line 1966)
  • Fast-fail for excessive lag that exceeds timeout (lines 1991-2004)
  • Retry loop with backoff for recoverable lag (lines 2009-2025)
  • Proper error wrapping with diagnostic information (lines 2027-2040)
  • Metrics are recorded via defer for all lag cases

The loop exit condition ts <= txn.commitWaitUntilTSO correctly continues until we get a TS greater than the target, and lastAttemptTS is properly updated after each successful fetch.


2045-2052: LGTM!

The helper correctly guards with stats.waitDuration > 0 to only populate lag details when the lag path was actually exercised.

txnkv/transaction/2pc.go (2)

1772-1772: LGTM!

Correctly populates lag details after the async commit path's timestamp retrieval.


1908-1908: LGTM!

Correctly populates lag details after the regular 2PC commit timestamp retrieval.

integration_tests/option_test.go (1)

141-231: Well-structured parameterized test with comprehensive coverage.

The test cases cover key scenarios:

  • No lag (single attempt success)
  • Lag with 1-2 retries and success
  • Excessive lag (fail-fast)
  • Timeout during retries

The metric collection pattern using doWithCollectHistSamplesInc is a clean way to verify histogram increments.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 15, 2026

@tiancaiamao: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@integration_tests/option_test.go`:
- Around line 72-88: The comparison in mockCommitTSOracle.GetTimestamp is
comparing an interface{} (ctx.Value(...)) to a uint64, which always fails;
change it to assert the value from
ctx.Value(transaction.CtxInGetTimestampForCommitKey) to (uint64, ok) and only
proceed if ok && assertedValue == o.mu.startTS; update the branch that currently
checks `if ts := ...; ts != nil { ... if ts != o.mu.startTS { ... } else { ... }
}` to use the asserted uint64 value, keep the existing locking/unlocking and
panic/offset logic intact, and fall back to o.Oracle.GetTimestamp when the
assertion fails or the value is nil.
🧹 Nitpick comments (2)
integration_tests/option_test.go (1)

56-63: Consider documenting the mock oracle's expected usage.

The mockCommitTSOracle struct embeds oracle.Oracle anonymously and stores mock state. A brief doc comment explaining that this mock intercepts GetTimestamp calls for commit-phase testing (when CtxInGetTimestampForCommitKey is present) would improve clarity for future maintainers.

txnkv/transaction/txn.go (1)

373-377: Minor documentation clarification needed.

The doc comment states "If it returns zero, it means it is not set and the default timeout should be used", but commitWaitUntilTSOTimeout is initialized to time.Second in NewTiKVTxn (line 248). This means a newly created transaction will never return zero. Consider updating the documentation to reflect that the default value is time.Second, or clarify the intended semantics.

📝 Suggested documentation update
-// GetCommitWaitUntilTSOTimeout returns the timeout to wait for the commit tso reach the expected value.
-// If it returns zero, it means it is not set and the default timeout should be used
+// GetCommitWaitUntilTSOTimeout returns the timeout to wait for the commit tso reach the expected value.
+// The default value is time.Second if not explicitly overridden via SetCommitWaitUntilTSOTimeout.
 func (txn *KVTxn) GetCommitWaitUntilTSOTimeout() time.Duration {
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b1dba3 and 4dc5b95.

📒 Files selected for processing (10)
  • config/retry/config.go
  • error/error.go
  • examples/txnkv/go.mod
  • integration_tests/option_test.go
  • metrics/metrics.go
  • metrics/shortcuts.go
  • txnkv/transaction/2pc.go
  • txnkv/transaction/test_probe.go
  • txnkv/transaction/txn.go
  • util/execdetails.go
💤 Files with no reviewable changes (1)
  • txnkv/transaction/test_probe.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • error/error.go
  • metrics/metrics.go
  • examples/txnkv/go.mod
🧰 Additional context used
🧬 Code graph analysis (3)
integration_tests/option_test.go (7)
txnkv/transaction/txn.go (1)
  • CtxInGetTimestampForCommitKey (1951-1951)
tikv/test_probe.go (1)
  • StoreProbe (54-56)
tikv/kv.go (1)
  • KVStore (124-166)
integration_tests/util_test.go (1)
  • NewTestStore (71-85)
util/execdetails.go (3)
  • CommitDetails (168-193)
  • CommitDetailCtxKey (60-60)
  • CommitTSLagDetails (143-152)
metrics/shortcuts.go (4)
  • LagCommitTSWaitHistogramWithOK (196-196)
  • LagCommitTSAttemptHistogramWithOK (198-198)
  • LagCommitTSWaitHistogramWithError (197-197)
  • LagCommitTSAttemptHistogramWithError (199-199)
error/error.go (1)
  • IsErrorCommitTSLag (427-429)
metrics/shortcuts.go (1)
metrics/metrics.go (2)
  • TiKVTxnLagCommitTSWaitHistogram (130-130)
  • TiKVTxnLagCommitTSAttemptHistogram (131-131)
txnkv/transaction/txn.go (7)
config/retry/backoff.go (2)
  • Backoffer (60-76)
  • NewBackoffer (84-90)
tikv/backoff.go (2)
  • Backoffer (45-45)
  • NewBackoffer (61-63)
metrics/shortcuts.go (3)
  • LagCommitTSAttemptHistogramWithError (199-199)
  • LagCommitTSAttemptHistogramWithOK (198-198)
  • LagCommitTSWaitHistogramWithOK (196-196)
oracle/oracle.go (1)
  • GetTimeFromTS (118-121)
error/error.go (2)
  • ErrCommitTSLag (95-95)
  • IsErrorCommitTSLag (427-429)
config/retry/config.go (1)
  • BoCommitTSLag (136-136)
util/execdetails.go (1)
  • CommitTSLagDetails (143-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: integration-raw-tikv (v2)
  • GitHub Check: integration-local-race
  • GitHub Check: integration-raw-tikv (v1ttl)
  • GitHub Check: integration-next-gen-tikv
  • GitHub Check: integration-tikv
  • GitHub Check: race-test
  • GitHub Check: integration-local
🔇 Additional comments (11)
txnkv/transaction/2pc.go (1)

1772-1772: LGTM - Lag details properly captured after timestamp fetching.

The fillCommitTSLagDetails calls are correctly placed immediately after the timestamp fetching operations (GetLatestTsTime and GetCommitTsTime), ensuring lag metrics are captured at the right points in the commit flow for both async commit/1PC and regular 2PC paths.

Also applies to: 1908-1908

config/retry/config.go (1)

136-136: LGTM - Backoff configuration for commit TS lag.

The BoCommitTSLag configuration follows the established pattern with appropriate parameters: a short base (2ms) for quick retries, reasonable cap (500ms), and NoJitter consistent with similar time-sensitive backoffs like BoMaxTsNotSynced.

metrics/shortcuts.go (1)

195-199: LGTM - Metric shortcuts follow established patterns.

The new lag-related histogram observer shortcuts are correctly declared and initialized, following the existing conventions for OK/error label variants (e.g., TxnHeartBeatHistogramOK/Error).

Also applies to: 361-365

util/execdetails.go (2)

141-165: LGTM - New CommitTSLagDetails struct with proper merge semantics.

The struct correctly captures lag metrics, and the Merge method appropriately accumulates wait time and backoff count while using the latest timestamp values.

One minor note: other.FirstLagTS <= 0 on a uint64 is equivalent to == 0 since unsigned integers cannot be negative. This works correctly but == 0 would be clearer.


171-171: LGTM - LagDetails integration into CommitDetails.

The LagDetails field is correctly added and properly handled in both Merge (line 199) and Clone (line 260). Since CommitTSLagDetails contains only value types, the direct struct assignment in Clone provides correct copy semantics.

Also applies to: 199-199, 260-260

integration_tests/option_test.go (1)

120-232: Well-structured table-driven test covering lag scenarios.

The test comprehensively covers:

  • No lag (single TSO fetch succeeds)
  • Lag with retry success (1 and 2 retries)
  • Excessive lag (immediate failure)
  • Timeout during retry

The metric increment collection via doWithCollectHistSamplesInc and lag detail assertions are thorough.

txnkv/transaction/txn.go (5)

71-71: LGTM!

Import of the intest package for test-time instrumentation is appropriate.


224-229: LGTM!

The commitLagWaitStats struct cleanly encapsulates the lag tracking state, with fields that align well with util.CommitTSLagDetails.


1949-1951: LGTM!

The context key pattern using an unexported type with an exported variable is idiomatic Go for context keys, preventing key collisions.


1955-2043: LGTM with minor observation.

The lag handling logic is well-structured:

  1. Early return when no lag detected (line 1966-1968)
  2. Early error if clock drift exceeds timeout (line 1996-2004)
  3. Backoff loop with proper timeout budget (line 2007-2025)
  4. Metrics recorded appropriately in defer with OK/Error distinction

The approach of creating a new Backoffer at line 2007 with maxSleep as its budget is appropriate, as the lag retry has its own dedicated timeout separate from the original operation's budget.


2045-2053: LGTM!

The helper correctly populates lag details only when lag actually occurred (guarded by stats.firstAttemptTS > 0), with proper field mapping to util.CommitTSLagDetails.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

…tilTSO` timeout

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
Copy link
Contributor

@cfzjywxk cfzjywxk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 15, 2026
@ti-chi-bot ti-chi-bot bot added the lgtm label Jan 15, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, ekexium, tiancaiamao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 15, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 15, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-15 08:07:05.350693941 +0000 UTC m=+49252.964650787: ☑️ agreed by cfzjywxk.
  • 2026-01-15 08:14:29.57463498 +0000 UTC m=+49697.188591826: ☑️ agreed by ekexium.

@ti-chi-bot ti-chi-bot bot merged commit 30119fc into tikv:master Jan 15, 2026
13 checks passed
@lcwangchao lcwangchao deleted the commit_ts_lag branch January 15, 2026 08:26
lcwangchao added a commit to lcwangchao/client-go that referenced this pull request Jan 20, 2026
…tUntilTSO` timeout (tikv#1842)

close tikv#1843

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
lcwangchao added a commit to lcwangchao/client-go that referenced this pull request Jan 20, 2026
…tUntilTSO` timeout (tikv#1842)

close tikv#1843

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
joechenrh pushed a commit to joechenrh/client-go that referenced this pull request Jan 21, 2026
…tUntilTSO` timeout (tikv#1842)

close tikv#1843

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
lcwangchao added a commit to lcwangchao/client-go that referenced this pull request Jan 26, 2026
…tUntilTSO` timeout (tikv#1842)

close tikv#1843

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
@coderabbitai coderabbitai bot mentioned this pull request Feb 5, 2026
wshwsh12 pushed a commit to wshwsh12/client-go that referenced this pull request Feb 10, 2026
…tUntilTSO` timeout (tikv#1842)

close tikv#1843

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
wshwsh12 pushed a commit to wshwsh12/client-go that referenced this pull request Feb 10, 2026
…tUntilTSO` timeout (tikv#1842)

close tikv#1843

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an named error ErrCommitTSLag when waiting commitWaitUntilTSO timeout

4 participants