Skip to content

kvserver: deflake TestReadLoadMetricAccounting#159877

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:deflake-read-load-metric-accounting
Dec 19, 2025
Merged

kvserver: deflake TestReadLoadMetricAccounting#159877
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:deflake-read-load-metric-accounting

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Dec 19, 2025

TestReadLoadMetricAccounting has a history of flaking due to lease-related writes interfering with load metric measurements.

Issue #141716 (and #141586) identified the same failure signature:

Error: Max difference between 0 and 85 allowed is 4, but difference was -85

The root cause was identified by @pav-kv: an "unexpected" leader lease upgrade write was interfering with the test's write bytes measurements. PR #141843 added tc.MaybeWaitForLeaseUpgrade() to wait for lease upgrades before starting measurements.

The fix from #141843 IS present in the failing SHA. However, the test still flaked with the same error signature (85 write bytes when expecting 0).

The logs show:

  1. AddSSTableRequest evaluated (test setup)
  2. Many LeaseInfoRequest polls (from MaybeWaitForLeaseUpgrade)
  3. RequestLeaseRequest (the lease upgrade write)
  4. More LeaseInfoRequest polls
  5. "lease is now of type: LeaseLeader" - upgrade complete
  6. "test tidy up some correctness issues reported by go vet #1" - test loop begins
  7. GetRequest evaluated (the actual test request)
  8. Assertion fails - 85 write bytes observed

The race condition is subtle: MaybeWaitForLeaseUpgrade() waits until FindRangeLeaseEx() reports the lease is upgraded, but it does not guarantee that the write bytes have been recorded to load stats. This is because stats are recorded "awkwardly late" on the client goroutine (SendWithWriteBytes).

The fix:

  1. Wraps each test case iteration in SucceedsSoon
  2. Resets load stats, sends the request, checks results
  3. If any stat doesn't match (due to background activity like lease upgrades), returns an error to trigger retry
  4. Adds a comment noting that test cases must be idempotent (they are—all reads)

Related Issues/PRs

Issue/PR Status Relevance
#159719 OPEN Current failure
#141716 CLOSED Duplicate, Feb 2025
#141586 CLOSED Original issue, Feb 2025
#141843 MERGED Deflake attempt (wait for lease upgrade)
#141599 MERGED Added logging to help debug
#141905 CLOSED Duplicate
#134799 CLOSED Older occurrence

This is more robust than trying to synchronize with specific background operations because it handles any source of interference, not just lease upgrades.

Epic: none
Closes #159719.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg force-pushed the deflake-read-load-metric-accounting branch from 6a87d6a to eee13a8 Compare December 19, 2025 09:04
Wrap each test case in testutils.SucceedsSoon to handle interference
from background activity (e.g., async stats recording from lease
upgrades). If stats don't match expectations due to background writes,
the test resets and retries instead of failing immediately.

Fixes: cockroachdb#159719

Release note: None
@tbg tbg force-pushed the deflake-read-load-metric-accounting branch from eee13a8 to 7be2f39 Compare December 19, 2025 09:57
@tbg tbg marked this pull request as ready for review December 19, 2025 10:11
@tbg tbg requested a review from a team as a code owner December 19, 2025 10:11
@tbg tbg requested a review from pav-kv December 19, 2025 10:11
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 19, 2025

bors r+

Not sure what's wrong with the pull_request build - looks like an infra problem. Asked here

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 19, 2025

@craig craig bot merged commit 5b180f5 into cockroachdb:master Dec 19, 2025
23 of 25 checks passed
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Feb 11, 2026

blathers backport 26.1 25.4

@tbg tbg deleted the deflake-read-load-metric-accounting branch February 11, 2026 10:06
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Feb 11, 2026

Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches.


Issue #159719: branch-release-25.4, branch-release-26.1.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv/kvserver: TestReadLoadMetricAccounting failed

3 participants