Skip to content

kvserver: deflake TestReadLoadMetricAccounting#141843

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
pav-kv:deflake-load-metric-test
Feb 21, 2025
Merged

kvserver: deflake TestReadLoadMetricAccounting#141843
craig[bot] merged 2 commits intocockroachdb:masterfrom
pav-kv:deflake-load-metric-test

Conversation

@pav-kv
Copy link
Copy Markdown
Collaborator

@pav-kv pav-kv commented Feb 21, 2025

Occasionally, a leader lease upgrade request interferes with the metrics measured by this test. This commit makes it wait for the upgrade first, before checking metrics.

Fixes #141716

@pav-kv pav-kv requested a review from arulajmani February 21, 2025 16:13
@pav-kv pav-kv requested a review from a team as a code owner February 21, 2025 16:13
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@pav-kv pav-kv force-pushed the deflake-load-metric-test branch from 622c900 to 42e78a2 Compare February 21, 2025 16:14
Occasionally, a leader lease upgrade request interferes with the metrics
measured by this test. This commit makes it wait for the upgrade first,
before checking metrics.

Epic: none
Releaste note: none
@pav-kv pav-kv force-pushed the deflake-load-metric-test branch from 42e78a2 to 55d1505 Compare February 21, 2025 16:15
@pav-kv
Copy link
Copy Markdown
Collaborator Author

pav-kv commented Feb 21, 2025

bors r=tbg

Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pav-kv)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 21, 2025

@craig craig bot merged commit 09d4505 into cockroachdb:master Feb 21, 2025
22 of 24 checks passed
@pav-kv pav-kv deleted the deflake-load-metric-test branch February 21, 2025 17:20
@pav-kv
Copy link
Copy Markdown
Collaborator Author

pav-kv commented Feb 24, 2025

blathers backport 25.1

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Feb 24, 2025

Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches.


Issue #141716: branch-release-25.1.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

craig bot pushed a commit that referenced this pull request Dec 19, 2025
159877: kvserver: deflake TestReadLoadMetricAccounting r=tbg a=tbg

`TestReadLoadMetricAccounting` has a history of flaking due to lease-related writes interfering with load metric measurements.

Issue #141716 (and #141586) identified the same failure signature:
```
Error: Max difference between 0 and 85 allowed is 4, but difference was -85
```

The root cause was identified by `@pav-kv:` an "unexpected" leader lease upgrade write was interfering with the test's write bytes measurements. PR #141843 added `tc.MaybeWaitForLeaseUpgrade()` to wait for lease upgrades before starting measurements.

**The fix from #141843 IS present** in the failing SHA. However, the test still flaked with the same error signature (85 write bytes when expecting 0).

The logs show:
1. AddSSTableRequest evaluated (test setup)
2. Many LeaseInfoRequest polls (from MaybeWaitForLeaseUpgrade)
3. RequestLeaseRequest (the lease upgrade write)
4. More LeaseInfoRequest polls
5. "lease is now of type: LeaseLeader" - **upgrade complete**
6. "test #1" - test loop begins
7. GetRequest evaluated (the actual test request)
8. **Assertion fails** - 85 write bytes observed

The race condition is subtle: `MaybeWaitForLeaseUpgrade()` waits until `FindRangeLeaseEx()` reports the lease is upgraded, but it does **not** guarantee that the write bytes have been recorded to load stats. This is because stats are recorded "awkwardly late" on the client goroutine (`SendWithWriteBytes`).

The fix:
1. Wraps each test case iteration in `SucceedsSoon`
2. Resets load stats, sends the request, checks results
3. If any stat doesn't match (due to background activity like lease upgrades), returns an error to trigger retry
4. Adds a comment noting that test cases must be idempotent (they are—all reads)

## Related Issues/PRs

| Issue/PR | Status | Relevance |
|----------|--------|-----------|
| #159719 | OPEN | Current failure |
| #141716 | CLOSED | Duplicate, Feb 2025 |
| #141586 | CLOSED | Original issue, Feb 2025 |
| #141843 | MERGED | Deflake attempt (wait for lease upgrade) |
| #141599 | MERGED | Added logging to help debug |
| #141905 | CLOSED | Duplicate |
| #134799 | CLOSED | Older occurrence |

This is more robust than trying to synchronize with specific background operations because it handles **any** source of interference, not just lease upgrades.

Epic: none
Closes #159719.


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv/kvserver: TestReadLoadMetricAccounting failed

4 participants