Skip to content

kvserver: add tracing to TestStoreRangeSplitBackpressureWrites#160459

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:ssd/TestStoreRangeSplitBackpressureWrites-logging
Jan 6, 2026
Merged

kvserver: add tracing to TestStoreRangeSplitBackpressureWrites#160459
craig[bot] merged 1 commit intocockroachdb:masterfrom
stevendanna:ssd/TestStoreRangeSplitBackpressureWrites-logging

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

In #160141 we observed this test fail. This test is rather stable but does depend on two goroutines sending batches to the KV server within a 100ms timeout. One suspicion is that very rarely we are too slow and this time-based coordination is not enough.

This test is rather stable and adding more sophisticated coordination will further complicate an already complicated test. Here, I add some tracing that will hopefully allow us to confirm the timing based theory if this ever happens again.

Fixes #160141

Release note: None

@stevendanna stevendanna requested a review from a team as a code owner January 5, 2026 12:23
@stevendanna stevendanna requested a review from tbg January 5, 2026 12:23
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

given how rare this is, I'm fine with closing this issue as part of this PR.

But let's also skip this test under duress, since we know it has a timing-based vulnerability.

@tbg made 2 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @stevendanna).


pkg/kv/kvserver/client_split_test.go line 1785 at r1 (raw file):

					close(blockSplits)
					t.Fatalf("delete was not blocked on split, returned err %v", err)
				case <-time.After(100 * time.Millisecond):

So the thinking is that we hit this branch before the put/del even got anywhere, and so then we unblock the split prematurely, and get no error from the <-resCh checks below, right?

Annoying problem. We could have a callback via a testing knob that confirms that these are blocking.

Copy link
Copy Markdown
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevendanna made 1 comment.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg).


pkg/kv/kvserver/client_split_test.go line 1785 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

So the thinking is that we hit this branch before the put/del even got anywhere, and so then we unblock the split prematurely, and get no error from the <-resCh checks below, right?

Annoying problem. We could have a callback via a testing knob that confirms that these are blocking.

Exactly. I started looking at a testing knob but none of the existing ones seemed like they would be called at the right time (TestingRequestFilter is called after we block on backpressure and moving that feels like it would be more disruptive to other tests than it is worth) and it didn't seem worth it to add a knob for this single test that is otherwise pretty stable.

In cockroachdb#160141 we observed this test fail. This test is rather stable but
does depend on two goroutines sending batches to the KV server within a
100ms timeout. One suspicion is that very rarely we are too slow and
this time-based coordination is not enough.

This test is rather stable and adding more sophisticated coordination
will further complicate an already complicated test. Here, I add some
tracing that will hopefully allow us to confirm the timing based theory
if this ever happens again.

Fixes cockroachdb#160141

Release note: None
@stevendanna stevendanna force-pushed the ssd/TestStoreRangeSplitBackpressureWrites-logging branch from 5faead5 to a1d38b8 Compare January 6, 2026 13:34
@stevendanna
Copy link
Copy Markdown
Collaborator Author

bors r=tbg

craig bot pushed a commit that referenced this pull request Jan 6, 2026
159819: cmd: add ghsearchdump and ghweeklysummary r=wenyihu6 a=wenyihu6

Epic: none 
Release note: none

---
**ghsearchdump**


---
**cmd/ghweeklysummary: add tool for generating weekly PR summaries**

```
Example usage:

# Last 1 week
go run ./pkg/cmd/ghweeklysummary --author wenyihu6 --weeks 1

# Specific date range
go run ./pkg/cmd/ghweeklysummary --author wenyihu6 --after 2025-01-01 --before 2025-02-01

# Save to file
go run ./pkg/cmd/ghweeklysummary --author wenyihu6 > weekly_summary.md
```

Release Notes: None
Epic: None

---
**cmd: move kv dev tools under pkg/cmd/kv/**

Move ghsearchdump and ghweeklysummary under pkg/cmd/kv/ for clearer
ownership. This seems okay given we already have pkg/cmd/allocsim.

```
Example usage:

❯ go run ./pkg/cmd/ghweeklysummary --author wenyihu6 --weeks 1

❯ go run ./pkg/cmd/kv/ghsearchdump \
  --search "repo:cockroachdb/cockroach author:wenyihu6" \
  --contribution-after 2025-12-12
tracking contributions from: [wenyihu6]
```



160459: kvserver: add tracing to TestStoreRangeSplitBackpressureWrites r=tbg a=stevendanna

In #160141 we observed this test fail. This test is rather stable but does depend on two goroutines sending batches to the KV server within a 100ms timeout. One suspicion is that very rarely we are too slow and this time-based coordination is not enough.

This test is rather stable and adding more sophisticated coordination will further complicate an already complicated test. Here, I add some tracing that will hopefully allow us to confirm the timing based theory if this ever happens again.

Fixes #160141

Release note: None

160499: pgwire: handle negative scale r=drewkimball,yuzefovich a=dils2k

The node could crash if it received a decimal with a negative scale value. This commit adds a check and returns an error instead.

Epic: CRDB-57539
Fixes: #158884

Release note (bug fix): CockroachDB could previously crash when handling decimals with negative scales via extended PGWire protocol and this has been fixed (an error is returned, similar to PG).

160526: sql: fix TestUnsplitRanges to work with external test tenants r=rafiss a=rafiss

Previously, TestUnsplitRanges was skipped in external test tenant mode because it scanned meta ranges directly and performed AdminSplit/ AdminUnsplit operations that external tenants cannot do.

This commit fixes the test by:
1. Using the system layer's DB for meta range operations (scanning meta ranges, checking sticky bits, splitting ranges) since tenants cannot access meta ranges directly.
2. Using the application layer's DB for table data operations which tenants can access.
3. Granting the CanAdminUnsplit capability to the external tenant so the GC job can unsplit ranges after dropping tables/indexes.

Resolves: #142388
Epic: CRDB-48944

Release note: None

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: wenyihu6 <wenyi@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: dils2k <dils.matchanov@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 6, 2026

Build failed (retrying...):

@craig craig bot merged commit f50ee9d into cockroachdb:master Jan 6, 2026
26 checks passed
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 6, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv/kvserver: TestStoreRangeSplitBackpressureWrites failed

3 participants