server: batch large span stats requests by zachlite · Pull Request #109464 · cockroachdb/cockroach

zachlite · 2023-08-24T21:13:16Z

Previously, span stats requests could only request a maximum of 500 spans at a time, otherwise the request would return an error.

This commit adds transparent batching at both the system tenant's handler and the secondary tenant's handler. The cluster setting server.span_stats.span_batch_limit is used to control the size of the batches. The default value has been raised to 1000.

Resolves #105638
Epic: CRDB-30635
Release note (ops change): Requests for database details or table details from the UI, or usages of SHOW RANGES WITH DETAILS are no longer subject to errors if the number of requested spans is too large.

cockroach-teamcity · 2023-08-24T21:13:26Z

This change is

maryliag

looks like you're missing a ./dev gen bazel
Also, you have tests failing, so not sure if there is something wrong on the code itself

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier)

j82w

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @zachlite)

pkg/server/span_stats_server.go line 387 at r2 (raw file):

// verifySpanStatsRequest returns an error if the request can not be serviced.
// Requests can not be serviced if the active cluster version is less than 23.1,
// or if the request is made using the pre 23.1 format.

Why can't the request serviced? Blocking mixed versions is painful for users and TSEs trying to troubleshoot.

pkg/server/span_stats_server.go line 433 at r2 (raw file):

	// Allocate memory once for the required number of batches.
	totalSpans := len(req.Spans)
	batches := make([][]roachpb.Span, 0, (totalSpans+batchSize-1)/batchSize)

What is the benefit of batching? Why not just send everything at once?

zachlite

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @j82w)

pkg/server/span_stats_server.go line 387 at r2 (raw file):

Previously, j82w (Jake) wrote…

Why can't the request serviced? Blocking mixed versions is painful for users and TSEs trying to troubleshoot.

The semantics of the nodeID field changed between 22.2 and 23.1, and the startKey and endKey params were replaced with the spans field. Basically, the behavior of the endpoint and the wire formats changed enough between 22.2 and 23.1 to warrant breaking compatibility.

But there are no proto behavior changes in this PR - I was just moving some code around and figured I'd add a comment.

pkg/server/span_stats_server.go line 433 at r2 (raw file):

Previously, j82w (Jake) wrote…

What is the benefit of batching? Why not just send everything at once?

Resource consumption. If a SQL pod makes this request, we want to be able to attribute the cost of the request to the pod, while we keep the burden away from KV. KV servicing lots of small requests will consume less memory than one big request. At least in theory :D

j82w

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @zachlite)

pkg/server/span_stats_server.go line 446 at r2 (raw file):

	res.SpanToStats = make(map[string]*roachpb.SpanStats, totalSpans)

	for _, batch := range batches {

Can this be optimized? The batches array allocation can be avoid by just merging the logic above with this loop. A slice can be used to limit the spans being sent.

j82w · 2023-08-31T12:01:21Z

pkg/server/span_stats_server.go line 454 at r2 (raw file):

		for k, v := range batchRes.SpanToStats {
			res.SpanToStats[k] = v

Should the batching logic first split the request by node id then by batch size? It would be bad for perf if the first 800 are for node 1, then next 800 are for node 2, etc because it would end up doing multiple calls to node 2.

j82w · 2023-08-31T12:02:10Z

pkg/server/status.go line 3633 at r2 (raw file):

	batchSize := int(roachpb.SpanStatsBatchLimit.Get(&s.st.SV))
	return batchedSpanStats(ctx, req, s.sqlServer.tenantConnect.SpanStats, batchSize)

What is the reason the batch logic was not done in the s.sqlServer.tenantConnect.SpanStats which already has to split the request by node id?

zachlite

Dismissed @j82w from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w)

pkg/server/span_stats_server.go line 454 at r2 (raw file):

Previously, j82w (Jake) wrote…

Should the batching logic first split the request by node id then by batch size? It would be bad for perf if the first 800 are for node 1, then next 800 are for node 2, etc because it would end up doing multiple calls to node 2.

Discussed offline. Batches need to happen before node ID lookups.

pkg/server/status.go line 3633 at r2 (raw file):

Previously, j82w (Jake) wrote…

What is the reason the batch logic was not done in the s.sqlServer.tenantConnect.SpanStats which already has to split the request by node id?

Discussed offline. Requests directly to the system tenant should be batched too.

j82w

Reviewed 1 of 4 files at r1, 1 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @zachlite)

pkg/server/span_stats_server.go line 466 at r3 (raw file):

		}

		for k, v := range batchRes.SpanToStats {

Is there any chance the same key can be returned multiple times? 1 from the lease holder and 1 from a non-lease holder? Just want to confirm there doesn't need to be some for of merge logic instead of just replacing the value.

pkg/server/span_stats_server_test.go line 88 at r3 (raw file):

		// Assert that the responses from each batch are returned to the caller.
		require.Equal(t, tcase.nSpans, len(res.SpanToStats))

It would be good to validate the actual content is valid.

Previously, span stats requests could only request a maximum of 500 spans at a time, otherwise the request would return an error. This commit adds transparent batching at both the system tenant's handler and the secondary tenant's handler. The cluster setting `server.span_stats.span_batch_limit` is used to control the size of the batches. The default value has been raised to 1000. Resolves cockroachdb#105638 Epic: CRDB-30635 Release note (ops change): Requests for database details or table details from the UI, or usages of `SHOW RANGES WITH DETAILS` are no longer subject to errors if the number of requested spans is too large.

zachlite · 2023-09-01T15:36:16Z

bors r+

craig · 2023-09-01T16:32:37Z

Build succeeded:

Bazel Essential CI (Cockroach)

blathers-crl · 2023-09-01T16:32:49Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error setting reviewers, but backport branch blathers/backport-release-23.1-109464 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/109902/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []

Backport to branch 23.1.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

zachlite requested a review from a team as a code owner August 24, 2023 21:13

zachlite requested a review from a team August 24, 2023 21:13

zachlite requested a review from a team as a code owner August 24, 2023 21:13

zachlite requested review from a team and abarganier and removed request for a team August 24, 2023 21:13

zachlite added the backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only label Aug 24, 2023

maryliag reviewed Aug 25, 2023

View reviewed changes

zachlite force-pushed the 230821.span_stats_batches branch from e47fd1d to 1f0a990 Compare August 25, 2023 14:38

zachlite requested a review from a team as a code owner August 25, 2023 14:38

zachlite force-pushed the 230821.span_stats_batches branch from 1f0a990 to a7b8941 Compare August 25, 2023 15:08

j82w reviewed Aug 25, 2023

View reviewed changes

zachlite commented Aug 25, 2023

View reviewed changes

abarganier removed their request for review August 28, 2023 15:00

j82w reviewed Aug 30, 2023

View reviewed changes

zachlite force-pushed the 230821.span_stats_batches branch from a7b8941 to 1aba190 Compare August 31, 2023 16:39

zachlite commented Aug 31, 2023

View reviewed changes

j82w approved these changes Aug 31, 2023

View reviewed changes

zachlite force-pushed the 230821.span_stats_batches branch from 1aba190 to 4e14279 Compare September 1, 2023 14:39

craig bot merged commit d9c137d into cockroachdb:master Sep 1, 2023

blathers-crl bot mentioned this pull request Sep 1, 2023

release-23.1: server: batch large span stats requests #109902

Merged

cockroach-teamcity mentioned this pull request Sep 2, 2023

PR #109464 - server: batch large span stats requests cockroachdb/docs#17811

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: batch large span stats requests#109464

server: batch large span stats requests#109464
craig[bot] merged 1 commit intocockroachdb:masterfrom
zachlite:230821.span_stats_batches

zachlite commented Aug 24, 2023

Uh oh!

cockroach-teamcity commented Aug 24, 2023

Uh oh!

maryliag left a comment

Uh oh!

j82w left a comment

Uh oh!

zachlite left a comment

Uh oh!

j82w left a comment

Uh oh!

j82w commented Aug 31, 2023

Uh oh!

j82w commented Aug 31, 2023

Uh oh!

zachlite left a comment

Uh oh!

j82w left a comment

Uh oh!

zachlite commented Sep 1, 2023

Uh oh!

craig bot commented Sep 1, 2023

Uh oh!

blathers-crl bot commented Sep 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zachlite commented Aug 24, 2023

Uh oh!

cockroach-teamcity commented Aug 24, 2023

Uh oh!

maryliag left a comment

Choose a reason for hiding this comment

Uh oh!

j82w left a comment

Choose a reason for hiding this comment

Uh oh!

zachlite left a comment

Choose a reason for hiding this comment

Uh oh!

j82w left a comment

Choose a reason for hiding this comment

Uh oh!

j82w commented Aug 31, 2023

Uh oh!

j82w commented Aug 31, 2023

Uh oh!

zachlite left a comment

Choose a reason for hiding this comment

Uh oh!

j82w left a comment

Choose a reason for hiding this comment

Uh oh!

zachlite commented Sep 1, 2023

Uh oh!

craig bot commented Sep 1, 2023

Uh oh!

blathers-crl bot commented Sep 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants