Skip to content

admission: Split stats for WorkQueueMetrics#88076

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
andrewbaptist:work_queue_stats
Sep 27, 2022
Merged

admission: Split stats for WorkQueueMetrics#88076
craig[bot] merged 1 commit intocockroachdb:masterfrom
andrewbaptist:work_queue_stats

Conversation

@andrewbaptist
Copy link
Copy Markdown

@andrewbaptist andrewbaptist commented Sep 16, 2022

This commit splits the WorkQueueMetric stats into priority.
Because there are a lot of new stats that might be generated
as a result of this splitting, only the "important" priorities
are collected and considered for each queue.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@andrewbaptist andrewbaptist force-pushed the work_queue_stats branch 15 times, most recently from f4ed0ac to 07bf2d7 Compare September 21, 2022 19:35
@andrewbaptist andrewbaptist force-pushed the work_queue_stats branch 3 times, most recently from 2389725 to 905c884 Compare September 22, 2022 15:49
@andrewbaptist andrewbaptist marked this pull request as ready for review September 22, 2022 17:39
@andrewbaptist andrewbaptist requested a review from a team as a code owner September 22, 2022 17:39
@andrewbaptist andrewbaptist requested a review from a team September 22, 2022 17:39
@andrewbaptist andrewbaptist requested a review from a team as a code owner September 22, 2022 17:39
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 9 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @irfansharif)


pkg/server/server.go line 222 at r1 (raw file):

		admissionOptions.Override(opts)
	}
	admissionOptions.Settings = st

why this change to pull out settings from the options?


pkg/ts/catalog/chart_catalog.go line 3473 at r1 (raw file):

					"admission.errored.kv.0",
					"admission.errored.kv.100",
					"admission.errored.sql-kv-response.0",

is this 0 and 100 the priority values?
I am wondering why we've picked these to populate here and not the others in the admissionpb.WorkPriority enum.


pkg/util/admission/grant_coordinator.go line 158 at r1 (raw file):

	// This is IO work, so override the usesTokens value.
	opts.usesTokens = true
	// TODO(sumeer): add per-store WorkQueue state for debug.zip and db console.

this revised code is also sharing the WorkQueue metrics across all stores, yes?
This would be a good thing to fix in a followup PR.


pkg/util/admission/grant_coordinator.go line 970 at r1 (raw file):

// Prevent the linter from emitting unused warnings.
var _ = NewGrantCoordinatorSQL

why is this removed?


pkg/util/admission/work_queue.go line 1598 at r1 (raw file):

	// TODO(abaptist): This is done to pre-register stats. Need to check that we
	// getOrCreate "enough" of the priorities to be useful.
	for _, pri := range applicablePriorities {

if the lazy creation is working correctly, this eager creation seems unnecessary. Is is needed for the list in chart_catalog?


pkg/kv/kvserver/client_split_test.go line 1224 at r1 (raw file):

			// Set maxBytes to something small so we can exceed the maximum split
			// size without adding 2x64MB of data.
			const maxBytes = 1 << 17

why this change?

Copy link
Copy Markdown
Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @sumeerbhola)


pkg/server/server.go line 222 at r1 (raw file):

Previously, sumeerbhola wrote…

why this change to pull out settings from the options?

There was inconsistency in the NewGrantCoordinator in terms of how it processed the settings. This wasn't strictly necessary, however, it made it harder to follow the code in terms of where the settings vs options came from. The goal was really to have more immutable options and settings coming through this code to make reasoning and testing easier. I don't mind reverting this part of the patch (or pulling into another commit).


pkg/ts/catalog/chart_catalog.go line 3473 at r1 (raw file):

Previously, sumeerbhola wrote…

is this 0 and 100 the priority values?
I am wondering why we've picked these to populate here and not the others in the admissionpb.WorkPriority enum.

These were the values that were most commonly used (0, 100 and -30). Once lazy loading is in place in the future, ideally this won't be necessary to do.


pkg/util/admission/grant_coordinator.go line 158 at r1 (raw file):

Previously, sumeerbhola wrote…

this revised code is also sharing the WorkQueue metrics across all stores, yes?
This would be a good thing to fix in a followup PR.

This comment seemed out of place after this chance the metrics were more obviously inserted as part of the makeStoreRequesterFunc.


pkg/util/admission/grant_coordinator.go line 970 at r1 (raw file):

Previously, sumeerbhola wrote…

why is this removed?

Not removed, just moved up to right before the method to make it more obvious what it was referring to. The NewGrantCoordinatorSQL method will need a little work before it is enabled, and it took me a while to understand how it worked (since it was repeating metrics names) as I didn't see that it was unused.


pkg/util/admission/work_queue.go line 1598 at r1 (raw file):

Previously, sumeerbhola wrote…

if the lazy creation is working correctly, this eager creation seems unnecessary. Is is needed for the list in chart_catalog?

lazy creation is working correctly from the metrics perspective and they get registered, unfortunately the registry does not currently handle lazy addtion (see https://cockroachlabs.slack.com/archives/C01CNRP6TSN/p1663779962687259 )


pkg/kv/kvserver/client_split_test.go line 1224 at r1 (raw file):

Previously, sumeerbhola wrote…

why this change?

The two tests started failing after making the change to stats. This test initializes a system with a greatly reduced maxBytes and then attempts to create a maxBytes size value and check what happens on split. With the new stats, we push the metrics value over the 64KB limit and therefore this test fails because it also gets added to the split queue. The 1 <<16 was somewhat arbitrary, so this change simply makes it large enough so the metrics range is no longer included.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 9 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @sumeerbhola)


-- commits line 7 at r1:
Please add something like
Informs #82743


pkg/server/server.go line 222 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

There was inconsistency in the NewGrantCoordinator in terms of how it processed the settings. This wasn't strictly necessary, however, it made it harder to follow the code in terms of where the settings vs options came from. The goal was really to have more immutable options and settings coming through this code to make reasoning and testing easier. I don't mind reverting this part of the patch (or pulling into another commit).

btw, I don't think we have a principle regarding separating immutable and mutable configuration -- we tend to pile them all into Config/Options structs. Maybe kv code is principled in this regard, in which case I'd also like to know what the idiom is. One example is base.StorageConfig which includes cluster.Settings.

But I am ok with your change.


pkg/ts/catalog/chart_catalog.go line 3473 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

These were the values that were most commonly used (0, 100 and -30). Once lazy loading is in place in the future, ideally this won't be necessary to do.

So will we not see metrics for -30, -50, -100 for kv, kv-stores etc. given that the lazy registration does not work?
If yes, we should eagerly register them. One of the reasons we wanted to change is to use it when diagnosing workloads where we have user-facing stuff (typically priority 0) and various bulk operations running concurrently (and the latter will use -30 and -100).


pkg/util/admission/grant_coordinator.go line 158 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

This comment seemed out of place after this chance the metrics were more obviously inserted as part of the makeStoreRequesterFunc.

nit: makeStoreRequesterFunc is accepting metrics created in makeStoresGrantCoordinators and not creating them itself, so we need a TODO comment about per-store metrics somewhere in StoreGrantCoordinators. See my comment below about a better place.


pkg/util/admission/grant_coordinator.go line 401 at r1 (raw file):

	// These metrics are shared across all stores and all
	// priorities across stores (even the coarser workClasses, which are a
	// mapping from priority, share the same metrics).

Comment needs updating since the "all priorities" part is stale now. Can you add a TODO here about per-store metrics?


pkg/util/admission/grant_coordinator.go line 970 at r1 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Not removed, just moved up to right before the method to make it more obvious what it was referring to. The NewGrantCoordinatorSQL method will need a little work before it is enabled, and it took me a while to understand how it worked (since it was repeating metrics names) as I didn't see that it was unused.

Ack

"admission.admitted.kv.0",
"admission.admitted.kv.100",
"admission.admitted.sql-kv-response.0",
"admission.admitted.sql-kv-response.100",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the numerical suffix, can we use the name of priority level instead? I can see us needing to slot in other priorities numerically without wanting that to reflect in different metric names/updating UI/warranting a release note. It's also easier to understand what these specific metrics represent without needing to look at code.

case "init-store-grant-coordinator":
clearRequesterAndCoord()
metrics := makeGrantCoordinatorMetrics()
workQueuMetrics := makeWorkQueueMetrics("", registry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.

// getOrCreate will return the metric if it exists or create it and then return
// it if it didn't previously exist.
func (m *WorkQueueMetrics) getOrCreate(priority admissionpb.WorkPriority) workQueueMetricsSingle {
// Try loading from the map first. This will not fin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete sentence.

statPrefix := fmt.Sprintf("%v.%v", m.name, priority)
val, ok = m.byPriority.LoadOrStore(priority, makeWorkQueueMetricsSingle(statPrefix))
if !ok {
m.registry.AddMetricStruct(val)
Copy link
Copy Markdown
Contributor

@irfansharif irfansharif Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this lazy registration -- do these metrics show up in CRDB's internal timeseries database? I'm asking whether it works with everything under pkg/server/status/recorder.go since a lot of that code was written with upfront registration in mind.

}
// TODO(abaptist): This is done to pre-register stats. Need to check that we
// getOrCreate "enough" of the priorities to be useful.
for _, pri := range applicablePriorities {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems we're doing pre-registration here anyway. Do we need the LoadOrStore above when finding the right metric? If everything is properly pre-registered, shouldn't we fatal if we don't find something in the sync.Map?

Copy link
Copy Markdown
Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @sumeerbhola)


-- commits line 7 at r1:

Previously, sumeerbhola wrote…

Please add something like
Informs #82743

Done


pkg/ts/catalog/chart_catalog.go line 3468 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Instead of the numerical suffix, can we use the name of priority level instead? I can see us needing to slot in other priorities numerically without wanting that to reflect in different metric names/updating UI/warranting a release note. It's also easier to understand what these specific metrics represent without needing to look at code.

Done


pkg/ts/catalog/chart_catalog.go line 3473 at r1 (raw file):

Previously, sumeerbhola wrote…

So will we not see metrics for -30, -50, -100 for kv, kv-stores etc. given that the lazy registration does not work?
If yes, we should eagerly register them. One of the reasons we wanted to change is to use it when diagnosing workloads where we have user-facing stuff (typically priority 0) and various bulk operations running concurrently (and the latter will use -30 and -100).

Done, I added some additional stats, I am attempting to avoid overloading the number of stats. Having a few missed stats isn't too bad because they will fall into the "default" bucket.


pkg/util/admission/grant_coordinator.go line 158 at r1 (raw file):

Previously, sumeerbhola wrote…

nit: makeStoreRequesterFunc is accepting metrics created in makeStoresGrantCoordinators and not creating them itself, so we need a TODO comment about per-store metrics somewhere in StoreGrantCoordinators. See my comment below about a better place.

Done


pkg/util/admission/grant_coordinator.go line 401 at r1 (raw file):

Previously, sumeerbhola wrote…

Comment needs updating since the "all priorities" part is stale now. Can you add a TODO here about per-store metrics?

Done


pkg/util/admission/grant_coordinator.go line 970 at r1 (raw file):

Previously, sumeerbhola wrote…

Ack

Done


pkg/util/admission/work_queue.go line 1526 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Incomplete sentence.

Done


pkg/util/admission/work_queue.go line 1537 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

With this lazy registration -- does these metrics show up in CRDB's internal timeseries database? I'm asking whether it works with everything under pkg/server/status/recorder.go since a lot of that code was written with upfront registration in mind.

No - the lazy registration today does not handle either the internal or prometheus database. I hope we can remove the pre-registration in the future, but I didn't dive into that code now.


pkg/util/admission/work_queue.go line 1598 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

So it seems we're doing pre-registration here anyway. Do we need the LoadOrStore above when finding the right metric? If everything is properly pre-registered, shouldn't we fatal if we don't find something in the sync.Map?

We don't currently pre-register everything so this shouldn't be fatal. I want to remove pre-registration in the future, but didn't want to try and do this as part of this commit. I kept LoadOrStore for that reason.


pkg/util/admission/granter_test.go line 124 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Typo.

Done

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 5 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist, @irfansharif, and @sumeerbhola)


pkg/util/admission/work_queue.go line 1529 at r3 (raw file):

	Admitted        *metric.Counter
	Errored         *metric.Counter
	WaitDurationSum *metric.Counter

You are removing this because we don't find it useful, yes?
My original motivation was to be able to look at mean latency, but it is not possible to compute delta(waitDurationSum)/delta(requested) in our internal metrics system, though it it possible with Prometheus.
I'll leave it to @irfansharif and your judgement on whether this is useful or not.

Copy link
Copy Markdown
Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @sumeerbhola)


pkg/util/admission/work_queue.go line 1529 at r3 (raw file):

Previously, sumeerbhola wrote…

You are removing this because we don't find it useful, yes?
My original motivation was to be able to look at mean latency, but it is not possible to compute delta(waitDurationSum)/delta(requested) in our internal metrics system, though it it possible with Prometheus.
I'll leave it to @irfansharif and your judgement on whether this is useful or not.

This is automatically computed as part of the WaitDurations histogram, and the values are the same. So this was a redundant state. I'm not sure about the internal metrics, but I assume they could use the sum metric. Prometheus can definitely use the histogram counter to get the sum

This commit splits the WorkQueueMetric stats into priority.
Because there are a lot of new stats that might be generated
as a result of this splitting, only the "important" priorities
are collected and considered for each queue.
Informs cockroachdb#82743.

Release note: None
Copy link
Copy Markdown
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist, @irfansharif, and @sumeerbhola)


pkg/ts/catalog/chart_catalog.go line 3463 at r4 (raw file):

					"admission.admitted.elastic-cpu",
					"admission.errored.elastic-cpu",
					"admission.admitted.elastic-cpu.bulk",

Optional nit: add the "-pri" suffix so it's even clearer.


pkg/util/admission/work_queue.go line 1529 at r3 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

This is automatically computed as part of the WaitDurations histogram, and the values are the same. So this was a redundant state. I'm not sure about the internal metrics, but I assume they could use the sum metric. Prometheus can definitely use the histogram counter to get the sum

n := float64(mtr.TotalCountWindowed())
fn(name+"-count", n)
avg := mtr.TotalSumWindowed() / n
if math.IsNaN(avg) || math.IsInf(avg, +1) || math.IsInf(avg, -1) {
avg = 0
}
fn(name+"-avg", avg)
for _, pt := range recordHistogramQuantiles {
fn(name+pt.suffix, mtr.ValueAtQuantileWindowed(pt.quantile))
}

The sum is missing, but the avg is computed already. So this is redundant and can be removed.

@andrewbaptist
Copy link
Copy Markdown
Author

bors r=irfansharif

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 27, 2022

Build succeeded:

@craig craig bot merged commit 471f9b6 into cockroachdb:master Sep 27, 2022
@andreimatei
Copy link
Copy Markdown
Contributor

This patch increased the number of timeseries exported to tsdb by crdb nodes from 731 to 1739. If there's histograms in there, it might be even much worse for Prometheus.
I don't think we can afford anything like it...

@andrewbaptist
Copy link
Copy Markdown
Author

I was concerned about the impact of this change. I can easily disable all of these and will hold off on any backporting. I might need to rethink if there is a way to get the data we want without exploding stats.

@andreimatei
Copy link
Copy Markdown
Contributor

I think I fooled myself when counting, sorry. I now think that this patch only took the timeseries count from 1543 to 1739. So, now quite as bad as I thought. Still significant, though.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/util/admission/work_queue.go line 1529 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

n := float64(mtr.TotalCountWindowed())
fn(name+"-count", n)
avg := mtr.TotalSumWindowed() / n
if math.IsNaN(avg) || math.IsInf(avg, +1) || math.IsInf(avg, -1) {
avg = 0
}
fn(name+"-avg", avg)
for _, pt := range recordHistogramQuantiles {
fn(name+pt.suffix, mtr.ValueAtQuantileWindowed(pt.quantile))
}

The sum is missing, but the avg is computed already. So this is redundant and can be removed.

The histogram reporting is limited to work that actually waited in the queue. So if a large fraction did not wait in the queue, it will overreport the latency percentiles. We should fix that by reporting everything. I did not do this when writing the first version of this code, because I was worried about lock contention overhead in the histogram and did not have time to investigate how much of a problem it really was.

dhartunian added a commit to dhartunian/cockroach that referenced this pull request Nov 8, 2022
Previously, `AggHistogram` instances would not persist their quantiles to tsdb
due to a missing interface implementation of `metrics.WindowedHistogram`. This
PR adds a trivial implementation that delegates to the aggregate histogram
instance within the struct.

This is relatively safe to do even though an `AggHistogram` could contain many
children because we are only exporting a single set of aggregate quantiles
per-`AggHistogram`. The children are only iterated over via the
`PrometheusIterable` interface which is used by the prometheus exporter, but
not by the metrics recorder.

NOTE: This backported commit includes a test fix to the `kv` package that was
lifted from cockroachdb#88076.

Release note (bug fix, ops change): Previously, certain aggregate histograms
would appear in `_status/vars` but not be available for graphing in the DB
Console. These are now made available. They include changefeed-related
histograms, and row-level-TTL histograms.

Epic: None
@andrewbaptist andrewbaptist deleted the work_queue_stats branch December 15, 2023 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants