server/status: add load related metrics by darinpp · Pull Request #79023 · cockroachdb/cockroach

darinpp · 2022-03-30T04:35:12Z

Previously there were only CPU related metrics available on the
_status/load endpoint. For serverless we need in addition to these, the
metrics which show the total number of current sql connections, the
number of sql queries executed and the number of jobs currently running
that are not idle. This PR adds the three new metrics by using selective
prometheus exporter and scraping the MetricsRecorder.

Release justification: Low risk, high reward changes to existing functionality

Release note: None

Will be re-based once #79021 and #79022 are merged.

cockroach-teamcity · 2022-03-30T04:35:21Z

This change is

abarganier · 2022-03-30T15:57:59Z

Similar to 79022, can we add any AlertingRule(s) and/or AggregationRule(s) for this?

darinpp · 2022-03-30T16:29:14Z

Similar to 79022, can we add any AlertingRule(s) and/or AggregationRule(s) for this?

Can you be a bit more specific as to what the rules should be or why do we need to add them? The two of the metrics are not new, they are just being made available through a secondary endpoint. Does that imply a new alerting or aggregation rule?

abarganier

As mentioned in the 79022, we just ask those implementing new metrics to consider it, but it's not mandatory. Thanks for looking into it & considering!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @samiskin)

andy-kimball

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @samiskin)

pkg/jobs/registry.go, line 1314 at r6 (raw file):

			if isIdle {
				r.metrics.RunningNonIdleJobs.Dec(1)
				jm.CurrentlyIdle.Inc(1)

Why do we bother with the CurrentlyIdle metric if what we really need is the RunningNonIdleJobs metric? Shouldn't we get rid of CurrentlyIdle, which was only added for the benefit of the autoscaler to begin with?

darinpp

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @samiskin)

pkg/jobs/registry.go, line 1314 at r6 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Why do we bother with the CurrentlyIdle metric if what we really need is the RunningNonIdleJobs metric? Shouldn't we get rid of CurrentlyIdle, which was only added for the benefit of the autoscaler to begin with?

I wasn't aware that the only reason for these metrics is the autoscaler. I did find it however very useful for debugging purposes to have a breakdown of what actual jobs are running or idle. There isn't any other way to do that as far as I can see. Whatever we decide - we can remove the per job running/idle stats separately. Not as part of this PR

Previously there were only CPU related metrics available on the _status/load endpoint. For serverless we need in addition to these, the metrics which show the total number of current sql connections, the number of sql queries executed and the number of jobs currently running that are not idle. This PR adds the three new metrics by using selective prometheus exporter and scraping the MetricsRecorder. Release justification: Low risk, high reward changes to existing functionality Release note: None

andy-kimball

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @samiskin)

pkg/server/load_endpoint.go, line 47 at r7 (raw file):

	// Exporter for the selected metrics that also show in /_status/vars.
	exporter2 := metric.MakePrometheusExporterForSelectedMetrics(map[string]struct{}{

It's gotten strange to call these "load metrics", since they now include metrics like non-idle jobs that aren't really load metrics.

darinpp

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @samiskin)

pkg/server/load_endpoint.go, line 47 at r7 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

It's gotten strange to call these "load metrics", since they now include metrics like non-idle jobs that aren't really load metrics.

Agree. Will remove the load endpoint #79388

darinpp · 2022-04-05T23:52:51Z

bors r+

craig · 2022-04-06T01:27:25Z

Build succeeded:

GitHub CI (Cockroach)

darinpp requested a review from a team March 30, 2022 04:35

darinpp requested review from a team as code owners March 30, 2022 04:35

darinpp requested review from samiskin and removed request for a team March 30, 2022 04:35

darinpp force-pushed the add-load-related-metrics branch from 3add6d5 to 651f39f Compare March 30, 2022 15:55

abarganier reviewed Apr 1, 2022

View reviewed changes

andy-kimball reviewed Apr 2, 2022

View reviewed changes

darinpp commented Apr 4, 2022

View reviewed changes

darinpp force-pushed the add-load-related-metrics branch from 651f39f to 6fb0768 Compare April 5, 2022 21:29

andy-kimball approved these changes Apr 5, 2022

View reviewed changes

darinpp commented Apr 5, 2022

View reviewed changes

craig bot merged commit fc97617 into cockroachdb:master Apr 6, 2022

darinpp deleted the add-load-related-metrics branch April 6, 2022 03:19

darinpp mentioned this pull request Apr 6, 2022

release-22.1: Adding metrics required by the serverless autoscaler #79519

Merged

tbg mentioned this pull request Jun 22, 2022

roachperf: regression around 2022-04-06 #82136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server/status: add load related metrics#79023

server/status: add load related metrics#79023
craig[bot] merged 1 commit intocockroachdb:masterfrom
darinpp:add-load-related-metrics

darinpp commented Mar 30, 2022

Uh oh!

cockroach-teamcity commented Mar 30, 2022

Uh oh!

abarganier commented Mar 30, 2022

Uh oh!

darinpp commented Mar 30, 2022

Uh oh!

abarganier left a comment

Uh oh!

andy-kimball left a comment

Uh oh!

darinpp left a comment

Uh oh!

andy-kimball left a comment

Uh oh!

darinpp left a comment

Uh oh!

darinpp commented Apr 5, 2022

Uh oh!

craig bot commented Apr 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

darinpp commented Mar 30, 2022

Uh oh!

cockroach-teamcity commented Mar 30, 2022

Uh oh!

abarganier commented Mar 30, 2022

Uh oh!

darinpp commented Mar 30, 2022

Uh oh!

abarganier left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball left a comment

Choose a reason for hiding this comment

Uh oh!

darinpp left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball left a comment

Choose a reason for hiding this comment

Uh oh!

darinpp left a comment

Choose a reason for hiding this comment

Uh oh!

darinpp commented Apr 5, 2022

Uh oh!

craig bot commented Apr 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants