server/status: add load related metrics#79023
Conversation
3add6d5 to
651f39f
Compare
|
Similar to 79022, can we add any AlertingRule(s) and/or AggregationRule(s) for this? |
Can you be a bit more specific as to what the rules should be or why do we need to add them? The two of the metrics are not new, they are just being made available through a secondary endpoint. Does that imply a new alerting or aggregation rule? |
abarganier
left a comment
There was a problem hiding this comment.
As mentioned in the 79022, we just ask those implementing new metrics to consider it, but it's not mandatory. Thanks for looking into it & considering!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @samiskin)
andy-kimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @samiskin)
pkg/jobs/registry.go, line 1314 at r6 (raw file):
if isIdle { r.metrics.RunningNonIdleJobs.Dec(1) jm.CurrentlyIdle.Inc(1)
Why do we bother with the CurrentlyIdle metric if what we really need is the RunningNonIdleJobs metric? Shouldn't we get rid of CurrentlyIdle, which was only added for the benefit of the autoscaler to begin with?
darinpp
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @samiskin)
pkg/jobs/registry.go, line 1314 at r6 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Why do we bother with the
CurrentlyIdlemetric if what we really need is theRunningNonIdleJobsmetric? Shouldn't we get rid ofCurrentlyIdle, which was only added for the benefit of the autoscaler to begin with?
I wasn't aware that the only reason for these metrics is the autoscaler. I did find it however very useful for debugging purposes to have a breakdown of what actual jobs are running or idle. There isn't any other way to do that as far as I can see. Whatever we decide - we can remove the per job running/idle stats separately. Not as part of this PR
Previously there were only CPU related metrics available on the _status/load endpoint. For serverless we need in addition to these, the metrics which show the total number of current sql connections, the number of sql queries executed and the number of jobs currently running that are not idle. This PR adds the three new metrics by using selective prometheus exporter and scraping the MetricsRecorder. Release justification: Low risk, high reward changes to existing functionality Release note: None
651f39f to
6fb0768
Compare
andy-kimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @samiskin)
pkg/server/load_endpoint.go, line 47 at r7 (raw file):
// Exporter for the selected metrics that also show in /_status/vars. exporter2 := metric.MakePrometheusExporterForSelectedMetrics(map[string]struct{}{
It's gotten strange to call these "load metrics", since they now include metrics like non-idle jobs that aren't really load metrics.
darinpp
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @samiskin)
pkg/server/load_endpoint.go, line 47 at r7 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
It's gotten strange to call these "load metrics", since they now include metrics like non-idle jobs that aren't really load metrics.
Agree. Will remove the load endpoint #79388
|
bors r+ |
|
Build succeeded: |
Previously there were only CPU related metrics available on the
_status/load endpoint. For serverless we need in addition to these, the
metrics which show the total number of current sql connections, the
number of sql queries executed and the number of jobs currently running
that are not idle. This PR adds the three new metrics by using selective
prometheus exporter and scraping the MetricsRecorder.
Release justification: Low risk, high reward changes to existing functionality
Release note: None
Will be re-based once #79021 and #79022 are merged.