Skip to content

[Feature]: decouple metrics scraping from metrics recomputation #4962

@jsilvela

Description

@jsilvela

Is there an existing issue already for this feature request/idea?

  • I have searched for an existing issue, and could not find anything. I believe this is a new feature request to be evaluated.

What problem is this feature going to solve? Why should it be added?

Several metrics, including user defined metrics, are run each time the prometheus metrics endpoint 9187, is scraped. Depending on the configuration of Prometheus, and even perhaps just having some other process doing a GET on :9187/metrics, those SQL queries might run often.
If a user included some long running queries for monitoring, this could be a CPU hog and even pose a threat to the normal functioning of their cluster.

Describe the solution you'd like

Generally, to respond to a GET on the metrics port, we should simply return the current values of the various prometheus counters, histograms etc.
A few select metrics might warrant a refresh on each scrape.

The recomputation of metrics should happen independently of scrapes.
The refresh interval should be configurable, and if not set default to something reasonable, e.g. 30 seconds, which seems the current default scraping interval for the Prometheus operator

type MonitoringConfiguration struct {
	<... snipped ...>
	RefreshInterval *metav1.Duration `json:"refreshInterval,omitempty"`

For the purpose of decoupling scraping from updating of metrics, it is sufficient to have a goroutine rerunning the metrics queries and creating Prometheus metrics, with the above refresh interval.

A future PR/Issue could tackle having query-dependent refresh intervals, so that very expensive queries could be monitored, while keeping a normal frequency with the smaller queries.

Describe alternatives you've considered

The prometheus HTTP library has a parameter MaxRequestsInFlight which, if set will respond with an HTTP 503 if exceeded.
That could be a very low effort low resolution safety mechanism.

Additional context

No response

Backport?

Yes

Are you willing to actively contribute to this feature?

Yes

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions