Is your feature request related to a problem? Please describe.
Currently, in addition to /_status/vars, /_status/load has been created to aid in auto-scaling in multi-tenant clusters. /_status/load currently contains CPU metrics that are generated at request time, but also requires some metrics from the node metric registry exported by /_status/vars.
Since /_status/vars and /_status/load are pulled at different cadences, this caused situations where we could not precisely determine if load was increasing or not.
As a fix, we added the ability to export select metrics from the registry, instead of an "all-or-nothing" approach (see #79021).
However, do two separate endpoints need to exist? Can we not achieve this with a single /_status/vars endpoint that takes a metric filter as an argument?
Describe the solution you'd like
A way for clients of /_status/vars to provide a metric filter list (via query params?) and for the endpoint handler to properly parse and apply these filters. This allows different types of clients (e.g. the auto-scaler) to consume only the metrics that they need, and cut down on the cost of exporting the entire metrics registry.
It would also decouple the DB release cycle from iterations on the metrics used by the auto-scaler/_status/load endpoint, allowing quicker development.
Describe alternatives you've considered
An alternative would be to keep the endpoints separate as they are today. It's noted that /_status/load generates some CPU metrics at request time. If that doesn't fit /_status/vars well, then it might make more sense to keep them separate so we can have finer-grained control over the metrics that get generated anew with each request.
Jira issue: CRDB-14784
Is your feature request related to a problem? Please describe.
Currently, in addition to
/_status/vars,/_status/loadhas been created to aid in auto-scaling in multi-tenant clusters./_status/loadcurrently contains CPU metrics that are generated at request time, but also requires some metrics from the node metric registry exported by/_status/vars.Since
/_status/varsand/_status/loadare pulled at different cadences, this caused situations where we could not precisely determine if load was increasing or not.As a fix, we added the ability to export select metrics from the registry, instead of an "all-or-nothing" approach (see #79021).
However, do two separate endpoints need to exist? Can we not achieve this with a single
/_status/varsendpoint that takes a metric filter as an argument?Describe the solution you'd like
A way for clients of
/_status/varsto provide a metric filter list (via query params?) and for the endpoint handler to properly parse and apply these filters. This allows different types of clients (e.g. the auto-scaler) to consume only the metrics that they need, and cut down on the cost of exporting the entire metrics registry.It would also decouple the DB release cycle from iterations on the metrics used by the auto-scaler/
_status/loadendpoint, allowing quicker development.Describe alternatives you've considered
An alternative would be to keep the endpoints separate as they are today. It's noted that
/_status/loadgenerates some CPU metrics at request time. If that doesn't fit/_status/varswell, then it might make more sense to keep them separate so we can have finer-grained control over the metrics that get generated anew with each request.Jira issue: CRDB-14784