Skip to content

Unbounded valuesets for metric labels #76302

@logicalhan

Description

@logicalhan

Currently, we have a number of metrics in the kube-apiserver which will likely become problematic in the near-ish future. Specifically, metrics which currently have 'resource' as a label are unbounded. In large part, this is due to the emergence of CRDs, which has had the effect of unbounding the cardinality of resources (and sub-resources) served by the kube-apiserver. This dimension was previously bound to control-plane specific critical resources (e.g. pods, nodes, namespaces).

The effect of this is that issues like #69540 will become increasingly likely since we cannot control the number of time-series generated by a given metric (these metrics effectively become memory leaks). While that issue was resolved (temporarily) by #69895, this is actually only a mitigation, since it does not solve the root of the problem (i.e. CRDs can cause unbounded metric growth).

Since we still use resource/sub-resource in apiserver request metrics (e.g. apiserver_request_latencies, apiserver_response_sizes, apiserver_request_duration_seconds, apiserver_longrunning_gauge, apiserver_request_count), and since we are still adding actively adding labels to these metrics, it is likely only a matter of time before this becomes an issue again, since values for labels have a multiplicative effect on metric size.

/sig api-machinery instrumentation

Metadata

Metadata

Labels

kind/bugCategorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.sig/instrumentationCategorizes an issue or PR as relevant to SIG Instrumentation.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions