Skip to content

CPU high usage, reconciliation triggered by ConfigMap #7873

@crazymushrooms

Description

@crazymushrooms

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

With the upgrade to prometheus-operator v0.85.0 we face the drastically increased CPU usage of prometheus-operator.

Image

Using metric prometheus_operator_triggered_total{triggered_by="ConfigMap"} We observe that ConfigMap objects obviously trigger the reconciliation in "alertmanager" and "prometheus"
We have about 30 ConfigMap objects in the release namespace however only a couple of them are related to the prometheus-operator. However according to the "prometheus_operator_triggered_total{triggered_by="ConfigMap"}" the number of triggering ConfigMaps is also about 30.

The attribute watch_referenced_objects_in_all_namespaces is not set (default: false).

The operator is deployed with the following parameters:

    - --kubelet-service=kube-system/kube-prometheus-stack-kubelet
    - --kubelet-endpoints=true
    - --kubelet-endpointslice=false
    - --log-format=json
    - --log-level=info
    - --namespaces=ops,kube-system
    - --localhost=127.0.0.1
    - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.85.0
    - --config-reloader-cpu-request=5m
    - --config-reloader-cpu-limit=0
    - --config-reloader-memory-request=50Mi
    - --config-reloader-memory-limit=150Mi
    - --alertmanager-instance-namespaces=foo
    - --alertmanager-config-namespaces=foo
    - --prometheus-instance-namespaces=foo
    - --thanos-default-base-image=quay.io/thanos/thanos:v0.39.2
    - --thanos-ruler-instance-namespaces=foo
    - --secret-field-selector=type!=kubernetes.io/dockercfg,type!=kubernetes.io/service-account-token,type!=helm.sh/release.v1
    - --web.enable-tls=true
    - --web.cert-file=/cert/cert
    - --web.key-file=/cert/key
    - --web.listen-address=:10250
    - --web.tls-min-version=VersionTLS13

For now the watched Secret objects can be filtered with secretFieldSelector, as referenced here

options.FieldSelector = config.SecretListWatchFieldSelector.String()

However there is no such option for the ConfigMaps.
It seems that all the ConfigMap objects in the namespace are watched.

If there is additional configuration we should apply and have overseen by now in order to filter the ConfigMaps objects - it would be great to get help here.

Steps to Reproduce

  • deploy prometheus-operator in a version < 0.85.0
  • deploy ConfigMaps in the release namespace
  • upgrade prometheus-operator to the version v0.85.0

Expected Result

  • reasonable variations in the CPU usage

Actual Result

  • drastic increase in the CPU usage (~2m -> ~100m)

Prometheus Operator Version

0.85.0

Kubernetes Version

v1.31.11

Kubernetes Cluster Type

EKS

How did you deploy Prometheus-Operator?

helm chart:prometheus-community/kube-prometheus-stack

Manifests

prometheus-operator log output

no meaningful logs for this issue

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions