Skip to content

[Metricbeat][Kubernetes] Refactor watchers used in Kubernetes metricsets for metadata enrichment #37243

@constanca-m

Description

@constanca-m

Background

This issue was first detected when trying to stop node and namespace watchers from starting (issue is this one). At first, it seemed that we only had these watchers based on features add_resource_metadata and hints, but having a better look, we were actually starting watchers at many places.

Issue

All state_* metricsets are using this function

func Init(name string, mapping *prometheus.MetricsMapping) {
to start, except from state_container and state_resourcequota.

This function adds a metadata enricher to each metricset by calling NewResourceMetadataEnricher function:

enricher: util.NewResourceMetadataEnricher(base, resourceName, mod.GetMetricsRepo(), false),

Also pod and node metricsets call NewResourceMetadataEnricher to enrich the events with metadata.

This enricher will create 3 watchers:

watcher, nodeWatcher, namespaceWatcher := getResourceMetadataWatchers(config, res, client, nodeScope)

One for the resource of that metricset, one for node and one for namespace.

Since we can have multiple metricsets enabled, then we will also have multiple node and namespace watchers running. One for each metricset.

To solve this, we need to find a way to share these watchers between the metricsets, similar to what we did to fetch metrics from KSM (PR) and Kubelet.

Additionally, from the way we create the resource specific watcher

watcher, err := kubernetes.NewNamedWatcher("resource_metadata_enricher", client, resource, options, nil)
we can see that in case the function was called from state_node/state_namespace metricset (resource is node/namespace), we will end up creating yet another watcher for node/namespace. We need to, at least, add a condition to stop the duplicated watcher from starting.

More on this, we seem to always start the three watchers for all the resources that the NewResourceMetadataEnricher is called by. These watchers are created for metadata enrichment and for some resources the existence of one or some of these watchers is not relevant. For example node watcher is not needed in case we want to enrich deployments or statefulsets.
We need to start only the relevant watchers for each resource.

Possibly this can be solved if we share the watchers between metricsets.

The better handling of watchers initialization will lead to less Kubernetes API calls and possible issues related to that in large scale clusters.

Issues

Relates to elastic/elastic-agent#3801.

Current watchers

These are the current watchers when starting metricbeat with default configurations.

For example, like this.
- type: kubernetes
  scope: cluster
  node: ${NODE_NAME}
  unique: true
  templates:
    - config:
        - module: kubernetes
          hosts: ["kube-state-metrics:8080"]
          period: 10s
          metricsets:
            - state_node
            - state_deployment
            - state_daemonset
            - state_replicaset
            - state_pod
            - state_container
            - state_cronjob
            - state_job
            - state_resourcequota
            - state_statefulset
            - state_service
            - state_persistentvolume
            - state_persistentvolumeclaim
            - state_storageclass
            - state_namespace

For the state_* metricsets grouped as described above:

Metricset Namespace watcher Node watcher Resource watcher
State cronjob
State daemonset
State deployment
State job
State namespace
State node
State PV
State PVC
State pod
State replicaset
State service
State statefulset
State storage class
Total 14 14 -

Note: It is 14 and not 13, because the resource watcher for namespace adds 1 to the namespace watcher, and same happens for node watcher.

Note 2: state pod can also create 2 more watchers if we add resource_metadata for job and deployments (disabled by default).

For all the other metricsets:

Metricset Namespace watcher Node watcher Resource watcher Source
API Server
Container NewContainerMetadataEnricher
Controller manager
Event When creating metricset
Node NewResourceMetadataEnricher
Pod NewResourceMetadataEnricher
Proxy
Scheduler
State container NewContainerMetadataEnricher
State resource quota
System
Volume

Note: Container and state container can also create 2 more watchers if we add resource_metadata for job and deployments (disabled by default).

Expected watchers

Watchers needed for each metricset by default (without counting add_resource_metadata.deployment/cronjob):

Metricset Namespace watcher Node watcher Resource watcher Notes
API Server
Container
Controller manager
Event
Node Resource watcher should be the same as node watcher.
Pod
Proxy
Scheduler
State container
State cronjob
State daemonset
State deployment
State job
State namespace Resource watcher should be the same as namespace watcher.
State node Resource watcher should be the same as node watcher.
State PV
State PVC
State pod
State replicaset
State resource quota
State service
State statefulset
State storage class
System
Volume

Checks

  • We don't start watchers based on the resource at the same time we start watchers for node/namespace, without adding a condition to avoid duplicates.
  • We don't have multiple watchers for the same resource. Instead we need to find a way to have them shared.
  • We only start watchers that are needed based on either the resource or the configuration.

PRs

### Tasks

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions