Background
This issue was first detected when trying to stop node and namespace watchers from starting (issue is this one). At first, it seemed that we only had these watchers based on features add_resource_metadata and hints, but having a better look, we were actually starting watchers at many places.
Issue
All state_* metricsets are using this function
|
func Init(name string, mapping *prometheus.MetricsMapping) { |
to start,
except from
state_container and
state_resourcequota.
This function adds a metadata enricher to each metricset by calling NewResourceMetadataEnricher function:
|
enricher: util.NewResourceMetadataEnricher(base, resourceName, mod.GetMetricsRepo(), false), |
Also pod and node metricsets call NewResourceMetadataEnricher to enrich the events with metadata.
This enricher will create 3 watchers:
|
watcher, nodeWatcher, namespaceWatcher := getResourceMetadataWatchers(config, res, client, nodeScope) |
One for the resource of that metricset, one for node and one for namespace.
Since we can have multiple metricsets enabled, then we will also have multiple node and namespace watchers running. One for each metricset.
To solve this, we need to find a way to share these watchers between the metricsets, similar to what we did to fetch metrics from KSM (PR) and Kubelet.
Additionally, from the way we create the resource specific watcher
|
watcher, err := kubernetes.NewNamedWatcher("resource_metadata_enricher", client, resource, options, nil) |
we can see that in case the function was called from
state_node/
state_namespace metricset (
resource is node/namespace), we will end up creating yet another watcher for node/namespace. We need to, at least, add a condition to stop the duplicated watcher from starting.
More on this, we seem to always start the three watchers for all the resources that the NewResourceMetadataEnricher is called by. These watchers are created for metadata enrichment and for some resources the existence of one or some of these watchers is not relevant. For example node watcher is not needed in case we want to enrich deployments or statefulsets.
We need to start only the relevant watchers for each resource.
Possibly this can be solved if we share the watchers between metricsets.
The better handling of watchers initialization will lead to less Kubernetes API calls and possible issues related to that in large scale clusters.
Issues
Relates to elastic/elastic-agent#3801.
Current watchers
These are the current watchers when starting metricbeat with default configurations.
For example, like this.
- type: kubernetes
scope: cluster
node: ${NODE_NAME}
unique: true
templates:
- config:
- module: kubernetes
hosts: ["kube-state-metrics:8080"]
period: 10s
metricsets:
- state_node
- state_deployment
- state_daemonset
- state_replicaset
- state_pod
- state_container
- state_cronjob
- state_job
- state_resourcequota
- state_statefulset
- state_service
- state_persistentvolume
- state_persistentvolumeclaim
- state_storageclass
- state_namespace
For the state_* metricsets grouped as described above:
| Metricset |
Namespace watcher |
Node watcher |
Resource watcher |
| State cronjob |
✓ |
✓ |
✓ |
| State daemonset |
✓ |
✓ |
✓ |
| State deployment |
✓ |
✓ |
✓ |
| State job |
✓ |
✓ |
✓ |
| State namespace |
✓ |
✓ |
✓ |
| State node |
✓ |
✓ |
✓ |
| State PV |
✓ |
✓ |
✓ |
| State PVC |
✓ |
✓ |
✓ |
| State pod |
✓ |
✓ |
✓ |
| State replicaset |
✓ |
✓ |
✓ |
| State service |
✓ |
✓ |
✓ |
| State statefulset |
✓ |
✓ |
✓ |
| State storage class |
✓ |
✓ |
✓ |
| Total |
14 |
14 |
- |
Note: It is 14 and not 13, because the resource watcher for namespace adds 1 to the namespace watcher, and same happens for node watcher.
Note 2: state pod can also create 2 more watchers if we add resource_metadata for job and deployments (disabled by default).
For all the other metricsets:
| Metricset |
Namespace watcher |
Node watcher |
Resource watcher |
Source |
| API Server |
✕ |
✕ |
✕ |
|
| Container |
✓ |
✓ |
✓ |
NewContainerMetadataEnricher |
| Controller manager |
✕ |
✕ |
✕ |
|
| Event |
✕ |
✕ |
✓ |
When creating metricset |
| Node |
✓ |
✓ |
✓ |
NewResourceMetadataEnricher |
| Pod |
✓ |
✓ |
✓ |
NewResourceMetadataEnricher |
| Proxy |
✕ |
✕ |
✕ |
|
| Scheduler |
✕ |
✕ |
✕ |
|
| State container |
✓ |
✓ |
✓ |
NewContainerMetadataEnricher |
| State resource quota |
✕ |
✕ |
✕ |
|
| System |
✕ |
✕ |
✕ |
|
| Volume |
✕ |
✕ |
✕ |
|
Note: Container and state container can also create 2 more watchers if we add resource_metadata for job and deployments (disabled by default).
Expected watchers
Watchers needed for each metricset by default (without counting add_resource_metadata.deployment/cronjob):
| Metricset |
Namespace watcher |
Node watcher |
Resource watcher |
Notes |
| API Server |
✕ |
✕ |
✕ |
|
| Container |
✓ |
✓ |
✓ |
|
| Controller manager |
✕ |
✕ |
✓ |
|
| Event |
✓ |
✕ |
✓ |
|
| Node |
✕ |
✓ |
✓ |
Resource watcher should be the same as node watcher. |
| Pod |
✓ |
✓ |
✓ |
|
| Proxy |
✕ |
✕ |
✕ |
|
| Scheduler |
✕ |
✕ |
✕ |
|
| State container |
✓ |
✓ |
✓ |
|
| State cronjob |
✓ |
✕ |
✓ |
|
| State daemonset |
✓ |
✕ |
✓ |
|
| State deployment |
✓ |
✕ |
✓ |
|
| State job |
✓ |
✕ |
✓ |
|
| State namespace |
✓ |
✕ |
✓ |
Resource watcher should be the same as namespace watcher. |
| State node |
✕ |
✓ |
✓ |
Resource watcher should be the same as node watcher. |
| State PV |
✕ |
✕ |
✓ |
|
| State PVC |
✓ |
✕ |
✓ |
|
| State pod |
✓ |
✓ |
✓ |
|
| State replicaset |
✓ |
✕ |
✓ |
|
| State resource quota |
✓ |
✕ |
✓ |
|
| State service |
✓ |
✕ |
✓ |
|
| State statefulset |
✓ |
✕ |
✓ |
|
| State storage class |
✕ |
✕ |
✓ |
|
| System |
✕ |
✕ |
✕ |
|
| Volume |
✕ |
✕ |
✕ |
|
Checks
PRs
Background
This issue was first detected when trying to stop node and namespace watchers from starting (issue is this one). At first, it seemed that we only had these watchers based on features
add_resource_metadataandhints, but having a better look, we were actually starting watchers at many places.Issue
All
state_*metricsets are using this functionbeats/metricbeat/helper/kubernetes/state_metricset.go
Line 46 in d9139c9
state_containerandstate_resourcequota.This function adds a metadata enricher to each metricset by calling
NewResourceMetadataEnricherfunction:beats/metricbeat/helper/kubernetes/state_metricset.go
Line 91 in d9139c9
Also pod and node metricsets call NewResourceMetadataEnricher to enrich the events with metadata.
This enricher will create 3 watchers:
beats/metricbeat/module/kubernetes/util/kubernetes.go
Line 158 in d9139c9
One for the resource of that metricset, one for node and one for namespace.
Since we can have multiple metricsets enabled, then we will also have multiple node and namespace watchers running. One for each metricset.
To solve this, we need to find a way to share these watchers between the metricsets, similar to what we did to fetch metrics from KSM (PR) and Kubelet.
Additionally, from the way we create the resource specific watcher
beats/metricbeat/module/kubernetes/util/kubernetes.go
Line 454 in d9139c9
state_node/state_namespacemetricset (resource is node/namespace), we will end up creating yet another watcher for node/namespace. We need to, at least, add a condition to stop the duplicated watcher from starting.More on this, we seem to always start the three watchers for all the resources that the
NewResourceMetadataEnricheris called by. These watchers are created for metadata enrichment and for some resources the existence of one or some of these watchers is not relevant. For example node watcher is not needed in case we want to enrich deployments or statefulsets.We need to start only the relevant watchers for each resource.
Possibly this can be solved if we share the watchers between metricsets.
The better handling of watchers initialization will lead to less Kubernetes API calls and possible issues related to that in large scale clusters.
Issues
Relates to elastic/elastic-agent#3801.
Current watchers
These are the current watchers when starting metricbeat with default configurations.
For example, like this.
For the
state_*metricsets grouped as described above:For all the other metricsets:
Expected watchers
Watchers needed for each metricset by default (without counting
add_resource_metadata.deployment/cronjob):Checks
PRs