Make Autodiscover more selective when watching K8s events

### Setup

Observed in Filebeat 8.5.3 running in Azure Kubernetes `v1.24.6` with the following autodiscover configuration:

<details>

```
filebeat:
  autodiscover:
    providers:
    - templates:
      - condition:
          or:
          - equals:
              kubernetes:
                container:
                  name: elasticsearch
          - equals:
              kubernetes:
                container:
                  name: kibana
        config:
        - paths:
          - /var/log/pods/${data.kubernetes.namespace}_${data.kubernetes.pod.name}_${data.kubernetes.pod.uid}/${data.kubernetes.container.name}/*.log
          processors:
          - decode_json_fields:
              add_error_key: true
              fields:
              - message
              max_depth: 1
              overwrite_keys: true
              process_array: false
              target: ""
          type: container
      type: kubernetes
```

</details>

### Context

While addressing https://github.com/elastic/beats/issues/23139 Node and Namespace watchers were added in Autodicover to make Node/Namespace metadata changes instantly available to the related Pods. Currently, when a Node modification is received via a watcher, all Pods located on this Node are retrieved from a watcher store ([src](https://github.com/elastic/elastic-agent-autodiscover/blob/v0.3.0/kubernetes/eventhandler.go#L202)) and stop and start Pod events are emitted ([src](https://github.com/elastic/beats/blob/v8.5.3/libbeat/autodiscover/providers/kubernetes/pod.go#L158)). A single Pod can emit multiple start and stop events ([src](https://github.com/elastic/beats/blob/v8.5.3/libbeat/autodiscover/providers/kubernetes/pod.go#L278)). This happens regardless of the nature of the change of the Node object.

In the environment specified in `Setup` section, it was observed that Node objects can be updated as frequently as **every 10 seconds**. Filebeat logs do not allow to determine what exactly was received via watch APIs, but this can be verified as follows:

```
kubectl proxy &
stdbuf -oL curl -s '127.0.0.1:8001/api/v1/namespaces?allowWatchBookmarks=true&watch=true&pretty=false' > namespaces.log &
stdbuf -oL curl -s '127.0.0.1:8001/api/v1/nodes?allowWatchBookmarks=true&watch=true&pretty=false' > nodes.log &
stdbuf -oL curl -s '127.0.0.1:8001/api/v1/pods?allowWatchBookmarks=true&watch=true&pretty=false' > pods.log &
```

Node objects were modified in `.status.conditions[]` array which is related to Node monitoring process. The only field modified was `lastHeartbeatTime`. Here's an example of message received via watch API:

<details>

```
{
    "type": "MODIFIED",
    "object": {
        "kind": "Node",
        "apiVersion": "v1",
        "metadata": {
            "name": "<REDACTED>",  ...
        },
        "spec": { ...
        },
        "status": {
            "capacity": { ...
            },
            "allocatable": { ...
            },
            "conditions": [
                {
                    "type": "FilesystemCorruptionProblem",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-09T07:41:54Z",
                    "reason": "FilesystemIsOK",
                    "message": "Filesystem is healthy"
                },
                {
                    "type": "ContainerRuntimeProblem",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-10T20:37:25Z",
                    "reason": "ContainerRuntimeIsUp",
                    "message": "container runtime service is up"
                },
                {
                    "type": "KubeletProblem",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-10T20:37:25Z",
                    "reason": "KubeletIsUp",
                    "message": "kubelet service is up"
                },
                {
                    "type": "FrequentDockerRestart",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-09T07:41:54Z",
                    "reason": "NoFrequentDockerRestart",
                    "message": "docker is functioning properly"
                },
                {
                    "type": "FrequentUnregisterNetDevice",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-09T07:41:54Z",
                    "reason": "NoFrequentUnregisterNetDevice",
                    "message": "node is functioning properly"
                },
                {
                    "type": "VMEventScheduled",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-10T20:37:45Z",
                    "reason": "NoVMEventScheduled",
                    "message": "VM has no scheduled event"
                },
                {
                    "type": "ReadonlyFilesystem",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-09T07:41:54Z",
                    "reason": "FilesystemIsNotReadOnly",
                    "message": "Filesystem is not read-only"
                },
                {
                    "type": "FrequentContainerdRestart",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-09T07:41:54Z",
                    "reason": "NoFrequentContainerdRestart",
                    "message": "containerd is functioning properly"
                },
                {
                    "type": "KernelDeadlock",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-09T07:41:54Z",
                    "reason": "KernelHasNoDeadlock",
                    "message": "kernel has no deadlock"
                },
                {
                    "type": "FrequentKubeletRestart",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:10:59Z", <--- HERE
                    "lastTransitionTime": "2023-02-09T07:41:54Z",
                    "reason": "NoFrequentKubeletRestart",
                    "message": "kubelet is functioning properly"
                },
                {
                    "type": "MemoryPressure",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:15:15Z", <--- HERE
                    "lastTransitionTime": "2023-02-10T20:38:18Z",
                    "reason": "KubeletHasSufficientMemory",
                    "message": "kubelet has sufficient memory available"
                },
                {
                    "type": "DiskPressure",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:15:15Z", <--- HERE
                    "lastTransitionTime": "2023-02-10T20:38:18Z",
                    "reason": "KubeletHasNoDiskPressure",
                    "message": "kubelet has no disk pressure"
                },
                {
                    "type": "PIDPressure",
                    "status": "False",
                    "lastHeartbeatTime": "2023-02-27T08:15:15Z", <--- HERE
                    "lastTransitionTime": "2023-02-10T20:38:18Z",
                    "reason": "KubeletHasSufficientPID",
                    "message": "kubelet has sufficient PID available"
                },
                {
                    "type": "Ready",
                    "status": "True",
                    "lastHeartbeatTime": "2023-02-27T08:15:15Z", <--- HERE
                    "lastTransitionTime": "2023-02-10T20:38:18Z",
                    "reason": "KubeletReady",
                    "message": "kubelet is posting ready status. AppArmor enabled"
                }
            ],
            "addresses": [ ...
            ],
            "daemonEndpoints": { ...
            },
            "nodeInfo": { ...
            },
            "images": [ ...
            ],
            "volumesInUse": [ ...
            ],
            "volumesAttached": [ ...
            ]
        }
    }
}
```

</details>

**As Node metadata is not modified, the entire configuration reload process that follows is unnecessary.**

Here's an excerpt from Filebeat debug log that shows that every 10s a sequence of 4 stop and 4 start events is emitted, just for a single Pod (there are many Pods on a single Node, so this is multiplied):

```
2023-02-14T12:03:02.406Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.406Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.407Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.407Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.604Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.604Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.605Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.607Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:12.091Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:12.092Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:12.093Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:12.093Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:12.099Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:12.100Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:12.101Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:12.102Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:22.415Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:22.415Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:22.416Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:22.417Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:22.715Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:22.716Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:22.717Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:22.718Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
```

With the above template configuration most of these start/stop should be ignored, but not events for Kibana Pods. Here is an interesting debug log excerpt which shows a sequence of stop and start events that lead to a runner error potentially caused by `log` input race condition (see https://github.com/elastic/beats/issues/34388#issuecomment-1439816785). Once such runner error occurs Autodiscover worker initiates a configuration reload on _every_ received event ([src](https://github.com/elastic/beats/blob/v8.5.3/libbeat/autodiscover/autodiscover.go#L133)).

```
2023-02-14T12:03:02.406Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.406Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.406Z                       cfgfile/list.go  64      Starting reload procedure, current runners: 3   
2023-02-14T12:03:02.406Z                       cfgfile/list.go  82      Start list: 0, Stop list: 1     
2023-02-14T12:03:02.407Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.407Z                       cfgfile/list.go  64      Starting reload procedure, current runners: 2   
2023-02-14T12:03:02.407Z                       cfgfile/list.go  82      Start list: 0, Stop list: 1     
2023-02-14T12:03:02.407Z          autodiscover/autodiscover.go  267     Got a stop event.       esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.407Z                       cfgfile/list.go  64      Starting reload procedure, current runners: 1   
2023-02-14T12:03:02.407Z                       cfgfile/list.go  82      Start list: 0, Stop list: 1     
2023-02-14T12:03:02.604Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.604Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.605Z                       cfgfile/list.go  64      Starting reload procedure, current runners: 0   
2023-02-14T12:03:02.605Z                       cfgfile/list.go  82      Start list: 1, Stop list: 0     
2023-02-14T12:03:02.605Z                       cfgfile/list.go  107     Error creating runner from config: Can only start an input when all related states are finished: {Id: native::4927659-2049, Finished: false, Fileinfo: &{0.log 31895141 416 {696363713 63811972980 0xaaaad7612520} {2049 4927659 33184 1 0 0 0 0 31895141 4096 0 62304 {1676357648 659620958} {1676376180 696363713} {1676376180 696363713} [0 0]}}, Source: /var/log/pods/elastic_esdevazk8swe1-3-kb-84795bfcd6-cd984_340bb25a-a329-4d4b-b934-e8605c3d4ee8/kibana/0.log, Offset: 29901889, Timestamp: 2023-02-14 12:03:01.723335032 +0000 UTC m=+149.690897691, TTL: -1ns, Type: container, Meta: map[], FileStateOS: 4927659-2049}    
2023-02-14T12:03:02.605Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.606Z          autodiscover/autodiscover.go  156     Reloading existing autodiscover configs after error     
2023-02-14T12:03:02.606Z                       cfgfile/list.go  64      Starting reload procedure, current runners: 0   
2023-02-14T12:03:02.606Z                       cfgfile/list.go  82      Start list: 2, Stop list: 0     
2023-02-14T12:03:02.607Z                       cfgfile/list.go  107     Error creating runner from config: Can only start an input when all related states are finished: {Id: native::4927659-2049, Finished: false, Fileinfo: &{0.log 31895141 416 {696363713 63811972980 0xaaaad7612520} {2049 4927659 33184 1 0 0 0 0 31895141 4096 0 62304 {1676357648 659620958} {1676376180 696363713} {1676376180 696363713} [0 0]}}, Source: /var/log/pods/elastic_esdevazk8swe1-3-kb-84795bfcd6-cd984_340bb25a-a329-4d4b-b934-e8605c3d4ee8/kibana/0.log, Offset: 29901889, Timestamp: 2023-02-14 12:03:01.723335032 +0000 UTC m=+149.690897691, TTL: -1ns, Type: container, Meta: map[], FileStateOS: 4927659-2049}    
2023-02-14T12:03:02.607Z          autodiscover/autodiscover.go  182     Got a start event.      esdevazk8swe1-3-kb-84795bfcd6-cd984
2023-02-14T12:03:02.608Z          autodiscover/autodiscover.go  156     Reloading existing autodiscover configs after error     
2023-02-14T12:03:02.608Z                       cfgfile/list.go  64      Starting reload procedure, current runners: 1   
2023-02-14T12:03:02.608Z                       cfgfile/list.go  82      Start list: 2, Stop list: 0     
2023-02-14T12:03:02.609Z                       cfgfile/list.go  107     Error creating runner from config: Can only start an input when all related states are finished: {Id: native::4927659-2049, Finished: false, Fileinfo: &{0.log 31895141 416 {696363713 63811972980 0xaaaad7612520} {2049 4927659 33184 1 0 0 0 0 31895141 4096 0 62304 {1676357648 659620958} {1676376180 696363713} {1676376180 696363713} [0 0]}}, Source: /var/log/pods/elastic_esdevazk8swe1-3-kb-84795bfcd6-cd984_340bb25a-a329-4d4b-b934-e8605c3d4ee8/kibana/0.log, Offset: 29901889, Timestamp: 2023-02-14 12:03:01.723335032 +0000 UTC m=+149.690897691, TTL: -1ns, Type: container, Meta: map[], FileStateOS: 4927659-2049}    
2023-02-14T12:03:02.609Z          autodiscover/autodiscover.go  267     Got a stop event.       aad-pod-identity-nmi-5jm82
2023-02-14T12:03:02.609Z          autodiscover/autodiscover.go  156     Reloading existing autodiscover configs after error <--- HERE 
2023-02-14T12:03:02.609Z                       cfgfile/list.go  64      Starting reload procedure, current runners: 2   
2023-02-14T12:03:02.609Z                       cfgfile/list.go  82      Start list: 1, Stop list: 0     
2023-02-14T12:03:02.610Z                       cfgfile/list.go  107     Error creating runner from config: Can only start an input when all related states are finished: {Id: native::4927659-2049, Finished: false, Fileinfo: &{0.log 31895141 416 {696363713 63811972980 0xaaaad7612520} {2049 4927659 33184 1 0 0 0 0 31895141 4096 0 62304 {1676357648 659620958} {1676376180 696363713} {1676376180 696363713} [0 0]}}, Source: /var/log/pods/elastic_esdevazk8swe1-3-kb-84795bfcd6-cd984_340bb25a-a329-4d4b-b934-e8605c3d4ee8/kibana/0.log, Offset: 29901889, Timestamp: 2023-02-14 12:03:01.723335032 +0000 UTC m=+149.690897691, TTL: -1ns, Type: container, Meta: map[], FileS
tateOS: 4927659-2049}    
2023-02-14T12:03:02.610Z          autodiscover/autodiscover.go  267     Got a stop event.       aad-pod-identity-nmi-5jm82
2023-02-14T12:03:02.610Z          autodiscover/autodiscover.go  156     Reloading existing autodiscover configs after error <--- HERE
```

Ultimately, the following frequency of configuration reloads was observed:
```
   1 2023-02-14T12:03:01
  66 2023-02-14T12:03:02
 113 2023-02-14T12:03:12 <--- HERE
  27 2023-02-14T12:03:13
   6 2023-02-14T12:03:22
   5 2023-02-14T12:03:32
  79 2023-02-14T12:03:33
   1 2023-02-14T12:03:42
  28 2023-02-14T12:03:43
   1 2023-02-14T12:03:52
  66 2023-02-14T12:03:53
...
```

### Enhancement request

Autodiscover should filter Node and Namespace object modifications that are not changing its metadata.

### Workarounds

If `hints` are not used, Node and Namespace watchers can be disabled entirely ([src](https://github.com/elastic/beats/blob/v8.5.3/libbeat/autodiscover/providers/kubernetes/pod.go#L128-L136)) with the following configuration:

```
filebeat:
  autodiscover:
    providers:
    - add_resource_metadata:
        namespace:
          enabled: false
        node:
          enabled: false
      hints.enabled: false
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Autodiscover more selective when watching K8s events #34717

Setup

Context

Enhancement request

Workarounds

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Make Autodiscover more selective when watching K8s events #34717

Description

Setup

Context

Enhancement request

Workarounds

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions