As @mkirkevo already observed, sometimes Filebeat doesn't collect some of the logs when the pod is going in Terminating state. I tried to reproduce it and I think I found at least one case in which this can happen. In that case, a random pod is handling the SIGTERM when someone deletes the pod, it sleeps for some time and in the end creates another log entry. As you can see in my findings, the logs after the SIGTERM was triggered never reach our backend(in that case Elasticsearch), but we can still see the logs using kubectl. As far as I can tell, the Harvester is not deleted prematurely.
It might be related to #14259 but I'm not sure as I don't know the codebase.
Running configuration:
filebeat.autodiscover:
providers:
- type: kubernetes
hints.enabled: true
templates:
- condition:
regexp:
kubernetes.pod.name: ".*"
config:
- type: container
paths:
- /var/lib/docker/containers/${data.kubernetes.container.id}/*.log
fields:
kubernetes.cluster.type: eks
fields_under_root: true
Steps to reproduce:
In order to reproduce you need a running Kubernetes cluster with Filebeat up and running using the configuration above(please also create an output, we have Elasticsearch as a backend as I already mentioned).
- Apply this:
---
apiVersion: v1
kind: Pod
metadata:
name: test-pod
labels:
app: test
spec:
containers:
- name: test
image: ubuntu:latest
command:
- bash
- -c
- |
#!/bin/bash
trap "{ echo '$(date): SIGTERM triggered'; sleep 5;echo '$(date): Bye bye'; sleep 2; echo '$(date): bye'; exit 1; }" SIGINT SIGTERM
echo "$(date): started the process"
while :
do
echo "$(date): sleeping 5 seconds"
sleep 5
done
This is just a long running pod that will not be killed immediately after you do kubectl delete pod test-pod.
2. Check the logs of this pod using kubectl logs -f test-pod in a new shell.
3. Kill the pod by running kubectl delete pod test-pod.
4. Check your backend(depending on what you have set in the filebeat configuration).
You are gonna see something like this(please ignore the wrong timestamps after SIGTERM triggered, I didn't invest much time to figure out how to provide a good date to the trap):
$ kubectl logs -f test-pod
Wed Apr 1 12:03:30 UTC 2020: started the process
Wed Apr 1 12:03:30 UTC 2020: sleeping 5 seconds
Wed Apr 1 12:03:35 UTC 2020: sleeping 5 seconds
Wed Apr 1 12:03:40 UTC 2020: sleeping 5 seconds
Wed Apr 1 12:03:45 UTC 2020: sleeping 5 seconds
Wed Apr 1 12:03:30 UTC 2020: SIGTERM triggered
Wed Apr 1 12:03:30 UTC 2020: Bye bye
Wed Apr 1 12:03:30 UTC 2020: bye
rpc error: code = Unknown desc = Error: No such container: d5bfd58d4f473a991f88b016079a75c9f96c4bcf0117a88798d641dfc3732c91%

In the meantime, I've checked the Filebeat logs and it seems that the harvester is terminated after the pod has finished completely. You can see that the last message arrived at 14:03:50.853 and if you add 5+2 seconds from the trap mechanism you end up close to the timestamp the file was removed.
2020-04-01T12:03:31.996Z INFO log/input.go:152 Configured paths: [/var/lib/docker/containers/d5bfd58d4f473a991f88b016079a75c9f96c4bcf0117a88798d641dfc3732c91/*.log]
2020-04-01T12:03:31.996Z INFO input/input.go:114 Starting input of type: container; ID: 809693904145508800
2020-04-01T12:03:31.997Z INFO log/harvester.go:297 Harvester started for file: /var/lib/docker/containers/d5bfd58d4f473a991f88b016079a75c9f96c4bcf0117a88798d641dfc3732c91/d5bfd58d4f473a991f88b016079a75c9f96c4bcf0117a88798d641dfc3732c91-json.log
2020-04-01T12:03:59.000Z INFO log/harvester.go:316 File was removed: /var/lib/docker/containers/d5bfd58d4f473a991f88b016079a75c9f96c4bcf0117a88798d641dfc3732c91/d5bfd58d4f473a991f88b016079a75c9f96c4bcf0117a88798d641dfc3732c91-json.log. Closing because close_remove
d is enabled.
As @mkirkevo already observed, sometimes Filebeat doesn't collect some of the logs when the pod is going in Terminating state. I tried to reproduce it and I think I found at least one case in which this can happen. In that case, a random pod is handling the SIGTERM when someone deletes the pod, it sleeps for some time and in the end creates another log entry. As you can see in my findings, the logs after the SIGTERM was triggered never reach our backend(in that case Elasticsearch), but we can still see the logs using
kubectl. As far as I can tell, the Harvester is not deleted prematurely.It might be related to #14259 but I'm not sure as I don't know the codebase.
Running configuration:
Steps to reproduce:
In order to reproduce you need a running Kubernetes cluster with Filebeat up and running using the configuration above(please also create an output, we have Elasticsearch as a backend as I already mentioned).
This is just a long running pod that will not be killed immediately after you do
kubectl delete pod test-pod.2. Check the logs of this pod using
kubectl logs -f test-podin a new shell.3. Kill the pod by running
kubectl delete pod test-pod.4. Check your backend(depending on what you have set in the filebeat configuration).
You are gonna see something like this(please ignore the wrong timestamps after SIGTERM triggered, I didn't invest much time to figure out how to provide a good date to the trap):
In the meantime, I've checked the Filebeat logs and it seems that the harvester is terminated after the pod has finished completely. You can see that the last message arrived at 14:03:50.853 and if you add 5+2 seconds from the trap mechanism you end up close to the timestamp the file was removed.