Reported in #11834 (comment) . I could reproduce it in 7.10 with the reference documentation and a simple cronjob. Filebeat with autodiscover doesn't collect logs of short-living jobs.
What I could see is that short living containers don't generate any start event. Sometimes they generate a stop event.
With kubectl (-w) I could see that pods from short-living cronjobs don't generate an event with the running state:
hello-1606148340-spnpj 0/1 Pending 0 1s
hello-1606148340-spnpj 0/1 Pending 0 1s
hello-1606148340-spnpj 0/1 ContainerCreating 0 1s
hello-1606148340-spnpj 0/1 Completed 0 2s
With longer-living pods, this is the sequence of events seen, and logs are collected:
hello-1606148400-gm82x 0/1 Pending 0 0s
hello-1606148400-gm82x 0/1 Pending 0 0s
hello-1606148400-gm82x 0/1 ContainerCreating 0 0s
hello-1606148400-gm82x 1/1 Running 0 1s
hello-1606148400-gm82x 0/1 Completed 0 11s
I have also seen that logs from pods that print something and fail fast are not collected, events for these cases are like this:
echo 0/1 Pending 0 0s
echo 0/1 Pending 0 0s
echo 0/1 ContainerCreating 0 0s
echo 0/1 Completed 0 3s
echo 0/1 Completed 1 4s
echo 0/1 CrashLoopBackOff 1 5s
...
For these cases having the logs is important to help investigating what is happening.
If there are init containers, there can be cases where the logs for the init containers are not collected, in these cases event sequences like these ones are seen:
mytarget2 0/1 Init:0/1 0 6s
mytarget2 0/1 PodInitializing 0 15s
mytarget2 1/1 Running 0 19s
For Metricbeat it can be ok to don't start modules for short-living processes, but filebeat should collect logs of containers from the moment they start, it is important to investigate issues.
For confirmed bugs, please report:
Version: 7.10.0 (also reported with 7.9.3)
Discuss Forum URL: [autodiscover] Error creating runner from config: Can only start an input when all related states are finished #11834 (comment)
Steps to Reproduce:
Start filebeat with the reference configuration and an autodiscover template:
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
templates:
- config:
- type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
Run a cronjob with a short-living process, like this:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
failedJobsHistoryLimit: 10
successfulJobsHistoryLimit: 20
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
With debug logs for autodiscover this is seen for some jobs, some errors regarding the lack of container.id, and some stop events, but no start event:
2020-11-23T16:44:10.377Z DEBUG [autodiscover] template/config.go:156 Configuration template cannot be resolved: field 'data.kubernetes.container.id' not available in event or environment accessing 'paths' (source:'/etc/filebeat.yml')
17:44:10.377
2020-11-23T16:44:10.377Z DEBUG [autodiscover] autodiscover/autodiscover.go:236 Got a stop event: map[config:[] host:10.0.6.13 id:fedc9c0a-113c-414c-95e9-409b3e56ead8 kubernetes:{"annotations":{},"labels":{"controller-uid":"d0b5a9f3-cccc-41c1-90da-7bd4decbcf8c","job-name":"hello-1606149780"},"namespace":"cron","node":{"name":"gke-jsoriano-test-default-pool-a6f338c6-w0b6"},"pod":{"name":"hello-1606149780-bngj7","uid":"fedc9c0a-113c-414c-95e9-409b3e56ead8"}} meta:{"kubernetes":{"labels":{"controller-uid":"d0b5a9f3-cccc-41c1-90da-7bd4decbcf8c","job-name":"hello-1606149780"},"namespace":"cron","node":{"name":"gke-jsoriano-test-default-pool-a6f338c6-w0b6"},"pod":{"name":"hello-1606149780-bngj7","uid":"fedc9c0a-113c-414c-95e9-409b3e56ead8"}}} ports:{} provider:d8bb0011-c4ab-4e20-890c-e7a9ff56dfff stop:true]
17:44:10.377
2020-11-23T16:44:10.377Z DEBUG [autodiscover] autodiscover/autodiscover.go:236 Got a stop event: map[config:[0xc000763e90] host:10.0.6.13 id:fedc9c0a-113c-414c-95e9-409b3e56ead8.hello kubernetes:{"annotations":{},"container":{"id":"054c10b6b0c8d8530735d3a92bbff5d76f4f76420e9c33319e5a0551be0fbf87","image":"busybox","name":"hello","runtime":"docker"},"labels":{"controller-uid":"d0b5a9f3-cccc-41c1-90da-7bd4decbcf8c","job-name":"hello-1606149780"},"namespace":"cron","node":{"name":"gke-jsoriano-test-default-pool-a6f338c6-w0b6"},"pod":{"name":"hello-1606149780-bngj7","uid":"fedc9c0a-113c-414c-95e9-409b3e56ead8"}} meta:{"container":{"id":"054c10b6b0c8d8530735d3a92bbff5d76f4f76420e9c33319e5a0551be0fbf87","image":{"name":"busybox"},"runtime":"docker"},"kubernetes":{"container":{"image":"busybox","name":"hello"},"labels":{"controller-uid":"d0b5a9f3-cccc-41c1-90da-7bd4decbcf8c","job-name":"hello-1606149780"},"namespace":"cron","node":{"name":"gke-jsoriano-test-default-pool-a6f338c6-w0b6"},"pod":{"name":"hello-1606149780-bngj7","uid":"fedc9c0a-113c-414c-95e9-409b3e56ead8"}}} port:0 provider:d8bb0011-c4ab-4e20-890c-e7a9ff56dfff stop:true]
Reported in #11834 (comment). I could reproduce it in 7.10 with the reference documentation and a simple cronjob. Filebeat with autodiscover doesn't collect logs of short-living jobs.
What I could see is that short living containers don't generate any start event. Sometimes they generate a stop event.
With kubectl (
-w) I could see that pods from short-living cronjobs don't generate an event with the running state:With longer-living pods, this is the sequence of events seen, and logs are collected:
I have also seen that logs from pods that print something and fail fast are not collected, events for these cases are like this:
For these cases having the logs is important to help investigating what is happening.
If there are init containers, there can be cases where the logs for the init containers are not collected, in these cases event sequences like these ones are seen:
For Metricbeat it can be ok to don't start modules for short-living processes, but filebeat should collect logs of containers from the moment they start, it is important to investigate issues.
For confirmed bugs, please report:
With debug logs for autodiscover this is seen for some jobs, some errors regarding the lack of
container.id, and some stop events, but no start event: