Relevent telegraf.conf
[agent]
collection_jitter = "0s"
debug = false
flush_interval = "30s"
flush_jitter = "1s"
hostname = "$HOSTNAME"
interval = "30s"
logfile = ""
metric_batch_size = 1000
metric_buffer_limit = 10000
omit_hostname = false
precision = ""
quiet = false
round_interval = true
[[processors.enum]]
[[processors.enum.mapping]]
dest = "status_code"
field = "status"
[processors.enum.mapping.value_mappings]
critical = 3
healthy = 1
problem = 2
[[outputs.influxdb]]
database = "kubernetes"
insecure_skip_verify = false
password = ""
retention_policy = ""
timeout = "5s"
url = "http://influxdb:8086"
user_agent = "telegraf"
username = ""
[[inputs.prometheus]]
monitor_kubernetes_pods = true
[[inputs.internal]]
collect_memstats = false
System info
Telegraf 1.20.3, Kubernetes 1.20.7
Docker
No response
Steps to reproduce
- Deploy Telegraf using the helm chart in the official repo to deploy telegraf in a K8s 1.20 cluster. We have seen this failure in both EKS and AKS.
- Use the configuration shown above via the ConfigMap.
- Observe the the pod produces an error on startup.
Expected behavior
Telegraf should start and the prometheus input should start scraping the pods it finds via discovery. This works just fine in Kubernetes 1.19, but fails as describe above in K8s 1.20.
Actual behavior
Telegraf pod dies immediately with the following error:
2021-11-10T02:06:37Z I! Starting Telegraf 1.20.3
2021-11-10T02:06:37Z I! Using config file: /etc/telegraf/telegraf.conf
2021-11-10T02:06:37Z I! Loaded inputs: internal prometheus
2021-11-10T02:06:37Z I! Loaded aggregators:
2021-11-10T02:06:37Z I! Loaded processors: enum
2021-11-10T02:06:37Z I! Loaded outputs: influxdb
2021-11-10T02:06:37Z I! Tags enabled: host=telegraf-polling-service
2021-11-10T02:06:37Z I! [agent] Config: Interval:30s, Quiet:false, Hostname:"telegraf-polling-service", Flush Interval:30s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x285f71c]
goroutine 36 [running]:
github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).watchPod(0xc000476fc0, {0x575c368, 0xc0002a04c0}, 0x0)
/go/src/github.com/influxdata/telegraf/plugins/inputs/prometheus/kubernetes.go:113 +0xfc
github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).startK8s.func1()
/go/src/github.com/influxdata/telegraf/plugins/inputs/prometheus/kubernetes.go:92 +0x24c
created by github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).startK8s
/go/src/github.com/influxdata/telegraf/plugins/inputs/prometheus/kubernetes.go:79 +0x2af
Additional info
No response
Relevent telegraf.conf
System info
Telegraf 1.20.3, Kubernetes 1.20.7
Docker
No response
Steps to reproduce
Expected behavior
Telegraf should start and the prometheus input should start scraping the pods it finds via discovery. This works just fine in Kubernetes 1.19, but fails as describe above in K8s 1.20.
Actual behavior
Telegraf pod dies immediately with the following error:
Additional info
No response