Skip to content

[inputs.prometheus] SIGSEGV on startup with Kubernetes 1.20 #10085

@sfitts

Description

@sfitts

Relevent telegraf.conf

[agent]
      collection_jitter = "0s"
      debug = false
      flush_interval = "30s"
      flush_jitter = "1s"
      hostname = "$HOSTNAME"
      interval = "30s"
      logfile = ""
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      omit_hostname = false
      precision = ""
      quiet = false
      round_interval = true
    [[processors.enum]]
       [[processors.enum.mapping]]
        dest = "status_code"
        field = "status"
        [processors.enum.mapping.value_mappings]
            critical = 3
            healthy = 1
            problem = 2


    [[outputs.influxdb]]
      database = "kubernetes"
      insecure_skip_verify = false
      password = ""
      retention_policy = ""
      timeout = "5s"
      url = "http://influxdb:8086"
      user_agent = "telegraf"
      username = ""

    [[inputs.prometheus]]
      monitor_kubernetes_pods = true
    [[inputs.internal]]
      collect_memstats = false

System info

Telegraf 1.20.3, Kubernetes 1.20.7

Docker

No response

Steps to reproduce

  1. Deploy Telegraf using the helm chart in the official repo to deploy telegraf in a K8s 1.20 cluster. We have seen this failure in both EKS and AKS.
  2. Use the configuration shown above via the ConfigMap.
  3. Observe the the pod produces an error on startup.

Expected behavior

Telegraf should start and the prometheus input should start scraping the pods it finds via discovery. This works just fine in Kubernetes 1.19, but fails as describe above in K8s 1.20.

Actual behavior

Telegraf pod dies immediately with the following error:

2021-11-10T02:06:37Z I! Starting Telegraf 1.20.3
2021-11-10T02:06:37Z I! Using config file: /etc/telegraf/telegraf.conf
2021-11-10T02:06:37Z I! Loaded inputs: internal prometheus
2021-11-10T02:06:37Z I! Loaded aggregators:
2021-11-10T02:06:37Z I! Loaded processors: enum
2021-11-10T02:06:37Z I! Loaded outputs: influxdb
2021-11-10T02:06:37Z I! Tags enabled: host=telegraf-polling-service
2021-11-10T02:06:37Z I! [agent] Config: Interval:30s, Quiet:false, Hostname:"telegraf-polling-service", Flush Interval:30s
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x285f71c]

goroutine 36 [running]:
github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).watchPod(0xc000476fc0, {0x575c368, 0xc0002a04c0}, 0x0)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/prometheus/kubernetes.go:113 +0xfc
github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).startK8s.func1()
        /go/src/github.com/influxdata/telegraf/plugins/inputs/prometheus/kubernetes.go:92 +0x24c
created by github.com/influxdata/telegraf/plugins/inputs/prometheus.(*Prometheus).startK8s
        /go/src/github.com/influxdata/telegraf/plugins/inputs/prometheus/kubernetes.go:79 +0x2af

Additional info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugunexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions