Skip to content

Prometheus pod discovery not working for monitor_kubernetes_pods=true and pod_scrape_scope=cluster since 1.18.3 #9600

@gracewehner

Description

@gracewehner

Relevant telegraf.conf:

[[inputs.prometheus]]
  metric_version = 2
  pod_scrape_scope = "cluster"

  ## Scrape Kubernetes pods for the following prometheus annotations:
  ## - prometheus.io/scrape: Enable scraping for this pod
  ## - prometheus.io/scheme: If the metrics endpoint is secured then you will need to
  ##     set this to `https` & most likely set the tls config.
  ## - prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
  ## - prometheus.io/port: If port is not 9102 use this annotation
  monitor_kubernetes_pods = true

  bearer_token = "/var/run/secrets/kubernetes.io/serviceaccount/token"
  response_timeout = "15s"

  tls_ca = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
  insecure_skip_verify = true

System info:

Telegraf 1.18.3 - 1.19.2
Telegraf executable run as a process in a Kubernetes container

Steps to reproduce:

  1. Deploy a pod that exposes prometheus metrics at <pod_ip>:<port>/<metrics path> with the annotations in the comment for monitor_kubernetes_pods above.
  2. Run telegraf with the prometheus input plugin and the settings monitor_kubernetes_pods = true, pod_scrape_scope = "cluster".

Expected behavior:

Telegraf will discover the pods with that have the annotations and scrape the metrics exposed by those pods.

Actual behavior:

No pods are registered in the kubernetes.go code of the prometheus input plugin. Since no pods are discovered/registered, no prometheus metrics are scraped.

Additional info:

This is different from the issues #9349 and #9408 which is for pod_scrape_scope = "node".

Looks like the issue stems from #8937 and this line in kubernetes.go where the pod struct is never populated with the event object so the pod registered will not have info about the endpoint to scrape.
Replacing that line with something like:

pod, ok := event.Object.(*corev1.Pod)
if !ok {
   return fmt.Errorf("Unexpected object when getting pods")
}

fixes the issue by getting the pod from the watch event.

I have a fork and can make a PR with this change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugunexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions