Feature Request
Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods'
Current behavior:
Currently when 'monitor_kubernetes_pods=true' , telegraf watches for pods with specific annotations to scrape metrics, as pods come & go (in all namespaces or in specified namespaces). This approach works for smaller clusters, its almost 100% not scaling in bigger clusters (more than 500+ pods), especially when running telegraf in A pod which does this scraping for all pods in the cluster. This is also a single point of scale failure/unreliability when scraping thru pod annotations using telegraf promethus input plugin.
Proposal:
To introduce an additional option (may be like 'local_mode' or something more intuitive), which when TRUE, will get ONLY pods that are running in that node. It will fetch podlist locally for that node from the node's kubelet (instead of watching them thru API server as it does today) and scrape the ones with the same annotations as it is today. This will require running Telegraf as daemonset (in every node) in the cluster, which will do pod scraping in each node locally, when enabled. By default, this will be backward compatible (meaning this new option will be turned OFF/false by default and users can turn ON as they see the need)
Desired behavior:
Pod annotation based scraping thru Telegraf, scale as k8s cluster scales.
Use case:
As Kubernetes starts to become defacto for running workloads, most production clusters are growing, and prometheus metric sources & metrics are widely available. To monitor them thru telegraf, we need Telegraf to have reliable way to scale & collect metrics as the cluster grows.
Feature Request
Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods'
Current behavior:
Currently when 'monitor_kubernetes_pods=true' , telegraf watches for pods with specific annotations to scrape metrics, as pods come & go (in all namespaces or in specified namespaces). This approach works for smaller clusters, its almost 100% not scaling in bigger clusters (more than 500+ pods), especially when running telegraf in A pod which does this scraping for all pods in the cluster. This is also a single point of scale failure/unreliability when scraping thru pod annotations using telegraf promethus input plugin.
Proposal:
To introduce an additional option (may be like 'local_mode' or something more intuitive), which when TRUE, will get ONLY pods that are running in that node. It will fetch podlist locally for that node from the node's kubelet (instead of watching them thru API server as it does today) and scrape the ones with the same annotations as it is today. This will require running Telegraf as daemonset (in every node) in the cluster, which will do pod scraping in each node locally, when enabled. By default, this will be backward compatible (meaning this new option will be turned OFF/false by default and users can turn ON as they see the need)
Desired behavior:
Pod annotation based scraping thru Telegraf, scale as k8s cluster scales.
Use case:
As Kubernetes starts to become defacto for running workloads, most production clusters are growing, and prometheus metric sources & metrics are widely available. To monitor them thru telegraf, we need Telegraf to have reliable way to scale & collect metrics as the cluster grows.