Relevant telegraf.conf:
[global_tags]
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
[agent]
interval = "30s"
round_interval = false
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "2s"
flush_interval = "10s"
flush_jitter = "5s"
precision = ""
debug = false
quiet = false
logfile = ""
hostname = ""
omit_hostname = false
[[inputs.prometheus]]
## An array of urls to scrape metrics from.
urls = ["http://localhost:9000/minio/prometheus/metrics"]
metric_version = 2
name_override = "minio"
tagexclude = ["url"]
System info:
$ lsb_release -rc
Release: 10
Codename: buster
$ dpkg -l | grep telegraf
ii telegraf 1.17.0-1 amd64 Plugin-driven server agent for reporting metrics into InfluxDB.
$ minio -v
minio version RELEASE.2020-12-12T08-39-07Z
Steps to reproduce:
- Install telegraf and minio on a host
- Attempt to scrape prometheus stats from minio using telegraf with
metric_version = 2.
$ sudo -u telegraf telegraf --test --config-directory /etc/telegraf/telegraf.d/ --input-filter prometheus
2020-12-24T17:51:35Z I! Starting Telegraf 1.17.0
2020-12-24T17:51:35Z I! Using config file: /etc/telegraf/telegraf.conf
2020-12-24T17:51:35Z E! [inputs.prometheus] Error in plugin: error reading metrics for http://localhost:9000/minio/prometheus/metrics: reading text format failed: text format parsing error in line 1: expected float as value, got ""
2020-12-24T17:51:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors
- Switch to
metric_version = 1 and stats will be collected, although the naming is then messed up.
Expected behavior:
- When scraping stats from minio's prometheus status page, telegraf fails to parse the output values
- If I set
metric_version = 1 in the prometheus input, the stats are collected.
- This wasn't the case before 1.17.0.
metric_version = 2 has been working well for quite some time.
Actual behavior:
Described above.
Additional info:
This is also occurring with other prometheus client endpoints (traefik, for example) on another Debian 10 system.
Example minio stats pulled from the host:
$ curl -Ss localhost:9000/minio/prometheus/metrics | head
# HELP disk_storage_available Total available space left on the disk
# TYPE disk_storage_available gauge
disk_storage_available{disk="/data/Blobstore"} 1.1720781201408e+13
# HELP disk_storage_total Total space on the disk
# TYPE disk_storage_total gauge
disk_storage_total{disk="/data/Blobstore"} 1.6001493008384e+13
# HELP disk_storage_used Total disk storage used on the disk
# TYPE disk_storage_used gauge
disk_storage_used{disk="/data/Blobstore"} 4.280711806976e+12
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
It seems clear to me those are valid prometheus stats. What's happening here?
Relevant telegraf.conf:
System info:
Steps to reproduce:
metric_version = 2.metric_version = 1and stats will be collected, although the naming is then messed up.Expected behavior:
metric_version = 1in the prometheus input, the stats are collected.metric_version = 2has been working well for quite some time.Actual behavior:
Described above.
Additional info:
This is also occurring with other prometheus client endpoints (traefik, for example) on another Debian 10 system.
Example minio stats pulled from the host:
It seems clear to me those are valid prometheus stats. What's happening here?