I'm gathering metrics with Google mtail (https://github.com/google/mtail) (inputs.prometheus). Problem is that histograms are removed every expiration_interval (and reappear on next interval) (Counters from mtail are not removed on expiration_interval).
(Setting expiration_interval=0 is not viable because telegraf also gathers other metrics and there are metrics I want to alert on if they're absent). (Testing with (both input/output) metric_version=1 and histograms don't disappear every expiration_interval).
Relevant telegraf.conf:
# This is minimal example for reproducing the behaviour.
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "3s"
flush_interval = "10s"
flush_jitter = "4s"
precision = ""
debug = false
hostname = ""
omit_hostname = true
[[outputs.prometheus_client]]
## Address to listen on
listen = ":9273"
metric_version = 2
path = "/metrics"
# short expiration for debugging
expiration_interval = "30s"
collectors_exclude = ["gocollector", "process"]
# string_as_label = true
## Export metric collection time.
export_timestamp = false
[[inputs.prometheus]]
urls = ["http://localhost:8000/metrics"]
response_timeout = "8s"
metric_version = 2
namedrop = [ "go_*", "process_*" ]
tagexclude = [ "host", "url" ]
interval = "10s"
System info:
Telegraf 1.15.3 (git: HEAD fac8181) (also happens with at least 1.15.1, 1.14.0)
linux(amd64) (fedora32, centos8, centos7)
Steps to reproduce:
- Serve this file as /metrics (for example with
python3 -m http.server 8000
# HELP postfix_qmgr_messages_inserted_recipients defined at postfix.mtail:81:13-53
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="1"} 2
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="2"} 2
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 2
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 2
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 2
# HELP postfix_qmgr_messages_removed_total defined at postfix.mtail:89:11-45
# TYPE postfix_qmgr_messages_removed_total counter
postfix_qmgr_messages_removed_total{prog="postfix.mtail"} 2
- Start telegraf with the example config:
telegraf --config telegraf.conf and in a loop scrape telegraf: while true; do curl -s http://localhost:9273/metrics | grep -c ^postfix_ ; sleep 2; done and see that about every expiration_interval the postfix_qmgr_messages_inserted_recipients_* metrics disappear (and reappear after next input interval). Note: postfix_qmgr_messages_removed_total (counter) doesn't disappear.
Expected behavior:
curl -s http://localhost:9273/metrics output should have postfix_qmgr_messages_inserted_recipients_* metrics as long as they appear in input.
Actual behavior:
Histogram metrics are incorrectly removed every expiration_interval.
Additional info:
While this is fairly minimal test config, the problem also appears with actual data + mtail (here the histogram data is updated more frequently than expiration_interval).
Also with this test setup and file output:
[[outputs.file]]
files = ["stdout"]
data_format = "prometheus"
produces weird duplicate histogram values:
# HELP postfix_qmgr_messages_inserted_recipients Telegraf collected metric
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 2
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 2
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 2
# HELP postfix_qmgr_messages_inserted_recipients Telegraf collected metric
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="1"} 2
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 0
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 0
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 0
# HELP postfix_qmgr_messages_inserted_recipients Telegraf collected metric
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="2"} 2
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 0
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 0
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 0
# HELP postfix_qmgr_messages_inserted_recipients Telegraf collected metric
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 2
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 0
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 0
# HELP postfix_qmgr_messages_removed_total Telegraf collected metric
# TYPE postfix_qmgr_messages_removed_total counter
postfix_qmgr_messages_removed_total{prog="postfix.mtail"} 2
I'm gathering metrics with Google mtail (https://github.com/google/mtail) (inputs.prometheus). Problem is that histograms are removed every expiration_interval (and reappear on next interval) (Counters from mtail are not removed on expiration_interval).
(Setting expiration_interval=0 is not viable because telegraf also gathers other metrics and there are metrics I want to alert on if they're absent). (Testing with (both input/output) metric_version=1 and histograms don't disappear every expiration_interval).
Relevant telegraf.conf:
System info:
Telegraf 1.15.3 (git: HEAD fac8181) (also happens with at least 1.15.1, 1.14.0)
linux(amd64) (fedora32, centos8, centos7)
Steps to reproduce:
python3 -m http.server 8000telegraf --config telegraf.confand in a loop scrape telegraf:while true; do curl -s http://localhost:9273/metrics | grep -c ^postfix_ ; sleep 2; doneand see that about every expiration_interval thepostfix_qmgr_messages_inserted_recipients_*metrics disappear (and reappear after next input interval). Note:postfix_qmgr_messages_removed_total(counter) doesn't disappear.Expected behavior:
curl -s http://localhost:9273/metrics output should have
postfix_qmgr_messages_inserted_recipients_*metrics as long as they appear in input.Actual behavior:
Histogram metrics are incorrectly removed every expiration_interval.
Additional info:
While this is fairly minimal test config, the problem also appears with actual data + mtail (here the histogram data is updated more frequently than expiration_interval).
Also with this test setup and file output:
produces weird duplicate histogram values: