Skip to content

outputs.prometheus_client (metric_version=2) removes(still existing) histograms every expiration_interval #8170

@jjh74

Description

@jjh74

I'm gathering metrics with Google mtail (https://github.com/google/mtail) (inputs.prometheus). Problem is that histograms are removed every expiration_interval (and reappear on next interval) (Counters from mtail are not removed on expiration_interval).

(Setting expiration_interval=0 is not viable because telegraf also gathers other metrics and there are metrics I want to alert on if they're absent). (Testing with (both input/output) metric_version=1 and histograms don't disappear every expiration_interval).

Relevant telegraf.conf:

# This is minimal example for reproducing the behaviour. 
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "3s"
  flush_interval = "10s"
  flush_jitter = "4s"
  precision = ""
  debug = false
  hostname = ""
  omit_hostname = true

[[outputs.prometheus_client]]
  ## Address to listen on
  listen = ":9273"
  metric_version = 2
  path = "/metrics"
  # short expiration for debugging
  expiration_interval = "30s"
  collectors_exclude = ["gocollector", "process"]
  # string_as_label = true

  ## Export metric collection time.
  export_timestamp = false

[[inputs.prometheus]]
  urls = ["http://localhost:8000/metrics"]
  response_timeout = "8s"

  metric_version = 2

  namedrop = [ "go_*", "process_*" ]
  tagexclude = [ "host", "url" ]

  interval = "10s"

System info:

Telegraf 1.15.3 (git: HEAD fac8181) (also happens with at least 1.15.1, 1.14.0)
linux(amd64) (fedora32, centos8, centos7)

Steps to reproduce:

  1. Serve this file as /metrics (for example with python3 -m http.server 8000
# HELP postfix_qmgr_messages_inserted_recipients defined at postfix.mtail:81:13-53
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="1"} 2
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="2"} 2
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 2
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 2
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 2
# HELP postfix_qmgr_messages_removed_total defined at postfix.mtail:89:11-45
# TYPE postfix_qmgr_messages_removed_total counter
postfix_qmgr_messages_removed_total{prog="postfix.mtail"} 2
  1. Start telegraf with the example config: telegraf --config telegraf.conf and in a loop scrape telegraf: while true; do curl -s http://localhost:9273/metrics | grep -c ^postfix_ ; sleep 2; done and see that about every expiration_interval the postfix_qmgr_messages_inserted_recipients_* metrics disappear (and reappear after next input interval). Note: postfix_qmgr_messages_removed_total (counter) doesn't disappear.

Expected behavior:

curl -s http://localhost:9273/metrics output should have postfix_qmgr_messages_inserted_recipients_* metrics as long as they appear in input.

Actual behavior:

Histogram metrics are incorrectly removed every expiration_interval.

Additional info:

While this is fairly minimal test config, the problem also appears with actual data + mtail (here the histogram data is updated more frequently than expiration_interval).

Also with this test setup and file output:

[[outputs.file]]
  files = ["stdout"]
  data_format = "prometheus"

produces weird duplicate histogram values:

# HELP postfix_qmgr_messages_inserted_recipients Telegraf collected metric
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 2
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 2
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 2
# HELP postfix_qmgr_messages_inserted_recipients Telegraf collected metric
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="1"} 2
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 0
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 0
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 0
# HELP postfix_qmgr_messages_inserted_recipients Telegraf collected metric
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="2"} 2
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 0
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 0
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 0
# HELP postfix_qmgr_messages_inserted_recipients Telegraf collected metric
# TYPE postfix_qmgr_messages_inserted_recipients histogram
postfix_qmgr_messages_inserted_recipients_bucket{prog="postfix.mtail",le="+Inf"} 2
postfix_qmgr_messages_inserted_recipients_sum{prog="postfix.mtail"} 0
postfix_qmgr_messages_inserted_recipients_count{prog="postfix.mtail"} 0
# HELP postfix_qmgr_messages_removed_total Telegraf collected metric
# TYPE postfix_qmgr_messages_removed_total counter
postfix_qmgr_messages_removed_total{prog="postfix.mtail"} 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/prometheusbugunexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions