With telegraf 1.19.1, internal_write reports values of buffer_size for outputs.influxdb that are smaller than metric_batch_size even when the influxdb instance is down and more points have been generated. Here is a configuration for comparison also with http output which buffers points correctly (as with previous versions).
Relevant telegraf.conf:
[global_tags]
region = "eu-west-1"
[agent]
metric_buffer_limit = 100000
flush_interval = "2s"
[[inputs.internal]]
interval="4ms"
[[outputs.influxdb]]
urls = [ "http://localhost:8086" ]
skip_database_creation = true
[[outputs.http]]
url = "http://127.0.0.1:8080/telegraf"
[[outputs.file]]
files = [ "telegraf.out" ]
System info:
Linux on AMD64 with telegraf 1.19.1 (https://dl.influxdata.com/telegraf/releases/telegraf-1.19.1_linux_amd64.tar.gz) and 1.18.3 (https://dl.influxdata.com/telegraf/releases/telegraf-1.18.3_linux_amd64.tar.gz)
No processes are listening on localhost:8086 nor 127.0.0.1:8080.
Steps to reproduce:
- Use the above configuration with 1.19.1 binary and run it for 20s. Rename telegraf.out to telegraf-1.19.1.out
- Then run the same configuration with 1.18.3 version binary for 20s, and rename telegraf.out to telegraf-1.18.3.out
- Compare the last lines of telegraf-1.19.1.out and telegraf-1.18.3.out
$ tail telegraf-1.19.1.out | grep write | tail -3
internal_write,host=xxxxx,output=influxdb,region=eu-west-1,version=1.19.1 metrics_written=29000i,metrics_dropped=0i,buffer_size=682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=1823419i,errors=29i,metrics_added=29682i 1626627449853000000
internal_write,host=xxxxx,output=http,region=eu-west-1,version=1.19.1 metrics_dropped=0i,buffer_size=29682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=11507427i,errors=0i,metrics_added=29682i,metrics_written=0i 1626627449853000000
internal_write,host=xxxxx,output=file,region=eu-west-1,version=1.19.1 metrics_dropped=0i,buffer_size=682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=17995062i,errors=0i,metrics_added=29682i,metrics_written=29000i 1626627449853000000
$ tail telegraf-1.18.3.out | grep write | tail -3
internal_write,host=xxxxx,output=influxdb,region=eu-west-1,version=1.18.3 metrics_dropped=0i,buffer_size=29718i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=3341971i,errors=29i,metrics_added=29718i,metrics_written=0i 1626627746465000000
internal_write,host=xxxxx,output=http,region=eu-west-1,version=1.18.3 buffer_limit=100000i,metrics_filtered=0i,write_time_ns=13574467i,errors=0i,metrics_added=29718i,metrics_written=0i,metrics_dropped=0i,buffer_size=29718i 1626627746465000000
internal_write,host=xxxxx,output=file,region=eu-west-1,version=1.18.3 metrics_added=29718i,metrics_written=29000i,metrics_dropped=0i,buffer_size=718i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=20608019i,errors=0i 1626627746465000000
Expected behavior:
internal_write should report similar numbers for output=influxdb and output=http, in particular for metrics_written and buffer_size, just as when using telegraf 1.18.3. Gathered metric points should be buffered for both unavailable outputs, but only http output is buffered if using 1.19.1.
Actual behavior:
When using 1.19.1 binary internal_write reports high number for metrics_written even though influxdb is not up and low number for buffer_size as it wasn't buffering the gathered points.
internal_write,host=xxx,output=influxdb,...,version=1.19.1 metrics_written=29000i,...,buffer_size=682i
On the contrary, 1.18.3 binary works as expected.
internal_write,host=xxxxx,output=influxdb,...,version=1.18.3 ...buffer_size=29718i,...,metrics_written=0i
http output also works as expected for both binary versions:
internal_write,host=xxxxx,output=http,...,version=1.19.1 ...buffer_size=29682i,...,metrics_written=0i
internal_write,host=xxxxx,output=http,...,version=1.18.3 ...,metrics_written=0i,...,buffer_size=29718i
With telegraf 1.19.1, internal_write reports values of buffer_size for outputs.influxdb that are smaller than metric_batch_size even when the influxdb instance is down and more points have been generated. Here is a configuration for comparison also with http output which buffers points correctly (as with previous versions).
Relevant telegraf.conf:
System info:
Linux on AMD64 with telegraf 1.19.1 (https://dl.influxdata.com/telegraf/releases/telegraf-1.19.1_linux_amd64.tar.gz) and 1.18.3 (https://dl.influxdata.com/telegraf/releases/telegraf-1.18.3_linux_amd64.tar.gz)
No processes are listening on localhost:8086 nor 127.0.0.1:8080.
Steps to reproduce:
$ tail telegraf-1.19.1.out | grep write | tail -3
internal_write,host=xxxxx,output=influxdb,region=eu-west-1,version=1.19.1 metrics_written=29000i,metrics_dropped=0i,buffer_size=682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=1823419i,errors=29i,metrics_added=29682i 1626627449853000000
internal_write,host=xxxxx,output=http,region=eu-west-1,version=1.19.1 metrics_dropped=0i,buffer_size=29682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=11507427i,errors=0i,metrics_added=29682i,metrics_written=0i 1626627449853000000
internal_write,host=xxxxx,output=file,region=eu-west-1,version=1.19.1 metrics_dropped=0i,buffer_size=682i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=17995062i,errors=0i,metrics_added=29682i,metrics_written=29000i 1626627449853000000
$ tail telegraf-1.18.3.out | grep write | tail -3
internal_write,host=xxxxx,output=influxdb,region=eu-west-1,version=1.18.3 metrics_dropped=0i,buffer_size=29718i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=3341971i,errors=29i,metrics_added=29718i,metrics_written=0i 1626627746465000000
internal_write,host=xxxxx,output=http,region=eu-west-1,version=1.18.3 buffer_limit=100000i,metrics_filtered=0i,write_time_ns=13574467i,errors=0i,metrics_added=29718i,metrics_written=0i,metrics_dropped=0i,buffer_size=29718i 1626627746465000000
internal_write,host=xxxxx,output=file,region=eu-west-1,version=1.18.3 metrics_added=29718i,metrics_written=29000i,metrics_dropped=0i,buffer_size=718i,buffer_limit=100000i,metrics_filtered=0i,write_time_ns=20608019i,errors=0i 1626627746465000000
Expected behavior:
internal_write should report similar numbers for output=influxdb and output=http, in particular for metrics_written and buffer_size, just as when using telegraf 1.18.3. Gathered metric points should be buffered for both unavailable outputs, but only http output is buffered if using 1.19.1.
Actual behavior:
When using 1.19.1 binary internal_write reports high number for metrics_written even though influxdb is not up and low number for buffer_size as it wasn't buffering the gathered points.
internal_write,host=xxx,output=influxdb,...,version=1.19.1 metrics_written=29000i,...,buffer_size=682i
On the contrary, 1.18.3 binary works as expected.
internal_write,host=xxxxx,output=influxdb,...,version=1.18.3 ...buffer_size=29718i,...,metrics_written=0i
http output also works as expected for both binary versions:
internal_write,host=xxxxx,output=http,...,version=1.19.1 ...buffer_size=29682i,...,metrics_written=0i
internal_write,host=xxxxx,output=http,...,version=1.18.3 ...,metrics_written=0i,...,buffer_size=29718i