I ran into a very confusing issue that I'd like to make sure gets documented and hopefully fixed so others don't fall into the same issue. Using Telegraf 1.13.4 on an AWS EC2 instance passing through an AWS Internal ELB, I was seeing missing metrics and connection errors in the Telegraf logs.
Relevant telegraf.conf:
[[outputs.wavefront]]
host = "wfproxy.example.net"
port = 2878
metric_separator = "."
convert_paths = true
System info:
Telegraf 1.13.4
Amazon Linux 2
Connection through AWS Internal ELB with default 60-sec idle connection timeout
Steps to reproduce:
- Create EC2 instance
- Install telegraf 1.13.4
- Configure to use WF Proxy endpoint through AWS Internal ELB
- Configure telegraf interval to 60 sec
- Start telegraf
Expected behavior:
Expected behavior is that metrics show normally, once every 60 sec.
Actual behavior:
The first 3-4 minutes of metrics are missed, followed by a single instance of metrics, followed by another 3-4 minutes of missed data.
Additional info:
The telegraf log shows the following connection reset message once every 3-4 minutes:
2020-02-28T21:47:13Z I! resetting wavefront proxy connection
2020-02-28T21:47:13Z I! write tcp 10.234.11.217:36870->10.234.245.107:2878: write: broken pipe
2020-02-28T21:48:10Z I! connected to Wavefront proxy at address: wfproxy.example.net:2878
Workarounds:
- If I change the AWS Internal ELB idle connection timeout above 60 sec, then things seem to work normally.
- If I change the Wavefront output plugin to use 'http' mode by specifying the 'url' setting instead of 'host' and 'port', then it also seems to work normally (perhaps an http keep-alive is sent).
I ran into a very confusing issue that I'd like to make sure gets documented and hopefully fixed so others don't fall into the same issue. Using Telegraf 1.13.4 on an AWS EC2 instance passing through an AWS Internal ELB, I was seeing missing metrics and connection errors in the Telegraf logs.
Relevant telegraf.conf:
System info:
Telegraf 1.13.4
Amazon Linux 2
Connection through AWS Internal ELB with default 60-sec idle connection timeout
Steps to reproduce:
Expected behavior:
Expected behavior is that metrics show normally, once every 60 sec.
Actual behavior:
The first 3-4 minutes of metrics are missed, followed by a single instance of metrics, followed by another 3-4 minutes of missed data.
Additional info:
The telegraf log shows the following connection reset message once every 3-4 minutes:
2020-02-28T21:47:13Z I! resetting wavefront proxy connection
2020-02-28T21:47:13Z I! write tcp 10.234.11.217:36870->10.234.245.107:2878: write: broken pipe
2020-02-28T21:48:10Z I! connected to Wavefront proxy at address: wfproxy.example.net:2878
Workarounds: