Relevant telegraf.conf:
Note: The MQTT Broker is AWS IoT Core.
[[inputs.mqtt_consumer]]
servers = ["ssl://<snip>-ats.iot.us-east-1.amazonaws.com:8883"]
## Topics that will be subscribed to.
topics = [
"<snip>",
"<snip>",
]
qos = 0
connection_timeout = "30s"
client_id = "<snip>"
tls_ca = "/etc/telegraf/ca.pem"
tls_cert = "/etc/telegraf/cert.pem"
tls_key = "/etc/telegraf/key.pem"
name_override = "Relative Humidity"
data_format = "json"
fieldpass = ["value"]
json_time_key = "timestamp"
json_time_format = "unix_ms"
[inputs.mqtt_consumer.tags]
Units = "%"
System info:
Telegraf 1.17
Ubuntu 14.04.5
Docker
docker-compose.yaml:
telegraf:
image: telegraf:1.17-alpine
restart: always
environment:
- INFLUXDB_V2_URL=https://us-west-2-1.aws.cloud2.influxdata.com
- INFLUX_TOKEN=<snip>
- OUR_ORGANIZATION=<snip>
- INFLUXDB_BUCKET=<snip>
volumes:
- …/aws-mqtt-influxcloud-v2.conf:/etc/telegraf/telegraf.conf:ro
- …/AmazonRootCA1.pem:/etc/telegraf/ca.pem
- …/certificate.pem.crt:/etc/telegraf/cert.pem
- …/private.pem.key:/etc/telegraf/key.pem
Steps to reproduce:
- Start Telegraf with MQTT consumer configuration for AWS IoT Core and allow to run successfully for a day.
- LAN device automatically reboots early in the morning, and Telegraf disconnects.
- Telegraf reconnects to MQTT broker after LAN device recovers.
- Telegraf MQTT collection timeouts take the place of metrics gathering thereafter.
Expected behavior:
Recovery of MQTT subscriptions after broker reconnection following network interruption.
Actual behavior:
After a connection interruption Telegraf successfully reconnects to the MQTT broker but collection results in timeouts. Only restarting Telegraf fixes the problem.
This problem was reliably reproducible each morning at 6am when a network device automatically rebooted and interrupted the connection to the AWS IoT Core MQTT Broker. Removing the automatic device reboot solved the problem.
Additional info:
| 2021-01-13T12:23:16Z I! Starting Telegraf 1.17.0
| 2021-01-13T12:23:16Z I! Using config file: /etc/telegraf/telegraf.conf
| 2021-01-13T12:23:16Z I! Loaded inputs: mqtt_consumer (2x)
| 2021-01-13T12:23:16Z I! Loaded aggregators:
| 2021-01-13T12:23:16Z I! Loaded processors: rename
| 2021-01-13T12:23:16Z I! Loaded outputs: influxdb_v2
| 2021-01-13T12:23:16Z I! Tags enabled:
| 2021-01-13T12:23:16Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"738565c8ccdc", Flush Interval:10s
| 2021-01-13T12:23:16Z I! [inputs.mqtt_consumer] Connected [ssl://<snip>-ats.iot.us-east-1.amazonaws.com:8883]
| 2021-01-13T12:23:16Z I! [inputs.mqtt_consumer] Connected [ssl://<snip>-ats.iot.us-east-1.amazonaws.com:8883]
…
| 2021-01-14T11:00:12Z E! [outputs.influxdb_v2] when writing to [https://us-west-2-1.aws.cloud2.influxdata.com]: Post "https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/write?…": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
| 2021-01-14T11:00:12Z E! [agent] Error writing to outputs.influxdb_v2: Post "https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/write?…": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
| 2021-01-14T11:00:22Z E! [outputs.influxdb_v2] when writing to [https://us-west-2-1.aws.cloud2.influxdata.com]: Post "https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/write?…": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
| 2021-01-14T11:00:22Z E! [agent] Error writing to outputs.influxdb_v2: Post "https://us-west-2-1.aws.cloud2.influxdata.com/api/v2/write?…": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
| 2021-01-14T11:01:21Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: pingresp not received, disconnecting
…
| 2021-01-14T11:02:40Z I! [inputs.mqtt_consumer] Connected [ssl://<snip>-ats.iot.us-east-1.amazonaws.com:8883]
| 2021-01-14T11:02:41Z I! [inputs.mqtt_consumer] Connected [ssl://<snip>-ats.iot.us-east-1.amazonaws.com:8883]
| 2021-01-14T11:04:00Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: pingresp not received, disconnecting
| 2021-01-14T11:04:01Z E! [inputs.mqtt_consumer] Error in plugin: connection lost: pingresp not received, disconnecting
| 2021-01-14T11:04:20Z I! [inputs.mqtt_consumer] Connected [ssl://<snip>-ats.iot.us-east-1.amazonaws.com:8883]
| 2021-01-14T11:04:20Z I! [inputs.mqtt_consumer] Connected [ssl://<snip>-ats.iot.us-east-1.amazonaws.com:8883]
| 2021-01-14T11:04:40Z W! [inputs.mqtt_consumer] Collection took longer than expected; not complete after interval of 20s
| 2021-01-14T11:04:40Z W! [inputs.mqtt_consumer] Collection took longer than expected; not complete after interval of 20s
| 2021-01-14T11:05:00Z W! [inputs.mqtt_consumer] Collection took longer than expected; not complete after interval of 20s
| 2021-01-14T11:05:00Z W! [inputs.mqtt_consumer] Collection took longer than expected; not complete after interval of 20s
| 2021-01-14T11:05:20Z W! [inputs.mqtt_consumer] Collection took longer than expected; not complete after interval of 20s
| 2021-01-14T11:05:20Z W! [inputs.mqtt_consumer] Collection took longer than expected; not complete after interval of 20s
| 2021-01-14T11:05:40Z W! [inputs.mqtt_consumer] Collection took longer than expected; not complete after interval of 20s
Relevant telegraf.conf:
Note: The MQTT Broker is AWS IoT Core.
System info:
Telegraf 1.17
Ubuntu 14.04.5
Docker
docker-compose.yaml:
Steps to reproduce:
Expected behavior:
Recovery of MQTT subscriptions after broker reconnection following network interruption.
Actual behavior:
After a connection interruption Telegraf successfully reconnects to the MQTT broker but collection results in timeouts. Only restarting Telegraf fixes the problem.
This problem was reliably reproducible each morning at 6am when a network device automatically rebooted and interrupted the connection to the AWS IoT Core MQTT Broker. Removing the automatic device reboot solved the problem.
Additional info: