Skip to content

Exponential backoff upon output error #9353

@jrc

Description

@jrc

Feature Request

Opening a feature request kicks off a discussion.

Current behavior:

We are using Telegraf to send data from IoT devices to InfluxDB. At some point, we hit a rate limit, causing InfluxDB to respond with an error status. Telegraf then tried again every minute (default flushing interval of 1m).

Example log:

2021-04-29T07:46:52Z W! [outputs.influxdb_v2] Failed to write; will retry in 0s. (429 Too Many Requests)
2021-04-29T07:46:52Z E! [outputs.influxdb_v2] When writing to [https://eu-central-1-1.aws.cloud2.influxdata.com]: waiting 0s for server before sending metric again
2021-04-29T07:46:52Z E! [agent] Error writing to outputs.influxdb_v2: waiting 0s for server before sending metric again
2021-04-29T07:47:01Z W! [outputs.influxdb_v2] Failed to write; will retry in 0s. (429 Too Many Requests)
2021-04-29T07:47:01Z E! [outputs.influxdb_v2] When writing to [https://eu-central-1-1.aws.cloud2.influxdata.com]: waiting 0s for server before sending metric again

Desired behavior:

Implement exponential backoff/cooldown (i.e. dynamically variable flushing interval) upon any output errors.

In their documentation, for example, Google Cloud explicitly states that you are "strongly encouraged to implement truncated exponential backoff with introduced jitter" in their documentation: https://cloud.google.com/iot/docs/how-tos/exponential-backoff

Use case:

As we have a fleet of devices running on mobile broadband, this had the very real consequence of eating up GB of data 💸💸💸 (and killing connectivity) within days!

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/iotNew plugins or features relating to IoT monitoringfeature requestRequests for new plugin and for new features to existing plugins

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions