Skip to content

Agent won't start if remote HTTP config endpoint is down #7338

@clever-trevor

Description

@clever-trevor

Feature Request

A useful feature of Telegraf is to pull config from a remote HTTP server, thus centralising agent configuration. However, if the HTTP endpoint is unavailable when the agent is started, the agent throws an error and dies.

Proposal:

Have the agent retry the endpoint periodically if unavailable, and when online, pull the config and start as normal.

Current behavior:

Agent will not start.

Desired behavior:

Agent re-attempts to get the configuration, ideally driven by startup switches to control retry-interval and max retry-attempts

Use case:

In a large environment, it is entirely possible that the HTTP endpoint could go offline (e.g. planned or unplanned outage). Any agents that get restarted during this interval (e.g. reboot, agent installation) would simply die, meaning some recovery process is needed to bring them back online.
Having this retry ability in the agent would provide a more reliable solution and self-recovery.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions