Skip to content
This repository was archived by the owner on Sep 21, 2023. It is now read-only.
This repository was archived by the owner on Sep 21, 2023. It is now read-only.

The Elasticsearch output should not report itself as degraded based only on the time between events #301

@cmacknz

Description

@cmacknz

The implementation in #174 was incompletely specified. We currently consider the shipper Elasticsearch output degraded whenever it has not written events to Elasticsearch within the past 30 seconds: #239

This is an ok proxy for inability to connect to Elasticsearch, but does not consider the impact on low volume log sources. Users could tune the timeout, but this isn't something they've traditionally had to do and may lead to false positive degraded states.

Instead we should only mark the shipper as degraded when we have not published events for 30 seconds, and we have detected an explicitly error attempting to connect to Elasticsearch. For example this would include connection refused errors, failed DNS lookups, or invalid credentials.

The most common reasons for failing to connect to Elasticsearch would be incorrect proxy configurations, connectivity outages, or invalidated API keys. We should address these cases specifically instead of using a catch all timeout that makes assumptions about the steady state event rate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions