Skip to content

Add fixed failure percentage-based outlier detection mode #8105

@csssuf

Description

@csssuf

Title: Add fixed failure percentage-based outlier detection mode.

Description:
Add a new outlier detection mode which compares a host's failure rate to a configured fixed threshold, rather than the current success rate method's standard deviation-based threshold. This enables outlier detection to catch failure modes where a node is intermittently failing, but:
a) the node does not trigger the consecutive 5XX/gateway failure detection modes, and
b) the standard deviation-based success rate mode has been tuned to avoid false-positive ejections.

This new mode would be disabled by default, as I believe it fills a more niche role in outlier detection, and it is harder to find a safe default value which works for a wide variety of deployments.

I've got an initial implementation of this done already, which I can PR if this seems like a reasonable feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    design proposalNeeds design doc/proposal before implementation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions