-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Description
Title: Add fixed failure percentage-based outlier detection mode.
Description:
Add a new outlier detection mode which compares a host's failure rate to a configured fixed threshold, rather than the current success rate method's standard deviation-based threshold. This enables outlier detection to catch failure modes where a node is intermittently failing, but:
a) the node does not trigger the consecutive 5XX/gateway failure detection modes, and
b) the standard deviation-based success rate mode has been tuned to avoid false-positive ejections.
This new mode would be disabled by default, as I believe it fills a more niche role in outlier detection, and it is harder to find a safe default value which works for a wide variety of deployments.
I've got an initial implementation of this done already, which I can PR if this seems like a reasonable feature.