Skip to content

health check edge intervals incorrectly delayed by health thresholds #3173

@juchem

Description

@juchem

The current implementation of healthy_edge_interval and unhealthy_edge_interval will wait for the health threshold to be reached before the interval is used.

That leads to situations like this:

  • healthy_threshold is set to 2;
  • host health state is currently unhealthy;
  • host health check fails, next check happens after unhealthy_interval;
  • host health check succeeds, next check happens after interval;
  • host health check succeeds, next check happens after healthy_edge_interval;
  • host health check succeeds, next check happens after interval.

The behavior above defeats the purpose of having an edge interval since its goal is to detect health state changes faster whilst reducing the burden of health checks on healthy hosts.

The intended behavior to achieve edge interval's purpose on the same scenario would be:

  • host health check fails, next check happens after unhealthy_interval;
  • host health check succeeds, next check happens after healthy_edge_interval;
  • host health check succeeds, next check happens after interval;
  • host health check succeeds, next check happens after interval.

Repro steps:

  • configure unhealthy_threshold to a value bigger than 1;
  • configure an arbitrary value for unhealthy_interval;
  • configure unhealthy_edge_interval to a value different than unhealthy_interval;
  • cause a network timeout in the backend host and verify that the second failed health check will happen unhealthy_interval after the first failed one, instead of unhealthy_edge_interval after.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions