Skip to content

Race in subset LB host updates #6301

@snowp

Description

@snowp

During host updates on the worker threads, the subset LB calls health() on the list of all hosts to partition to the hosts into healthy/degraded/unhealthy vectors. Since health() reads from an atomic
bit mask that may be modified by another thread (ie by the health checker), this may produce an inconsistent view of a given HostSet, since health() is called multiple times for each host.

I can think of a few ways of solving this:

  1. Ensure we only call health() exactly once per host per subset update. This will ensure that the health values are consistent within a given HostSet. It would not guarantee that all subsets are updated with the same health values.
  2. Infer the health values of hosts by determining which of the healthy/degraded/all hosts on the original
    priority set they're in. This would ensure consistency between all subsets.

Leaning towards 2), so I'll see if I can get a PR ready.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions