Skip to content

Race condition in remove-replicas could make you lose the last replica of a key #5265

@crusaderky

Description

@crusaderky

CC @fjetter @mrocklin @jrbourbeau

This issue is downstream of

  1. Worker state machine refactor #5046
  2. Active Memory Manager framework + discard excess replicas #5111
  3. crusaderky@cea3c4a, which merges the two

The sum of all three is currently available at https://github.com/crusaderky/distributed/tree/amm_wsmr


There is a race condition in the new remove-replicas action, highlighted by https://github.com/crusaderky/distributed/blob/5a8f4ffa3a5fcfde14d57b72a30dfeada1245332/distributed/tests/test_active_memory_manager.py#L128-L153:

  1. An AMM drop policy runs once to drop one of the two replicas of a key
  2. Then the same or another policy runs again, in a different AMM run, before the recommendations from the first iteration had the time to either be enacted or rejected. On the second run, it asks to drop the same key again, but for whatever reason the other worker gets selected to drop its replica.

In this use case, AMM will accidentally destroy the last replica of the key.

This issue is because of how the current WSMR PR implements remove-replicas:

  1. the scheduler asks the worker to drop the key
  2. If the worker accepts the suggestion, it sends back a message to the scheduler, which removes the worker from who_has on the scheduler side.

The algorithm should be changed as follows:

  1. the scheduler asks the worker to drop the key and immediately removes the worker locally from who_has.
  2. If the worker rejects the suggestion, it sends back a message to the scheduler which re-adds the worker into who_has on the scheduler side.

In real life, this flaw will only manifest itself when the workers and/or network are so laggy that the full round-trip of the remove-replicas action will take longer than the interval at which the AMM is run (5s by default).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions