Skip to content

Conversation

@Shillaker
Copy link
Collaborator

@Shillaker Shillaker commented Mar 16, 2022

Previously, when the state server received a request to globally lock a state key, it would block until it acquired that lock. This can result in the following deadlock scenario:

  • We have s server threads serving requests.
  • s + 1 clients request a lock on a state key.
  • One of them gets the lock, the other s cause all the server threads to block.
  • The one with the lock tries to request an unlock, but all the server threads are blocking on the lock.

Rather than fix this, I've decided to remove global state locks for the following reasons:

  • Global locks are implemented without this problem in the distributed synchronisation primitives (as part of the point-to-point group operations).
  • There are ways to avoid needing a global lock entirely, such as using the state append operation to queue updates, or collective communication operations like reduce.

As part of debugging this issue I also tidied up the logging in the state server and client a bit.

@Shillaker Shillaker self-assigned this Mar 16, 2022
@Shillaker Shillaker changed the title Avoid deadlocks when locking global state Remove state global locks Mar 16, 2022
@Shillaker Shillaker marked this pull request as ready for review March 16, 2022 16:48
@Shillaker Shillaker requested a review from csegarragonz March 16, 2022 17:13
@Shillaker Shillaker merged commit 63eefaf into main Mar 17, 2022
@Shillaker Shillaker deleted the state-deadlock branch March 17, 2022 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants