Skip to content

SCAN, SSCAN, HSCAN, and ZSCAN are not stable across failovers #12440

@madolson

Description

@madolson

Scan is supposed to provide the following guarantees (as per https://redis.io/docs/manual/keyspace/):

  1. A full iteration always retrieves all the elements that were present in the collection from the start to the end of a full iteration. This means that if a given element is inside the collection when an iteration is started, and is still there when an iteration terminates, then at some point SCAN returned it to the user.

  2. A full iteration never returns any element that was NOT present in the collection from the start to the end of a full iteration. So if an element was removed before the start of an iteration, and is never added back to the collection for all the time an iteration lasts, SCAN ensures that this element will never be returned.

While playing around with some PoC for cluster wide scan, I realized that the first guarantee can only hold as long as you were connected to the same node during the entire duration of the scanning. If failover occurs, the replica may have a different seed value for the siphash function, so the cursor previously used on the primary would not be in the same place. This is likely less of an issue for SCAN since you normally indicate the node you're talking to, but many CME client transparently handle re-directs for SSCAN/HSCAN/ZSCAN during failovers.

I'm not sure we need to strictly do anything about SCAN, I'm not sure how often SCAN is resumed after a failover. The other commands might though.

I think this is worth addressing, but could be done in three ways:

  1. Simply update the documentation to add the caveat. This doesn't feel right to me, because most users will not really be aware of this.
  2. Add a configuration so that seeds can be set externally. This would allow operators to configure this consistency, but has limited benefit for those that don't know.
  3. Allow replicas to sync their data seed from their primaries. This makes their cursors consistent. We can also persistent this into RDB so that it is still accurate, I'm most in-favor of this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions