kv/storage: requests need retries after a LeaseTransfer

Here's a nice one: node `A` executes a `TransferLease`, transferring to node `B`. Immediately after, if `A` receives a request, it will say `NotLeaseHolderError` and redirect to `B` (as the post-commit trigger for `TransferLease` has updated the lease).
`B` is not aware of this transfer, as it hasn't executed the respective trigger yet, so when it receives the request it will say `NotLeaseHolderError` and redirect back to `A`.
The `DistSender.sendToReplicas` runs through all the replicas and then returns a `SendError`, which the higher-level `sendChunk` takes to mean that the range descriptor in its cache is stale, and so it evicts it from the cache and tries again, with infinite retries. This hides the error from clients, but still this is not ideal.

I'm wondering what to do about it, if anything. I think options are:
a) We make the `*Admin*TransferLease` RPC (not the Raft `TransferLease`) block until `B` has applied the new lease. This is doable particularly since I'm currently introducing a command for reading the current lease. This only solves the problem for tests which explicitly do a transfer and then want to send more commands. And currently only tests use lease transfers... But then again, this is not really a problem for tests anyway.
b) We make the `DistSender` understand this situation and localize the retries in `sendToReplicas`. This can be done, for example, by having `sendToReplicas` keep track of the `NotLeaseHolderError`s it gets and seeing if nodes contradict each other. This is helped by the fact that I'm currently working to include the current lease in these errors, so the `DistSender` could assert that this case is happening is the leases reported back by different nodes don't agree and the alleged lease holder by the most recent lease is itself reporting a stale lease.
c) We try to reduce the possibility of a node being oblivious to the fact that it has a lease coming its way by making a transferee aware of the incoming lease before Raft actually applies it, and have it block incoming requests for the respective range for a grace period, until the lease has been applied. Maybe there's a way to peak at the Raft commands a node votes on? And better yet, maybe there's a way to require that a particular node vote on a request before the Raft leader considers a proposal committed (so that a transferer can require the transferee to become aware of a transfer early)?
d) (suggested by @tschottdorf) We try to reduce the possibility of this happening by attempting to move Raft leadership before proposing a `LeaseTransfer`. This would increase the probability that the transferee applies the `LeaseTransfer` before the transferer returns control.
This proposal doesn't seem to have a downside, except that it's all best effort?

This would become a real problem in prod if we start using lease transfers on node shutdown and down-replication (which I think we should). It's not currently a problem because leases only move once the old one expires, in which case nodes decide to acquire a lease while serving some request, and that request is blocked until the lease is acquired. So there's never a situation in which a "lease holder" is not aware of the most recent lease. But of course this all has the bigger price of having to wait for a lease to expire until a new holder emerges.

One thing that became apparent while looking at this is that we don't have any visibility into `DistSender` retries.

@bdarnell any thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/storage: requests need retries after a LeaseTransfer #8816

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kv/storage: requests need retries after a LeaseTransfer #8816

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions