storage: NotLeaseholderError redirection can get in tight loop

I've observed that in at least `v2.0-alpha.20180122` and `v2.0-alpha.20180212` it's possible for `NotLeaseholderError` redirection to result in a tight loop where `DistSender` continually ping-pongs requests back between two replicas. I first noticed this in the `RPC Errors` graph of the Admin UI, where the number of `Not Leaseholder Errors` occasionally jumped up in the 10k range even though I only had about 50 SQL clients. This was backed up by the `RPCs` graph, which also showed a similar spike.

Later, I got lucky and caught this on the `debug/requests` page. Here I saw the following:
```
2018/02/20 08:10:24.413824 	13.750285 	kv.DistSender: sending partial batch
08:10:24.413826 	 .     2 	... txnID:bb4d74d6-fc80-46eb-b3e2-9121a450a1c2
08:10:24.414053 	 .   227 	... [client=127.0.0.1:35854,user=rk_user,n1] r52: sending batch 2 CPut, 1 BeginTxn to (n2,s2):3
08:10:24.414071 	 .    18 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to nathan-high-mem-0002:26257
08:10:24.419545 	 .  5473 	... [client=127.0.0.1:35854,user=rk_user,n1] application error: [NotLeaseHolderError] r52: replica (n2,s2):3 not lease holder; current lease is repl=(n1,s1):1 seq=0 start=1519114215.919013222,0 epo=1 pro=1519114215.919015761,0
08:10:24.419597 	 .    52 	... [client=127.0.0.1:35854,user=rk_user,n1] error: {(err: [NotLeaseHolderError] r52: replica (n2,s2):3 not lease holder; current lease is repl=(n1,s1):1 seq=0 start=1519114215.919013222,0 epo=1 pro=1519114215.919015761,0) <nil>}; trying next peer (n1,s1):1
08:10:24.419614 	 .    17 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to local server
08:10:24.419852 	 .   238 	... [client=127.0.0.1:35854,user=rk_user,n1] application error: [NotLeaseHolderError] r52: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=3 start=1519114118.566713425,0 epo=1 pro=1519114118.566716884,0
08:10:24.419861 	 .     9 	... [client=127.0.0.1:35854,user=rk_user,n1] error: {(err: [NotLeaseHolderError] r52: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=3 start=1519114118.566713425,0 epo=1 pro=1519114118.566716884,0) <nil>}; trying next peer (n2,s2):3
08:10:24.419878 	 .    17 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to nathan-high-mem-0002:26257
08:10:24.426201 	 .  6324 	... [client=127.0.0.1:35854,user=rk_user,n1] application error: [NotLeaseHolderError] r52: replica (n2,s2):3 not lease holder; current lease is repl=(n1,s1):1 seq=0 start=1519114215.919013222,0 epo=1 pro=1519114215.919015761,0
08:10:24.426220 	 .    18 	... [client=127.0.0.1:35854,user=rk_user,n1] error: {(err: [NotLeaseHolderError] r52: replica (n2,s2):3 not lease holder; current lease is repl=(n1,s1):1 seq=0 start=1519114215.919013222,0 epo=1 pro=1519114215.919015761,0) <nil>}; trying next peer (n1,s1):1
08:10:24.426233 	 .    13 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to local server
08:10:24.426578 	 .   345 	... [client=127.0.0.1:35854,user=rk_user,n1] application error: [NotLeaseHolderError] r52: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=3 start=1519114118.566713425,0 epo=1 pro=1519114118.566716884,0
08:10:24.426587 	 .     9 	... [client=127.0.0.1:35854,user=rk_user,n1] error: {(err: [NotLeaseHolderError] r52: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=3 start=1519114118.566713425,0 epo=1 pro=1519114118.566716884,0) <nil>}; trying next peer (n2,s2):3
08:10:24.426612 	 .    24 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to nathan-high-mem-0002:26257
...
for multiple seconds
...
```

We can see n1 and n2 continuously redirecting to each other. I believe a situation like this is possible if a node requests a range lease and then quickly falls behind in the Raft log before seeing the application of its new lease. In that case, I'm not sure if there's much we can do to inform the new leaseholder about its new lease, because it's not easily safe to communicate the lease information in a side-channel outside of Raft. Still, this results in 0 QPS across the entire range, so I wonder if there's something else we can do to prevent the situation entirely.

At a minimum, we should have some kind of backoff at the `DistSender` level to prevent such a tight loop from occurring and blowing up the RPC count.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: NotLeaseholderError redirection can get in tight loop #22837

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

storage: NotLeaseholderError redirection can get in tight loop #22837

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions