Skip to content

storage: NotLeaseholderError redirection can get in tight loop #22837

@nvb

Description

@nvb

I've observed that in at least v2.0-alpha.20180122 and v2.0-alpha.20180212 it's possible for NotLeaseholderError redirection to result in a tight loop where DistSender continually ping-pongs requests back between two replicas. I first noticed this in the RPC Errors graph of the Admin UI, where the number of Not Leaseholder Errors occasionally jumped up in the 10k range even though I only had about 50 SQL clients. This was backed up by the RPCs graph, which also showed a similar spike.

Later, I got lucky and caught this on the debug/requests page. Here I saw the following:

2018/02/20 08:10:24.413824 	13.750285 	kv.DistSender: sending partial batch
08:10:24.413826 	 .     2 	... txnID:bb4d74d6-fc80-46eb-b3e2-9121a450a1c2
08:10:24.414053 	 .   227 	... [client=127.0.0.1:35854,user=rk_user,n1] r52: sending batch 2 CPut, 1 BeginTxn to (n2,s2):3
08:10:24.414071 	 .    18 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to nathan-high-mem-0002:26257
08:10:24.419545 	 .  5473 	... [client=127.0.0.1:35854,user=rk_user,n1] application error: [NotLeaseHolderError] r52: replica (n2,s2):3 not lease holder; current lease is repl=(n1,s1):1 seq=0 start=1519114215.919013222,0 epo=1 pro=1519114215.919015761,0
08:10:24.419597 	 .    52 	... [client=127.0.0.1:35854,user=rk_user,n1] error: {(err: [NotLeaseHolderError] r52: replica (n2,s2):3 not lease holder; current lease is repl=(n1,s1):1 seq=0 start=1519114215.919013222,0 epo=1 pro=1519114215.919015761,0) <nil>}; trying next peer (n1,s1):1
08:10:24.419614 	 .    17 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to local server
08:10:24.419852 	 .   238 	... [client=127.0.0.1:35854,user=rk_user,n1] application error: [NotLeaseHolderError] r52: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=3 start=1519114118.566713425,0 epo=1 pro=1519114118.566716884,0
08:10:24.419861 	 .     9 	... [client=127.0.0.1:35854,user=rk_user,n1] error: {(err: [NotLeaseHolderError] r52: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=3 start=1519114118.566713425,0 epo=1 pro=1519114118.566716884,0) <nil>}; trying next peer (n2,s2):3
08:10:24.419878 	 .    17 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to nathan-high-mem-0002:26257
08:10:24.426201 	 .  6324 	... [client=127.0.0.1:35854,user=rk_user,n1] application error: [NotLeaseHolderError] r52: replica (n2,s2):3 not lease holder; current lease is repl=(n1,s1):1 seq=0 start=1519114215.919013222,0 epo=1 pro=1519114215.919015761,0
08:10:24.426220 	 .    18 	... [client=127.0.0.1:35854,user=rk_user,n1] error: {(err: [NotLeaseHolderError] r52: replica (n2,s2):3 not lease holder; current lease is repl=(n1,s1):1 seq=0 start=1519114215.919013222,0 epo=1 pro=1519114215.919015761,0) <nil>}; trying next peer (n1,s1):1
08:10:24.426233 	 .    13 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to local server
08:10:24.426578 	 .   345 	... [client=127.0.0.1:35854,user=rk_user,n1] application error: [NotLeaseHolderError] r52: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=3 start=1519114118.566713425,0 epo=1 pro=1519114118.566716884,0
08:10:24.426587 	 .     9 	... [client=127.0.0.1:35854,user=rk_user,n1] error: {(err: [NotLeaseHolderError] r52: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):3 seq=3 start=1519114118.566713425,0 epo=1 pro=1519114118.566716884,0) <nil>}; trying next peer (n2,s2):3
08:10:24.426612 	 .    24 	... [client=127.0.0.1:35854,user=rk_user,n1] sending request to nathan-high-mem-0002:26257
...
for multiple seconds
...

We can see n1 and n2 continuously redirecting to each other. I believe a situation like this is possible if a node requests a range lease and then quickly falls behind in the Raft log before seeing the application of its new lease. In that case, I'm not sure if there's much we can do to inform the new leaseholder about its new lease, because it's not easily safe to communicate the lease information in a side-channel outside of Raft. Still, this results in 0 QPS across the entire range, so I wonder if there's something else we can do to prevent the situation entirely.

At a minimum, we should have some kind of backoff at the DistSender level to prevent such a tight loop from occurring and blowing up the RPC count.

Metadata

Metadata

Assignees

Labels

A-kv-clientRelating to the KV client and the KV interface.C-performancePerf of queries or internals. Solution not expected to change functional behavior.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions