kv: evict leaseholder on RPC error#23885
Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom Aug 16, 2018
Merged
Conversation
Member
Contributor
|
This is kind of the converse of #3196. Since we don't know when the lease expires, we want to evict the cache on RPC errors. If we know that the lease hadn't expired yet, we'd want to stick to that node even if we did get transient RPC errors. Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Contributor
067a41e to
7f398ad
Compare
This addresses a situation in which we would not evict a stale leaseholder for a long time. Consider the replica [s1,s2,s3] and s1 is down but is the cached leaseholder, while s2 is the actual lease holder. The RPC layer will try s1, get an RPC error, try s2 and succeed. Since there is no NotLeaseHolderError involved, the cache would not get updated, and so every request pays the overhead of trying s1 first. Fixes cockroachdb#23601. Release note: None
7f398ad to
bc2bffd
Compare
tbg
commented
Aug 16, 2018
Member
Author
tbg
left a comment
There was a problem hiding this comment.
Rebased this and added a test, PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale)
solongordon
approved these changes
Aug 16, 2018
Contributor
solongordon
left a comment
There was a problem hiding this comment.
Test looks good, thanks!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale)
Member
Author
|
bors r=solongoron |
craig bot
pushed a commit
that referenced
this pull request
Aug 16, 2018
23885: kv: evict leaseholder on RPC error r=solongoron a=tschottdorf This addresses a situation in which we would not evict a stale leaseholder for a long time. Consider the replicas [s1,s2,s3] and s1 is down but is the cached leaseholder, while s2 is the actual lease holder. The RPC layer will try s1, get an RPC error, try s2 and succeed. Since there is no NotLeaseHolderError involved, the cache would not get updated, and so every request pays the overhead of trying s1 first. WIP because needs testing. Touches #23601. Release note (bug fix): Improve request routing during node outages. 28609: opt: Make additional perf improvements r=andy-kimball a=andy-kimball Make several more fixes: 1. Do not qualify column names in metadata, since that requires expensive string formatting up-front (also cleanup the factoring of this code, which had gotten messy). 2. Inline Metadata into Memo. 3. Inline logicalPropsBuilder into the Memo. Together, these changes improve KV perf from: ``` Phases/kv-read/OptBuild 18.4µs ± 1% ``` to: ``` Phases/kv-read/OptBuild 17.8µs ± 1% ``` 28661: storage: don't include RHS data in merge trigger r=bdarnell,tschottdorf a=benesch Now that we require ranges to be collocated during a merge and the RHS replicas to be up-to-date before the merge commits, we no longer need to include a snapshot of the RHS in the merge trigger. We know that the copy of the data that already exists in the local store is perfectly up-to-date. So, stop sending the data in the merge trigger. Release note: None 28689: sqlbase: avoid using SERIAL in system tables r=knz a=knz Needed for #28575. We'll soon want special behavior for SERIAL. We can't afford the definition of system tables to be subject to a discussion about what SERIAL means. So this patch ensures system tables don't use SERIAL. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Contributor
Build succeeded |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This addresses a situation in which we would not evict a stale
leaseholder for a long time. Consider the replicas [s1,s2,s3] and
s1 is down but is the cached leaseholder, while s2 is the actual lease
holder. The RPC layer will try s1, get an RPC error, try s2 and succeed.
Since there is no NotLeaseHolderError involved, the cache would not get
updated, and so every request pays the overhead of trying s1 first.
WIP because needs testing.
Touches #23601.
Release note (bug fix): Improve request routing during node outages.