Skip to content

kv: evict leaseholder on RPC error#23885

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:distsender/leaseeviction
Aug 16, 2018
Merged

kv: evict leaseholder on RPC error#23885
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:distsender/leaseeviction

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Mar 15, 2018

This addresses a situation in which we would not evict a stale
leaseholder for a long time. Consider the replicas [s1,s2,s3] and
s1 is down but is the cached leaseholder, while s2 is the actual lease
holder. The RPC layer will try s1, get an RPC error, try s2 and succeed.
Since there is no NotLeaseHolderError involved, the cache would not get
updated, and so every request pays the overhead of trying s1 first.

WIP because needs testing.

Touches #23601.

Release note (bug fix): Improve request routing during node outages.

@tbg tbg requested a review from a team March 15, 2018 00:30
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@bdarnell
Copy link
Copy Markdown
Contributor

:lgtm:

This is kind of the converse of #3196. Since we don't know when the lease expires, we want to evict the cache on RPC errors. If we know that the lease hadn't expired yet, we'd want to stick to that node even if we did get transient RPC errors.


Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@solongordon
Copy link
Copy Markdown
Contributor

solongordon commented Mar 15, 2018

:lgtm:

Nice, evicting is more straightforward than trying to populate the cache with the correct value, which I had been looking at.

This addresses a situation in which we would not evict a stale
leaseholder for a long time. Consider the replica [s1,s2,s3] and
s1 is down but is the cached leaseholder, while s2 is the actual lease
holder. The RPC layer will try s1, get an RPC error, try s2 and succeed.
Since there is no NotLeaseHolderError involved, the cache would not get
updated, and so every request pays the overhead of trying s1 first.

Fixes cockroachdb#23601.

Release note: None
@tbg tbg force-pushed the distsender/leaseeviction branch from 7f398ad to bc2bffd Compare August 16, 2018 11:09
Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased this and added a test, PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)

@tbg tbg requested a review from nvb August 16, 2018 12:03
@tbg tbg changed the title wip: kv: evict leaseholder on RPC error kv: evict leaseholder on RPC error Aug 16, 2018
Copy link
Copy Markdown
Contributor

@solongordon solongordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test looks good, thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Aug 16, 2018

bors r=solongoron
TFTR!

craig bot pushed a commit that referenced this pull request Aug 16, 2018
23885: kv: evict leaseholder on RPC error r=solongoron a=tschottdorf

This addresses a situation in which we would not evict a stale
leaseholder for a long time. Consider the replicas [s1,s2,s3] and
s1 is down but is the cached leaseholder, while s2 is the actual lease
holder. The RPC layer will try s1, get an RPC error, try s2 and succeed.
Since there is no NotLeaseHolderError involved, the cache would not get
updated, and so every request pays the overhead of trying s1 first.

WIP because needs testing.

Touches #23601.

Release note (bug fix): Improve request routing during node outages.

28609: opt: Make additional perf improvements r=andy-kimball a=andy-kimball

Make several more fixes:

1. Do not qualify column names in metadata, since that
requires expensive string formatting up-front (also cleanup
the factoring of this code, which had gotten messy).
2. Inline Metadata into Memo.
3. Inline logicalPropsBuilder into the Memo.

Together, these changes improve KV perf from:
```
Phases/kv-read/OptBuild  18.4µs ± 1%
```
to:
```
Phases/kv-read/OptBuild  17.8µs ± 1%
```

28661: storage: don't include RHS data in merge trigger r=bdarnell,tschottdorf a=benesch

Now that we require ranges to be collocated during a merge and the RHS
replicas to be up-to-date before the merge commits, we no longer need to
include a snapshot of the RHS in the merge trigger. We know that the
copy of the data that already exists in the local store is perfectly
up-to-date.

So, stop sending the data in the merge trigger.

Release note: None

28689: sqlbase: avoid using SERIAL in system tables r=knz a=knz

Needed for  #28575.

We'll soon want special behavior for SERIAL. We can't afford the
definition of system tables to be subject to a discussion about what
SERIAL means. So this patch ensures system tables don't use SERIAL.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 16, 2018

Build succeeded

@craig craig bot merged commit bc2bffd into cockroachdb:master Aug 16, 2018
@tbg tbg deleted the distsender/leaseeviction branch August 20, 2018 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants