Skip to content

Fix leaking LeaseRejectedError#2996

Merged
mrtracy merged 1 commit intocockroachdb:masterfrom
mrtracy:mtracy/fix_leaserejected
Nov 5, 2015
Merged

Fix leaking LeaseRejectedError#2996
mrtracy merged 1 commit intocockroachdb:masterfrom
mrtracy:mtracy/fix_leaserejected

Conversation

@mrtracy
Copy link
Copy Markdown
Contributor

@mrtracy mrtracy commented Nov 3, 2015

In a very particular race condition, a LeaseRejectedError could leak back to
clients. This race occurs when a replica requests the leader lease, but it is no
longer part of the range's raft group (having not yet been GCed). Due to a logic
quirk, this was not being properly converted to a "NotLeaderError" before being
passed back to the sender level.

Review on Reviewable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you double-check how that lease is interpreted by the caller? In the "new" case here, our lease is possibly outdated, which I'm not sure the caller knows. Might be better to return a lease-less NewLeaderError then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored the original logic which did just this; however, there is now an else case which returns a leaderless NotLeaderError in other cases.

@tbg
Copy link
Copy Markdown
Member

tbg commented Nov 3, 2015

Can you add a simple test case that exercises this new change?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't print a timestamp as %d

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(how did this pass vet?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think vet knows about Warningf by default.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that must be it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to %v

@mrtracy mrtracy force-pushed the mtracy/fix_leaserejected branch 2 times, most recently from 06bd623 to fe9f1bc Compare November 4, 2015 20:28
@mrtracy
Copy link
Copy Markdown
Contributor Author

mrtracy commented Nov 4, 2015

Added a simple test to exercise this.

@mrtracy mrtracy force-pushed the mtracy/fix_leaserejected branch from fe9f1bc to f48152e Compare November 4, 2015 20:34
@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Nov 4, 2015

LGTM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be the else if for line 397 below? After all, it's expected that this happen when multiple replicas race for the lease.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. Moved.

@mrtracy mrtracy force-pushed the mtracy/fix_leaserejected branch 2 times, most recently from feaa94b to b8bb25c Compare November 4, 2015 20:52
@tbg
Copy link
Copy Markdown
Member

tbg commented Nov 4, 2015

LGTM

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should do something with this error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

In a very particular race condition, a LeaseRejectedError could leak back to
clients. This race occurs when a replica requests the leader lease, but it is no
longer part of the range's raft group (having not yet been GCed). Due to a logic
quirk, this was not being properly converted to a "NotLeaderError" before being
passed back to the sender level.
@mrtracy mrtracy force-pushed the mtracy/fix_leaserejected branch from b8bb25c to 8f478b4 Compare November 5, 2015 15:01
mrtracy added a commit that referenced this pull request Nov 5, 2015
@mrtracy mrtracy merged commit d457af5 into cockroachdb:master Nov 5, 2015
@mrtracy mrtracy deleted the mtracy/fix_leaserejected branch November 5, 2015 15:30
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lErr == nil check is not useful; it's not possible for ok to be true and lErr to be nil at the same time. I'll send a PR with other fixes and grab this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that the "it's possible, but we've decided that it's not something we'll ever have" case?

var err error
err = (*roachpb.NotLeaderError)(nil)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's true. Still, it seems to be in alignment with our practices to avoid this check. For the curious: http://play.golang.org/p/tLd4XGxKjQ

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. We had this discussion before.

bdarnell added a commit to bdarnell/cockroach that referenced this pull request May 21, 2018
This introduction of proactive lease renewals (cockroachdb#25322) made this test
flaky in two ways. First, the test was (oddly) creating two Replica
objects for range 1 on the same store (since cockroachdb#2996), leading to races
when combined with the background lease renewal thread. Second, the
test expects leases to expire so it needs to disable automatic
renewals.

Fixes cockroachdb#25748

Release note: None
craig bot pushed a commit that referenced this pull request May 21, 2018
25781: storage: Deflake TestReplicaLease r=tschottdorf a=bdarnell

This introduction of proactive lease renewals (#25322) made this test
flaky in two ways. First, the test was (oddly) creating two Replica
objects for range 1 on the same store (since #2996), leading to races
when combined with the background lease renewal thread. Second, the
test expects leases to expire so it needs to disable automatic
renewals.

Fixes #25748

Release note: None

Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
a-robinson pushed a commit to a-robinson/cockroach that referenced this pull request Aug 29, 2018
This introduction of proactive lease renewals (cockroachdb#25322) made this test
flaky in two ways. First, the test was (oddly) creating two Replica
objects for range 1 on the same store (since cockroachdb#2996), leading to races
when combined with the background lease renewal thread. Second, the
test expects leases to expire so it needs to disable automatic
renewals.

Fixes cockroachdb#25748

Release note: None
@cockroach-teamcity cockroach-teamcity mentioned this pull request Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants