Fix leaking LeaseRejectedError#2996
Conversation
storage/replica.go
Outdated
There was a problem hiding this comment.
can you double-check how that lease is interpreted by the caller? In the "new" case here, our lease is possibly outdated, which I'm not sure the caller knows. Might be better to return a lease-less NewLeaderError then.
There was a problem hiding this comment.
I restored the original logic which did just this; however, there is now an else case which returns a leaderless NotLeaderError in other cases.
|
Can you add a simple test case that exercises this new change? |
storage/replica.go
Outdated
There was a problem hiding this comment.
can't print a timestamp as %d
There was a problem hiding this comment.
I don't think vet knows about Warningf by default.
06bd623 to
fe9f1bc
Compare
|
Added a simple test to exercise this. |
fe9f1bc to
f48152e
Compare
|
LGTM |
storage/replica.go
Outdated
There was a problem hiding this comment.
should this be the else if for line 397 below? After all, it's expected that this happen when multiple replicas race for the lease.
There was a problem hiding this comment.
Yeah, good point. Moved.
feaa94b to
b8bb25c
Compare
|
LGTM |
There was a problem hiding this comment.
probably should do something with this error.
In a very particular race condition, a LeaseRejectedError could leak back to clients. This race occurs when a replica requests the leader lease, but it is no longer part of the range's raft group (having not yet been GCed). Due to a logic quirk, this was not being properly converted to a "NotLeaderError" before being passed back to the sender level.
b8bb25c to
8f478b4
Compare
There was a problem hiding this comment.
the lErr == nil check is not useful; it's not possible for ok to be true and lErr to be nil at the same time. I'll send a PR with other fixes and grab this.
There was a problem hiding this comment.
isn't that the "it's possible, but we've decided that it's not something we'll ever have" case?
var err error
err = (*roachpb.NotLeaderError)(nil)There was a problem hiding this comment.
Yeah, that's true. Still, it seems to be in alignment with our practices to avoid this check. For the curious: http://play.golang.org/p/tLd4XGxKjQ
There was a problem hiding this comment.
agreed. We had this discussion before.
This introduction of proactive lease renewals (cockroachdb#25322) made this test flaky in two ways. First, the test was (oddly) creating two Replica objects for range 1 on the same store (since cockroachdb#2996), leading to races when combined with the background lease renewal thread. Second, the test expects leases to expire so it needs to disable automatic renewals. Fixes cockroachdb#25748 Release note: None
25781: storage: Deflake TestReplicaLease r=tschottdorf a=bdarnell This introduction of proactive lease renewals (#25322) made this test flaky in two ways. First, the test was (oddly) creating two Replica objects for range 1 on the same store (since #2996), leading to races when combined with the background lease renewal thread. Second, the test expects leases to expire so it needs to disable automatic renewals. Fixes #25748 Release note: None Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
This introduction of proactive lease renewals (cockroachdb#25322) made this test flaky in two ways. First, the test was (oddly) creating two Replica objects for range 1 on the same store (since cockroachdb#2996), leading to races when combined with the background lease renewal thread. Second, the test expects leases to expire so it needs to disable automatic renewals. Fixes cockroachdb#25748 Release note: None
In a very particular race condition, a LeaseRejectedError could leak back to
clients. This race occurs when a replica requests the leader lease, but it is no
longer part of the range's raft group (having not yet been GCed). Due to a logic
quirk, this was not being properly converted to a "NotLeaderError" before being
passed back to the sender level.