kvserver/concurrency: disable lock-table when not leaseholder#45830
kvserver/concurrency: disable lock-table when not leaseholder#45830nvb merged 1 commit intocockroachdb:masterfrom
Conversation
115a566 to
6fd02a4
Compare
tbg
left a comment
There was a problem hiding this comment.
Reviewed 8 of 8 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table.go, line 1631 at r1 (raw file):
// by AcquireLock or AddDiscoveredLock, we'll never be able to form a // wait-queue or update a lock. However, it's not clear whether making // that change makes things more clear or less. I'm open to suggestions.
I would leave a comment and remove the code. You have test coverage for this case, right?
sumeerbhola
left a comment
There was a problem hiding this comment.
resoled the lock
resolved
after tbg's comment is addressed.
Reviewed 7 of 8 files at r1.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/testdata/lock_table/disable, line 39 at r1 (raw file):
infinite loop
what infinite loop is being referred to here?
6fd02a4 to
9f1ae4a
Compare
nvb
left a comment
There was a problem hiding this comment.
resolved
Done.
TFTRs!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @sumeerbhola and @tbg)
pkg/kv/kvserver/concurrency/lock_table.go, line 1631 at r1 (raw file):
I would leave a comment and remove the code
Done.
You have test coverage for this case, right?
Yes, all of the subtests in TestConcurrencyManagerBasic/range_state_listener hit this.
pkg/kv/kvserver/concurrency/testdata/lock_table/disable, line 39 at r1 (raw file):
Previously, sumeerbhola wrote…
infinite loopwhat infinite loop is being referred to here?
Updated to:
# NOTE: this won't end up in an infinite loop of scanning a disabled lock-table
# and discovering but ignoring the same lock in practice because the second pass
# through evaluation is likely to hit a NotLeaseholderError, bouncing the
# request back to the client.
Fixes cockroachdb#38257. Fixes cockroachdb#45792. Potentially fixes other instability over the past 4 days. In cockroachdb#38257, We tracked down that a request was getting stuck in a lock-table wait-queue that was on a follower replica. This was possible even though we cleared the lock-table when a replica loses the lease because the request was re-discovering a lock and inserting it into the lock-table after the table had been cleared. Once in this state, the follower's lock-table would never hear about updates to the lock, so even when the request reached out and resolved the lock, it was not able to proceed. This commit fixes this bug by adding a "disabled" state to the lock-table. When disabled, the lock-table does not track any locks or, by extension, any wait-queues. This allows requests to proceed past the lock-table and notice that the replica has lost its lease, which bounces it back to the client.
9f1ae4a to
280e694
Compare
Fixes #38257.
Fixes #45792.
Potentially fixes other instability over the past 4 days.
In #38257, We tracked down that a request was getting stuck in a
lock-table wait-queue that was on a follower replica. This was possible
even though we cleared the lock-table when a replica loses the lease
because the request was re-discovering a lock and inserting it into the
lock-table after the table had been cleared. Once in this state, the
follower's lock-table would never hear about updates to the lock, so
even when the request reached out and resolved the lock, it was not able
to proceed.
This commit fixes this bug by adding a "disabled" state to the lock-table.
When disabled, the lock-table does not track any locks or, by extension,
any wait-queues. This allows requests to proceed past the lock-table and
notice that the replica has lost its lease, which bounces it back to the
client.