kv, kvserver: add query locks command#76998
Conversation
nvb
left a comment
There was a problem hiding this comment.
It's exciting to see this all falling into place.
Reviewed 5 of 5 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @nvanbenschoten)
pkg/kv/kvserver/replica_read.go, line 373 at r3 (raw file):
// and becomes the inclusive start key of what was read. header.Key = t.ResumeSpan.EndKey case *roachpb.QueryLocksResponse:
This function is used for optimistic evaluation of MVCC operations. It shouldn't apply to a metadata operation like QueryLocksRequest.
pkg/kv/kvserver/batcheval/cmd_query_locks.go, line 36 at r3 (raw file):
maxOffset time.Duration, ) { DefaultDeclareKeys(rs, header, req, latchSpans, lockSpans, maxOffset)
Do we want to acquire read latches across the key span being observed? I didn't think we needed to.
pkg/kv/kvserver/batcheval/cmd_query_locks.go, line 41 at r3 (raw file):
} func QueryLocks(
We should add some kv-level testing for this request type.
pkg/roachpb/api.go, line 667 at r3 (raw file):
// Method implements the Request interface. func (*QueryLocksRequest) Method() Method { return QueryLocks }
We'll want to implement the combinable interface for this type earlier in this file so that requests that are split on range boundaries can have their responses merged back together.
b4fec0e to
52179f1
Compare
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @nvanbenschoten)
pkg/kv/kvserver/batcheval/cmd_query_locks.go, line 36 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we want to acquire read latches across the key span being observed? I didn't think we needed to.
Perhaps not necessary, but is there any reason we might want to? Perhaps to not race with operations on these keys that may acquire locks? But let me know if we think this feature shouldn't attempt to block that during evaluation...
pkg/kv/kvserver/batcheval/cmd_query_locks.go, line 41 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We should add some kv-level testing for this request type.
Working - should finish this shortly
52179f1 to
49d8566
Compare
49d8566 to
464d452
Compare
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewed 6 of 23 files at r4, 10 of 17 files at r5, 6 of 10 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @nvanbenschoten)
464d452 to
4142cbf
Compare
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @nvanbenschoten)
pkg/kv/kvserver/client_split_test.go, line 403 at r8 (raw file):
// create transactions that hold locks on keys across the ranges, and then // utilize the QueryLocks API to validate the exposed lock table information. func TestStoreRangeSplitAndQueryLocks(t *testing.T) {
FYI, Still not entirely sure this is the correct place for this test, especially because it feels like it's using different approaches to the KV APIs (I.e. most things in this module use a Store / store.TestSender(), whereas what we want to test is sending a batch with QueryLocksRequest using the DistSender). I think this test does work properly, but please let me know if it should be moved and/or cleaned up at all!
nvb
left a comment
There was a problem hiding this comment.
Reviewed 4 of 17 files at r5, 3 of 10 files at r6, 1 of 3 files at r7, 5 of 5 files at r8, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks)
pkg/kv/kvserver/client_split_test.go, line 403 at r8 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
FYI, Still not entirely sure this is the correct place for this test, especially because it feels like it's using different approaches to the KV APIs (I.e. most things in this module use a
Store/store.TestSender(), whereas what we want to test is sending a batch withQueryLocksRequestusing theDistSender). I think this test does work properly, but please let me know if it should be moved and/or cleaned up at all!
I think this is a fine place for the test.
But CI is indicating that the test is flaky. We may just need to bump the 10ms up to testutils.DefaultSucceedsSoonDuration. Or remove the manual timeout entirely and defer to the test's timeout. That's what you did in the other test.
pkg/kv/kvserver/replica_test.go, line 11250 at r8 (raw file):
require.Equal(t, roachpb.Key("a"), lockInfo.Key) require.NotNil(t, lockInfo.LockHolder, "expected lock to be held") if !lockInfo.LockHolder.ID.Equal(txn.ID) || !lockInfo.LockHolder.WriteTimestamp.Equal(txn.WriteTimestamp) {
minor nit: any reason to not use a pair of require.Equal calls here (and below)?
andreimatei
left a comment
There was a problem hiding this comment.
mod the deadlines in one of the tests
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @AlexTalks)
pkg/kv/kvserver/concurrency/concurrency_control.go, line 467 at r8 (raw file):
ResumeReason roachpb.ResumeReason ResumeNextBytes int64 TotalBytes int64
please comment this new field, and also on ResumeNextBytes
pkg/roachpb/api.proto, line 1118 at r8 (raw file):
// A QueryLocksRequest is arguments to the QueryLocks() method. It requests // the state of the locks held in the lock table over the given key span,
consider putting some words here about what locks are in the lock table and what locks aren't
pkg/roachpb/api.proto, line 1125 at r8 (raw file):
// for example in the case of an unreplicated lock.
I think this "for example" is confusing. If you want to say anything, I think you need to say more to make the connection with replicated vs unreplicated locks; you could say that replicated locks are not part of the lock table if they were never contended. I think you should also say that only locks in the lock table are returned, regardless of this flag (right?).
pkg/kv/kvserver/client_split_test.go, line 403 at r8 (raw file):
// create transactions that hold locks on keys across the ranges, and then // utilize the QueryLocks API to validate the exposed lock table information. func TestStoreRangeSplitAndQueryLocks(t *testing.T) {
nit: make the name of this test start with TestQueryLocks. Maybe TestQueryLocksAcrossRanges. The fact that this test "does splits" is not very relevant.
pkg/kv/kvserver/client_split_test.go, line 403 at r8 (raw file):
I think this is a fine place for the test.
+1
pkg/kv/kvserver/client_split_test.go, line 412 at r8 (raw file):
Store: &kvserver.StoreTestingKnobs{ DisableMergeQueue: true, DisableSplitQueue: true,
how come DisableSplitQueue is needed? Consider commenting.
pkg/kv/kvserver/client_split_test.go, line 480 at r8 (raw file):
case err := <-scanRes: t.Fatalf("scan did not wait on locks, returned err %v", err) case <-time.After(10 * time.Millisecond):
Is this assuming that the scan will block in 10ms? This seems like a recipe for flakiness. Let's use some sort of a knob to get notified when the scan blocks.
pkg/kv/kvserver/client_split_test.go, line 529 at r8 (raw file):
case err := <-scanRes: require.NoError(t, err, "txn3.Scan() should have finished without error after txns unlock") case <-time.After(10 * time.Millisecond):
I would make this deadline a lot larger, or remove it completely.
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, and @nvanbenschoten)
pkg/kv/kvserver/concurrency/concurrency_control.go, line 467 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please comment this new field, and also on
ResumeNextBytes
Will do - sg
pkg/roachpb/api.proto, line 1125 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
// for example in the case of an unreplicated lock.
I think this "for example" is confusing. If you want to say anything, I think you need to say more to make the connection with replicated vs unreplicated locks; you could say that replicated locks are not part of the lock table if they were never contended. I think you should also say that only locks in the lock table are returned, regardless of this flag (right?).
Will do - sg
pkg/kv/kvserver/client_split_test.go, line 403 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: make the name of this test start with
TestQueryLocks. MaybeTestQueryLocksAcrossRanges. The fact that this test "does splits" is not very relevant.
Will do - sg
pkg/kv/kvserver/client_split_test.go, line 412 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
how come
DisableSplitQueueis needed? Consider commenting.
This was definitely cargo-culting, heh. It is likely unnecessary, can remove.
pkg/kv/kvserver/client_split_test.go, line 480 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Is this assuming that the scan will block in 10ms? This seems like a recipe for flakiness. Let's use some sort of a knob to get notified when the scan blocks.
Do we have a knob that I can use to get a signal?
pkg/kv/kvserver/client_split_test.go, line 529 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I would make this deadline a lot larger, or remove it completely.
Will do - sg
pkg/kv/kvserver/replica_test.go, line 11250 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
minor nit: any reason to not use a pair of
require.Equalcalls here (and below)?
Not really, can change
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, and @nvanbenschoten)
pkg/kv/kvserver/client_split_test.go, line 480 at r8 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Do we have a knob that I can use to get a signal?
I don't see anything in the lock table; I think we should add something.
Alternatively, we could install a RequestFilter and watch for a PushTxn I guess, although the push is not immediate (but I think there's a knob you can use to make it immediate).
Or, you could record the trace of the respective request and poll it periodically to see if there's some event (or, with #74878, some tag) about the lock. We could also introduce a trace-based notification system for such tests.
Btw, if this patch needs to go in today, feel free to merge as is if you'll clean up later.
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, and @nvanbenschoten)
pkg/kv/kvserver/client_split_test.go, line 480 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't see anything in the lock table; I think we should add something.
Alternatively, we could install aRequestFilterand watch for aPushTxnI guess, although the push is not immediate (but I think there's a knob you can use to make it immediate).
Or, you could record the trace of the respective request and poll it periodically to see if there's some event (or, with #74878, some tag) about the lock. We could also introduce a trace-based notification system for such tests.Btw, if this patch needs to go in today, feel free to merge as is if you'll clean up later.
actually, there is a lockTableWaiterImpl.onContentionEvent that might be useful
4142cbf to
6fafe7f
Compare
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvserver/concurrency/concurrency_control.go, line 467 at r8 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Will do - sg
Done.
pkg/roachpb/api.proto, line 1118 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
consider putting some words here about what locks are in the lock table and what locks aren't
Done.
pkg/roachpb/api.proto, line 1125 at r8 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Will do - sg
Done.
pkg/kv/kvserver/client_split_test.go, line 412 at r8 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
This was definitely cargo-culting, heh. It is likely unnecessary, can remove.
Done
pkg/kv/kvserver/client_split_test.go, line 480 at r8 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
actually, there is a
lockTableWaiterImpl.onContentionEventthat might be useful
ok cool - I'll stress it to make sure it's not flaking now but will plan to merge shortly. Especially since the existing tests in replica_test.go seem to wait 5ms for operations to block on locks (though I realize that's a bit different since it's using the replica Send directly).
pkg/kv/kvserver/client_split_test.go, line 529 at r8 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Will do - sg
Done.
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvserver/client_split_test.go, line 480 at r8 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
ok cool - I'll stress it to make sure it's not flaking now but will plan to merge shortly. Especially since the existing tests in
replica_test.goseem to wait 5ms for operations to block on locks (though I realize that's a bit different since it's using the replicaSenddirectly).
Unfortunately (of course) onContentionEvent only gets called when the contention is resolved, not started. I think the options are either to increase the wait time here or install a new event...
f2c877e to
7ff0dac
Compare
This change adds a new KV API, `QueryLocks`, allowing a client to query the state of a replica's lock table over a provided key span. This exposes the in-memory locks currently managed by Concurrency Control, and is part of the work needed to implement cockroachdb#75541. Resolves cockroachdb#74833. Release note: None Release justification: Low-risk new functionality.
7ff0dac to
e9910b0
Compare
|
bors r+ |
|
Build succeeded: |
This change adds a new KV API,
QueryLocks, allowing a client to querythe state of a replica's lock table over a provided key span. This
exposes the in-memory locks currently managed by Concurrency Control,
and is part of the work needed to implement #75541.
Resolves #74833.
Release note: None
Release justification: Low-risk new functionality.