sql: add the builtin crdb_internal.list_sql_keys_in_range#58314
sql: add the builtin crdb_internal.list_sql_keys_in_range#58314craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. I have added a few people who may be able to assist in reviewing: 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
|
Happy to bikeshed about naming, return values/types etc, whether you still want this etc |
irfansharif
left a comment
There was a problem hiding this comment.
Nice, thanks for sending this! but I'm also not an expert here, so cc-ing @solongordon.
Reviewed 11 of 11 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @rohany)
docs/generated/sql/functions.md, line 2568 at r1 (raw file):
<tr><td><a name="crdb_internal.lease_holder"></a><code>crdb_internal.lease_holder(key: <a href="bytes.html">bytes</a>) → <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used to fetch the leaseholder corresponding to a request key</p> </span></td></tr> <tr><td><a name="crdb_internal.list_sql_keys_in_range"></a><code>crdb_internal.list_sql_keys_in_range(range_id: <a href="int.html">int</a>) → tuple{string AS key, string AS value}</code></td><td><span class="funcdesc"><p>TODO (rohany): Fill this out</p>
Remember to address this.
pkg/sql/logictest/testdata/logic_test/sql_keys, line 1 at r1 (raw file):
# LogicTest: local
Mind moving this test into crdb_internal?
pkg/sql/logictest/testdata/logic_test/sql_keys, line 13 at r1 (raw file):
# TODO (rohany): I thought ranges were split automatically on table creation, but # I seem to have to do the SPLIT AT above. Is something different in logic tests?
Dunno, but I think we're having to do the same thing here:
cockroach/pkg/sql/logictest/testdata/logic_test/crdb_internal
Lines 370 to 380 in ad3dea8
pkg/sql/sem/builtins/generator_builtins.go, line 1313 at r1 (raw file):
} // Scan all of the K/V's within the target range. // TODO (rohany): Is it OK to have maxRows = 0 here? Since range sizes are capped (64 mb),
Range sizes now 512mb by default, and given we'd typically expect to use this when debugging live, minimizing any sort of impact seems prudent. Is it much work to fold it into the iterator instead?
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @rohany)
pkg/sql/sem/builtins/generator_builtins.go, line 1315 at r1 (raw file):
// TODO (rohany): Is it OK to have maxRows = 0 here? Since range sizes are capped (64 mb), // we shouldn't OOM here. If not, we can do these scans as part of the iterator. kvs, err := txn.Scan(ctx, rangeDesc.StartKey, rangeDesc.EndKey, 0 /* maxRows */)
Please don't issue the Scan for all the rows inside Start() and then load them in RAM.
There should be buffering in the generator struct, and a conditional re-call of Scan in the Next() function instead.
At any point in time there should be no more than N bytes of data buffered inside the builtin struct, where N is fixed, small-ish (let's say, less than 1KiB) and independent of the range's actual size.
Also add a TODO: the KV interface currently has a maxRows parameters, but we're changing that to have a maxBytes parameter instead. That's what we really want. If some keys happen to be very large, maxRows doesn't help with memory usage.
6603f16 to
7983cff
Compare
|
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Done, forgot to remake
I wanted to keep this test separate as it depends on descriptor ID's. The Moved the builtin to a buffering / paged version. |
knz
left a comment
There was a problem hiding this comment.
very nice. thanks.
just noticed that the meta scan has the same issue, see comment
Reviewed 1 of 6 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @rohany)
pkg/sql/sem/builtins/generator_builtins.go, line 1248 at r2 (raw file):
// rangeKeyIterator requests at a time. If this changes, make sure // to update the test in sql_keys. // TODO (kv): The current KV API only supports a maxRows limitation
nit: there's a linter that wants TODO(kv) without a space
pkg/sql/sem/builtins/generator_builtins.go, line 1254 at r2 (raw file):
var rangeKeyIteratorType = types.MakeLabeledTuple( // TODO (rohany): These could be bytes if we don't want to display the
ditto
pkg/sql/sem/builtins/generator_builtins.go, line 1307 at r2 (raw file):
func (rk *rangeKeyIterator) Start(ctx context.Context, txn *kv.Txn) error { rk.txn = txn // Scan the range meta K/V's to find the target range.
similar comment to previous: if there are many ranges in the cluster, then this is going to blow up too.
I'd suggest to page through meta with no more than 100 or 1000 ranges at a time. (This can be achieved with maxRows.)
7983cff to
a6f30c6
Compare
|
Thanks on the heads up for the linter fixed.
Done, nice catch. The existing ScanMetaKV's API / users didn't lend well to a paging style use, so I inlined some of the logic here. |
knz
left a comment
There was a problem hiding this comment.
LGTM modulo placing the code in the right place(s).
Reviewed 2 of 6 files at r2, 10 of 10 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rohany)
pkg/sql/sem/builtins/generator_builtins.go, line 1308 at r3 (raw file):
// Scan the range meta K/V's to find the target range. We do this in a // chunk-wise fashion to avoid loading all ranges into memory. var ranges []kv.KeyValue
I appreciate that the existing kv scan function was not appropriate, but the code you have written here is not really appropriate for a "SQL builtins" package either. It's very much low level and belongs to KV.
Therefore I'd encourage you to move this code into a new function in the appropriate package, and call it from here.
|
Do you have a suggestion of a package that this could go in? I'm not familiar with the packages in |
|
can you put them in then also move (as well as the reference in a comment in |
|
sorry I meant |
df2d57d to
2750f3e
Compare
|
Done, thanks! |
2750f3e to
4d16120
Compare
knz
left a comment
There was a problem hiding this comment.
Reviewed 16 of 16 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rohany)
5ac322f to
f7ad02a
Compare
Fixes cockroachdb#54975. Add the internal builtin function crdb_internal.list_sql_keys_in_range. This function return all SQL keys within a target range (given by RangeID). This commit additionally moves the `sql.ScanMetaKVs` function into the `kvclient` package. Release note: None
f7ad02a to
f4b475a
Compare
|
looks like this finally passed tests. Can one of you bors this ? |
|
bors r+ |
58314: sql: add the builtin crdb_internal.list_sql_keys_in_range r=irfansharif a=rohany Fixes #54975. Add the internal builtin function crdb_internal.list_sql_keys_in_range. This function return all SQL keys within a target range (given by RangeID). Release note: None Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
|
Build failed: |
|
this looks like an infra flake. bors r+ |
|
Build failed (retrying...): |
|
Build succeeded: |
Fixes #54975.
Add the internal builtin function crdb_internal.list_sql_keys_in_range.
This function return all SQL keys within a target range (given by
RangeID).
Release note: None