Skip to content

sql: add the builtin crdb_internal.list_sql_keys_in_range#58314

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
rohany:list-kvs-in-range
Jan 4, 2021
Merged

sql: add the builtin crdb_internal.list_sql_keys_in_range#58314
craig[bot] merged 1 commit intocockroachdb:masterfrom
rohany:list-kvs-in-range

Conversation

@rohany
Copy link
Copy Markdown
Contributor

@rohany rohany commented Dec 28, 2020

Fixes #54975.

Add the internal builtin function crdb_internal.list_sql_keys_in_range.
This function return all SQL keys within a target range (given by
RangeID).

Release note: None

@rohany rohany requested a review from a team as a code owner December 28, 2020 21:09
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Dec 28, 2020

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Dec 28, 2020
@blathers-crl blathers-crl bot requested a review from irfansharif December 28, 2020 21:09
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Dec 28, 2020

Happy to bikeshed about naming, return values/types etc, whether you still want this etc

Copy link
Copy Markdown
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for sending this! :lgtm: but I'm also not an expert here, so cc-ing @solongordon.

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rohany)


docs/generated/sql/functions.md, line 2568 at r1 (raw file):

<tr><td><a name="crdb_internal.lease_holder"></a><code>crdb_internal.lease_holder(key: <a href="bytes.html">bytes</a>) &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used to fetch the leaseholder corresponding to a request key</p>
</span></td></tr>
<tr><td><a name="crdb_internal.list_sql_keys_in_range"></a><code>crdb_internal.list_sql_keys_in_range(range_id: <a href="int.html">int</a>) &rarr; tuple{string AS key, string AS value}</code></td><td><span class="funcdesc"><p>TODO (rohany): Fill this out</p>

Remember to address this.


pkg/sql/logictest/testdata/logic_test/sql_keys, line 1 at r1 (raw file):

# LogicTest: local

Mind moving this test into crdb_internal?


pkg/sql/logictest/testdata/logic_test/sql_keys, line 13 at r1 (raw file):


# TODO (rohany): I thought ranges were split automatically on table creation, but
#  I seem to have to do the SPLIT AT above. Is something different in logic tests?

Dunno, but I think we're having to do the same thing here:

statement ok
CREATE TABLE foo (a INT PRIMARY KEY, INDEX idx(a)); INSERT INTO foo VALUES(1)
statement ok
ALTER TABLE foo SPLIT AT VALUES(2)
query TTT colnames
SELECT start_pretty, end_pretty, split_enforced_until FROM crdb_internal.ranges WHERE split_enforced_until IS NOT NULL
----
start_pretty end_pretty split_enforced_until
/Table/56/1/2 /Max 2262-04-11 23:47:16.854776 +0000 +0000


pkg/sql/sem/builtins/generator_builtins.go, line 1313 at r1 (raw file):

	}
	// Scan all of the K/V's within the target range.
	// TODO (rohany): Is it OK to have maxRows = 0 here? Since range sizes are capped (64 mb),

Range sizes now 512mb by default, and given we'd typically expect to use this when debugging live, minimizing any sort of impact seems prudent. Is it much work to fold it into the iterator instead?

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rohany)


pkg/sql/sem/builtins/generator_builtins.go, line 1315 at r1 (raw file):

	// TODO (rohany): Is it OK to have maxRows = 0 here? Since range sizes are capped (64 mb),
	//  we shouldn't OOM here. If not, we can do these scans as part of the iterator.
	kvs, err := txn.Scan(ctx, rangeDesc.StartKey, rangeDesc.EndKey, 0 /* maxRows */)

Please don't issue the Scan for all the rows inside Start() and then load them in RAM.

There should be buffering in the generator struct, and a conditional re-call of Scan in the Next() function instead.

At any point in time there should be no more than N bytes of data buffered inside the builtin struct, where N is fixed, small-ish (let's say, less than 1KiB) and independent of the range's actual size.

Also add a TODO: the KV interface currently has a maxRows parameters, but we're changing that to have a maxBytes parameter instead. That's what we really want. If some keys happen to be very large, maxRows doesn't help with memory usage.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Dec 30, 2020

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Dec 30, 2020

Remember to address this.

Done, forgot to remake

Mind moving this test into crdb_internal?

I wanted to keep this test separate as it depends on descriptor ID's. The crdb_internal tests are already a minefield of descriptor ID dependencies.

Moved the builtin to a buffering / paged version.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice. thanks.
just noticed that the meta scan has the same issue, see comment

Reviewed 1 of 6 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @rohany)


pkg/sql/sem/builtins/generator_builtins.go, line 1248 at r2 (raw file):

// rangeKeyIterator requests at a time. If this changes, make sure
// to update the test in sql_keys.
// TODO (kv): The current KV API only supports a maxRows limitation

nit: there's a linter that wants TODO(kv) without a space


pkg/sql/sem/builtins/generator_builtins.go, line 1254 at r2 (raw file):

var rangeKeyIteratorType = types.MakeLabeledTuple(
	// TODO (rohany): These could be bytes if we don't want to display the

ditto


pkg/sql/sem/builtins/generator_builtins.go, line 1307 at r2 (raw file):

func (rk *rangeKeyIterator) Start(ctx context.Context, txn *kv.Txn) error {
	rk.txn = txn
	// Scan the range meta K/V's to find the target range.

similar comment to previous: if there are many ranges in the cluster, then this is going to blow up too.
I'd suggest to page through meta with no more than 100 or 1000 ranges at a time. (This can be achieved with maxRows.)

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Dec 30, 2020

Thanks on the heads up for the linter fixed.

similar comment to previous: if there are many ranges in the cluster, then this is going to blow up too.
I'd suggest to page through meta with no more than 100 or 1000 ranges at a time. (This can be achieved with maxRows.)

Done, nice catch. The existing ScanMetaKV's API / users didn't lend well to a paging style use, so I inlined some of the logic here.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo placing the code in the right place(s).

Reviewed 2 of 6 files at r2, 10 of 10 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rohany)


pkg/sql/sem/builtins/generator_builtins.go, line 1308 at r3 (raw file):

	// Scan the range meta K/V's to find the target range. We do this in a
	// chunk-wise fashion to avoid loading all ranges into memory.
	var ranges []kv.KeyValue

I appreciate that the existing kv scan function was not appropriate, but the code you have written here is not really appropriate for a "SQL builtins" package either. It's very much low level and belongs to KV.

Therefore I'd encourage you to move this code into a new function in the appropriate package, and call it from here.

@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Dec 30, 2020

Do you have a suggestion of a package that this could go in? I'm not familiar with the packages in kv.

@knz
Copy link
Copy Markdown
Contributor

knz commented Dec 30, 2020

can you put them in pkg/kv/kvserver (new file scan_meta.go)

then also move sql.ScanMetaKVs there and update the call points accordingly

(as well as the reference in a comment in pkg/keys/keys.go)

@knz
Copy link
Copy Markdown
Contributor

knz commented Dec 30, 2020

sorry I meant kvclient not kvserver, my bad

@rohany rohany force-pushed the list-kvs-in-range branch 2 times, most recently from df2d57d to 2750f3e Compare December 30, 2020 20:58
@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Dec 30, 2020

Done, thanks!

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 16 of 16 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rohany)

@rohany rohany force-pushed the list-kvs-in-range branch 3 times, most recently from 5ac322f to f7ad02a Compare January 1, 2021 01:13
Fixes cockroachdb#54975.

Add the internal builtin function crdb_internal.list_sql_keys_in_range.
This function return all SQL keys within a target range (given by
RangeID).

This commit additionally moves the `sql.ScanMetaKVs` function into
the `kvclient` package.

Release note: None
@rohany rohany force-pushed the list-kvs-in-range branch from f7ad02a to f4b475a Compare January 3, 2021 21:18
@rohany
Copy link
Copy Markdown
Contributor Author

rohany commented Jan 3, 2021

looks like this finally passed tests. Can one of you bors this ?

@irfansharif
Copy link
Copy Markdown
Contributor

bors r+

craig bot pushed a commit that referenced this pull request Jan 4, 2021
58314: sql: add the builtin crdb_internal.list_sql_keys_in_range r=irfansharif a=rohany

Fixes #54975.

Add the internal builtin function crdb_internal.list_sql_keys_in_range.
This function return all SQL keys within a target range (given by
RangeID).

Release note: None

Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 4, 2021

Build failed:

@knz
Copy link
Copy Markdown
Contributor

knz commented Jan 4, 2021

this looks like an infra flake.

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 4, 2021

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 4, 2021

Build succeeded:

@craig craig bot merged commit 17092dc into cockroachdb:master Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: introduce shorthand to retrieve sql keys for a given range

4 participants