sql: extend SHOW RANGES to include object size estimates#103128
sql: extend SHOW RANGES to include object size estimates#103128craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
a3d14df to
85f87f8
Compare
knz
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @zachlite)
-- commits line 19 at r1:
I have studied your change and even tested the change locally, and I am facing the following observation:
- your code does, indeed, do what you wanted it to do and what is written here. 💯
- sadly, the resulting UX is not yet valuable.
(This is a shortcoming in the original spec of the feature - and I think we wouldn't have noticed prior to you putting this PR out to try it in practice. So the work here is valuable in any case.)
Okay so what went wrong. SHOW RANGES is meant to emphasize ranges primarily.
- without WITH INDEXES/TABLES, we want DETAILS to indicate the total stats for the range. You got this right already - nothing else to say here.
- when WITH INDEXES/TABLES is specified, we want DETAILS to report stats about the intersection between the SQL objects and each range.
Intersection means that:
- we don't want repeated information across multiple rows -- each row should report details about a "segment" of the SQL object
- each row should pertain to the data just in the range mentioned in the range_id column.
The code as currently written works fine in the case where a SQL object is fully contained in 1 range. However, when the SQL object spans across 2 or more ranges, we see the span stats computed across the entire SQL object (i.e. across all enclosing ranges), not per range.
There are two "criteria" to identify when success will be achieved:
- in any combination of SHOW RANGES options, the span stats
range_countmust always be = 1. - when WITH TABLES/WITH INDEXES is specified, it should be possible to aggregate the span stats with SUM across all the rows for 1 SQL object to obtain an estimate of the size of that SQL object.
Currently, when the code is run over an index which spans, say, 3 ranges, we see 1) range_count=3 2) the same span stats repeated over 3 different rows, such that the SUM will report 3x the actual size of the index.
How to achieve this?
- The code will need to add an
IF()orCASEexpression to decide which span start/end key to use. See some sketch below: - While trying to do this myself, I also noticed that the columns
span_start_key/span_end_keyas currently computed in the code also have ... the same bug. This is a mistake I made in the initial implementation. I'll send a separate PR, which you can use here.
Sketch of what I mean above (this assumes WITH INDEXES was specified)
WITH ranges AS (
SELECT
r.range_id,
r.start_key, r.end_key, -- SAME AS CURRENTLY / UNCHANGED
...
s.start_key AS span_start_key, s.end_key AS span_end_key, -- SAME AS CURRENTLY / UNCHANGED
-- NEW (I WILL SEND A NEW PR)
IF(s.start_key<r.start_key,r.start_key,s.start_key) AS per_range_start_key,
IF(s.end_key<r.end_key,s.end_key,r.end_key) AS per_range_end_key
FROM crdb_internal.ranges_no_leases r
INNER JOIN defaultdb.crdb_internal.index_spans s
ON ...
),
named_ranges AS ( ... )
-- CHANGE IN YOUR CODE:
all_span_stats AS (SELECT * FROM crdb_internal.tenant_span_stats(ARRAY(SELECT (per_range_start_key, per_range_end_key) FROM named_ranges))),
intermediate AS (
SELECT r.*, sps.stats AS span_stats
FROM named_ranges r
INNER JOIN all_span_stats sps ON r.per_range_start_key = sps.start_key AND r.per_range_end_key = sps.end_key)
-- main query
SELECT ...
-- CHANGE (I WILL SEND SEPARATE PR)
'…'||crdb_internal.pretty_key(per_range_start_key, 1) AS index_start_key,
'…'||crdb_internal.pretty_key(per_range_end_key, 1) AS index_end_key,
...
-- as in your PR - unchanged:
span_statsIf you want to test it for yourself before you change your PR, here's a full query that works:
Details
```sql WITH ranges AS ( SELECT r.range_id, r.start_key, r.end_key, s.descriptor_id AS table_id, s.index_id, s.start_key AS span_start_key, s.end_key AS span_end_key, IF(s.start_key r.start_key AND s.descriptor_id = 105 ), named_ranges AS ( SELECT r.* , t.schema_name, t.name AS table_name, ti.index_name FROM ranges r LEFT OUTER JOIN defaultdb.crdb_internal.table_indexes ti ON r.table_id = ti.descriptor_id AND r.index_id = ti.index_id LEFT OUTER JOIN defaultdb.crdb_internal.tables t ON r.table_id = t.table_id ), all_span_stats AS (SELECT * FROM crdb_internal.tenant_span_stats(ARRAY(SELECT (per_range_start_key, per_range_end_key) FROM named_ranges))), intermediate AS ( SELECT r.*, sps.stats AS span_stats FROM named_ranges r INNER JOIN all_span_stats sps ON r.per_range_start_key = sps.start_key AND r.per_range_end_key = sps.end_key) SELECT CASE WHEN r.start_key = span_start_key THEN '…/' WHEN r.start_key = crdb_internal.table_span(table_id)[1] THEN '…/' WHEN r.start_key < crdb_internal.table_span(table_id)[1] THEN '' ELSE '…'||crdb_internal.pretty_key(r.start_key, 1) END AS start_key, CASE WHEN r.end_key = span_end_key THEN '…/…/' WHEN r.end_key = crdb_internal.table_span(table_id)[2] THEN '…/' WHEN r.end_key > crdb_internal.table_span(table_id)[2] THEN '' ELSE '…'||crdb_internal.pretty_key(r.end_key, 1) END AS end_key, range_id, index_name, index_id, '…'||crdb_internal.pretty_key(per_range_start_key, 1) AS index_start_key, '…'||crdb_internal.pretty_key(per_range_end_key, 1) AS index_end_key, span_stats FROM intermediate r ORDER BY r.start_key, r.span_start_key ; ```Move the computation of all_span_stats below this point, and use named_ranges instead as input which has already computed the DISTINCT span bounds. This will avoid duplicate work.
pkg/sql/sem/builtins/generator_builtins.go line 608 at r1 (raw file):
volatility.Stable, ), // Span overload
Either expand this comment (and make it a sentence with a final period) or you can remove it entirely.
pkg/sql/sem/builtins/generator_builtins.go line 615 at r1 (raw file):
spanStatsGeneratorType, makeSpanStatsGenerator, "foo bar",
This description would benefit from more words.
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @zachlite)
pkg/sql/delegate/show_ranges.go line 381 at r1 (raw file):
Reviewable ate my comment in the previous review. here it is:
Move the computation of
all_span_statsbelow this point, and usenamed_rangesinstead as input which has already computed the DISTINCT span bounds. This will avoid duplicate work.
|
After this change is completed, I am noticing that it remains cumbersome to compute the span stats for an entire SQL object without performing a manual aggregation on the output of SHOW RANGES. This is OK for SHOW RANGES, but users will ask for a shorthand. This is not in-scope for this PR, and we can work on that separately afterwards. |
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @zachlite)
Previously, knz (Raphael 'kena' Poss) wrote…
I have studied your change and even tested the change locally, and I am facing the following observation:
- your code does, indeed, do what you wanted it to do and what is written here. 💯
- sadly, the resulting UX is not yet valuable.
(This is a shortcoming in the original spec of the feature - and I think we wouldn't have noticed prior to you putting this PR out to try it in practice. So the work here is valuable in any case.)
Okay so what went wrong. SHOW RANGES is meant to emphasize ranges primarily.
- without WITH INDEXES/TABLES, we want DETAILS to indicate the total stats for the range. You got this right already - nothing else to say here.
- when WITH INDEXES/TABLES is specified, we want DETAILS to report stats about the intersection between the SQL objects and each range.
Intersection means that:
- we don't want repeated information across multiple rows -- each row should report details about a "segment" of the SQL object
- each row should pertain to the data just in the range mentioned in the range_id column.
The code as currently written works fine in the case where a SQL object is fully contained in 1 range. However, when the SQL object spans across 2 or more ranges, we see the span stats computed across the entire SQL object (i.e. across all enclosing ranges), not per range.
There are two "criteria" to identify when success will be achieved:
- in any combination of SHOW RANGES options, the span stats
range_countmust always be = 1.- when WITH TABLES/WITH INDEXES is specified, it should be possible to aggregate the span stats with SUM across all the rows for 1 SQL object to obtain an estimate of the size of that SQL object.
Currently, when the code is run over an index which spans, say, 3 ranges, we see 1) range_count=3 2) the same span stats repeated over 3 different rows, such that the SUM will report 3x the actual size of the index.
How to achieve this?
- The code will need to add an
IF()orCASEexpression to decide which span start/end key to use. See some sketch below:- While trying to do this myself, I also noticed that the columns
span_start_key/span_end_keyas currently computed in the code also have ... the same bug. This is a mistake I made in the initial implementation. I'll send a separate PR, which you can use here.Sketch of what I mean above (this assumes WITH INDEXES was specified)
WITH ranges AS ( SELECT r.range_id, r.start_key, r.end_key, -- SAME AS CURRENTLY / UNCHANGED ... s.start_key AS span_start_key, s.end_key AS span_end_key, -- SAME AS CURRENTLY / UNCHANGED -- NEW (I WILL SEND A NEW PR) IF(s.start_key<r.start_key,r.start_key,s.start_key) AS per_range_start_key, IF(s.end_key<r.end_key,s.end_key,r.end_key) AS per_range_end_key FROM crdb_internal.ranges_no_leases r INNER JOIN defaultdb.crdb_internal.index_spans s ON ... ), named_ranges AS ( ... ) -- CHANGE IN YOUR CODE: all_span_stats AS (SELECT * FROM crdb_internal.tenant_span_stats(ARRAY(SELECT (per_range_start_key, per_range_end_key) FROM named_ranges))), intermediate AS ( SELECT r.*, sps.stats AS span_stats FROM named_ranges r INNER JOIN all_span_stats sps ON r.per_range_start_key = sps.start_key AND r.per_range_end_key = sps.end_key) -- main query SELECT ... -- CHANGE (I WILL SEND SEPARATE PR) '…'||crdb_internal.pretty_key(per_range_start_key, 1) AS index_start_key, '…'||crdb_internal.pretty_key(per_range_end_key, 1) AS index_end_key, ... -- as in your PR - unchanged: span_statsIf you want to test it for yourself before you change your PR, here's a full query that works:
Details
```sql WITH ranges AS ( SELECT r.range_id, r.start_key, r.end_key, s.descriptor_id AS table_id, s.index_id, s.start_key AS span_start_key, s.end_key AS span_end_key, IF(s.start_key r.start_key AND s.descriptor_id = 105 ), named_ranges AS ( SELECT r.* , t.schema_name, t.name AS table_name, ti.index_name FROM ranges r LEFT OUTER JOIN defaultdb.crdb_internal.table_indexes ti ON r.table_id = ti.descriptor_id AND r.index_id = ti.index_id LEFT OUTER JOIN defaultdb.crdb_internal.tables t ON r.table_id = t.table_id ), all_span_stats AS (SELECT * FROM crdb_internal.tenant_span_stats(ARRAY(SELECT (per_range_start_key, per_range_end_key) FROM named_ranges))), intermediate AS ( SELECT r.*, sps.stats AS span_stats FROM named_ranges r INNER JOIN all_span_stats sps ON r.per_range_start_key = sps.start_key AND r.per_range_end_key = sps.end_key) SELECT CASE WHEN r.start_key = span_start_key THEN '…/' WHEN r.start_key = crdb_internal.table_span(table_id)[1] THEN '…/' WHEN r.start_key < crdb_internal.table_span(table_id)[1] THEN '' ELSE '…'||crdb_internal.pretty_key(r.start_key, 1) END AS start_key, CASE WHEN r.end_key = span_end_key THEN '…/…/' WHEN r.end_key = crdb_internal.table_span(table_id)[2] THEN '…/' WHEN r.end_key > crdb_internal.table_span(table_id)[2] THEN '' ELSE '…'||crdb_internal.pretty_key(r.end_key, 1) END AS end_key, range_id, index_name, index_id, '…'||crdb_internal.pretty_key(per_range_start_key, 1) AS index_start_key, '…'||crdb_internal.pretty_key(per_range_end_key, 1) AS index_end_key, span_stats FROM intermediate r ORDER BY r.start_key, r.span_start_key ; ```
Here's the patch you'll need as starting point: #103667
dcceb39 to
9c97daf
Compare
zachlite
left a comment
There was a problem hiding this comment.
This is not in-scope for this PR, and we can work on that separately afterwards.
Ack!
@knz, I've rebased on top of your work in #103667.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz)
Previously, knz (Raphael 'kena' Poss) wrote…
Here's the patch you'll need as starting point: #103667
Done.
pkg/sql/delegate/show_ranges.go line 381 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Move the computation of
all_span_statsbelow this point, and usenamed_rangesinstead as input which has already computed the DISTINCT span bounds. This will avoid duplicate work.
Done.
pkg/sql/delegate/show_ranges.go line 381 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Reviewable ate my comment in the previous review. here it is:
Move the computation of
all_span_statsbelow this point, and usenamed_rangesinstead as input which has already computed the DISTINCT span bounds. This will avoid duplicate work.
Done.
pkg/sql/sem/builtins/generator_builtins.go line 608 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Either expand this comment (and make it a sentence with a final period) or you can remove it entirely.
Done.
pkg/sql/sem/builtins/generator_builtins.go line 615 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
This description would benefit from more words.
☠️ missed this!
Done.
knz
left a comment
There was a problem hiding this comment.
This PR has the potential to fill me with joy, but the joy will only appear after there's a unit test for the new functionality.
Reviewed 5 of 5 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @zachlite)
pkg/sql/delegate/show_ranges.go line 388 at r3 (raw file):
} fmt.Fprintf(&buf, "all_span_stats AS (\n"+
nit: use backticks for the literal string and avoid the string concatenation.
103667: sql: fix the index/table span generation in SHOW RANGES r=tbg a=knz Fixes #103665. Needed for #103128. Epic: CRDB-24928. Prior to this patch, the `table_`/`index_` start/end key pairs emitted under WITH TABLES/INDEXES were incorrectly computed to span the entire SQL object, not just the intersection between the object and the current range. This patch fixes it. Release note (bug fix): When the option `WITH TABLES` or `WITH INDEXES` is passed to `SHOW RANGES`, the per-object start/end key columns now properly refer to the part of the object included inside the range identified by the current row. Previously, they could incorrectly point to keys outside of the current range's boundaries. This bug had been introduced in v23.1. Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
zachlite
left a comment
There was a problem hiding this comment.
I have found this bug during the span stats fan-out: #103809
While working on a patch, I've found some other curious behavior in span_stats_test.go. I will know more soon.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz)
9c97daf to
28a7465
Compare
knz
left a comment
There was a problem hiding this comment.
The overall structure of the code LGTM.
After trying the feature out however, I found myself raising eyebrows at the meaning of the various JSON fields.
I have two opinions here, one i feel strongly about and one that i feel less strongly about:
-
strongly held opinion: regardless of the fields we choose to include in the JSON stats, they need to be part of a documented public interface for SHOW RANGES. That is, we need to document the list of fields (if only in the release note), and there should be some rudimentary testing that all the fields we guarantee to provide are populated in a way that makes sense.
-
less strongly held opinion: some of the fields are kinda "internal" and are hard to reason about for an end-user. I would personally leave them out. For example, "contains_estimates", "last_update_nanos", "gc_bytes_age". Some are confusing by being separated, I'm not sure we need to separate them (can we combine them perhaps), namely the "range_key_X" and "key_X" fields, and "separated_intent_X" vs "intent_X". I also think that "range_count" should be omitted entirely in SHOW RANGES because it's always going to be =1 so it's just noise.
The two opinions combine together: we could select a subset of the stats to present using a fixed, documented interface; and tell users "if you want to know more, you can use that reserved/private span_stats() function. What would be a good subset to expose via SHOW RANGES? I think the following would make the cut: key/val/sys/live/intent count and bytes. They are all useful and straightforward to explain.
Finally, a question for you: is there a feature somewhere that gives us the total on-disk size of a range? Users probably want to use SHOW RANGES for that (in the 1 row per range output mode) and I don't think this can be inferred from the MVCCStats payload. I also asked this on slack here.
Reviewed 5 of 11 files at r4, 7 of 7 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @zachlite)
pkg/sql/sem/builtins/generator_builtins.go line 579 at r5 (raw file):
), "crdb_internal.tenant_span_stats": makeBuiltin(genProps(), // Tenant overload
When you look at these comments again, maybe help here by adding a period after each of the comments from here.
pkg/sql/sem/builtins/generator_builtins.go line 608 at r5 (raw file):
volatility.Stable, ), // Span overload
comment?
pkg/sql/sem/builtins/generator_builtins.go line 615 at r5 (raw file):
spanStatsGeneratorType, makeSpanStatsGenerator, "foo bar",
foo bar?
28a7465 to
a847eb7
Compare
zachlite
left a comment
There was a problem hiding this comment.
I think the following would make the cut: key/val/sys/live/intent count and bytes.
I like it. This was easy enough to implement. This is documented in the release note and I've updated the test to include assertions that these values are present.
is there a feature somewhere that gives us the total on-disk size of a range?
approximate_disk_bytes has always been a part of the span stats response but has been omitted by the marshaler when it has a value of 0 (empty). The value comes from pebble's EstimateDiskUsage, and has a "warmup time" where it returns 0 for a minute or so after the cluster starts up.
For this PR, I've adjusted the proto definition with an explicit json tag so the value is not emitted when empty. As much as this feature should be authoritative, I suppose since the thing has 'approximate' in the name, it's ok.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/sql/delegate/show_ranges.go line 388 at r3 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
nit: use backticks for the literal string and avoid the string concatenation.
Done.
pkg/sql/sem/builtins/generator_builtins.go line 579 at r5 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
When you look at these comments again, maybe help here by adding a period after each of the comments from here.
Done. My bad - I had resolved these comment issues previously but they got lost in the shuffle as I changed branches.
a847eb7 to
310215d
Compare
knz
left a comment
There was a problem hiding this comment.
Amazing work.
Reviewed 10 of 10 files at r6.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @zachlite)
-- commits line 31 at r6:
nit: the doc writer knows nothing about enginepb.MVCCStats. To help them it would be useful to copy-paste the description strings for those fields into the release note.
zachlite
left a comment
There was a problem hiding this comment.
Thank you for all of your help and for your thorough review!
I see. will do.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz)
310215d to
d92a6a9
Compare
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`. A new override of `crdb_internal.tenant_span_stats` is introduced to leverage batched span statistics requests to KV, so only 1 fan-out is required. Resolves: cockroachdb#97858 Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928 Release note (sql change): The SHOW RANGES command will now emit span statistics when the DETAILS option is specified. The statistics are included in a new column named 'span_stats', as a JSON object. The statistics are calculated for the identifier of each row: `SHOW RANGES WITH DETAILS` will compute span statistics for each range. `SHOW RANGES WITH TABLES, DETAILS` will compute span statistics for each table, etc. The 'span_stats' JSON object has the following keys: - approximate_disk_bytes - [key|val|sys|live|intent]_count - [key|val|sys|live|intent]_bytes 'approximate_disk_bytes' is an approximation of the total on-disk size of the given object. 'key_count' is the number of meta keys tracked under key_bytes. 'key_bytes' is the number of bytes stored in all non-system point keys, including live, meta, old, and deleted keys. Only meta keys really account for the "full" key; value keys only for the timestamp suffix. 'val_count' is the number of meta values tracked under val_bytes. 'val_bytes' is the number of bytes in all non-system version values, including meta values. 'sys_count' is the number of meta keys tracked under sys_bytes. 'sys_bytes' is the number of bytes stored in system-local kv-pairs. This tracks the same quantity as (key_bytes + val_bytes), but for system-local metadata keys (which aren't counted in either key_bytes or val_bytes). Each of the keys falling into this group is documented in keys/constants.go under the LocalPrefix constant and is prefixed by either LocalRangeIDPrefix or LocalRangePrefix. 'live_count' is the number of meta keys tracked under live_bytes. 'live_bytes' is the number of bytes stored in keys and values which can in principle be read by means of a Scan or Get in the far future, including intents but not deletion tombstones (or their intents). Note that the size of the meta kv pair (which could be explicit or implicit) is included in this. Only the meta kv pair counts for the actual length of the encoded key (regular pairs only count the timestamp suffix). 'intent_count' is the number of keys tracked under intent_bytes. It is equal to the number of meta keys in the system with a non-empty Transaction proto. 'intent_bytes' is the number of bytes in intent key-value pairs (without their meta keys).
d92a6a9 to
4263471
Compare
|
bors r+ |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 4263471 to blathers/backport-release-23.1-103128: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Build succeeded: |
This commit adds object size estimates to
SHOW RANGES WITH DETAILS.A new override of
crdb_internal.tenant_span_statsis introducedto leverage batched span statistics requests to KV, so only 1 fan-out is
required.
Resolves: #97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928
Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON object.
The statistics are calculated for the identifier of each row:
SHOW RANGES WITH DETAILSwill compute span statistics for each range.SHOW RANGES WITH TABLES, DETAILSwill compute span statisticsfor each table, etc.
The 'span_stats' JSON object has the following keys:
'approximate_disk_bytes' is an approximation of the total on-disk size
of the given object.
'[key|val|sys|live|intent]_count' and '[key|val|sys|live|intent]_bytes'
are defined in enginepb.MVCCStats.