opt: permit coexistance of inv, forward histograms#84927
opt: permit coexistance of inv, forward histograms#84927craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
Note that this is a slightly more precise fix than the one I proposed in #84529. It will only ignore the histogram if the histogram is of the wrong type: in the steady state, the trigram inverted filter will successfully use the trigram inverted histogram when one exists. I do have a concern: will the reverse problem exist? In other words, once we build an inverted histogram on the trigram-indexed column, will a forward scan use the wrong histogram too? I couldn't reproduce any problem of this nature, but it surprises me - I'd expect that something would be going wrong of that nature. |
mgartner
left a comment
There was a problem hiding this comment.
LGTM! Thanks for fixing this!
| statement ok | ||
| CREATE TABLE b (a) AS SELECT 'foo'; | ||
| ANALYZE b; | ||
| CREATE INVERTED INDEX ON b(a gin_trgm_ops) |
There was a problem hiding this comment.
Don't you need to analyze after the index is created? The inverted column doesn't exist until the index is created.
There was a problem hiding this comment.
The bug was explicitly in this state, where there is an inverted index but a forward histogram on the column that's being inverted indexed. Once you re-analyze, there is no bug. I've added a test case to ensure that both scenarios work okay.
| // wouldn't properly reject the histogram. The proper fix, tracked | ||
| // in #50655, is to add a type field onto the persisted histogram. | ||
| if inputHist.BucketCount() == 0 || | ||
| inputHist.Bucket(0).UpperBound.ResolvedType().Family() == types.BytesFamily { |
There was a problem hiding this comment.
Internally the type we use for inverted columns is types.EncodedKey:
cockroach/pkg/sql/types/types.go
Lines 479 to 485 in 10a054b
But I guess we write the type bytes to the persisted histogram?
There was a problem hiding this comment.
Yeah, we do - there is some commentary about it where we write the persisted inverted histograms.
| // looking at a forward histogram, since inverted histograms always | ||
| // are of type DBytes. We cannot use the histogram if this is the | ||
| // case. | ||
| // Note that this is a sketchy situation (no pun intended), because |
There was a problem hiding this comment.
Not only is it sketchy, it means that we won't have accurate row count estimates for queries on trigram indexes. As a result, the optimizer probably won't pick the best plan in a number of cases.
There was a problem hiding this comment.
Yeah :( And even worse, isn't it actually more like we won't have accurate row count estimates for queries against text columns that happen to also be inverted indexed? I think that's the thing I'm most scared of here...
There was a problem hiding this comment.
Yeah :( And even worse, isn't it actually more like we won't have accurate row count estimates for queries against text columns that happen to also be inverted indexed? I think that's the thing I'm most scared of here...
Correct, that's what I meant to say.
|
Hmm, a bunch of tests failed... I'm going to investigate, but I wonder if there is something more fishy underfoot here. |
63395ee to
4e03e3a
Compare
rytaft
left a comment
There was a problem hiding this comment.
Thanks for fixing this!
Reviewed 1 of 3 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/opt/memo/statistics_builder.go line 871 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Yeah :( And even worse, isn't it actually more like we won't have accurate row count estimates for queries against text columns that happen to also be inverted indexed? I think that's the thing I'm most scared of here...
Probably worth adding a stats-specific test for this. Can you add some tests in opt/memo/testdata/stats/scan similar to the tests you added in logic /execbuilder tests?
We can probably fix this issue by adding a check before line 865 or saving the previous value of s.RowCount.
pkg/sql/opt/memo/statistics_builder.go line 884 at r2 (raw file):
continue case types.BytesFamily: break LOOP
nit: I'd like to avoid using these GOTO style statements here if possible. I feel like this would be a good candidate for a helper function.
jordanlewis
left a comment
There was a problem hiding this comment.
The execbuilder test I added doesn't have stable results - if I dump the histogram after running ANALYZE, the results differ from run to run. I know the statistics collection uses random sampling, but I guess I figured in logic tests we'd fix the seed. Is the proper way to fix this to inject artificial histograms rather than running ANALYZE? And if so, should all of the test cases that need the histograms really just belong in the stats testdata?
Also, in adding the stats test you suggested, I'm noticing that the stats tests have nice histograms with just a few buckets. How do you generate those nice test histograms? Is there a way to ask the stats system to use a lower bucket count?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/memo/statistics_builder.go line 871 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Probably worth adding a stats-specific test for this. Can you add some tests in
opt/memo/testdata/stats/scansimilar to the tests you added in logic /execbuilder tests?We can probably fix this issue by adding a check before line 865 or saving the previous value of
s.RowCount.
That's true, we should also avoid editing the row count if we're looking at the wrong kind of histogram. I've fixed that.
But isn't there still the problem in the opposite case, where we're trying to use a histogram for an ordinary, non-inverted scan on a text column, and we see a set of inverted statistics? I think we'd want a guard in the opposite direction, in applyIndexConstraint, that makes sure that we aren't using an inverted histogram for a forward scan.
What I don't understand is why this theoretical problem doesn't manifest as a type error, like the original bug did.
rytaft
left a comment
There was a problem hiding this comment.
The execbuilder test I added doesn't have stable results - if I dump the histogram after running ANALYZE, the results differ from run to run. I know the statistics collection uses random sampling, but I guess I figured in logic tests we'd fix the seed. Is the proper way to fix this to inject artificial histograms rather than running ANALYZE? And if so, should all of the test cases that need the histograms really just belong in the stats testdata?
You could use INJECT STATISTICS, but I think it's also fine to get rid of these execbuilder tests altogether.
Also, in adding the stats test you suggested, I'm noticing that the stats tests have nice histograms with just a few buckets. How do you generate those nice test histograms? Is there a way to ask the stats system to use a lower bucket count?
We don't currently have a way to control the number of histogram buckets collected by CREATE STATISTICS/ANALYZE (it's hardcoded as 200 for index columns, and 2 for other columns). Those histograms in the stats tests were just manually constructed.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/memo/statistics_builder.go line 871 at r1 (raw file):
That's true, we should also avoid editing the row count if we're looking at the wrong kind of histogram. I've fixed that.
Doesn't look like you've pushed the fix.
But isn't there still the problem in the opposite case, where we're trying to use a histogram for an ordinary, non-inverted scan on a text column, and we see a set of inverted statistics? I think we'd want a guard in the opposite direction, in applyIndexConstraint, that makes sure that we aren't using an inverted histogram for a forward scan.
I think you're right.
What I don't understand is why this theoretical problem doesn't manifest as a type error, like the original bug did.
I'm not sure -- it would be good to add some tests that exercise this.
4e03e3a to
defb512
Compare
jordanlewis
left a comment
There was a problem hiding this comment.
This is ready for another look, I redid it like we discussed on Slack. Now, the code carefully segments inverted histograms from forward histograms in the statistics map, so there's no chance for code to accidentally get the wrong type of histogram when building plans.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 871 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Yeah :( And even worse, isn't it actually more like we won't have accurate row count estimates for queries against text columns that happen to also be inverted indexed? I think that's the thing I'm most scared of here...
Correct, that's what I meant to say.
Removed this code.
pkg/sql/opt/memo/testdata/stats/inverted-geo line 774 at r3 (raw file):
│ │ └── stats: [rows=1, distinct(2)=1, null(2)=0, avgsize(2)=4] │ │ histogram(2)= 0 4.9994e-05 0.99988 7.4991e-05 │ │ <--- '0107000020E6100000010000000103000020E610000001000000040000008C7D4198BD2B574080A6FD3A111E4D40F0DF86928AA12BC03A59212197A35140CEA3CEE206B863C0FC7649EB60BA53408C7D4198BD2B574080A6FD3A111E4D40' --------- '0102000020E61000000300000005D8E086BB6365C03F9E5737DD1A53C0C04ECDED673B55C06711C00C7C0240C0B8EABD96072856404A9D2C529FC74EC0'
I think the reason for this change is that the test is buggy.
The injected statistics here should have histo_col_type=BYTES, not GEOGRAPHY, unless they were collected on the ordinary column. But I think we don't collect forward statistics on GEOGRAPHY columns. So I think this test is just wrong, thoughts? The fix would be to change the injected stats to be bytes type.
defb512 to
979b5fb
Compare
rytaft
left a comment
There was a problem hiding this comment.
thanks for working through these details!
I thought we had discussed iterating through older statistics to find both forward and inverted histograms? Feel free to open an issue and/or save that work for another PR. I think it's probably worth doing at some point, though (even before we fix #50655), since we can limit the extra computation to only cases where it's useful. For example, we can limit it to cases where we have a string column that also has an inverted index. We will (almost) always collect a forward histogram (it will just be smaller if there is no forward index), so we should try to find both the forward and inverted histograms for these columns.
Reviewed 3 of 5 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @mgartner, and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 666 at r4 (raw file):
if needHistogram { invertedColOrds = invertedIndexCols[stat.ColumnOrdinal(0)] invertedStatistic = len(invertedColOrds) > 0 && !histogramIsNonBytes(stat.Histogram())
nit: can we remove the double-negative and change this to histogramIsBytes?
pkg/sql/opt/memo/statistics_builder.go line 687 at r4 (raw file):
// to calling stats.ColStats.Add() on any inverted column statistics // created above. colStat, _ = stats.ColStats.Lookup(cols)
I don't think you need to do this anymore
pkg/sql/opt/memo/testdata/stats/inverted-geo line 774 at r3 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I think the reason for this change is that the test is buggy.
The injected statistics here should have histo_col_type=BYTES, not GEOGRAPHY, unless they were collected on the ordinary column. But I think we don't collect forward statistics on GEOGRAPHY columns. So I think this test is just wrong, thoughts? The fix would be to change the injected stats to be bytes type.
Your change looks good
jordanlewis
left a comment
There was a problem hiding this comment.
I am probably missing something, but I thought that we were actually already doing that (and just ignoring all but the most recent one because of the way colStatsMap works:
for i := 0; i < tab.StatisticCount(); i++ {
stat := tab.Statistic(i)
As I understood, this loop finds every statistic in the table, including historical statistics.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @mgartner, and @rytaft)
rytaft
left a comment
There was a problem hiding this comment.
That's correct -- we are ignoring all but the most recent one for each set of columns. So in order to make sure that we get both types of histograms, I think we can make a relatively small change (I added a comment below).
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @mgartner, and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 669 at r4 (raw file):
} if colStat, ok := stats.ColStats.Add(cols); ok {
I think we can ensure we get both types of histograms by changing this slightly as follows:
if colStat, ok := stats.ColStats.Add(cols); ok {
// Keep what's already here
} else if colStat.Histogram == nil && needHistogram && !invertedStatistic {
col := cols.SingleColumn()
colStat.Histogram = &props.Histogram{}
colStat.Histogram.Init(sb.evalCtx, col, stat.Histogram())
}
This takes advantage of the fact that stats.ColStats.Add(cols) returns the existing statistic if ok is false.
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @mgartner, and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 669 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think we can ensure we get both types of histograms by changing this slightly as follows:
if colStat, ok := stats.ColStats.Add(cols); ok { // Keep what's already here } else if colStat.Histogram == nil && needHistogram && !invertedStatistic { col := cols.SingleColumn() colStat.Histogram = &props.Histogram{} colStat.Histogram.Init(sb.evalCtx, col, stat.Histogram()) }This takes advantage of the fact that
stats.ColStats.Add(cols)returns the existing statistic ifokis false.
Sorry for being dense, but I still don't understand what situation this branch defends against that the code doesn't already handle. The code as written does handle finding both the forward and inverted statistic for a given column set regardless of where either is in the list of historical statistics, since the new branch on line 700 gets executed regardless of whether we already have a forward statistic for the column set.
The only situation I can think of where this would change behavior is something like this list of statistics for a column t:
- a forward statistic with no histogram at time x
- an inverted statistic with a histogram at time x
- a forward statistic with a histogram at time x-1
Is this the situation you're thinking of too or am I still missing something?
979b5fb to
f72952b
Compare
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 666 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: can we remove the double-negative and change this to
histogramIsBytes?
I'm not sure - would we ever get into a situation in which the histogram contains all null buckets? If so, we wouldn't be able to guarantee that the histogram was all bytes, just that it had no non-bytes buckets. That's why I named it this way.
pkg/sql/opt/memo/statistics_builder.go line 687 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't think you need to do this anymore
Done.
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @mgartner, and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 666 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I'm not sure - would we ever get into a situation in which the histogram contains all null buckets? If so, we wouldn't be able to guarantee that the histogram was all bytes, just that it had no non-bytes buckets. That's why I named it this way.
I think that could happen if every row in the table is NULL. Maybe the better question is whether we would want to default to creating an inverted or a forward histogram in this case. I'm not sure it really matters in most cases, since I think we should be smart enough to just use the NULL counts and row counts directly to get the correct selectivity even without a histogram. The case where it might matter is if you implement my proposed change below to go back and add a forward histogram when one is found. In particular, if the most recent stats show a table with all NULLs, but historical stats have some non-nulls, we could end up using stale stats.
Alternatively, we could update the TableStatistic interface to include a HistogramType function that returns the type directly (then we wouldn't need to guess and you can get rid of the histogramIsNonBytes function). This info should be available with a bit of plumbing. If you think that's a better route, I'm happy to open a separate PR with that change.
Regardless, it would be good to add a test for this case.
pkg/sql/opt/memo/statistics_builder.go line 669 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Sorry for being dense, but I still don't understand what situation this branch defends against that the code doesn't already handle. The code as written does handle finding both the forward and inverted statistic for a given column set regardless of where either is in the list of historical statistics, since the new branch on line 700 gets executed regardless of whether we already have a forward statistic for the column set.
The only situation I can think of where this would change behavior is something like this list of statistics for a column t:
- a forward statistic with no histogram at time x
- an inverted statistic with a histogram at time x
- a forward statistic with a histogram at time x-1
Is this the situation you're thinking of too or am I still missing something?
I'm thinking of this:
- an inverted statistic on column c with a histogram at time x
- a forward statistic on column c with a histogram at time x
- ... more stats ...
For the first statistic, we will add a forward stat for column c with no histogram, and an inverted stat with a histogram. Once we get to the second statistic, as the code is currently written, we will see that a forward stat has already been added and it will be a no-op. My proposed change would ensure that we still save the histogram for this case.
As an alternative, you could just not add the forward stat if invertedStatistic is true. That might also work.
b278752 to
f6f54b3
Compare
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 666 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think that could happen if every row in the table is NULL. Maybe the better question is whether we would want to default to creating an inverted or a forward histogram in this case. I'm not sure it really matters in most cases, since I think we should be smart enough to just use the NULL counts and row counts directly to get the correct selectivity even without a histogram. The case where it might matter is if you implement my proposed change below to go back and add a forward histogram when one is found. In particular, if the most recent stats show a table with all NULLs, but historical stats have some non-nulls, we could end up using stale stats.
Alternatively, we could update the
TableStatisticinterface to include aHistogramTypefunction that returns the type directly (then we wouldn't need to guess and you can get rid of thehistogramIsNonBytesfunction). This info should be available with a bit of plumbing. If you think that's a better route, I'm happy to open a separate PR with that change.Regardless, it would be good to add a test for this case.
Great point, I took a look and your suggestion was very simple to implement (no plumbing was necessary, the actual data was available on both the test catalog and production stat struct as-is) so I just went ahead and did it.
In the case where the data is all null, no histogram is created at all, so I think we're fine.
{
"avg_size": 0,
"columns": [
"a"
],
"created_at": "2022-07-28 04:05:28.848149",
"distinct_count": 1,
"histo_col_type": "",
"null_count": 3,
"row_count": 3
}
pkg/sql/opt/memo/statistics_builder.go line 669 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I'm thinking of this:
- an inverted statistic on column c with a histogram at time x
- a forward statistic on column c with a histogram at time x
- ... more stats ...
For the first statistic, we will add a forward stat for column c with no histogram, and an inverted stat with a histogram. Once we get to the second statistic, as the code is currently written, we will see that a forward stat has already been added and it will be a no-op. My proposed change would ensure that we still save the histogram for this case.
As an alternative, you could just not add the forward stat if
invertedStatisticis true. That might also work.
Ahh, I see, thanks for spelling it out, you're totally right of course.
We can't not add the forward stat if invertedStatistic is true, because in the case of ordinary inverted index types like JSON, we don't ever collect forward stats on that type - so we'd be missing out on the non-histogram stats.
I think your example snippet is also going to do something slightly weird - it has the potential of creating a statistic that consists of counts from an inverted histogram at time t, and a histogram from a forward histogram at a time t-1 when the counts were different.
The updated code now takes the entirety of the most recent forward statistic that has a histogram... I think that is the optimal thing.
And confirmed this change updated a test case in the trigram stats test file for the better! It now can use a forward histogram for a forward scan even though an inverted histogram on the same column was present.
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 669 at r4 (raw file):
Wait, is this still going to present a problem in a different case? You mention something similar above:
In particular, if the most recent stats show a table with all NULLs, but historical stats have some non-nulls, we could end up using stale stats.
time t-1: statistic with forward histogram for column c
time t: user truncates table
time t+1: statistic with no histogram because the table is empty
In this case, I think the code I just pushed (and your suggested snippet?) would fall into a similar hazard. The newest statistic is missing a histogram because the table is empty, so we replace it with an older statistic that has a histogram, that's now stale and out of date.
We could avoid this hazard with an extra bit per ColumnStatistic that is set to true if the information was derived from an inverted histogram, I guess...
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 669 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Wait, is this still going to present a problem in a different case? You mention something similar above:
In particular, if the most recent stats show a table with all NULLs, but historical stats have some non-nulls, we could end up using stale stats.
time t-1: statistic with forward histogram for column c
time t: user truncates table
time t+1: statistic with no histogram because the table is emptyIn this case, I think the code I just pushed (and your suggested snippet?) would fall into a similar hazard. The newest statistic is missing a histogram because the table is empty, so we replace it with an older statistic that has a histogram, that's now stale and out of date.
We could avoid this hazard with an extra bit per
ColumnStatisticthat is set to true if the information was derived from an inverted histogram, I guess...
I added a commit on top that does this, but it bloats the memo by a bit. PTAL. Bummed that this keeps increasing in complexity.
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 4 of 5 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @mgartner, and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 666 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Great point, I took a look and your suggestion was very simple to implement (no plumbing was necessary, the actual data was available on both the test catalog and production stat struct as-is) so I just went ahead and did it.
In the case where the data is all null, no histogram is created at all, so I think we're fine.
{ "avg_size": 0, "columns": [ "a" ], "created_at": "2022-07-28 04:05:28.848149", "distinct_count": 1, "histo_col_type": "", "null_count": 3, "row_count": 3 }
Nice!
pkg/sql/opt/memo/statistics_builder.go line 669 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I added a commit on top that does this, but it bloats the memo by a bit. PTAL. Bummed that this keeps increasing in complexity.
Is there a way to avoid storing the extra bit in the ColStatsMap since we don't need this info outside of this function? One idea is to just check if we already have an inverted stat for this column. e.g., make a function like this:
func alreadyHasInvertedStat(invColOrds int[], stats *props.Statistics) {
if len(invColOrds) > 0 {
invCol := tabID.ColumnID(invColOrds[0])
invCols := opt.MakeColSet(invCol)
_, ok := stats.ColStats.Lookup(invCols)
return ok
}
return false
}
|
Previously, rytaft (Rebecca Taft) wrote…
(BTW I didn't check if that compiles -- looks like there's at least a couple fixes needed) |
e8f76c0 to
bf119db
Compare
9a34714 to
9a7000f
Compare
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/statistics_builder.go line 669 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
(BTW I didn't check if that compiles -- looks like there's at least a couple fixes needed)
I moved the state into a map that we're already keeping around in the statsbuilder loop and added more tests that ensure we don't use old histograms even if new statistics have no histograms at all for both the inverted and forward case.
rytaft
left a comment
There was a problem hiding this comment.
thanks for all your work on this!
Reviewed 1 of 5 files at r3, 14 of 14 files at r9, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @mgartner)
pkg/sql/opt/memo/statistics_builder.go line 686 at r9 (raw file):
// If these conditions hold, it means that the previous histogram // we found for the current colset was derived from an inverted // histogram, and therefore doesn't have a histogram at all, and the
nit: and therefore doesn't -> and therefore the existing forward statistic doesn't
pkg/sql/opt/memo/statistics_builder.go line 715 at r9 (raw file):
// Add inverted histograms if necessary. if needHistogram && invertedStatistic { // Record the fact that we just added an inverted statistic to this
nit: just added -> are adding
Previously, when trying to use persisted histograms to filter an inverted column, if the histogram read from disk was a "forward histogram", the operation would fail with an internal error. This is because inverted histograms and forward histograms use different datatypes on disk, and it's not possible to compare a forward histogram (represented as the datatype of the column) with the inverted constraint (represented as DBytes). Now, this problem is corrected by making sure that we only store forward histogram data in forward histogram column sets, and likewise for inverted histogram data being only stored in inverted histogram column sets. Release note: None
9a7000f to
963deb8
Compare
jordanlewis
left a comment
There was a problem hiding this comment.
Thanks for the patient reviews!
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
|
Build failed (retrying...): |
|
Build succeeded: |
Closes #84529.
Previously, when trying to use persisted histograms to filter an
inverted column, if the histogram read from disk was a "forward
histogram", the operation would fail with an internal error.
This is because inverted histograms and forward histograms use different
datatypes on disk, and it's not possible to compare a forward histogram
(represented as the datatype of the column) with the inverted constraint
(represented as DBytes).
Now, this problem is corrected by making sure that we only store forward
histogram data in forward histogram column sets, and likewise for
inverted histogram data being only stored in inverted histogram column
sets.
Release note: None