sql: index rec with better index to replace and NotVisible indexes#87174
sql: index rec with better index to replace and NotVisible indexes#87174craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
|
I want to draw attention to this line: Now I am using |
|
I'm also not sure if we want to panic if something odd happens here. Is it too disruptive? |
472b5a5 to
3b8fbb1
Compare
|
Previously, wenyihu6 (Wenyi Hu) wrote…
My bad. I see now. |
mgartner
left a comment
There was a problem hiding this comment.
Left some feedback for the first two commits.
Reviewed 9 of 9 files at r4, 6 of 6 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @wenyihu6)
-- commits line 19 at r5:
nit: the index rec call => the index recommender calls
-- commits line 32 at r5:
nit: set of columns are useful => set of columns that are useful
pkg/sql/opt/indexrec/hypothetical_table_test.go line 52 at r5 (raw file):
} func TestCheckSameExplicitCols(t *testing.T) {
nit: these unit tests are difficult to understand, because you have to go to the helper functions to look at the values returned. Are these covering code that is not covered by the data drive tests? If you think they are necessary, I recommend trying to convert them to table driven tests if possible. Example:
pkg/sql/opt/indexrec/rec.go line 127 at r3 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
My bad. I see now.
KeyColumnCount()includes columns that are implicitly stored such as row id or other columns, so KeyColumnCount() is more correct here. I reverted the changes.
Explicit columns are the columns explicit listed when the index is created, not including columns in STORING. Key columns are the columns the make up the key for each key-value pair in the index. Key columns is generally longer than explicit columns.
For example
CREATE TABLE t (
id1 INT,
id2 INT,
a INT,
b INT,
c INT,
PRIMARY KEY (id1, id2),
INDEX idx (a, b) STORING (c)
)
The explicit columns of idx are (a, b). The key columns of idx are (a, b, id1, id2).
pkg/sql/opt/indexrec/rec.go line 24 at r4 (raw file):
) type indexRecType uint8
It's a bit weird to export the constants of this type, but not the actual type, so maybe export this? You could name it just Type so that references to it outside the package at indexrec.Type instead of the repetitive indexrec.IndexRecType.
pkg/sql/opt/indexrec/rec.go line 29 at r4 (raw file):
TypeCreateIndex indexRecType = iota TypeReplaceIndex TypeUseless
nit: You can unexport this (i.e., rename to typeUseless) if it won't be used outside of this package.
pkg/sql/opt/indexrec/rec.go line 91 at r5 (raw file):
} // checkSameExplicitCols is a helper function that checks whether the given
nit: the statement "is a helper function" isn't necessary.
pkg/sql/opt/indexrec/rec.go line 94 at r5 (raw file):
// existing index has the same explicit columns as the given indexCols. If so, // it returns true. func checkSameExplicitCols(
I think this should be a method of hypotheticalIndex since it's always used in the context of a hypo-index. Consider renaming it to hasExplicitCols.
pkg/sql/opt/indexrec/rec.go line 100 at r5 (raw file):
return false } for j, m := 0, existingIndex.ExplicitColumnCount(); j < m; j++ {
With this current implementation, the order of columns matters. For example INDEX (a, b) would not match INDEX (b, a). I think that's desired, but just wanted you to confirm. If order does matter, the comment on the function should make that clear.
pkg/sql/opt/indexrec/rec.go line 119 at r5 (raw file):
// getStoredCols returns stored columns of the given existingIndex including in // the STORING clause.
nit: remove "including in the STORING clause" - it's redundant
nit: make it clear that the ordinals of the stored columns are returned
pkg/sql/opt/indexrec/rec.go line 129 at r5 (raw file):
// findBestExistingIndexToReplace finds the best existing index to replace given // the table with existing indexes, hypothetical index's explicit columns, and
nit "hypothetical index's explicit columns" => "the hypothetical index's explicit columns"
pkg/sql/opt/indexrec/rec.go line 133 at r5 (raw file):
// // We are looking for the best existing index candidate to replace. To be a // candidate, the existing index has to be 1. not a partial index (because we do
nit: => "has to 1. not be a partial index"
pkg/sql/opt/indexrec/rec.go line 134 at r5 (raw file):
// We are looking for the best existing index candidate to replace. To be a // candidate, the existing index has to be 1. not a partial index (because we do // not want to recommend dropping a partial index) 2. stores the same explicit
nit: We should only use the word "store(s)" when referring to columns in STORING. So instead of "stores the same explicit columns as the hypIndex", how about "has the same explicit columns as hypIndex".
pkg/sql/opt/indexrec/rec.go line 143 at r5 (raw file):
// means that there does not exist an index that satisfy the requirement to be a // candidate. So no existing indexes can be replaced, and creating a new index // is necessary. It returns TypeCreateIndex, nil, and util.FastIntSet{}.
It'd be more idiomatic to return an ok bool as false if an existing index is not found. So the return type of the function would be (idx cat.Index, storedCols util.FastIntSet, ok bool).
pkg/sql/opt/indexrec/rec.go line 186 at r5 (raw file):
// but not in the existing index's stored cols which will be the ones we // recommend to store later. numOfStoredColsDiff := actuallyScannedCols.Difference(existingIndexStoredCols)
nit: name this something other than num... because it's a set not a number
pkg/sql/opt/indexrec/rec.go line 191 at r5 (raw file):
// every column in actuallyScannedCol. // // Theoretically, this should never happen because optimizer should use
This should also never happen because at least one of the scanned cols should be a key column in the index, not a stored column.
pkg/sql/opt/indexrec/rec.go line 196 at r5 (raw file):
// means optimizer must be using some columns from the hypothetical // indexes that are not stored by the existing index. panic("Index recommendation should not be redundant.")
We can't panic from this code, because callers of indexrec.FindRecs are do not have panic catchers (recover blocks). It would likely crash an entire cockroach DB node. If this is a case that should error, we'll need to propagate the error up as a returned value. I think we should simply ignore this case - if the scanned columns are all stored by the index then just continue the search for an existing index. Maybe just do this
if numOfStoredColsDiff.Empty() {
continue
}pkg/sql/opt/indexrec/rec.go line 222 at r5 (raw file):
// inverted) INDEX ON tableName(indexCols) STORING (storing);" and // TypeCreateIndex. func outputCreateIndexRec(indexCols []tree.IndexElem, storing []tree.Name, tableName tree.Name, inverted bool, unique bool) Rec {
I think it'd be simpler to move the logic in outputCreateIndexRec and outputReplaceIndex directly into constructIndexRec, similar to how it was before.
Also, we should continue to use strings.Builder here instead of + to reduce allocations.
pkg/sql/opt/indexrec/rec.go line 358 at r5 (raw file):
// formatToOutput flattens the given map rcMap to a 1D list of Rec containing // all index recommendations as the final output for findRecs. func formatToOutput(rcMap map[cat.Table][]Rec) []Rec {
nit: This isn't really "formatting". Maybe name it flattenRecMap?
8f42e78 to
ce1a526
Compare
|
Previously, mgartner (Marcus Gartner) wrote…
I agree. They were confusing to me as well. Some were useful when I checked the helper function but no longer useful now. I have removed them. |
|
Previously, mgartner (Marcus Gartner) wrote…
Done. |
|
Previously, mgartner (Marcus Gartner) wrote…
Done. |
|
Previously, mgartner (Marcus Gartner) wrote…
That makes sense. Thanks for explaining to me! |
|
Previously, mgartner (Marcus Gartner) wrote…
That's right; for an existing index to be a candidate to be dropped, it has to have identical explicit columns as the hypothetical index. (same length, same element, and same order). I think this is what the index recommendation has prior to this commit as well (although this is room for index rec to improve here -> for example, if we know a specific explicit column is actually not used in a workload, we can still recommend to replace it). |
|
Previously, wenyihu6 (Wenyi Hu) wrote…
I've updated the comment. |
|
Previously, mgartner (Marcus Gartner) wrote…
Good point. Done. |
|
Previously, mgartner (Marcus Gartner) wrote…
Type is returned here to make introducing alter index visible easier. This logic returns TypeAlterIndex if an invisible index could be used in the 3rd commit. |
|
Previously, mgartner (Marcus Gartner) wrote…
I see. I've removed panic. I'm returning TypeUseless here instead of continue to match up with our behaviour prior to this commit. Before this commit, we call |
|
Previously, mgartner (Marcus Gartner) wrote…
Done. |
c873e9a to
416a9ff
Compare
mgartner
left a comment
There was a problem hiding this comment.
Very nicely done! Let's get @maryliag to sign off on the new recommendation type before merging.
Reviewed 10 of 10 files at r9, 8 of 8 files at r10, 10 of 10 files at r11, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @wenyihu6)
-- commits line 76 at r11:
The release note should have a category like Release note (sql change): ..
pkg/sql/explain_plan.go line 152 at r11 (raw file):
plural = "s" } else if recs[i].RecType == indexrec.TypeAlterIndex { recType = "index alteration"
nit: consider a switch statement here
pkg/sql/idxrecommendations/idx_recommendations.go line 54 at r11 (raw file):
if recs[i].RecType == indexrec.TypeReplaceIndex { recType = "replacement" } else if recs[i].RecType == indexrec.TypeAlterIndex {
nit: consider a switch statement here
pkg/sql/idxrecommendations/idx_recommendations.go line 55 at r11 (raw file):
recType = "replacement" } else if recs[i].RecType == indexrec.TypeAlterIndex { recType = "alteration"
cc @maryliag - is there anything else we need to do when introducing a new index recommendation type?
pkg/sql/opt/indexrec/rec.go line 119 at r10 (raw file):
// this means that there does not exist an index that satisfy the requirement to // be a candidate. So no existing indexes can be replaced, and creating a new // index is necessary. It returns TypeCreateIndex, nil, and util.FastIntSet{}.
nit: mention when typeUseless is returned
pkg/sql/opt/indexrec/rec.go line 227 at r10 (raw file):
Inverted: ir.index.IsInverted(), } sb.WriteString(createCmd.String() + ";")
nit:
sb.WriteString(createCmd.String())
sb.WriteByte(';')pkg/sql/opt/indexrec/rec.go line 244 at r10 (raw file):
Inverted: ir.index.IsInverted(), } sb.WriteString(createCmd.String() + "; " + dropCmd.String() + ";")
nit: WriteString, WriteByte, WriteString, WriteByte like I mentioned above
pkg/sql/opt/indexrec/rec.go line 172 at r11 (raw file):
existingIndexAllCols := getAllCols(existingIndex) if actuallyScannedCols.Difference(existingIndexAllCols).Empty() { // There exists an invisible index storing every explicit column in
nit: It's not only "storing", it's all columns, right? Maybe it'd be less confusing to say "an invisible index containing ..."?
pkg/sql/opt/indexrec/rec.go line 284 at r11 (raw file):
sb.WriteString(createCmd.String() + "; " + dropCmd.String() + ";") return Rec{sb.String(), TypeReplaceIndex} } else if recType == TypeAlterIndex {
nit: switch statement?
9bd64fc to
bad5a9b
Compare
a9d6fdd to
0e3fb49
Compare
|
@maryliag Friendly ping : ) hoping to merge before the branch cuts |
maryliag
left a comment
There was a problem hiding this comment.
there are some changes on the UI that will be required with this change, but you probably wouldn't be able to make them by eod, so I'll create an issue and look into it.
Reviewed 3 of 10 files at r11, 1 of 12 files at r12, 3 of 7 files at r14, 2 of 5 files at r15, 5 of 5 files at r16.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @maryliag, @mgartner, and @wenyihu6)
Sounds good. Thank you! |
0e3fb49 to
07c053e
Compare
This commit introduces a type indexRecType which acts as an enum to represent the type of index recommendation. This commit does not change any existing functionality, and the main purpose is to make future commits cleaner. Release justification: This is a low risk change to the existing functionality. Release note: None
Background information on how index recommendation works now: It first finds possible index candidates with the given query. These index candidates determine which columns to store inside hypothetical indexes definition. In addition, the hypothetical indexes store every other column from the table (as they are stored in a STORING clause). Then, the index recommender calls the optimizer with the updated table that stores both existing and hypothetical indexes. The index recommender then scans over the optimal plan and sees if a hypothetical index is used. If so, an index recommendation is constructed. However, we do not know the entire set of useful columns yet. While processing the entire optimal plan, we update the index recommendation instances to add useful columns. Prior to this commit, index recommendation decides what existing index can be potentially replaced when the index recommendation is constructed. It looks for the first existing index with the same explicit column as the hypothetical index definition. This is not ideal because the best existing index to replace might not be the first one. This commit changes the logic so that the index recommender decides what the best existing index to replace is after it fully processes the optimal plan and decides the entire set of columns that are useful. The best existing index to replace is the one that stores the same explicit column as the hypothetical index and stores the most columns from what was actually scanned. Consider the following example. ``` CREATE INDEX idx_1 ON t(v) CREATE INDEX idx_2 ON t(v) STORING (i) ``` Given `SELECT i, j FROM t5 WHERE v > 1`, the index recommendation previously recommends replacing idx_1 with `idx(v) STORING (i, j)`. It now recommends replacing `idx_2 with idx(v) STORING (i, j)` as idx_2 is more similar to what we want to replace later on. In addition, this commit also makes adding invisible indexes to index recommender much easier. Release justification: This is a low risk change to the existing functionality. Release note (sql change): Instead of always recommending to replace the first existing index with the same explicit columns, the index recommendation now considers all existing indexes in the table to decide the best one to replace.
Now that we have introduced not visible index feature, we should update index recommendations accordingly. Prior to this commit, the index recommendation engine treats all indexes as they are visible. If an index already exists in the table and is the same as we want to recommend, this index recommendation is not created. However, it is possible that this index is not visible now. This commits fixes this by adding not visible indexes to the logic. The index recommender only recommends alter index visible if the index already contains every explicit column in the hypothetical index and every useful column. It will never recommend dropping an invisible index. Fixes: cockroachdb#85477 Release justification: This commit does not interfere with existing functionality and fixes the index recommendation engine to take an index visibility into account. Release note: Index recommendation now considers not visible indexes and can also make index recommendations for alter index … visible.
07c053e to
003550c
Compare
|
@mgartner could you help me merge this PR? (PRs can only be merged by project collaborators.) |
With the new index recommendation type introduced on cockroachdb#87174, this commit does the proper handle of the alter option and show the recommendation with its proper docs accordingly. Fixes cockroachdb#87414 Rename "Create new index" to "Create Index" Release justification: low risk change Release note (ui change): Showing index recommendation of `ALTER index` type (both in Statement Details page and on Insights page).
87453: authors: add andrew.zhang to authors r=ayz-lex a=ayz-lex Release note: None Release justification: non-production code change. 87458: ui: showing ALTER index recommendations r=maryliag a=maryliag With the new index recommendation type introduced on #87174, this commit does the proper handle of the alter option and show the recommendation with its proper docs accordingly. Fixes #87414 Rename "Create new index" to "Create Index" Release justification: low risk change Release note (ui change): Showing index recommendation of `ALTER index` type (both in Statement Details page and on Insights page). 87460: githooks: fix UX of release note template r=maryliag a=maryliag The PR #87176 introduced improvements on the Release note template, but it didn't have the space required between `note` and `(...`. This commit adds the space so you can select the proper release note and don't get a error message about the missing space. Release justification: CRL end UX improvement Release note: None Co-authored-by: Andrew Zhang <andrewyzhanglex@gmail.com> Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com>
|
bors r=mgartner,maryliag |
Thank you! : ) |
No, thank YOU! :) |
|
Build succeeded: |
sql: introduce index rec type
This commit introduces a type indexRecType which acts as an enum to represent
the type of index recommendation. This commit does not change any existing
functionality, and the main purpose is to make future commits cleaner.
Release justification: This is a low risk change to the existing functionality.
Release note: None
sql: finds better index to replace in index rec
Background information on how index recommendation works now: It first finds
possible index candidates with the given query. These index candidates determine
which columns to store inside hypothetical indexes definition. In addition, the
hypothetical indexes store every other column from the table (as they are stored
in a STORING clause). Then, the index recommender calls the optimizer with the
updated table that stores both existing and hypothetical indexes. The index
recommender then scans over the optimal plan and sees if a hypothetical index is
used. If so, an index recommendation is constructed. However, we do not know the
entire set of useful columns yet. While processing the entire optimal plan, we
update the index recommendation instances to add useful columns.
Prior to this commit, index recommendation decides what existing index can be
potentially replaced when the index recommendation is constructed. It looks for
the first existing index with the same explicit column as the hypothetical index
definition. This is not ideal because the best existing index to replace might
not be the first one. This commit changes the logic so that the index
recommender decides what the best existing index to replace is after it fully
processes the optimal plan and decides the entire set of columns that are
useful. The best existing index to replace is the one that stores the same
explicit column as the hypothetical index and stores the most columns from
what was actually scanned.
Consider the following example.
Given
SELECT i, j FROM t5 WHERE v > 1, the index recommendationpreviously recommends replacing idx_1 with
idx(v) STORING (i, j). It nowrecommends replacing
idx_2 with idx(v) STORING (i, j)as idx_2 is more similarto what we want to replace later on.
In addition, this commit also makes adding invisible indexes to index
recommender much easier.
Release justification: This is a low risk change to the existing functionality.
Release note: Instead of always recommending to replace the first existing index
with the same explicit columns, the index recommendation now considers all
existing indexes in the table to decide the best one to replace.
sql: index rec with NotVisible indexes
Now that we have introduced not visible index feature, we should update index
recommendations accordingly. Prior to this commit, the index recommendation
engine treats all indexes as they are visible. If an index already exists in the
table and is the same as we want to recommend, this index recommendation
is not created. However, it is possible that this index is not visible now. This
commits fixes this by adding not visible indexes to the logic.
The index recommender only recommends alter index visible if the index already
contains every explicit column in the hypothetical index and every useful column.
It will never recommend dropping an invisible index.
Fixes: #85477
Release justification: This commit does not interfere with existing
functionality and fixes the index recommendation engine to take an index
visibility into account.
Release note: Index recommendation now considers not visible indexes and can
also make index recommendations for alter index … visible.