opt, cat: cleaner API for mutation columns#51400
opt, cat: cleaner API for mutation columns#51400craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
a95abcf to
3023579
Compare
andy-kimball
left a comment
There was a problem hiding this comment.
LGTM, with some nits.
pkg/sql/opt/cat/utils.go
Outdated
| // ConvertColumnIDsToOrdinals converts a list of ColumnIDs (such as from a | ||
| // tree.TableRef), to a list of ordinal positions of columns within the given | ||
| // table. See tree.Table for more information on column ordinals. | ||
| // tree.TableRef), to a list of ordinal positions of non-mutation columns within |
There was a problem hiding this comment.
It seems like an API bug that this only looks at non-mutation cols. Why not let it handle mutation cols as well, since the API is generic? I don't think any existing caller calls it with mutation cols, so there should be no panics.
pkg/sql/opt/cat/utils.go
Outdated
| @@ -147,8 +147,8 @@ func ConvertColumnIDsToOrdinals(tab Table, columns []tree.ColumnID) (ordinals [] | |||
| // FindTableColumnByName returns the ordinal of the non-mutation column having | |||
| // the given name, if one exists in the given table. Otherwise, it returns -1. | |||
| func FindTableColumnByName(tab Table, name tree.Name) int { | |||
There was a problem hiding this comment.
Should we make this return mutation cols as well and require caller to test for mutation/system cols? The more col types we add, the more confusing it is when some generic-sounding APIs filter cols and others don't.
| for i, n := 0, tab.ColumnCount(); i < n; i++ { | ||
| if private.IsColumnOutput(i) { | ||
| for i, n := 0, tab.AllColumnCount(); i < n; i++ { | ||
| if !cat.IsMutationColumn(tab, i) && private.IsColumnOutput(i) { |
There was a problem hiding this comment.
Actually, I think we can just get rid of the comment and the check for IsMutationColumn. private.IsColumnOutput should always return false for mutation columns, or else there's a bug somewhere else in the code. We always set private.ReturnCols entries to zero for mutation cols.
| // the column must be not null. | ||
| for i, n := 0, tab.ColumnCount(); i < n; i++ { | ||
| for i, n := 0, tab.AllColumnCount(); i < n; i++ { | ||
| if cat.IsMutationColumn(tab, i) { |
There was a problem hiding this comment.
The OutputCols check should already take care of this case, since mutation cols will never be output cols by the check above. I think it's good to let the mutation builder decide which cols are output or not, and let this props builder code just blindly process the cols without checking again.
| for i, cnt := 0, baseTable.ColumnCount(); i < cnt; i++ { | ||
| unfilteredCols.Add(t.Table.ColumnID(i)) | ||
| for i, cnt := 0, baseTable.AllColumnCount(); i < cnt; i++ { | ||
| if !cat.IsMutationColumn(baseTable, i) { |
There was a problem hiding this comment.
I don't see any harm in adding mutation cols here. I think Drew just used ColumnCount b/c it seemed like the right thing. But really we want all columns.
|
|
||
| var cols opt.ColSet | ||
| for i := 0; i < tab.ColumnCount(); i++ { | ||
| for i := 0; i < tab.AllColumnCount(); i++ { |
There was a problem hiding this comment.
I wonder how mutation cols play with stats. A good example of a case where using ColumnCount can cause us to not consider a case that is potentially interesting to think about, or at least document.
| @@ -1054,9 +1057,9 @@ func (mb *mutationBuilder) mapColumnNamesToOrdinals(names tree.NameList) util.Fa | |||
| var ords util.FastIntSet | |||
There was a problem hiding this comment.
Can you add to the mapColumnNamesToOrdinals to make it clear that it skips mutation columns?
|
|
||
| if needResults { | ||
| // Only non-mutation columns are output columns. ReturnCols needs to have | ||
| // DeletableColumnCount entries, but only the first ColumnCount entries |
There was a problem hiding this comment.
Need to update the comment to remove ref to DeletableColumnCount.
pkg/sql/opt/optbuilder/select.go
Outdated
| @@ -577,8 +578,8 @@ func (b *Builder) addCheckConstraintsForTable(tabMeta *opt.TableMeta) { | |||
|
|
|||
| // Find the non-nullable table columns. | |||
There was a problem hiding this comment.
Add parenthetical to the comment, saying that mutation cols can be NULL during backfill, so they should be excluded.
| var tableScope *scope | ||
| tab := tabMeta.Table | ||
| for i, n := 0, tab.ColumnCount(); i < n; i++ { | ||
| for i, n := 0, tab.AllColumnCount(); i < n; i++ { |
There was a problem hiding this comment.
Add comment to explain why mutation cols should not be included here.
|
From an outside perspective, I like this a lot. It's much more explicit about what count to use/what columns are considered in an operation, and will also make it easier for me to catch all of the cases where we should or shouldn't include system columns. |
The catalog currently relies on a particular ordering to separate different kinds of columns (ordinary, write-only, delete-only). This is unfortunate because it restricts the underlying implementation; in particular, it makes it tricky to add a new kind of column (like the `mvcc_timestamp` system column) without potentially breaking existing code in subtle ways. This change reworks the cat API to have a single `ColumnCount` method and a separate `ColumnKind` method used to figure out the column kind. This forces all code that cares about the kind to check it explicitly rather than relying on a potentially faulty column count. Fixes cockroachdb#51323. Release note: None
Also add some tests that fail if we don't ignore mutation columns. Release note: None
3023579 to
a6ad229
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Updated.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @mgartner, @RaduBerinde, @rohany, and @rytaft)
pkg/sql/opt/cat/utils.go, line 125 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
It seems like an API bug that this only looks at non-mutation cols. Why not let it handle mutation cols as well, since the API is generic? I don't think any existing caller calls it with mutation cols, so there should be no panics.
The caller does care - this is used when referencing a table by ID. Without the check, we allow a mutation column (which in practice leads to a more obscure error a bit later). I added some tests and I moved this into the optbuilder as it's specific to building numeric references.
pkg/sql/opt/cat/utils.go, line 149 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Should we make this return mutation cols as well and require caller to test for mutation/system cols? The more col types we add, the more confusing it is when some generic-sounding APIs filter cols and others don't.
Moved to optbuilder, renamed to findPublicTableColumnByName.
pkg/sql/opt/memo/logical_props_builder.go, line 1272 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Actually, I think we can just get rid of the comment and the check for
IsMutationColumn.private.IsColumnOutputshould always return false for mutation columns, or else there's a bug somewhere else in the code. We always setprivate.ReturnColsentries to zero for mutation cols.
Done.
pkg/sql/opt/memo/logical_props_builder.go, line 1293 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
The
OutputColscheck should already take care of this case, since mutation cols will never be output cols by the check above. I think it's good to let the mutation builder decide which cols are output or not, and let this props builder code just blindly process the cols without checking again.
Done.
pkg/sql/opt/memo/multiplicity_builder.go, line 122 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I don't see any harm in adding mutation cols here. I think Drew just used
ColumnCountb/c it seemed like the right thing. But really we want all columns.
Done.
pkg/sql/opt/optbuilder/insert.go, line 1057 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Can you add to the
mapColumnNamesToOrdinalsto make it clear that it skips mutation columns?
Done.
pkg/sql/opt/optbuilder/mutation_builder.go, line 832 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Need to update the comment to remove ref to
DeletableColumnCount.
Done.
pkg/sql/opt/optbuilder/select.go, line 579 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Add parenthetical to the comment, saying that mutation cols can be NULL during backfill, so they should be excluded.
Done.
pkg/sql/opt/optbuilder/select.go, line 630 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Add comment to explain why mutation cols should not be included here.
Done.
mgartner
left a comment
There was a problem hiding this comment.
This is great! I think we currently do something similar with the Index catalog, so it'd be great to apply similar changes there at some point.
Reviewed 10 of 20 files at r1, 4 of 8 files at r2, 3 of 4 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @mgartner, @RaduBerinde, @rohany, and @rytaft)
|
bors r+ |
Build failed |
|
bors r+ |
Build succeeded |
This change renames AllColumnCount to ColumnCount. In cockroachdb#51400 I meant to rename before merging but I forgot. Release note: None
51462: storage: optimize MVCCGarbageCollect for large number of undeleted version r=itsbilal a=ajwerner This is a better-tested take-2 of #51184. * The first commit introduces a `SupportsPrev` method to the `Iterator` interface. * The second commit adjusts the benchmark to use a regular batch. * The third commit adjusts the logic in #51184 to fix the bugs there and adds testing. The win is now only pronounced on pebble, as might be expected. 51489: opt, cat: rename Table.AllColumnCount to Table.ColumnCount r=RaduBerinde a=RaduBerinde This change renames AllColumnCount to ColumnCount. In #51400 I meant to rename before merging but I forgot. Release note: None Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
NOTE I used
AllColumnCountso that the compiler forces me to look at all callsites, and left it like this so they are also apparent in the diff. I intend to rename it toColumnCountbefore merging.The catalog currently relies on a particular ordering to separate
different kinds of columns (ordinary, write-only, delete-only). This
is unfortunate because it restricts the underlying implementation; in
particular, it makes it tricky to add a new kind of column (like the
mvcc_timestampsystem column) without potentially breaking existingcode in subtle ways.
This change reworks the cat API to have a single
ColumnCountmethodand a separate
ColumnKindmethod used to figure out the column kind.This forces all code that cares about the kind to check it explicitly
rather than relying on a potentially faulty column count.
Fixes #51323.
Release note: None