Skip to content

sql/schemachanger: enable create index by default#92128

Merged
craig[bot] merged 16 commits intocockroachdb:masterfrom
fqazi:addIdxFinal
Nov 30, 2022
Merged

sql/schemachanger: enable create index by default#92128
craig[bot] merged 16 commits intocockroachdb:masterfrom
fqazi:addIdxFinal

Conversation

@fqazi
Copy link
Copy Markdown
Collaborator

@fqazi fqazi commented Nov 18, 2022

Epic: CRDB-19164

This patch will enable CREATE INDEX by default inside the declarative schema changer by fixing the following:

  1. Fix validation for supported types for inverted indexes
  2. Implement full inverted index support
  3. Re-use and properly create columns for index expressions
  4. Add support for partial indexes
  5. Add support for partitioned indexes
  6. Address some hash-shared index bugs, related to wrong column types. Support is still disabled by default since we need constraint support

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@fqazi fqazi force-pushed the addIdxFinal branch 3 times, most recently from c5adf05 to fd862b9 Compare November 18, 2022 17:52
@fqazi fqazi marked this pull request as ready for review November 18, 2022 20:35
@fqazi fqazi requested a review from a team November 18, 2022 20:35
@fqazi fqazi requested a review from a team as a code owner November 18, 2022 20:35
@fqazi fqazi requested a review from a team November 18, 2022 20:35
@fqazi fqazi requested a review from a team as a code owner November 18, 2022 20:35
@fqazi fqazi marked this pull request as draft November 18, 2022 20:35
@fqazi fqazi force-pushed the addIdxFinal branch 2 times, most recently from 2f97e25 to 19b7594 Compare November 19, 2022 02:31
@fqazi fqazi marked this pull request as ready for review November 19, 2022 02:47
@fqazi fqazi force-pushed the addIdxFinal branch 2 times, most recently from 4c8f385 to 23ab71b Compare November 20, 2022 03:06
@fqazi fqazi requested a review from a team as a code owner November 20, 2022 03:06
@fqazi fqazi requested review from msbutler and removed request for a team November 20, 2022 03:06
func newUndefinedOpclassError(opclass tree.Name) error {
return pgerror.Newf(pgcode.UndefinedObject, "operator class %q does not exist", opclass)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add some simple comment here like "inaccessible and system columms cannot be referenced in an index"

Question: What about hidden column, virtual column, and expression columns? I vaguely remember they can be referenced in an index, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, hidden, virtual and stored expressions can be referenced. Updated to add a comment

}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similarly, can you some comment to explain what this function does in a high level?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is this logic copied/mimiced from somewhere else?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be :) it's mostly from create_index.go of legacy schema changer, for example:

"operator classes are only allowed for the last column of an inverted index")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment:

// checkColumnAccessibilityForIndex validate that any columns that are explicitly referenced in a column for storage or as a key are either accessible and not system columns.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is this logic copied/mimiced from somewhere else?

Yes, this is taken from, checkIndexColumns

}
}

func newUndefinedOpclassError(opclass tree.Name) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what is an operator class?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly for inverted indexes: https://www.postgresql.org/docs/current/gin-builtin-opclasses.html
But looks like only geometry and trigram are taken care of in crdb:

case types.ArrayFamily:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xiang-Gu https://www.postgresql.org/docs/current/indexes-opclass.html, it pretty much determines how values are compared / stored for a given index. I'll make add comments in the areas where this is checked:

// OpClass in Cockroach is used to determine how values will be compared/stored within the index,
// similar to Postgres.

Comment on lines +417 to +430
// Check if the column name was duplicated from the STORING clause, in which
// case this isn't allowed.
// Note: The column IDs for the key suffix columns are not resolved, so we couldn't
// do this earlier.
scpb.ForEachColumnName(relationElements, func(current scpb.Status, target scpb.TargetStatus, cn *scpb.ColumnName) {
if cn.ColumnID == e.ColumnID &&
cn.TableID == e.TableID {
if _, found := columnRefs[cn.Name]; found {
panic(pgerror.Newf(pgcode.InvalidObjectDefinition,
"index %q already contains column %q", n.Name, cn.Name))
}
columnRefs[cn.Name] = struct{}{}
}
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why do we have to check "whether there is duplicate STORING columns" here, but not earlier? I didn't understand the comment "Note: The column IDs for the key suffix columns are not resolved, so we couldn't do this earlier."

  2. You are computing the name of column e, right? If so, how is this related to STORING columns in the to-be-added index as e is a key suffix column in the to-be-added index?

  3. I think you can just call function mustRetrieveColumnNameElem to get the name of a column by specifying table ID and column ID.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The key suffix columns are derived from the primary key, pretty much any column that isn't contained by the key columns in the index. To make this clearer I'll change this comment to:
    // Note: The column IDs for the key suffix columns are derived by finding columns in the primary index, which
    // are not covered by the key columns within the index. We checked against the key columns vs the storing ones,
    // earlier so this covers any extra columns.

  2. If the column is already in the key either implicitly via the suffix or directly by the references key columns, then it makes no sense for it to be stored again. So, this check is pretty much saying the set of storing columns and key suffix columns don't overlap.

  3. TIL, let me replace of these

Comment on lines +356 to +517
if columnNode.Expr == nil {
colName := columnNode.Column.Normalize()
if _, found := columnRefs[colName]; found {
panic(pgerror.Newf(pgcode.InvalidObjectDefinition,
"index %q contains duplicate column %q", n.Name, colName))
}
columnRefs[colName] = struct{}{}
} else {
colExpr := columnNode.Expr.String()
if _, found := columnExprRefs[colExpr]; found {
panic(pgerror.Newf(pgcode.InvalidObjectDefinition,
"index %q contains duplicate expression", n.Name))
}
columnExprRefs[colExpr] = struct{}{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you educate me the bad case we try to avoid here by giving me two examples, one for each if branch? Thank you.

e.g.

  • duplicate column: CREATE INDEX idx ON t(i, i)?
  • duplicate expression: CREATE INDEX idx ON t(lower(i), j, lower(i))?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those are the two cases, pretty much either the same column referenced twice or the same expression twice since neither makes sense. I'll add a comment:

// Detect if either the same column or expression repeats twice in this
// index definition. ie.:
// 1) CREATE INDEX idx ON t(i, i) or
// 2) CREATE INDEX idx ON t(lower(i), j, lower(i)).

Comment on lines +676 to +860
// Mutate the accessibility flag on this column it should be inaccessible.
{
ers := b.ResolveColumn(tbl.TableID, d.Name, ResolveParams{
RequiredPrivilege: privilege.CREATE,
})
_, _, col := scpb.FindColumn(ers)
col.IsInaccessible = true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Do you mean a virtual column is always inaccessible?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any new virtual columns added by CREATE INDEX need to be inaccessible, this is the current behaviour. Just being hidden isn't sufficient.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment to make it more clear:

// When a virtual column for an index expression gets added for CREATE INDEX it must be inaccessible,
// so we will manipulate the newly added element to be inaccessible after going through the normal add
// column code path.

Comment on lines +545 to +564
// TODO(fqazi): this doesn't work when referencing newly added columns.
_, err := schemaexpr.ValidatePartialIndexPredicate(
b.ctx,
b.descCache[tableID].desc.(catalog.TableDescriptor),
expr,
&tn,
b.semaCtx)
if err != nil {
panic(err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered exactly the same issue when implementing add constraint check where I also need to validate the check expression in the builder. What I did is to refactor the code in schemaexpr.DequalifyAndValidateExpr so that the caller will need to provide the columns, so I can circumvent this limitation.

You will benefit from that refactoring as well but this PR does not need to be blocked by it.

fqazi added 13 commits November 29, 2022 21:32
…expression reuse

Previously, the declarative schema changer supported creating indexes
with expressions with no support for reusing virtual columns. Additionally,
there were other bugs in this expression logic likes accessibility flags
and lack of type checking. This patch addresses all of those concerns.

Release note: None
…decomposition

To help support partitioning the decomposition needs to be able to
track the kind of inverted index columns (trigram vs default) and
which columns are implicitly there in an index.

Release note: None
Previously, inverted index was not properly supported for non-geo
types and we would fall back to the legacy schema changer. This patch
enables it for the declarative schema changer when non-geo types are
used.

Release note: None
Previously, the declarative schema changer did not support
CREATE INDEX statements with partitions. This path adds
support for them.

Release note: None
Previously, the CREATE INDEX statement in declarative
schema changer did not support partial indexes. This
patch adds support for them.

Release note: None
Previously, the declarative schema changer did not support
geometry/geographic types in CREATE INDEX. This patch
adds support for them including geoconfig parameters
during CREATE INDEX.

Release note: None
Previously, our hash sharded index implementations had bugs,
which would prevent us from enabling CREATE INDEX by default.
This patch disables it support for CREATE INDEX in the declarative
schema changer if hash sharded indexes are detected. It also fixes
the types for the columns used, so that it behaves correctly,
once other pieces are in place to support it.

Release note: None
Previously the declarative schema changer did not correctly
do the following for CREATE INDEX:
1) Correctly return execution errors from validation.
2) Generate event log entries for CREATE INDEX.
3) Ensure that index names are assigned before validation, so
   proper errors are returned.
4) Enable CREATE INDEX support by default.

Release note: None
Previously, the CREATE INDEX logic inside the declarative
schema changer did not implement telemetry. This patch
adds telemetry counters for different variants of
the CREATE INDEX statement.

Release note: None
Previously, there were scenarios where the declarative schema changer
validating expressions could run into cases where types were not fully
evaluated. This patch addresses one such case by force fully hydrating
a type when parsing/evaluating constants in expressions.

Release note: None
Previously, CREATE INDEX ran by default inside the legacy schema changer,
these changes enable it inside the declarative one.

Release note: None
…tests

Some tests test low level functionality of CREATE INDEX legacy
schema changer hooks. We will temporarily disable declarative
schema changer in these tests. Additionally, update logic
test files for new expected output from declarative schema
changer.

Release note: None
Previously, invalid partition configurations would
generate generic unexpected errors. This patch, properly
maps these errors to the InvalidObjectDefinition pgcode.

Release note: None
@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Nov 30, 2022

The only failure observed is a known flake inside: TestEvictionTokenCoalesce.

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 30, 2022

Build succeeded:

@craig craig bot merged commit 7823e2f into cockroachdb:master Nov 30, 2022
rafiss added a commit to rafiss/cockroach that referenced this pull request Mar 26, 2025
When cockroachdb#73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of cockroachdb#63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in cockroachdb#139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in cockroachdb#92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.
craig bot pushed a commit that referenced this pull request Mar 26, 2025
143270: kvserver: better obs in TestTxnReadWithinUncertaintyIntervalAfterRangeMerge r=tbg a=tbg

Closes #143260.

This is a complex test with a rare failure mode. Trace all of the relevant operations so that we can meaningfully engage with it.

Epic: none
Release note: none

143451: sql,backfill: avoid holding a protected timestamp during the backfill r=rafiss a=rafiss

### sql,rowexec: use WriteTS as timestamp for AddSSTable in backfill

PR #64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, #63945 was
being worked on, which made it so that we use a fixed _write_ timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None

### sql,backfill: choose new read timestamp when backfill job is resumed

When #73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of #63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in #139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in #92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.

### sql, backfill: use a current timestamp to scan each chunk of the source index

There's no need for the backfill to use a fixed read timestamp for the
entire job. Instead, each chunk of the original index that needs to be
scanned can use a current timestamp. This allows us to remove all the
logic for adding protected timestamps for the backfill.

Release note (performance improvement): Schema changes that require
data to be backfilled no longer hold a protected timestamp for the
entire duration of the backfill, which means there is less overhead
caused by MVCC garbage collection after the backfill completes.

### sql: stop setting readAsOf in index backfiller

As of the parent commit, this is unused by the index backfiller.

Release note: None

---

fixes #140629
informs #142339
informs #142117
informs #141773

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
rafiss added a commit to rafiss/cockroach that referenced this pull request Mar 26, 2025
When cockroachdb#73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of cockroachdb#63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in cockroachdb#139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in cockroachdb#92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.
rafiss added a commit to rafiss/cockroach that referenced this pull request Mar 27, 2025
When cockroachdb#73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of cockroachdb#63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in cockroachdb#139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in cockroachdb#92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.
rafiss added a commit to rafiss/cockroach that referenced this pull request Mar 27, 2025
When cockroachdb#73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of cockroachdb#63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in cockroachdb#139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in cockroachdb#92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants