Skip to content

schemachanger: add OFFLINE as intermediate state when dropping descriptors#83915

Merged
craig[bot] merged 6 commits intocockroachdb:masterfrom
postamar:offline-in-drop
Jul 12, 2022
Merged

schemachanger: add OFFLINE as intermediate state when dropping descriptors#83915
craig[bot] merged 6 commits intocockroachdb:masterfrom
postamar:offline-in-drop

Conversation

@postamar
Copy link
Copy Markdown

@postamar postamar commented Jul 6, 2022

The addition of a revertible transition makes it possible to combine
descriptor drops with other schema changes in a way that preserves their
revertibility while immediately making the dropped descriptor no longer
public.

Release note (sql change): DROP statements performed by the declarative
schema changer (which is the case by default) now transition the
descriptor states to OFFLINE in the initial schema change transaction,
before subsequently transitioning them to DROP in a subsequent
transaction executed by the schema change job. The behaviour of
changefeeds on dropped tables will be correspondingly affected.
Previously they'd throw an error about the table being dropped.
They may now do either that or throw an error about the table being
taken offline. Furthermore, it's possible for a concurrent backup
to see the table as OFFLINE before it reaches DROP and this may cause
the offline table to be included in the data. We don't expect the impact
of these changes to be significant, but they may nonetheless be noticeable.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this worked out shockingly well. We should make sure that backups don't get sad about it.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@postamar
Copy link
Copy Markdown
Author

postamar commented Jul 7, 2022

There were, in fact, some wonky non-trivial bugs related to backups not caring enough about non-public type descriptors and the offline state in general.

@postamar postamar force-pushed the offline-in-drop branch 2 times, most recently from 679a9ca to 62f383e Compare July 7, 2022 13:34
@ajwerner
Copy link
Copy Markdown
Contributor

ajwerner commented Jul 7, 2022

One issue is that backups do want to backup OFFLINE descriptors, at least, I think they do. Consider the case of an IMPORT. It takes a descriptor offline and then when it's done, puts it back online. I think we want to back up these tables too, right?

It feels like in some cases, if we were more sophisticated, we could move the descriptors from OFFLINE to DROPPED in PreCommit if at that point it were inevitable.

@postamar
Copy link
Copy Markdown
Author

postamar commented Jul 7, 2022

One issue is that backups do want to backup OFFLINE descriptors, at least, I think they do.

I think so too, which is why I relaxed the filtering in the backup targeting to include more OFFLINE descriptor than it previously was. Previous behaviour was also slightly erroneous irrespective of current changes in the declarative schema changer: an OFFLINE table would be included in a BACKUP DATABASE but not the parent OFFLINE schema.

I'll make sure this PR gets reviewed by Bulk-IO as these changes also have an impact on changefeeds, which now can error complaining about a table being offline instead of dropped.

@ajwerner
Copy link
Copy Markdown
Contributor

ajwerner commented Jul 7, 2022

an OFFLINE table would be included in a BACKUP DATABASE but not the parent OFFLINE schema.

Ah, I suspect we've never had an OFFLINE schema before

@postamar
Copy link
Copy Markdown
Author

postamar commented Jul 7, 2022

Indeed. Also I don't think anyone could have noticed anything before the public schema got an explicit descriptor.

@postamar
Copy link
Copy Markdown
Author

postamar commented Jul 7, 2022

This is ready for review now.

@postamar postamar marked this pull request as ready for review July 7, 2022 16:43
@postamar postamar requested a review from a team July 7, 2022 16:43
@postamar postamar requested a review from a team as a code owner July 7, 2022 16:43
@postamar postamar requested review from a team, dt and miretskiy and removed request for a team July 7, 2022 16:43
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- this proved to be important for some changes I needed to make for DROP COLUMN. I'd prefer we merge it sooner rather than later.

// hydrated where and test the API boundary accordingly.
if table, isTable := desc.(catalog.TableDescriptor); isTable {
desc, err = tc.hydrateTypesInTableDesc(ctx, txn, table)
desc, err = tc.hydrateTypesInTableDescWitOptions(ctx, txn, table, flags.IncludeOffline, flags.AvoidLeased)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this typo on master, so this needs a rebase, otherwise it'll skew and to adopt the fix, otherwise it'll skew

@postamar
Copy link
Copy Markdown
Author

In a few hours I'll push a commit which addresses #84137 on top of this branch (and rebase).

@postamar postamar requested a review from a team July 11, 2022 14:01
@postamar
Copy link
Copy Markdown
Author

In a few hours I'll push a commit which addresses #84137 on top of this branch (and rebase).

The opposite is true, I actually need to rebase this branch on top of another. This should not hinder any reviews.

craig bot pushed a commit that referenced this pull request Jul 12, 2022
83646: streamingccl: create new tenant automatically for replication stream r=samiskin a=samiskin

Resolves #76952

Originally tenants had to be created manually prior to replication
stream creation.  This change instead creates a new tenant upon calling
the RESTORE INTO command using the tenant ID that was provided.  The
tenant is considered inactive until the cutover point, at which the
tenant is activated.

Release note: None

83677: server: return non-live, live info per table r=maryliag a=maryliag

This commit adds 3 new parameter to the table
details endpoint:
- dataTotalBytes
- dataNonLiveBytes
- dataNonLivePercentage

Partially addresses #82617

Release note (api change): Add information about total bytes,
non live (MVCC) bytes and non live (MVCC) percentage to
Table Details endpoint.

84117: prometheus: fix chaining in WithWorkload r=srosenberg a=tbg

Release note: None


84189: sql: fix dropping offline objects r=postamar a=postamar

Fixes #84137.

I intend to base #83915 on this PR, backport the first two commits to 22.1, and backport only the first commit to 21.2.

Co-authored-by: Shiranka Miskin <shiranka.miskin@gmail.com>
Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Marius Posta <marius@cockroachlabs.com>
Marius Posta added 5 commits July 12, 2022 05:57
This commit addresses two issues:
1. Until now, type descriptors were never properly filtered. It was
   therefore possible to include some with a DROP state in a backup.
2. OFFLINE schema descriptors were never included in targeted backups.

These issues were of little importance until now: we want to
use the OFFLINE state in the declarative schema changer.

Release note: None
These bugs became apparent while introducing the OFFLINE state in the
declarative schema changer DROP path.

Release note: None
This patch fixes an instance where we ignored offline descriptors by
mistake.

Release note: None
The addition of a revertible transition makes it possible to combine
descriptor drops with other schema changes in a way that preserves their
revertibility while immediately making the dropped descriptor no longer
public.

Release note (sql change): DROP statements performed by the declarative
schema changer (which is the case by default) now transition the
descriptor states to OFFLINE in the initial schema change transaction,
before subsequently transitioning them to DROP in a subsequent
transaction executed by the schema change job. The behaviour of
changefeeds on dropped tables will be correspondingly affected.
Previously they'd throw an error about the table being dropped.
They may now do either that or throw an error about the table being
taken offline. Furthermore, it's possible for a concurrent backup
to see the table as OFFLINE before it reaches DROP and this may cause
the offline table to be included in the data. We don't expect the impact
of these changes to be significant, but they may nonetheless be noticeable.
The default timeout was sometimes causing CI flakes.
The recent changes to DROP behaviour caused more cases to be run
in this test suite.

This change can (and should) be reverted once we're able to parallelize
test runs better in CI.

Release note: None
@postamar
Copy link
Copy Markdown
Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 12, 2022

Build succeeded:

@craig craig bot merged commit a69de1e into cockroachdb:master Jul 12, 2022
msbutler added a commit to msbutler/cockroach that referenced this pull request Aug 11, 2022
…flag set

Historically, offline tables were excluded from backups (e.g. restoring or
importing tables) because incremental backups could not reason correctly about
their MVCC history. Because all within tenant operations are now MVCC, BACKUP
can begin incrementally backing up offline tables.

Independent of the MVCC history thread, cockroachdb#83915 allowed backup to implicitly
target _all_ offline descriptors (e.g. database, schema, table, type) to
account for a class of compound online schema changes. Note with this PR, the
user can't _explicitly_ target an offline descriptor (e.g. BACKUP DATABASE
my_offline_database).

This patch reverts part of cockroachdb#82915 such that BACKUP targets offline tables
iff it contains  an in-progress import and when an external flag is set. Since
the class of online schema changes described in cockroachdb#82915 will never occur in
22.2, there isn't actually a need to back up a larget set of offline tables.

This patch merely refactors the internal targeting logic in
backupresolver.ResolveTargetsToDescriptors, and does not change how backups
behave. In other words, after this patch, no offline descriptors will get
targeted, as all callers set the function's backupImports flag to false. A
future PR will change these callsites.

Informs cockroachdb#76722

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this pull request Aug 16, 2022
Historically, offline tables were excluded from backups (e.g. restoring or
importing tables) because incremental backups could not reason correctly about
their MVCC history. Now that all within tenant operations are now MVCC, BACKUP
can begin incrementally backing up offline tables.

Independent of the MVCC history thread, cockroachdb#83915 allowed backup to implicitly
target _all_ offline descriptors (e.g. database, schema, table, type) to
account for a class of compound online schema changes. This is a regression
from the expected backup behavior and will get reverted in future work.

This patch ensures that offline tables with importing data are not targetted
iff an external flag is set.

This patch merely refactors the internal targeting logic in
backupresolver.ResolveTargetsToDescriptors, and does not change how backups
behave. In other words, after this patch, no offline descriptors will get
targeted, as all callers set the function's backupImports flag to false. A
future PR will change these callsites.

Informs cockroachdb#76722

Release note: none

Release note (<category, see below>): <what> <show> <why>
msbutler added a commit to msbutler/cockroach that referenced this pull request Aug 19, 2022
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors
(including IMPORTing and RESTOREing tables), but didn't fully handle their
rollback on RESTORE. This PR ensures RESTORE properly handles these offline
tables:
- An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The
  table does not exist in the restoring cluster.
- An offline table created by an IMPORT INTO has all importing data elided in
  the restore processor and is restored online to its pre import state.

If the RESTORE occurs AOST after the table returns online, the table is
RESTOREd to it's post offline state.

Currently, all tables that return online are fully re-backed up. For many
situations above, this is unecessary. Future work should ensure that these
tables are not re-backed up when we know all operations within the
incremental backup interval on the table were MVCC.

Release note (sql change): offline tables with in-progress IMPORT INTO imports
(i.e. tables that existed before the import job began) now get
backed up once the cluster has upgraded to v22.2. During a cluster/database
Restore that executes AOST during the in-progress import, all data from the
import is elided and leaves the table in its pre-import state (online, with
pre-import data, if any). If the RESTORE executes AOST after the import
completes, all the imported data ingests into the restored data.

This patch only applies to database and cluster BACKUP and RESTORE, and a
future PR should allow a user to BACKUP/RESTORE an existing offline table
undergoing an IMPORT.

Release justification: fixes bug where offline tables were not properly handled
in restore.
msbutler added a commit to msbutler/cockroach that referenced this pull request Aug 22, 2022
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors
(including IMPORTing and RESTOREing tables), but didn't fully handle their
rollback on RESTORE. This PR ensures RESTORE properly handles these offline
tables:
- An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The
  table does not exist in the restoring cluster.
- An offline table created by an IMPORT INTO has all importing data elided in
  the restore processor and is restored online to its pre import state.

If the RESTORE occurs AOST after the table returns online, the table is
RESTOREd to it's post offline state.

Currently, all tables that return online are fully re-backed up. For many
situations above, this is unecessary. Future work should ensure that these
tables are not re-backed up when we know all operations within the
incremental backup interval on the table were MVCC.

Release note (sql change): offline tables with in-progress IMPORT INTO imports
(i.e. tables that existed before the import job began) now get
backed up once the cluster has upgraded to v22.2. During a cluster/database
Restore that executes AOST during the in-progress import, all data from the
import is elided and leaves the table in its pre-import state (online, with
pre-import data, if any). If the RESTORE executes AOST after the import
completes, all the imported data ingests into the restored data.

This patch only applies to database and cluster BACKUP and RESTORE, and a
future PR should allow a user to BACKUP/RESTORE an existing offline table
undergoing an IMPORT.

Release justification: fixes bug where offline tables were not properly handled
in restore.
msbutler added a commit to msbutler/cockroach that referenced this pull request Aug 24, 2022
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors
(including IMPORTing and RESTOREing tables), but didn't fully handle their
rollback on RESTORE. This PR ensures RESTORE properly handles these offline
tables:
- An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The
  table does not exist in the restoring cluster.
- An offline table created by an IMPORT INTO has all importing data elided in
  the restore processor and is restored online to its pre import state.

If the RESTORE occurs AOST after the table returns online, the table is
RESTOREd to it's post offline state.

Currently, all tables that return online are fully re-backed up. For many
situations above, this is unecessary. Future work should ensure that these
tables are not re-backed up when we know all operations within the
incremental backup interval on the table were MVCC.

Release note (sql change): offline tables with in-progress IMPORT INTO imports
(i.e. tables that existed before the import job began) now get
backed up once the cluster has upgraded to v22.2. During a cluster/database
Restore that executes AOST during the in-progress import, all data from the
import is elided and leaves the table in its pre-import state (online, with
pre-import data, if any). If the RESTORE executes AOST after the import
completes, all the imported data ingests into the restored data.

This patch only applies to database and cluster BACKUP and RESTORE, and a
future PR should allow a user to BACKUP/RESTORE an existing offline table
undergoing an IMPORT.

Release justification: fixes bug where offline tables were not properly handled
in restore.
msbutler added a commit to msbutler/cockroach that referenced this pull request Aug 26, 2022
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors
(including IMPORTing and RESTOREing tables), but didn't fully handle their
rollback on RESTORE. This PR ensures RESTORE properly handles these offline
tables:
- An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The
  table does not exist in the restoring cluster.
- An offline table created by an IMPORT INTO has all importing data elided in
  the restore processor and is restored online to its pre import state.

If the RESTORE occurs AOST after the table returns online, the table is
RESTOREd to it's post offline state.

Currently, all tables that return online are fully re-backed up. For many
situations above, this is unecessary. Future work should ensure that these
tables are not re-backed up when we know all operations within the
incremental backup interval on the table were MVCC.

Release note (sql change): offline tables with in-progress IMPORT INTO imports
(i.e. tables that existed before the import job began) now get
backed up once the cluster has upgraded to v22.2. During a cluster/database
Restore that executes AOST during the in-progress import, all data from the
import is elided and leaves the table in its pre-import state (online, with
pre-import data, if any). If the RESTORE executes AOST after the import
completes, all the imported data ingests into the restored data.

This patch only applies to database and cluster BACKUP and RESTORE, and a
future PR should allow a user to BACKUP/RESTORE an existing offline table
undergoing an IMPORT.

Release justification: fixes bug where offline tables were not properly handled
in restore.
msbutler added a commit to msbutler/cockroach that referenced this pull request Aug 30, 2022
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors
(including IMPORTing and RESTOREing tables), but didn't fully handle their
rollback on RESTORE. This PR ensures RESTORE properly handles these offline
tables:
- An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The
  table does not exist in the restoring cluster.
- An offline table created by an IMPORT INTO has all importing data elided in
  the restore processor and is restored online to its pre import state.

If the RESTORE occurs AOST after the table returns online, the table is
RESTOREd to it's post offline state.

Currently, all tables that return online are fully re-backed up. For many
situations above, this is unecessary. Future work should ensure that these
tables are not re-backed up when we know all operations within the
incremental backup interval on the table were MVCC.

This patch only applies to BACKUP and RESTORE flavors that implicitly target an
offline importing/restoring table. A future PR should allow a user to
explicitly target an existing offline table undergoing an IMPORT.

Release justification: fixes bug where offline tables were not properly handled
in restore.
msbutler added a commit to msbutler/cockroach that referenced this pull request Aug 30, 2022
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors
(including IMPORTing and RESTOREing tables), but didn't fully handle their
rollback on RESTORE. This PR ensures RESTORE properly handles these offline
tables:
- An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The
  table does not exist in the restoring cluster.
- An offline table created by an IMPORT INTO has all importing data elided in
  the restore processor and is restored online to its pre import state.

If the RESTORE occurs AOST after the table returns online, the table is
RESTOREd to it's post offline state.

Currently, all tables that return online are fully re-backed up. For many
situations above, this is unecessary. Future work should ensure that these
tables are not re-backed up when we know all operations within the
incremental backup interval on the table were MVCC.

This patch only applies to BACKUP and RESTORE flavors that implicitly target an
offline importing/restoring table. A future PR should allow a user to
explicitly target an existing offline table undergoing an IMPORT.

Release justification: fixes bug where offline tables were not properly handled
in restore.
craig bot pushed a commit that referenced this pull request Aug 30, 2022
85920: backupccl: elide data from offline tables during RESTORE r=adityamaru a=msbutler

A previous PR (#83915) enabled BACKUP to target offline descriptors
(including IMPORTing and RESTOREing tables), but didn't fully handle their
rollback on RESTORE. This PR ensures RESTORE properly handles these offline
tables:
- An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The
  table does not exist in the restoring cluster.
- An offline table created by an IMPORT INTO has all importing data elided in
  the restore processor and is restored online to its pre import state.

If the RESTORE occurs AOST after the table returns online, the table is
RESTOREd to its post offline state.

Currently, all tables that return online are fully re-backed up. For many
situations above, this is unecessary. Future work should ensure that these
tables are not re-backed up when we know all operations within the
incremental backup interval on the table were MVCC.

This patch only applies to BACKUP and RESTORE flavors that implicitly target an
offline importing/restoring table. A future PR should allow a user to
explicitly target an existing offline table undergoing an IMPORT.

Release justification: fixes bug where offline tables were not properly handled
in restore.

86826: workload/ycsb: fix --request-distribution flag r=nvanbenschoten a=nvanbenschoten

This change fixes the handling of the `--request-distribution` flag so
that any user overrides are respected. Before, the default request
distribution for a given workload was always used, regardless of the
flag.

This bug dates back to #37804. I fear this has caused a good deal of
confusion in the past, as we often turn to YCSB-A uniform as an example
of an uncontended read/update workload.

Release justification: workload only.

87047: pgwire: fix txn options for prepared BEGIN  r=ZhouXing19 a=rafiss

fixes #87012

Release note (bug fix): The statement tag for SHOW command
results in the pgwire protocol no longer contain the number of returned
rows.

Release note (bug fix): Fixed a bug where the options given to the BEGIN
TRANSACTION command would be ignored if the BEGIN was a prepared
statement.

Release justification: low risk bug fix

87113: ci: run `acceptance` test in `ci` config r=jlinder a=rickystewart

We need this to get detailed info in the `test.xml`.

Release justification: Non-production code changes
Release note: None

87114: ci: fix nightlies that are invoking `bazci` incorrectly r=rail a=rickystewart

Some of these nightlies were broken as a result of
`c8d66a9c334dd87acd62190f314a4f3ec236ce82`.

Release justification: Non-production code changes
Release note: None

87116: bazci: delete unused `testdata` directory r=celiala a=rickystewart

Release justification: Non-production code changes
Release note: None

Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants