schemachanger: add OFFLINE as intermediate state when dropping descriptors by postamar · Pull Request #83915 · cockroachdb/cockroach

postamar · 2022-07-06T17:33:14Z

The addition of a revertible transition makes it possible to combine
descriptor drops with other schema changes in a way that preserves their
revertibility while immediately making the dropped descriptor no longer
public.

Release note (sql change): DROP statements performed by the declarative
schema changer (which is the case by default) now transition the
descriptor states to OFFLINE in the initial schema change transaction,
before subsequently transitioning them to DROP in a subsequent
transaction executed by the schema change job. The behaviour of
changefeeds on dropped tables will be correspondingly affected.
Previously they'd throw an error about the table being dropped.
They may now do either that or throw an error about the table being
taken offline. Furthermore, it's possible for a concurrent backup
to see the table as OFFLINE before it reaches DROP and this may cause
the offline table to be included in the data. We don't expect the impact
of these changes to be significant, but they may nonetheless be noticeable.

cockroach-teamcity · 2022-07-06T17:33:26Z

This change is

ajwerner

Well this worked out shockingly well. We should make sure that backups don't get sad about it.

Reviewable status: complete! 0 of 0 LGTMs obtained

postamar · 2022-07-07T02:02:44Z

There were, in fact, some wonky non-trivial bugs related to backups not caring enough about non-public type descriptors and the offline state in general.

ajwerner · 2022-07-07T13:37:29Z

One issue is that backups do want to backup OFFLINE descriptors, at least, I think they do. Consider the case of an IMPORT. It takes a descriptor offline and then when it's done, puts it back online. I think we want to back up these tables too, right?

It feels like in some cases, if we were more sophisticated, we could move the descriptors from OFFLINE to DROPPED in PreCommit if at that point it were inevitable.

postamar · 2022-07-07T13:47:18Z

One issue is that backups do want to backup OFFLINE descriptors, at least, I think they do.

I think so too, which is why I relaxed the filtering in the backup targeting to include more OFFLINE descriptor than it previously was. Previous behaviour was also slightly erroneous irrespective of current changes in the declarative schema changer: an OFFLINE table would be included in a BACKUP DATABASE but not the parent OFFLINE schema.

I'll make sure this PR gets reviewed by Bulk-IO as these changes also have an impact on changefeeds, which now can error complaining about a table being offline instead of dropped.

ajwerner · 2022-07-07T13:51:21Z

an OFFLINE table would be included in a BACKUP DATABASE but not the parent OFFLINE schema.

Ah, I suspect we've never had an OFFLINE schema before

postamar · 2022-07-07T13:54:52Z

Indeed. Also I don't think anyone could have noticed anything before the public schema got an explicit descriptor.

postamar · 2022-07-07T16:43:47Z

This is ready for review now.

ajwerner

LGTM -- this proved to be important for some changes I needed to make for DROP COLUMN. I'd prefer we merge it sooner rather than later.

ajwerner · 2022-07-10T18:32:00Z

pkg/sql/catalog/descs/descriptor.go

 		// hydrated where and test the API boundary accordingly.
 		if table, isTable := desc.(catalog.TableDescriptor); isTable {
-			desc, err = tc.hydrateTypesInTableDesc(ctx, txn, table)
+			desc, err = tc.hydrateTypesInTableDescWitOptions(ctx, txn, table, flags.IncludeOffline, flags.AvoidLeased)


I fixed this typo on master, so this needs a rebase, otherwise it'll skew and to adopt the fix, otherwise it'll skew

postamar · 2022-07-11T13:59:48Z

In a few hours I'll push a commit which addresses #84137 on top of this branch (and rebase).

postamar · 2022-07-11T17:20:05Z

In a few hours I'll push a commit which addresses #84137 on top of this branch (and rebase).

The opposite is true, I actually need to rebase this branch on top of another. This should not hinder any reviews.

83646: streamingccl: create new tenant automatically for replication stream r=samiskin a=samiskin Resolves #76952 Originally tenants had to be created manually prior to replication stream creation. This change instead creates a new tenant upon calling the RESTORE INTO command using the tenant ID that was provided. The tenant is considered inactive until the cutover point, at which the tenant is activated. Release note: None 83677: server: return non-live, live info per table r=maryliag a=maryliag This commit adds 3 new parameter to the table details endpoint: - dataTotalBytes - dataNonLiveBytes - dataNonLivePercentage Partially addresses #82617 Release note (api change): Add information about total bytes, non live (MVCC) bytes and non live (MVCC) percentage to Table Details endpoint. 84117: prometheus: fix chaining in WithWorkload r=srosenberg a=tbg Release note: None 84189: sql: fix dropping offline objects r=postamar a=postamar Fixes #84137. I intend to base #83915 on this PR, backport the first two commits to 22.1, and backport only the first commit to 21.2. Co-authored-by: Shiranka Miskin <shiranka.miskin@gmail.com> Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Marius Posta <marius@cockroachlabs.com>

This commit addresses two issues: 1. Until now, type descriptors were never properly filtered. It was therefore possible to include some with a DROP state in a backup. 2. OFFLINE schema descriptors were never included in targeted backups. These issues were of little importance until now: we want to use the OFFLINE state in the declarative schema changer. Release note: None

These bugs became apparent while introducing the OFFLINE state in the declarative schema changer DROP path. Release note: None

Release note: None

This patch fixes an instance where we ignored offline descriptors by mistake. Release note: None

The addition of a revertible transition makes it possible to combine descriptor drops with other schema changes in a way that preserves their revertibility while immediately making the dropped descriptor no longer public. Release note (sql change): DROP statements performed by the declarative schema changer (which is the case by default) now transition the descriptor states to OFFLINE in the initial schema change transaction, before subsequently transitioning them to DROP in a subsequent transaction executed by the schema change job. The behaviour of changefeeds on dropped tables will be correspondingly affected. Previously they'd throw an error about the table being dropped. They may now do either that or throw an error about the table being taken offline. Furthermore, it's possible for a concurrent backup to see the table as OFFLINE before it reaches DROP and this may cause the offline table to be included in the data. We don't expect the impact of these changes to be significant, but they may nonetheless be noticeable.

The default timeout was sometimes causing CI flakes. The recent changes to DROP behaviour caused more cases to be run in this test suite. This change can (and should) be reverted once we're able to parallelize test runs better in CI. Release note: None

postamar · 2022-07-12T10:12:50Z

bors r+

craig · 2022-07-12T11:27:54Z

Build succeeded:

GitHub CI (Cockroach)

…flag set Historically, offline tables were excluded from backups (e.g. restoring or importing tables) because incremental backups could not reason correctly about their MVCC history. Because all within tenant operations are now MVCC, BACKUP can begin incrementally backing up offline tables. Independent of the MVCC history thread, cockroachdb#83915 allowed backup to implicitly target _all_ offline descriptors (e.g. database, schema, table, type) to account for a class of compound online schema changes. Note with this PR, the user can't _explicitly_ target an offline descriptor (e.g. BACKUP DATABASE my_offline_database). This patch reverts part of cockroachdb#82915 such that BACKUP targets offline tables iff it contains an in-progress import and when an external flag is set. Since the class of online schema changes described in cockroachdb#82915 will never occur in 22.2, there isn't actually a need to back up a larget set of offline tables. This patch merely refactors the internal targeting logic in backupresolver.ResolveTargetsToDescriptors, and does not change how backups behave. In other words, after this patch, no offline descriptors will get targeted, as all callers set the function's backupImports flag to false. A future PR will change these callsites. Informs cockroachdb#76722 Release note: none

Historically, offline tables were excluded from backups (e.g. restoring or importing tables) because incremental backups could not reason correctly about their MVCC history. Now that all within tenant operations are now MVCC, BACKUP can begin incrementally backing up offline tables. Independent of the MVCC history thread, cockroachdb#83915 allowed backup to implicitly target _all_ offline descriptors (e.g. database, schema, table, type) to account for a class of compound online schema changes. This is a regression from the expected backup behavior and will get reverted in future work. This patch ensures that offline tables with importing data are not targetted iff an external flag is set. This patch merely refactors the internal targeting logic in backupresolver.ResolveTargetsToDescriptors, and does not change how backups behave. In other words, after this patch, no offline descriptors will get targeted, as all callers set the function's backupImports flag to false. A future PR will change these callsites. Informs cockroachdb#76722 Release note: none Release note (<category, see below>): <what> <show> <why>

A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to it's post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. Release note (sql change): offline tables with in-progress IMPORT INTO imports (i.e. tables that existed before the import job began) now get backed up once the cluster has upgraded to v22.2. During a cluster/database Restore that executes AOST during the in-progress import, all data from the import is elided and leaves the table in its pre-import state (online, with pre-import data, if any). If the RESTORE executes AOST after the import completes, all the imported data ingests into the restored data. This patch only applies to database and cluster BACKUP and RESTORE, and a future PR should allow a user to BACKUP/RESTORE an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore.

A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to it's post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. This patch only applies to BACKUP and RESTORE flavors that implicitly target an offline importing/restoring table. A future PR should allow a user to explicitly target an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore.

85920: backupccl: elide data from offline tables during RESTORE r=adityamaru a=msbutler A previous PR (#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to its post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. This patch only applies to BACKUP and RESTORE flavors that implicitly target an offline importing/restoring table. A future PR should allow a user to explicitly target an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore. 86826: workload/ycsb: fix --request-distribution flag r=nvanbenschoten a=nvanbenschoten This change fixes the handling of the `--request-distribution` flag so that any user overrides are respected. Before, the default request distribution for a given workload was always used, regardless of the flag. This bug dates back to #37804. I fear this has caused a good deal of confusion in the past, as we often turn to YCSB-A uniform as an example of an uncontended read/update workload. Release justification: workload only. 87047: pgwire: fix txn options for prepared BEGIN r=ZhouXing19 a=rafiss fixes #87012 Release note (bug fix): The statement tag for SHOW command results in the pgwire protocol no longer contain the number of returned rows. Release note (bug fix): Fixed a bug where the options given to the BEGIN TRANSACTION command would be ignored if the BEGIN was a prepared statement. Release justification: low risk bug fix 87113: ci: run `acceptance` test in `ci` config r=jlinder a=rickystewart We need this to get detailed info in the `test.xml`. Release justification: Non-production code changes Release note: None 87114: ci: fix nightlies that are invoking `bazci` incorrectly r=rail a=rickystewart Some of these nightlies were broken as a result of `c8d66a9c334dd87acd62190f314a4f3ec236ce82`. Release justification: Non-production code changes Release note: None 87116: bazci: delete unused `testdata` directory r=celiala a=rickystewart Release justification: Non-production code changes Release note: None Co-authored-by: Michael Butler <butler@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>

ajwerner reviewed Jul 6, 2022

View reviewed changes

postamar force-pushed the offline-in-drop branch from bac790f to 9d3c764 Compare July 7, 2022 01:57

postamar force-pushed the offline-in-drop branch 2 times, most recently from 679a9ca to 62f383e Compare July 7, 2022 13:34

postamar marked this pull request as ready for review July 7, 2022 16:43

postamar requested a review from a team July 7, 2022 16:43

postamar requested a review from a team as a code owner July 7, 2022 16:43

postamar requested review from a team, dt and miretskiy and removed request for a team July 7, 2022 16:43

ajwerner approved these changes Jul 10, 2022

View reviewed changes

postamar requested a review from a team July 11, 2022 14:01

postamar mentioned this pull request Jul 11, 2022

sql: fix dropping offline objects #84189

Merged

Marius Posta added 5 commits July 12, 2022 05:57

rewrite: fix schema changer state rewrite bugs

c58c791

These bugs became apparent while introducing the OFFLINE state in the declarative schema changer DROP path. Release note: None

scexec: add MarkDescriptorAsPublic and MarkDescriptorAsOffline ops

214c156

Release note: None

descs: fix type hydration bug

fde0594

This patch fixes an instance where we ignored offline descriptors by mistake. Release note: None

postamar force-pushed the offline-in-drop branch from 1e7f861 to 854719d Compare July 12, 2022 10:11

craig bot merged commit a69de1e into cockroachdb:master Jul 12, 2022

cockroach-teamcity mentioned this pull request Jul 12, 2022

opgen: add OFFLINE as intermediate state when dropping descriptors cockroachdb/docs#14513

Closed

msbutler mentioned this pull request Aug 11, 2022

backupccl: target offline importing tables if flag is set #85979

Closed

msbutler mentioned this pull request Aug 18, 2022

backupccl: elide data from offline tables during RESTORE #85920

Merged

ajwerner mentioned this pull request Aug 23, 2022

sql/schemachanger: ensure dropping descriptors are not taken OFFLINE #86626

Closed

Conversation

postamar commented Jul 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jul 6, 2022

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

postamar commented Jul 7, 2022

Uh oh!

ajwerner commented Jul 7, 2022

Uh oh!

postamar commented Jul 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajwerner commented Jul 7, 2022

Uh oh!

postamar commented Jul 7, 2022

Uh oh!

postamar commented Jul 7, 2022

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner Jul 10, 2022

Choose a reason for hiding this comment

Uh oh!

postamar commented Jul 11, 2022

Uh oh!

postamar commented Jul 11, 2022

Uh oh!

postamar commented Jul 12, 2022

Uh oh!

craig bot commented Jul 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

postamar commented Jul 6, 2022 •

edited

Loading

postamar commented Jul 7, 2022 •

edited

Loading