schemachanger: add OFFLINE as intermediate state when dropping descriptors#83915
schemachanger: add OFFLINE as intermediate state when dropping descriptors#83915craig[bot] merged 6 commits intocockroachdb:masterfrom
Conversation
ajwerner
left a comment
There was a problem hiding this comment.
Well this worked out shockingly well. We should make sure that backups don't get sad about it.
Reviewable status:
complete! 0 of 0 LGTMs obtained
|
There were, in fact, some wonky non-trivial bugs related to backups not caring enough about non-public type descriptors and the offline state in general. |
679a9ca to
62f383e
Compare
|
One issue is that backups do want to backup OFFLINE descriptors, at least, I think they do. Consider the case of an IMPORT. It takes a descriptor offline and then when it's done, puts it back online. I think we want to back up these tables too, right? It feels like in some cases, if we were more sophisticated, we could move the descriptors from OFFLINE to DROPPED in |
I think so too, which is why I relaxed the filtering in the backup targeting to include more OFFLINE descriptor than it previously was. Previous behaviour was also slightly erroneous irrespective of current changes in the declarative schema changer: an OFFLINE table would be included in a BACKUP DATABASE but not the parent OFFLINE schema. I'll make sure this PR gets reviewed by Bulk-IO as these changes also have an impact on changefeeds, which now can error complaining about a table being offline instead of dropped. |
Ah, I suspect we've never had an |
|
Indeed. Also I don't think anyone could have noticed anything before the |
|
This is ready for review now. |
ajwerner
left a comment
There was a problem hiding this comment.
LGTM -- this proved to be important for some changes I needed to make for DROP COLUMN. I'd prefer we merge it sooner rather than later.
pkg/sql/catalog/descs/descriptor.go
Outdated
| // hydrated where and test the API boundary accordingly. | ||
| if table, isTable := desc.(catalog.TableDescriptor); isTable { | ||
| desc, err = tc.hydrateTypesInTableDesc(ctx, txn, table) | ||
| desc, err = tc.hydrateTypesInTableDescWitOptions(ctx, txn, table, flags.IncludeOffline, flags.AvoidLeased) |
There was a problem hiding this comment.
I fixed this typo on master, so this needs a rebase, otherwise it'll skew and to adopt the fix, otherwise it'll skew
|
In a few hours I'll push a commit which addresses #84137 on top of this branch (and rebase). |
The opposite is true, I actually need to rebase this branch on top of another. This should not hinder any reviews. |
83646: streamingccl: create new tenant automatically for replication stream r=samiskin a=samiskin Resolves #76952 Originally tenants had to be created manually prior to replication stream creation. This change instead creates a new tenant upon calling the RESTORE INTO command using the tenant ID that was provided. The tenant is considered inactive until the cutover point, at which the tenant is activated. Release note: None 83677: server: return non-live, live info per table r=maryliag a=maryliag This commit adds 3 new parameter to the table details endpoint: - dataTotalBytes - dataNonLiveBytes - dataNonLivePercentage Partially addresses #82617 Release note (api change): Add information about total bytes, non live (MVCC) bytes and non live (MVCC) percentage to Table Details endpoint. 84117: prometheus: fix chaining in WithWorkload r=srosenberg a=tbg Release note: None 84189: sql: fix dropping offline objects r=postamar a=postamar Fixes #84137. I intend to base #83915 on this PR, backport the first two commits to 22.1, and backport only the first commit to 21.2. Co-authored-by: Shiranka Miskin <shiranka.miskin@gmail.com> Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Marius Posta <marius@cockroachlabs.com>
This commit addresses two issues: 1. Until now, type descriptors were never properly filtered. It was therefore possible to include some with a DROP state in a backup. 2. OFFLINE schema descriptors were never included in targeted backups. These issues were of little importance until now: we want to use the OFFLINE state in the declarative schema changer. Release note: None
These bugs became apparent while introducing the OFFLINE state in the declarative schema changer DROP path. Release note: None
Release note: None
This patch fixes an instance where we ignored offline descriptors by mistake. Release note: None
The addition of a revertible transition makes it possible to combine descriptor drops with other schema changes in a way that preserves their revertibility while immediately making the dropped descriptor no longer public. Release note (sql change): DROP statements performed by the declarative schema changer (which is the case by default) now transition the descriptor states to OFFLINE in the initial schema change transaction, before subsequently transitioning them to DROP in a subsequent transaction executed by the schema change job. The behaviour of changefeeds on dropped tables will be correspondingly affected. Previously they'd throw an error about the table being dropped. They may now do either that or throw an error about the table being taken offline. Furthermore, it's possible for a concurrent backup to see the table as OFFLINE before it reaches DROP and this may cause the offline table to be included in the data. We don't expect the impact of these changes to be significant, but they may nonetheless be noticeable.
The default timeout was sometimes causing CI flakes. The recent changes to DROP behaviour caused more cases to be run in this test suite. This change can (and should) be reverted once we're able to parallelize test runs better in CI. Release note: None
1e7f861 to
854719d
Compare
|
bors r+ |
|
Build succeeded: |
…flag set Historically, offline tables were excluded from backups (e.g. restoring or importing tables) because incremental backups could not reason correctly about their MVCC history. Because all within tenant operations are now MVCC, BACKUP can begin incrementally backing up offline tables. Independent of the MVCC history thread, cockroachdb#83915 allowed backup to implicitly target _all_ offline descriptors (e.g. database, schema, table, type) to account for a class of compound online schema changes. Note with this PR, the user can't _explicitly_ target an offline descriptor (e.g. BACKUP DATABASE my_offline_database). This patch reverts part of cockroachdb#82915 such that BACKUP targets offline tables iff it contains an in-progress import and when an external flag is set. Since the class of online schema changes described in cockroachdb#82915 will never occur in 22.2, there isn't actually a need to back up a larget set of offline tables. This patch merely refactors the internal targeting logic in backupresolver.ResolveTargetsToDescriptors, and does not change how backups behave. In other words, after this patch, no offline descriptors will get targeted, as all callers set the function's backupImports flag to false. A future PR will change these callsites. Informs cockroachdb#76722 Release note: none
Historically, offline tables were excluded from backups (e.g. restoring or importing tables) because incremental backups could not reason correctly about their MVCC history. Now that all within tenant operations are now MVCC, BACKUP can begin incrementally backing up offline tables. Independent of the MVCC history thread, cockroachdb#83915 allowed backup to implicitly target _all_ offline descriptors (e.g. database, schema, table, type) to account for a class of compound online schema changes. This is a regression from the expected backup behavior and will get reverted in future work. This patch ensures that offline tables with importing data are not targetted iff an external flag is set. This patch merely refactors the internal targeting logic in backupresolver.ResolveTargetsToDescriptors, and does not change how backups behave. In other words, after this patch, no offline descriptors will get targeted, as all callers set the function's backupImports flag to false. A future PR will change these callsites. Informs cockroachdb#76722 Release note: none Release note (<category, see below>): <what> <show> <why>
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to it's post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. Release note (sql change): offline tables with in-progress IMPORT INTO imports (i.e. tables that existed before the import job began) now get backed up once the cluster has upgraded to v22.2. During a cluster/database Restore that executes AOST during the in-progress import, all data from the import is elided and leaves the table in its pre-import state (online, with pre-import data, if any). If the RESTORE executes AOST after the import completes, all the imported data ingests into the restored data. This patch only applies to database and cluster BACKUP and RESTORE, and a future PR should allow a user to BACKUP/RESTORE an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore.
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to it's post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. Release note (sql change): offline tables with in-progress IMPORT INTO imports (i.e. tables that existed before the import job began) now get backed up once the cluster has upgraded to v22.2. During a cluster/database Restore that executes AOST during the in-progress import, all data from the import is elided and leaves the table in its pre-import state (online, with pre-import data, if any). If the RESTORE executes AOST after the import completes, all the imported data ingests into the restored data. This patch only applies to database and cluster BACKUP and RESTORE, and a future PR should allow a user to BACKUP/RESTORE an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore.
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to it's post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. Release note (sql change): offline tables with in-progress IMPORT INTO imports (i.e. tables that existed before the import job began) now get backed up once the cluster has upgraded to v22.2. During a cluster/database Restore that executes AOST during the in-progress import, all data from the import is elided and leaves the table in its pre-import state (online, with pre-import data, if any). If the RESTORE executes AOST after the import completes, all the imported data ingests into the restored data. This patch only applies to database and cluster BACKUP and RESTORE, and a future PR should allow a user to BACKUP/RESTORE an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore.
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to it's post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. Release note (sql change): offline tables with in-progress IMPORT INTO imports (i.e. tables that existed before the import job began) now get backed up once the cluster has upgraded to v22.2. During a cluster/database Restore that executes AOST during the in-progress import, all data from the import is elided and leaves the table in its pre-import state (online, with pre-import data, if any). If the RESTORE executes AOST after the import completes, all the imported data ingests into the restored data. This patch only applies to database and cluster BACKUP and RESTORE, and a future PR should allow a user to BACKUP/RESTORE an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore.
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to it's post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. This patch only applies to BACKUP and RESTORE flavors that implicitly target an offline importing/restoring table. A future PR should allow a user to explicitly target an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore.
A previous PR (cockroachdb#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to it's post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. This patch only applies to BACKUP and RESTORE flavors that implicitly target an offline importing/restoring table. A future PR should allow a user to explicitly target an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore.
85920: backupccl: elide data from offline tables during RESTORE r=adityamaru a=msbutler A previous PR (#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to its post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. This patch only applies to BACKUP and RESTORE flavors that implicitly target an offline importing/restoring table. A future PR should allow a user to explicitly target an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore. 86826: workload/ycsb: fix --request-distribution flag r=nvanbenschoten a=nvanbenschoten This change fixes the handling of the `--request-distribution` flag so that any user overrides are respected. Before, the default request distribution for a given workload was always used, regardless of the flag. This bug dates back to #37804. I fear this has caused a good deal of confusion in the past, as we often turn to YCSB-A uniform as an example of an uncontended read/update workload. Release justification: workload only. 87047: pgwire: fix txn options for prepared BEGIN r=ZhouXing19 a=rafiss fixes #87012 Release note (bug fix): The statement tag for SHOW command results in the pgwire protocol no longer contain the number of returned rows. Release note (bug fix): Fixed a bug where the options given to the BEGIN TRANSACTION command would be ignored if the BEGIN was a prepared statement. Release justification: low risk bug fix 87113: ci: run `acceptance` test in `ci` config r=jlinder a=rickystewart We need this to get detailed info in the `test.xml`. Release justification: Non-production code changes Release note: None 87114: ci: fix nightlies that are invoking `bazci` incorrectly r=rail a=rickystewart Some of these nightlies were broken as a result of `c8d66a9c334dd87acd62190f314a4f3ec236ce82`. Release justification: Non-production code changes Release note: None 87116: bazci: delete unused `testdata` directory r=celiala a=rickystewart Release justification: Non-production code changes Release note: None Co-authored-by: Michael Butler <butler@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
The addition of a revertible transition makes it possible to combine
descriptor drops with other schema changes in a way that preserves their
revertibility while immediately making the dropped descriptor no longer
public.
Release note (sql change): DROP statements performed by the declarative
schema changer (which is the case by default) now transition the
descriptor states to OFFLINE in the initial schema change transaction,
before subsequently transitioning them to DROP in a subsequent
transaction executed by the schema change job. The behaviour of
changefeeds on dropped tables will be correspondingly affected.
Previously they'd throw an error about the table being dropped.
They may now do either that or throw an error about the table being
taken offline. Furthermore, it's possible for a concurrent backup
to see the table as OFFLINE before it reaches DROP and this may cause
the offline table to be included in the data. We don't expect the impact
of these changes to be significant, but they may nonetheless be noticeable.