Skip to content

*: use bootstrap.BootstrappedSystemIDChecker in tests#75464

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
postamar:remove-system-id-oracle
Jan 27, 2022
Merged

*: use bootstrap.BootstrappedSystemIDChecker in tests#75464
craig[bot] merged 2 commits intocockroachdb:masterfrom
postamar:remove-system-id-oracle

Conversation

@postamar
Copy link
Copy Markdown

Instead of relying on hard-coded constants, tests now rely on the
bootstrapped system schema to determine which IDs belong to system
tables or not.

Release note: None

@postamar postamar requested a review from a team January 24, 2022 22:25
@postamar postamar requested review from a team as code owners January 24, 2022 22:25
@postamar postamar requested a review from a team January 24, 2022 22:25
@postamar postamar requested a review from a team as a code owner January 24, 2022 22:25
@postamar postamar requested review from HonoreDB and dt and removed request for a team January 24, 2022 22:25
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@postamar postamar requested review from ajwerner and removed request for a team, HonoreDB and dt January 24, 2022 22:26
@postamar postamar force-pushed the remove-system-id-oracle branch from 7695066 to 108cd22 Compare January 25, 2022 02:19
@postamar postamar requested review from a team as code owners January 25, 2022 02:19
@postamar postamar requested review from srosenberg and removed request for a team January 25, 2022 02:19
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. I sort of think it might be better to plumb in an object as we might eventually make it so that we bootstrap testclusters with a store dir from an old version or from a different upgrade path, but for now this is good.

@postamar postamar force-pushed the remove-system-id-oracle branch 3 times, most recently from 1919854 to 1ecb905 Compare January 25, 2022 22:05
@postamar
Copy link
Copy Markdown
Author

Sorry for the CI noise. I'm getting there. I followed up on this PR by removing the keys.SystemIDChecker interface altogether. There were a few use-cases outside of tests, especially in backupccl/importccl but I was able to work around all of them.

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: other than the restore questions

Reviewed 20 of 52 files at r1, 7 of 9 files at r2, 28 of 28 files at r3, 37 of 41 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @postamar, and @srosenberg)


pkg/ccl/backupccl/restore_job.go, line 121 at r4 (raw file):

	ctx, span := tracing.ChildSpan(ctx, "WriteDescriptors")
	defer span.Finish()
	defer func() {

did anything change here other than de-denting and decorating the error? I didn't spot anything


pkg/ccl/backupccl/restore_job.go, line 756 at r4 (raw file):

	for id := range details.DescriptorRewrites {
		if id > tempSystemDBID {
			tempSystemDBID = id

I wish that I understood this logic. It's going to set the tempSystemDBID to the max value of the rewrite sources? What is that?

It seems to me like this maybe just works because we assume the database is empty and so we never end up in this loop? I'm stumped by this code. @dt do you have any ideas? This was added in #55685.

I'm not blocking this PR on retaining this logic, but I do want to understand what's going on here.


pkg/ccl/backupccl/restore_job.go, line 881 at r4 (raw file):

	var tempSystemDBID descpb.ID
	if details.DescriptorCoverage == tree.AllDescriptors {
		for id := range details.DescriptorRewrites {

does this have a gnarly edge case if there are no rewrites? In particular, imagine restoring an empty full cluster backup.


pkg/ccl/backupccl/restore_job.go, line 2490 at r4 (raw file):

		if table.GetParentID() != tempSystemDBID {
			continue
		}

Does this definitely do nothing? Seems like this could very easily be meaningful.

Code quote:

		if table.GetParentID() != tempSystemDBID {
			continue
		}

pkg/keys/system.go, line 44 at r3 (raw file):

// MinUserDescriptorID returns the smallest possible non-system descriptor ID
// after a cluster is bootstrapped.
func MinUserDescriptorID(idChecker SystemIDChecker) uint32 {

Note to self: come back and caveat and/or remove this function

Code quote:

// MinUserDescriptorID returns the smallest possible non-system descriptor ID
// after a cluster is bootstrapped.
func MinUserDescriptorID(idChecker SystemIDChecker) uint32 {

pkg/sql/catalog/bootstrap/metadata.go, line 409 at r3 (raw file):

// from the minimum value allowed in a simple unit test setting.
func TestingUserDescID(offset uint32) uint32 {
	return keys.MinUserDescriptorID(BootstrappedSystemIDChecker()) + offset

in practice doesn't this end up being dumb because of the defaultdb and postgres databases? We should just bootstrap those too and get rid of the forever migration. I hate that thing. Leave it to me for a subsequent PR. Just mentioning that this isn't really right and might work by happenstance in the few use cases.

Copy link
Copy Markdown
Author

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I provided some answers and added some comments in the code that should satisfy you.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @srosenberg)


pkg/ccl/backupccl/restore_job.go, line 121 at r4 (raw file):

Previously, ajwerner wrote…

did anything change here other than de-denting and decorating the error? I didn't spot anything

Nothing of substance changed here. The error decoration was, in fact, already there. I removed the closure because it was useless and annoying. I meant to put that in a separate commit and forgot, sorry.


pkg/ccl/backupccl/restore_job.go, line 756 at r4 (raw file):

Previously, ajwerner wrote…

I wish that I understood this logic. It's going to set the tempSystemDBID to the max value of the rewrite sources? What is that?

It seems to me like this maybe just works because we assume the database is empty and so we never end up in this loop? I'm stumped by this code. @dt do you have any ideas? This was added in #55685.

I'm not blocking this PR on retaining this logic, but I do want to understand what's going on here.

Note: for some reason reviewable is placing this comment nowhere near the code it's quoting, which pertains to the computation of tempSystemDBID when executing the restore.

It took me a while to figure out too. The restore planner pre-allocates enough descriptor IDs for each of the descriptors in the backup, plus one extra for the temporary system database in the case where it's planning full cluster restore. Thus, the temporary system database descriptor ID is the last in this pre-allocated range, which is encoded in the keys of DescriptorRewrites.

This also explains how we "know" that DescriptorRewrites isn't empty: it never is in the case of a full cluster restore, because we always at least restore the system database.

I mean, really, this ID should be stored in a separate field in the restore job details, instead of inferring it like this. I'll add a comment explaining this.


pkg/ccl/backupccl/restore_job.go, line 881 at r4 (raw file):

Previously, ajwerner wrote…

does this have a gnarly edge case if there are no rewrites? In particular, imagine restoring an empty full cluster backup.

As explained above, the if above ensures that there is at least one rewrite, that of the temp system db desc.


pkg/ccl/backupccl/restore_job.go, line 2490 at r4 (raw file):

Previously, ajwerner wrote…

Does this definitely do nothing? Seems like this could very easily be meaningful.

I'm not finding this in my current revision, and the comment seems out of place in reviewable. Is this comment stale?


pkg/keys/system.go, line 44 at r3 (raw file):

Previously, ajwerner wrote…

Note to self: come back and caveat and/or remove this function

This has since been removed.


pkg/sql/catalog/bootstrap/metadata.go, line 409 at r3 (raw file):

Previously, ajwerner wrote…

in practice doesn't this end up being dumb because of the defaultdb and postgres databases? We should just bootstrap those too and get rid of the forever migration. I hate that thing. Leave it to me for a subsequent PR. Just mentioning that this isn't really right and might work by happenstance in the few use cases.

I certainly have no objection!

@postamar postamar force-pushed the remove-system-id-oracle branch from 1ecb905 to 08d49ca Compare January 27, 2022 01:11
Instead of relying on hard-coded constants, tests now rely on the
bootstrapped system schema to determine which IDs belong to system
tables or not.

Release note: None
@postamar postamar force-pushed the remove-system-id-oracle branch from 08d49ca to 424d476 Compare January 27, 2022 01:58
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @postamar, and @srosenberg)


pkg/ccl/backupccl/restore_job.go, line 2490 at r4 (raw file):

Previously, postamar (Marius Posta) wrote…

I'm not finding this in my current revision, and the comment seems out of place in reviewable. Is this comment stale?

Well that's exactly it, you removed this logic, but it's not clear that this logic was doing nothing. It was checking whether the table had its parentID set to the tempSystemDBID. It seem possible that there was a code path that called this with tables in the slice that had some other parent set and thus should be skipped.


pkg/keys/system.go, line 44 at r3 (raw file):

Previously, postamar (Marius Posta) wrote…

This has since been removed.

💯

Copy link
Copy Markdown
Author

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @srosenberg)


pkg/ccl/backupccl/restore_job.go, line 2490 at r4 (raw file):

Previously, ajwerner wrote…

Well that's exactly it, you removed this logic, but it's not clear that this logic was doing nothing. It was checking whether the table had its parentID set to the tempSystemDBID. It seem possible that there was a code path that called this with tables in the slice that had some other parent set and thus should be skipped.

I misunderstood, sorry. I'll put it back. Chesterton's fence and all that.

Code quote:

		if table.GetParentID() != tempSystemDBID {
			continue
		}

Now that the non-test use cases of the keys.SystemIDChecker oracle have
been clearly identified, these have been replaced with something
equivalent which doesn't require such an oracle.

In so doing, we reintroduce the keys.MaxReservedDescID = 49 constant, as
this magic value still has a special significance when it comes to
splitting ranges. It also serves as a lower bound on the IDs of
non-system descriptors.

Release note: None
@postamar postamar force-pushed the remove-system-id-oracle branch from 424d476 to 9246c20 Compare January 27, 2022 03:12
@postamar
Copy link
Copy Markdown
Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 27, 2022

Build succeeded:

@craig craig bot merged commit 83e2df7 into cockroachdb:master Jan 27, 2022
@postamar postamar deleted the remove-system-id-oracle branch January 27, 2022 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants