Skip to content

sql_instance: migrate to rbr compatible index#98445

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
jeffswenson:jeffswenson-sqlinstance-migration
Mar 15, 2023
Merged

sql_instance: migrate to rbr compatible index#98445
craig[bot] merged 1 commit intocockroachdb:masterfrom
jeffswenson:jeffswenson-sqlinstance-migration

Conversation

@jeffswenson
Copy link
Copy Markdown
Collaborator

@jeffswenson jeffswenson commented Mar 11, 2023

Migrate sql_instance to a regional by row compatible index. The version
gates are intended to follow the protocol discussed in the comment at
the top of upgrades/system_rbr_indexes.go

The crdb_region column ID was changed from 5 to 6 in order to match the
logical order in which the sql_addr and crdb_region columns were added.
The exact ID doesn't really matter in this case since the sql_addr
column was added in v23.1.

Most of the rbr migration work is the same for sqlliveness, lease, and
sql_instances. The main exception to that is the migration cache used by
the sql instance reader. The cache is backed by a range feed and we need
to switch implementations when the version setting changes.

Part of #94843

Relase note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@jeffswenson jeffswenson force-pushed the jeffswenson-sqlinstance-migration branch 2 times, most recently from 6b69604 to 11a767b Compare March 12, 2023 18:59
@jeffswenson jeffswenson changed the title sqlinstance: migrate to rbr compatible table sql_instance: migrate to rbr compatible table Mar 12, 2023
@jeffswenson jeffswenson force-pushed the jeffswenson-sqlinstance-migration branch 2 times, most recently from f7bbe2e to 3ad8b79 Compare March 12, 2023 21:54
@jeffswenson jeffswenson force-pushed the jeffswenson-sqlinstance-migration branch 4 times, most recently from 63ecb4d to d681cbe Compare March 13, 2023 20:37
@jeffswenson jeffswenson changed the title sql_instance: migrate to rbr compatible table sql_instance: migrate to rbr compatible index Mar 13, 2023
@jeffswenson jeffswenson force-pushed the jeffswenson-sqlinstance-migration branch from d681cbe to df42bba Compare March 13, 2023 20:51
@jeffswenson jeffswenson marked this pull request as ready for review March 13, 2023 20:52
@jeffswenson jeffswenson requested a review from a team March 13, 2023 20:52
@jeffswenson jeffswenson requested review from a team as code owners March 13, 2023 20:52
@jeffswenson jeffswenson force-pushed the jeffswenson-sqlinstance-migration branch from df42bba to 7ccf9e9 Compare March 13, 2023 21:18
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close. Some of the logic is quite subtle and deserves a bit more commentary and another pass to try to make the control flow clearer.

Reviewed 13 of 22 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson)


pkg/sql/sqlinstance/instancestorage/instancecache.go line 223 at r2 (raw file):

// from the old implementation to the new implementation when the version
// changes to V23_1_SystemRbrReadNew.
func newMigrationCache(

this function is a beast.


pkg/sql/sqlinstance/instancestorage/instancecache.go line 253 at r2 (raw file):

		// Use the background context because we don't want RPC cancellation to
		// prevent starting the new range feed.
		err := stopper.RunAsyncTask(context.Background(), "start-new-cache-implementation", func(ctx context.Context) {

nit: annotate the context with logtags. Probably cacheCtx := logtags.AddTags(context.Background(), logtags.FromContext(ctx))


pkg/sql/sqlinstance/instancestorage/instancecache.go line 256 at r2 (raw file):

			log.Ops.Info(ctx, "starting new system.sql_instance cache")

			cache, err := newCache(ctx)

I'd feel better if there were a defer which sent on this channel the return value of a function. In general, I feel like it'd be nicer if you made closures for two functions which send on the channels and lay out the invariants about who is going to send on which channel. After having scrutinized this a bunch, I don't see obvious bugs if there are no panics, but this thing is surely complex.


pkg/sql/sqlinstance/instancestorage/instancecache.go line 306 at r2 (raw file):

	select {
	case err = <-newReady:

I think there's a subtle race here. Imagine that the version change notification happens and then the stopper gets shut down. There's a case where the versionChanged field could get set to true, and the RunAsyncTask code can choose not to run. We'll log a warning but nobody will send on either channel. I think you can fix that by sending on the channel when unable to start the new cache.

All of this code is pretty subtle. My sense is that splitting out some of the logic into helpers can make the behavior a bit clearer.


pkg/sql/tests/system_table_test.go line 223 at r2 (raw file):

		require.NoError(t, desctestutils.TestingValidateSelf(gen))

		switch gen.GetID() {

Explain this in commentary

@jeffswenson jeffswenson force-pushed the jeffswenson-sqlinstance-migration branch from 7ccf9e9 to ac50d3e Compare March 14, 2023 13:54
@jeffswenson jeffswenson requested a review from a team as a code owner March 14, 2023 13:54
@jeffswenson jeffswenson requested a review from yuzefovich March 14, 2023 13:54
Copy link
Copy Markdown
Collaborator Author

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @yuzefovich)


pkg/sql/sqlinstance/instancestorage/instancecache.go line 223 at r2 (raw file):

Previously, ajwerner wrote…

this function is a beast.

I reworked the function and tried to clean up some of the state. Let me know what you think.


pkg/sql/sqlinstance/instancestorage/instancecache.go line 256 at r2 (raw file):

Previously, ajwerner wrote…

I'd feel better if there were a defer which sent on this channel the return value of a function. In general, I feel like it'd be nicer if you made closures for two functions which send on the channels and lay out the invariants about who is going to send on which channel. After having scrutinized this a bunch, I don't see obvious bugs if there are no panics, but this thing is surely complex.

I've reworked the implementation. Let me know if it is still difficult to follow and I will take another crack at it.


pkg/sql/sqlinstance/instancestorage/instancecache.go line 306 at r2 (raw file):

Previously, ajwerner wrote…

I think there's a subtle race here. Imagine that the version change notification happens and then the stopper gets shut down. There's a case where the versionChanged field could get set to true, and the RunAsyncTask code can choose not to run. We'll log a warning but nobody will send on either channel. I think you can fix that by sending on the channel when unable to start the new cache.

All of this code is pretty subtle. My sense is that splitting out some of the logic into helpers can make the behavior a bit clearer.

Done.


pkg/sql/tests/system_table_test.go line 223 at r2 (raw file):

Previously, ajwerner wrote…

Explain this in commentary

Done.

@jeffswenson jeffswenson force-pushed the jeffswenson-sqlinstance-migration branch 2 times, most recently from 04895a4 to 192893a Compare March 14, 2023 15:14
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Migrate sql_instance to a regional by row compatible index. The version
gates are intended to follow the protocol discussed in the comment at
the top of upgrades/system_rbr_indexes.go

The crdb_region column ID was changed from 5 to 6 in order to match the
logical order in which the sql_addr and crdb_region columns were added.
The exact ID doesn't really matter in this case since the sql_addr
column was added in v23.1.

Most of the rbr migration work is the same for sqlliveness, lease, and
sql_instances. The main exception to that is the migration cache used by
the sql instance reader. The cache is backed by a range feed and we need
to switch implementations when the version setting changes.

Part of cockroachdb#94843

Relase note: None
@jeffswenson jeffswenson force-pushed the jeffswenson-sqlinstance-migration branch from 192893a to c9d2ad8 Compare March 14, 2023 19:43
@jeffswenson
Copy link
Copy Markdown
Collaborator Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 14, 2023

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 15, 2023

Build succeeded:

@craig craig bot merged commit d5ff210 into cockroachdb:master Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants