sql_instance: migrate to rbr compatible index#98445
sql_instance: migrate to rbr compatible index#98445craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
6b69604 to
11a767b
Compare
f7bbe2e to
3ad8b79
Compare
63ecb4d to
d681cbe
Compare
d681cbe to
df42bba
Compare
df42bba to
7ccf9e9
Compare
ajwerner
left a comment
There was a problem hiding this comment.
This is close. Some of the logic is quite subtle and deserves a bit more commentary and another pass to try to make the control flow clearer.
Reviewed 13 of 22 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson)
pkg/sql/sqlinstance/instancestorage/instancecache.go line 223 at r2 (raw file):
// from the old implementation to the new implementation when the version // changes to V23_1_SystemRbrReadNew. func newMigrationCache(
this function is a beast.
pkg/sql/sqlinstance/instancestorage/instancecache.go line 253 at r2 (raw file):
// Use the background context because we don't want RPC cancellation to // prevent starting the new range feed. err := stopper.RunAsyncTask(context.Background(), "start-new-cache-implementation", func(ctx context.Context) {
nit: annotate the context with logtags. Probably cacheCtx := logtags.AddTags(context.Background(), logtags.FromContext(ctx))
pkg/sql/sqlinstance/instancestorage/instancecache.go line 256 at r2 (raw file):
log.Ops.Info(ctx, "starting new system.sql_instance cache") cache, err := newCache(ctx)
I'd feel better if there were a defer which sent on this channel the return value of a function. In general, I feel like it'd be nicer if you made closures for two functions which send on the channels and lay out the invariants about who is going to send on which channel. After having scrutinized this a bunch, I don't see obvious bugs if there are no panics, but this thing is surely complex.
pkg/sql/sqlinstance/instancestorage/instancecache.go line 306 at r2 (raw file):
select { case err = <-newReady:
I think there's a subtle race here. Imagine that the version change notification happens and then the stopper gets shut down. There's a case where the versionChanged field could get set to true, and the RunAsyncTask code can choose not to run. We'll log a warning but nobody will send on either channel. I think you can fix that by sending on the channel when unable to start the new cache.
All of this code is pretty subtle. My sense is that splitting out some of the logic into helpers can make the behavior a bit clearer.
pkg/sql/tests/system_table_test.go line 223 at r2 (raw file):
require.NoError(t, desctestutils.TestingValidateSelf(gen)) switch gen.GetID() {
Explain this in commentary
7ccf9e9 to
ac50d3e
Compare
jeffswenson
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @yuzefovich)
pkg/sql/sqlinstance/instancestorage/instancecache.go line 223 at r2 (raw file):
Previously, ajwerner wrote…
this function is a beast.
I reworked the function and tried to clean up some of the state. Let me know what you think.
pkg/sql/sqlinstance/instancestorage/instancecache.go line 256 at r2 (raw file):
Previously, ajwerner wrote…
I'd feel better if there were a defer which sent on this channel the return value of a function. In general, I feel like it'd be nicer if you made closures for two functions which send on the channels and lay out the invariants about who is going to send on which channel. After having scrutinized this a bunch, I don't see obvious bugs if there are no panics, but this thing is surely complex.
I've reworked the implementation. Let me know if it is still difficult to follow and I will take another crack at it.
pkg/sql/sqlinstance/instancestorage/instancecache.go line 306 at r2 (raw file):
Previously, ajwerner wrote…
I think there's a subtle race here. Imagine that the version change notification happens and then the stopper gets shut down. There's a case where the
versionChangedfield could get set to true, and theRunAsyncTaskcode can choose not to run. We'll log a warning but nobody will send on either channel. I think you can fix that by sending on the channel when unable to start the new cache.All of this code is pretty subtle. My sense is that splitting out some of the logic into helpers can make the behavior a bit clearer.
Done.
pkg/sql/tests/system_table_test.go line 223 at r2 (raw file):
Previously, ajwerner wrote…
Explain this in commentary
Done.
04895a4 to
192893a
Compare
Migrate sql_instance to a regional by row compatible index. The version gates are intended to follow the protocol discussed in the comment at the top of upgrades/system_rbr_indexes.go The crdb_region column ID was changed from 5 to 6 in order to match the logical order in which the sql_addr and crdb_region columns were added. The exact ID doesn't really matter in this case since the sql_addr column was added in v23.1. Most of the rbr migration work is the same for sqlliveness, lease, and sql_instances. The main exception to that is the migration cache used by the sql instance reader. The cache is backed by a range feed and we need to switch implementations when the version setting changes. Part of cockroachdb#94843 Relase note: None
192893a to
c9d2ad8
Compare
|
bors r+ |
|
Build failed (retrying...): |
|
Build succeeded: |
Migrate sql_instance to a regional by row compatible index. The version
gates are intended to follow the protocol discussed in the comment at
the top of upgrades/system_rbr_indexes.go
The crdb_region column ID was changed from 5 to 6 in order to match the
logical order in which the sql_addr and crdb_region columns were added.
The exact ID doesn't really matter in this case since the sql_addr
column was added in v23.1.
Most of the rbr migration work is the same for sqlliveness, lease, and
sql_instances. The main exception to that is the migration cache used by
the sql instance reader. The cache is backed by a range feed and we need
to switch implementations when the version setting changes.
Part of #94843
Relase note: None