systemschema: support rbr system.sql_instances#92010
systemschema: support rbr system.sql_instances#92010craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
The first 3 commits are from #91970. |
ajwerner
left a comment
There was a problem hiding this comment.
These migrations are going to be good times.
Reviewed 1 of 4 files at r4, 19 of 20 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @jaylim-crl, @jeffswenson, and @msbutler)
pkg/sql/catalog/systemschema/system.go line 667 at r5 (raw file):
session_id BYTES, locality JSONB, crdb_region BYTES NOT NULL,
should we mark this as NOT VISIBLE?
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 233 at r5 (raw file):
ctx context.Context, region []byte, db dbScan, ) (base.SQLInstanceID, error) { rows, err := s.getInstanceRows(ctx, region, db)
This is in the tail, but we might want for the initial grabbing of an ID to do a SKIP LOCKED scan in case there's an ongoing task to re-allocate IDs between regions. Honestly, I feel like the reallocation code could use another pass.
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 257 at r5 (raw file):
} // reclaimRegion will reclaim instances belong to expired sessions and delete
nit: belonging.
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 259 at r5 (raw file):
// reclaimRegion will reclaim instances belong to expired sessions and delete // surplus sessions. func (s *Storage) reclaimRegion(ctx context.Context, region []byte) error {
If we were using the cached reader we wouldn't need to worry about blocking when checking for liveness. There's a real with the cached reader that it won't be up-to-date, but that only really matters if everything is in a crash loop using up sessions which are indeed dead. If there's literally nothing to grab, perhaps the right thing to do is find a way to ask the cache when its in-flight queries are done or something before trying again. Exponential backoff in the retry if there's literally nothing available is probably just as good.
Code quote:
// reclaimRegion will reclaim instances belong to expired sessions and delete
// surplus sessions.
func (s *Storage) reclaimRegion(ctx context.Context, region []byte) error {
// In a seperate transaction, read all rows that exist in the region. Thispkg/sql/sqlinstance/instancestorage/row_codec.go line 167 at r5 (raw file):
instanceID: instanceID, } if value == nil || !value.IsPresent() {
value.IsPresent has a nil check in it already
pkg/ccl/multiregionccl/multiregion_system_table_test.go line 144 at r5 (raw file):
tDB := sqlutils.MakeSQLRunner(tenantSQL) tDB.Exec(t, `SET descriptor_validation = off`)
don't need this anymore
b0b4900 to
87441c6
Compare
jeffswenson
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jaylim-crl, and @msbutler)
pkg/sql/catalog/systemschema/system.go line 667 at r5 (raw file):
Previously, ajwerner wrote…
should we mark this as
NOT VISIBLE?
I would prefer to keep the table visible. Since this is an internal system state, I think we should optimize for showing the actual state of the system.
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 233 at r5 (raw file):
Previously, ajwerner wrote…
This is in the tail, but we might want for the initial grabbing of an ID to do a SKIP LOCKED scan in case there's an ongoing task to re-allocate IDs between regions. Honestly, I feel like the reallocation code could use another pass.
Done
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 259 at r5 (raw file):
Previously, ajwerner wrote…
If we were using the cached reader we wouldn't need to worry about blocking when checking for liveness. There's a real with the cached reader that it won't be up-to-date, but that only really matters if everything is in a crash loop using up sessions which are indeed dead. If there's literally nothing to grab, perhaps the right thing to do is find a way to ask the cache when its in-flight queries are done or something before trying again. Exponential backoff in the retry if there's literally nothing available is probably just as good.
We were actually using the cached reader. I updated the PR to remove the cache wrapper, since my changes allow us to use the blocking API instead.
reclaimRegion is only called by the background routine, so blocking on an IsAlive call to clean things up is fine (and preferable). reclaimRegion is only writing to a single region, so it will use the 1PC path and avoid locking.
Allocating on start up will only attempt to reclaim local rows. The only reason getAvailableInstanceIDForRegion reclaims unused IDs is it is cheaper to reclaim a local expired row than it is to perform the global allocation operation. It also improves availability as it can reclaim local rows even if a region is unavailable.
pkg/sql/sqlinstance/instancestorage/row_codec.go line 167 at r5 (raw file):
Previously, ajwerner wrote…
value.IsPresenthas a nil check in it already
That is surprising and useful.
pkg/ccl/multiregionccl/multiregion_system_table_test.go line 144 at r5 (raw file):
Previously, ajwerner wrote…
don't need this anymore
Done
ajwerner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, 2 of 5 files at r2, 2 of 2 files at r3, 1 of 7 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @jaylim-crl, @jeffswenson, and @msbutler)
pkg/sql/catalog/systemschema/system.go line 667 at r5 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
I would prefer to keep the table visible. Since this is an internal system state, I think we should optimize for showing the actual state of the system.
Fair enough. In general with REGIONAL BY ROW, this is NOT VISIBLE.
87441c6 to
41fd56d
Compare
Refactor the codec interface. The new codec interface will be used to support the legacy index format and the RBR compatible index format. Part of cockroachdb#85736
1ed957c to
ea3a299
Compare
jaylim-crl
left a comment
There was a problem hiding this comment.
Reviewed 27 of 28 files at r8, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @jeffswenson, and @msbutler)
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 220 at r8 (raw file):
// trigger reclaiming, and retry. This blocks until the reclaim process // completes. if err := s.generateAvailableInstanceRows(ctx, [][]byte{region}, sessionExpiration); err != nil {
nit: can we add a comment about only needing our own region here to reduce cold start latency?
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 266 at r8 (raw file):
// reclaimRegion will reclaim instances belonging to expired sessions and // delete surplus sessions. reclaimRegion should only be called by the back // ground clean up and allocation job.
nit: background?
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 268 at r8 (raw file):
// ground clean up and allocation job. func (s *Storage) reclaimRegion(ctx context.Context, region []byte) error { // In a seperate transaction, read all rows that exist in the region. This
nit: s/seperate/separate
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 338 at r8 (raw file):
batch := txn.NewBatch() if waitPolicy == lock.WaitPolicy_SkipLocked { batch.Header.WaitPolicy = waitPolicy
How come we only set the wait policy here if it is SkipLocked? If Block isn't required, I think we should make it take a bool instead (e.g. skipRowIfLocked bool)
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 439 at r8 (raw file):
// regions. func (s *Storage) generateAvailableInstanceRows( ctx context.Context, regions [][]byte, sessionExpiration hlc.Timestamp,
One thing to note about taking in an additional [][]byte here means that the reclaimGroup may not do what it does. e.g. If the caller calls generateAvailableInstanceRows(...,[]byte{region1}, ...), and generateAvailableInstanceRows(..., []byte{region2}, ...) at the same time, only one of them will run, given that the key is the same. If we include the region in the key, then that avoids the initial purpose.
Does it matter if the retry loop in CreateInstance only handles its own instance? (e.g. we talked about how round trip doesn't matter much in that case if we had to allocate).
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 447 at r8 (raw file):
_, _, err := s.reclaimGroup.Do("allocate-instance-ids", func() (interface{}, error) { err := s.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { instances, err := s.getInstanceRows(ctx /*global*/, nil, txn, lock.WaitPolicy_Block)
nit: I think you meant to put global for nil?
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 448 at r8 (raw file):
_, _, err := s.reclaimGroup.Do("reclaim-instance-ids", func() (interface{}, error) { err := s.db.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { // Set the transaction deadline to the session expiration to ensure
Why don't we continue setting the transaction deadline here?
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 496 at r8 (raw file):
target int, regions [][]byte, instances []instancerow, ) (toAllocate []instancerow) { // available per region
nit: unnecessary comment I think
bf92324 to
2a06280
Compare
jeffswenson
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @jaylim-crl, and @msbutler)
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 220 at r8 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
nit: can we add a comment about only needing our own region here to reduce cold start latency?
Done.
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 338 at r8 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
How come we only set the wait policy here if it is SkipLocked? If Block isn't required, I think we should make it take a bool instead (e.g.
skipRowIfLocked bool)
Thanks! I removed the if block. The if block was intended as a temporary test.
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 439 at r8 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
One thing to note about taking in an additional
[][]bytehere means that the reclaimGroup may not do what it does. e.g. If the caller calls generateAvailableInstanceRows(...,[]byte{region1}, ...), and generateAvailableInstanceRows(..., []byte{region2}, ...) at the same time, only one of them will run, given that the key is the same. If we include the region in the key, then that avoids the initial purpose.Does it matter if the retry loop in CreateInstance only handles its own instance? (e.g. we talked about how round trip doesn't matter much in that case if we had to allocate).
That is a good point. I decided to remove the single flight. It is okay for the transactions to run in parallel, so the single flight is an optimization to reduce contention. Since one is run at start up and the other is run 10 minutes after start up, it is unlikely the transactions will run at the same time.
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 447 at r8 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
nit: I think you meant to put global for
nil?
Thanks! I fixed it. This is an annoying quirk of the go format tool. It reformats func(arg1, /*comment*/ arg2) into func(arg1 /*comment*/, arg2)
pkg/sql/sqlinstance/instancestorage/instancestorage.go line 448 at r8 (raw file):
Previously, jaylim-crl (Jay Lim) wrote…
Why don't we continue setting the transaction deadline here?
We don't need to set a transaction deadline. I have a PR that I will send for review that removes the expiration argument and transaction deadline from CreateInstance. It's valid to create an instance owned by an expired session. The session expiration only needs to be checked when using the instance id.
jaylim-crl
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r9, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @jeffswenson, and @msbutler)
Follow up to cockroachdb#92010 and cockroachdb#91694. This commit addresses a TODO comment by loading all region enum members from the system database when reclaiming IDs. The list of regions is read each time the loop runs since there's a possibiilty where regions are added or removed from the system DB. Epic: None Release note: None
2a06280 to
5aa0a18
Compare
|
I commented out the `ALTER TABLE ... SET LOCALITY REGIONAL BY ROW' commands in TestMrSystemDatabase. They caused a difficult to debug test flake. alterTableLocalityToRegionalByRow is defensive. It will generate a new primary key index even though the existing primary key is compatible with regional by row. Generating the new index deletes the existing sqlliveness index, which causes the session heart beat loop to terminate. |
The sql_instances table now supports an index format compatible with regional by row tables. This is building on the work from cockroachdb#90408. ``` $ COCKROACH_MR_SYSTEM_DATABASE=true ./cockroach-short demo demo@127.0.0.1:26257/movr> SELECT * FROM system.sql_instances LIMIT 2; id | addr | session_id | locality | crdb_region -----+-----------------+------------------------------------------+-----------------------------------+-------------- 1 | 127.0.0.1:26257 | \x0101800c678f05d4114b3aa17bcd61d336308a | {"Tiers": "region=us-east1,az=b"} | \x80 2 | NULL | NULL | NULL | \x80 (2 rows) $ COCKROACH_MR_SYSTEM_DATABASE=false ./cockroach-short demo demo@127.0.0.1:26257/movr> SELECT * FROM system.sql_instances LIMIT 2; id | addr | session_id | locality -----+-----------------+------------------------------------------+------------------------------------ 1 | 127.0.0.1:26257 | \x010180fb9227b38ca9445ba27ac8b1f5a2204d | {"Tiers": "region=us-east1,az=b"} 2 | NULL | NULL | NULL (2 rows) ``` Part of cockroachdb#85736
5aa0a18 to
654fa25
Compare
|
bors r+ |
|
Build succeeded: |
Follow up to cockroachdb#92010 and cockroachdb#91694. This commit addresses a TODO comment by loading all region enum members from the system database when reclaiming IDs. The list of regions is read each time the loop runs since there's a possibiilty where regions are added or removed from the system DB. Epic: None Release note: None
Follow up to cockroachdb#92010 and cockroachdb#91694. This commit addresses a TODO comment by loading all region enum members from the system database when reclaiming IDs. The list of regions is read each time the loop runs since there's a possibiilty where regions are added or removed from the system DB. Epic: None Release note: None
Follow up to cockroachdb#92010 and cockroachdb#91694. This commit addresses a TODO comment by loading all region enum members from the system database when reclaiming IDs. The list of regions is read each time the loop runs since there's a possibiilty where regions are added or removed from the system DB. Epic: None Release note: None
92035: sql: use collection in zone config hydration r=chengxiong-ruan a=chengxiong-ruan part of #88571 Previously, we use a `getKey` function (which is basically a kv lookup) to fetch zone configs and table descriptors required for zone config hydration. It has redundant work of deserializing protobufs done by descriptor collection. This commit change it to use a helper interface to lookup zoneconfig and descriptor directly. This also make sure that we ustilize in cache zone configs when possible. Release note: None 92248: instancestorage: load all regions from the system DB when reclaiming IDs r=JeffSwenson a=jaylim-crl Follow up to #92010 and #91694. This commit addresses a TODO comment by loading all region enum members from the system database when reclaiming IDs. The list of regions is read each time the loop runs since there's a possibiilty where regions are added or removed from the system DB. Epic: None Release note: None 92709: multitenant: replace EXPERIMENTAL_RELOCATE StoreID to NodeID lookup r=arulajmani a=ecwall Fixes #91628 Replace EXPERIMENTAL_RELOCATE's StoreID -> NodeID lookup's gossip impl with an in-memory cache. Release note: None 92970: kvserver: track allocator errors via replicate queue action metrics r=AlexTalks a=AlexTalks While allocator errors were intended to be reported via the "Replicate Queue Failures by Allocator Action" metric, i.e. `queue.replicate.<action>.error`, these were not getting reported. This change makes sure to report these errors whenever we are not in a dry run. Epic: None Release note: None Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com> Co-authored-by: Jay <jay@cockroachlabs.com> Co-authored-by: Evan Wall <wall@cockroachlabs.com> Co-authored-by: Alex Sarkesian <sarkesian@cockroachlabs.com>
crdb_internal.unsafe_optimize_system_database optimizes the system database for multi region deployments by configuring latency critical tables as global or regional by row. Tables that are sensitive to read latency were converted to global. Table that are sensitive to write latency were converted to regional by row. system.sqlliveness and system.sql_instances were reworked by (cockroachdb#90408) and (cockroachdb#92010) to have a primary key index that is binary compatible with regional by row tables. The conversion to regional by row is implemented as a built in function because it is not possible to use `ALTER TABLE ... SET LOCALITY REGIONAL BY ROW` to perform the conversion. Using alter table to convert the RBR tables has two challenges: 1. There is no way to convert the crdb_region column from []byte to the region enum type. 2. Changing the locality to RBR always rewrites the primary index. The sqlliveness and sql_instance subsystems use the raw kv API and expect a specific index ID. Rewriting the primary index changes the ID and breaks the subsystems. Release note: None Part of cockroachdb#85736
crdb_internal.unsafe_optimize_system_database optimizes the system database for multi region deployments by configuring latency critical tables as global or regional by row. Tables that are sensitive to read latency were converted to global. Table that are sensitive to write latency were converted to regional by row. system.sqlliveness and system.sql_instances were reworked by (cockroachdb#90408) and (cockroachdb#92010) to have a primary key index that is binary compatible with regional by row tables. The conversion to regional by row is implemented as a built in function because it is not possible to use `ALTER TABLE ... SET LOCALITY REGIONAL BY ROW` to perform the conversion. Using alter table to convert the RBR tables has two challenges: 1. There is no way to convert the crdb_region column from []byte to the region enum type. 2. Changing the locality to RBR always rewrites the primary index. The sqlliveness and sql_instance subsystems use the raw kv API and expect a specific index ID. Rewriting the primary index changes the ID and breaks the subsystems. Release note: None Part of cockroachdb#85736
crdb_internal.unsafe_optimize_system_database optimizes the system database for multi region deployments by configuring latency critical tables as global or regional by row. Tables that are sensitive to read latency were converted to global. Table that are sensitive to write latency were converted to regional by row. system.sqlliveness and system.sql_instances were reworked by (cockroachdb#90408) and (cockroachdb#92010) to have a primary key index that is binary compatible with regional by row tables. The conversion to regional by row is implemented as a built in function because it is not possible to use `ALTER TABLE ... SET LOCALITY REGIONAL BY ROW` to perform the conversion. Using alter table to convert the RBR tables has two challenges: 1. There is no way to convert the crdb_region column from []byte to the region enum type. 2. Changing the locality to RBR always rewrites the primary index. The sqlliveness and sql_instance subsystems use the raw kv API and expect a specific index ID. Rewriting the primary index changes the ID and breaks the subsystems. Release note: None Part of cockroachdb#85736
92395: multiregionccl: add unsafe_optimize_system_database r=JeffSwenson a=JeffSwenson crdb_internal.unsafe_optimize_system_database optimizes the system database for multi region deployments by configuring latency critical tables as global or regional by row. Tables that are sensitive to read latency were converted to global. Table that are sensitive to write latency were converted to regional by row. system.sqlliveness and system.sql_instances were reworked by (#90408) and (#92010) to have a primary key index that is binary compatible with regional by row tables. The conversion to regional by row is implemented as a built in function because it is not possible to use `ALTER TABLE ... SET LOCALITY REGIONAL BY ROW` to perform the conversion. Using alter table to convert the RBR tables has two challenges: 1. There is no way to convert the crdb_region column from []byte to the region enum type. 2. Changing the locality to RBR always rewrites the primary index. The sqlliveness and sql_instance subsystems use the raw kv API and expect a specific index ID. Rewriting the primary index changes the ID and breaks the subsystems. Release note: None Part of #85736 Co-authored-by: Jeff <swenson@cockroachlabs.com>
instancestorage: clean up codec interface
Refactor the codec interface. The new codec interface will be used to
support the legacy index format and the RBR compatible index format.
Part of #85736
systemschema: support rbr system.sql_instances
The sql_instances table now supports an index format compatible with
regional by row tables. This is building on the work from #90408.
Part of #85736