sqlinstance: Update instancereader with cache backed by rangefeed#69976
sqlinstance: Update instancereader with cache backed by rangefeed#69976craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rimadeodhar)
pkg/sql/catalog/lease/lease_internal_test.go, line 1 at r1 (raw file):
// Copyright 2015 The Cockroach Authors.
?
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 77 at r1 (raw file):
Quoted 10 lines of code…
cacheConfig := cache.Config{ Policy: cache.CacheLRU, ShouldEvict: func(size int, key, value interface{}) bool { // SQL instance data is expected to stay small enough // (order of a few KB), so we don't need an eviction strategy // and can retain cached data indefinitely. return false }, } r.mu.instances = cache.NewUnorderedCache(cacheConfig)
why use this data structure if you're never going to evict? Why not just a *btree.BTree?
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 163 at r1 (raw file):
} instancesTablePrefix := r.codec.TablePrefix(uint32(r.tableID))
Consider tightening this to just the primary index.
pkg/sql/sqlinstance/instancestorage/instancereader_test.go, line 52 at r1 (raw file):
tDB := sqlutils.MakeSQLRunner(tc.ServerConn(0)) dbName := t.Name() s := tc.Server(0)
The test is failing because rangefeeds aren't enabled on the table used for the test. They are enabled by default on all system table ranges and on all tenant ranges.
See this error:
W210914 16:49:20.321486 1529 kv/kvclient/rangefeed/rangefeed.go:228 [rangefeed=sql_instances] 62 rangefeed failed 0 times, restarting: kv/kvserver/replica_rangefeed.go:151: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/v21.2/change-data-capture.html#enable-rangefeeds-to-reduce-latency
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rimadeodhar)
pkg/sql/sqlinstance/sqlinstance.go, line 33 at r1 (raw file):
Quoted 9 lines of code…
// AddressResolver exposes API for retrieving the instance address and all live instances for a tenant. type AddressResolver interface { // GetInstance returns the InstanceInfo for a pod given the instance ID. // Returns a NonExistentInstanceError if an instance with the given ID // does not exist. GetInstance(context.Context, base.SQLInstanceID) (InstanceInfo, error) // GetAllInstances returns a list of all SQL instances for the tenant. GetAllInstances(context.Context) ([]InstanceInfo, error) }
I've been thinking about these interface a bit. How would you feel about simplifying the interface like what I have below. I also took a hammer to the type names. Consider reading them as sqlinstance.Info, sqlinstance.InfoIterator, sqlinstance.Resolver etc.
// Info exposes information on a SQL instance such as ID, network address and
// the associated sqlliveness.SessionID.
type Info struct {
InstanceID base.SQLInstanceID
InstanceAddr string
SessionID sqlliveness.SessionID
}
// InfoIterator is used to iterate instance info stored in a resolver.
type InfoIterator func(Info) error
// Resolver exposes API for retrieving the instance address and all live instances for a tenant.
type Resolver interface {
// Iterate iterates the set of known, live instances stored in the resolver.
// From indicates the first instance from which iteration should occur. Iteration
// proceeds in ascending order, halting after the instance with the largest known
// instance ID. Returning iterutil.StopIteration will stop the iteration and return
// no error.
Iterate(ctx context.Context, from base.SQLInstanceID, f InfoIterator) error
}
// GetInfo returns the Info for an instance given the instance ID as reported
// by the Resolver. Returns a NonExistentInstanceError if an instance with the given ID
// does not exist.
func GetInfo(ctx context.Context, r Resolver, id base.SQLInstanceID) (Info, error) {
var ret Info
if err := r.Iterate(ctx, id, func(info Info) error {
ret = info
return iterutil.StopIteration()
}); err != nil {
return Info{}, err
}
if ret.ID == id {
return ret, nil
}
return Info{}, NonExistentInstanceError
}You can then implement this interface on a slice like:
// Infos is a slice of Info. If it is sorted, it can be used as a Resolver. Calls to
// Iterate have undefined behavior if Infos is not sorted.
type Infos []Info
var _ Resolver = (Infos)(nil)
// Iterate can be used to
func (infos Infos) Iterate(ctx context.Context, from base.SQLInstanceID, f InfoIterator) error {
idx := sort.Search(len(infos), func(i int) bool {
return infos[i].ID >= from
})
for i := idx; i < len(infos); i++ {
if err := f(infos[i]); err != nil {
if iterutils.Done(err) {
err = nil
}
return err
}
}
return nil
}I think that such an interface will make the cache nicer to use.
rimadeodhar
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rimadeodhar)
pkg/sql/sqlinstance/sqlinstance.go, line 33 at r1 (raw file):
Previously, ajwerner wrote…
// AddressResolver exposes API for retrieving the instance address and all live instances for a tenant. type AddressResolver interface { // GetInstance returns the InstanceInfo for a pod given the instance ID. // Returns a NonExistentInstanceError if an instance with the given ID // does not exist. GetInstance(context.Context, base.SQLInstanceID) (InstanceInfo, error) // GetAllInstances returns a list of all SQL instances for the tenant. GetAllInstances(context.Context) ([]InstanceInfo, error) }I've been thinking about these interface a bit. How would you feel about simplifying the interface like what I have below. I also took a hammer to the type names. Consider reading them as
sqlinstance.Info,sqlinstance.InfoIterator,sqlinstance.Resolveretc.// Info exposes information on a SQL instance such as ID, network address and // the associated sqlliveness.SessionID. type Info struct { InstanceID base.SQLInstanceID InstanceAddr string SessionID sqlliveness.SessionID } // InfoIterator is used to iterate instance info stored in a resolver. type InfoIterator func(Info) error // Resolver exposes API for retrieving the instance address and all live instances for a tenant. type Resolver interface { // Iterate iterates the set of known, live instances stored in the resolver. // From indicates the first instance from which iteration should occur. Iteration // proceeds in ascending order, halting after the instance with the largest known // instance ID. Returning iterutil.StopIteration will stop the iteration and return // no error. Iterate(ctx context.Context, from base.SQLInstanceID, f InfoIterator) error } // GetInfo returns the Info for an instance given the instance ID as reported // by the Resolver. Returns a NonExistentInstanceError if an instance with the given ID // does not exist. func GetInfo(ctx context.Context, r Resolver, id base.SQLInstanceID) (Info, error) { var ret Info if err := r.Iterate(ctx, id, func(info Info) error { ret = info return iterutil.StopIteration() }); err != nil { return Info{}, err } if ret.ID == id { return ret, nil } return Info{}, NonExistentInstanceError }You can then implement this interface on a slice like:
// Infos is a slice of Info. If it is sorted, it can be used as a Resolver. Calls to // Iterate have undefined behavior if Infos is not sorted. type Infos []Info var _ Resolver = (Infos)(nil) // Iterate can be used to func (infos Infos) Iterate(ctx context.Context, from base.SQLInstanceID, f InfoIterator) error { idx := sort.Search(len(infos), func(i int) bool { return infos[i].ID >= from }) for i := idx; i < len(infos); i++ { if err := f(infos[i]); err != nil { if iterutils.Done(err) { err = nil } return err } } return nil }I think that such an interface will make the cache nicer to use.
Maybe its just me, but I find this interface harder to use. Also, I'm kind of averse to updating the external facing API of the sqlinstance subsystem for a caching change which should be an internal implementation detail for sqlinstance. I could be convinced otherwise if the reason was compelling enough. But in this case, I feel like the original interface is good enough. Thoughts?
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 77 at r1 (raw file):
Previously, ajwerner wrote…
cacheConfig := cache.Config{ Policy: cache.CacheLRU, ShouldEvict: func(size int, key, value interface{}) bool { // SQL instance data is expected to stay small enough // (order of a few KB), so we don't need an eviction strategy // and can retain cached data indefinitely. return false }, } r.mu.instances = cache.NewUnorderedCache(cacheConfig)why use this data structure if you're never going to evict? Why not just a
*btree.BTree?
I just (mistakenly) assumed you use the cache data structure everywhere within the code while caching. I'll update this.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 163 at r1 (raw file):
Previously, ajwerner wrote…
Consider tightening this to just the primary index.
Please can you elaborate on this? How can I reference just the primary index? All other rangefeeds seems to reference the entire table, is this the recommended approach going forward?
pkg/sql/sqlinstance/instancestorage/instancereader_test.go, line 52 at r1 (raw file):
Previously, ajwerner wrote…
The test is failing because rangefeeds aren't enabled on the table used for the test. They are enabled by default on all system table ranges and on all tenant ranges.
See this error:
W210914 16:49:20.321486 1529 kv/kvclient/rangefeed/rangefeed.go:228 [rangefeed=sql_instances] 62 rangefeed failed 0 times, restarting: kv/kvserver/replica_rangefeed.go:151: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/v21.2/change-data-capture.html#enable-rangefeeds-to-reduce-latency
Thanks!! This fixed the test.
5a9d127 to
d8e7570
Compare
This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.
This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.
This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.
This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)
pkg/server/server_sql.go, line 951 at r2 (raw file):
// sqlInstanceProvider must always be started after sqlLivenessProvider // as sqlInstanceProvider relies on the session initialized and maintained by // sqlLivenessProvider
[nit] period missing
pkg/sql/sqlinstance/sqlinstance.go, line 33 at r1 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Maybe its just me, but I find this interface harder to use. Also, I'm kind of averse to updating the external facing API of the
sqlinstancesubsystem for a caching change which should be an internal implementation detail forsqlinstance. I could be convinced otherwise if the reason was compelling enough. But in this case, I feel like the original interface is good enough. Thoughts?
FWIW I also think the proposed interface feels harder to use..
pkg/sql/sqlinstance/sqlinstance.go, line 50 at r2 (raw file):
Instance(context.Context) (base.SQLInstanceID, error) // Start starts the instanceprovider. This will block until // the instancereader has been started.
[nit] instancereader doesn't mean much in the context of documenting the interface
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 101 at r2 (raw file):
r.stopper.AddCloser(rf) } return r.getStartError()
[nit] I think it would be cleaner to send the error through the channel
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 110 at r2 (raw file):
} } func (r *Reader) maybeStartRf(ctx context.Context) *rangefeed.RangeFeed {
[nit] maybeStartRangeFeed.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 112 at r2 (raw file):
func (r *Reader) maybeStartRf(ctx context.Context) *rangefeed.RangeFeed { if r.started() { // Nothing to do, return
[nit] We're doing a bunch of stuff to tolerate invalid usage like calling Start twice or calling the other functions without Start; this feels unnecessary.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 133 at r2 (raw file):
// This event corresponds to a deletion. if tombstone { r.updateInstanceMap(instance, true)
[nit] can just do r.updateInstanceMap(instance, tombstone)
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 196 at r2 (raw file):
} // GetAllInstances implements sqlinstance.AddressResolver interface.
[nit] It's worth documenting that this implementation does not block (on RPCs or the like). Well, at least when slReader is a cached reader but I assume that's the case outside of tests.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 254 at r2 (raw file):
// getAllInstanceRows returns all instancerow objects contained // within the map.
[nit] add ", in arbitrary order."
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 286 at r2 (raw file):
} func (r *Reader) getStartError() error {
[nit] checkStarted might be a better name
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 166 at r2 (raw file):
return instancerow{}, errors.Wrapf(err, "could not decode data for instance %d", instanceID) } if tombstone {
[nit] this seems redundant with the Value check above. I'd just ignore the new return.
pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 212 at r2 (raw file):
} if tombstone { // Corresponds to deleted data, continue.
[nit] A Scan does not return deleted data. I'd just ignore the new return.
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)
pkg/sql/sqlinstance/sqlinstance.go, line 33 at r1 (raw file):
Previously, RaduBerinde wrote…
FWIW I also think the proposed interface feels harder to use..
I buy that :). My motivation was:
- The returned slice is mutable so either we're going to allocate a new one each time or we need the caller to not mutate it. I am almost definitely over-allergic to allocations.
- I was hoping to unify getting a single ID and getting multiple.
I'm willing to accept neither are worth it.
Let's note that the slice ordering properties if we keep the slice. Ideally it'd be sorted. Also note the mutability properties.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 163 at r1 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Please can you elaborate on this? How can I reference just the primary index? All other rangefeeds seems to reference the entire table, is this the recommended approach going forward?
I think in the other cases it's just being sort of lazy and knowing that we only have a single index, the primary index. That's the case here too. It's interesting to think about what it would mean to run a schema change on this table in subsequent releases. The method would be codec.IndexPrefix and in this case, the primary index is 1.
In light of schema changes, we'd need to consider taking a catalog.TableDescriptor or something like that. Definitely future work. For now the whole table and the index are synonymous.
This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.
69380: docs/RFCs: draft MVCC-ification of bulk operations r=tbg a=dt Release note: none. Release justification: none. 70520: tenantcostclient: pass next live instance ID r=RaduBerinde a=RaduBerinde This set of changes adds tracking instances on the server side. The main motivator at this stage is to detect duplicate requests and avoid double-charging in some cases. A secondary motivator is that doing this later will cause headaches during upgrade. Informs #68479. #### tenantcostclient: plumb SQL instance and session ID This commit plumbs the SQL Instance ID and corresponding sqlliveness.SessionID to the tenant cost controller. Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers. #### tenantcost: implement request sequence number This commit adds a monotonic request sequence number for the TokenBucket API. This is used to detect duplicate requests and avoid double-charging. Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers. #### tenantcostserver: cleanup stale instances This commit implements the server-side logic for cleaning up stale instances from the tenant_usage table, according to the following scheme: - each tenant sends the ID of the next instance in circular order. The live instance set is maintained on the tenant side by a separate subsystem. - the server uses this information as a "hint" that some instances might be stale. When the next ID does not match the expected value, a cleanup for a specific instance ID range is triggered. The cleanup ultimately checks that the last update is stale, so that stale information from the tenant-side doesn't cause incorrect removals. Instances are cleaned up one at a time, with a limit of 10 instance removals per request. This solution avoids queries that scan ranges of the table which may contain a lot of tombstones. Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers. #### tenantcostclient: pass next live instance ID This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (##69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers. 70531: sql: add version gate to regrole type r=RichardJCai a=RichardJCai Without the version gate, in a mixed cluster setting, creating a table on the new binary version with type REGROLE can cause errors when inserting on the old verison. Errors can also happen on schema changes where the old binary uses the new type. Release justification: GA blocker, without a version gate this can cause panics. Release note: None. Not in production. Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: richardjcai <caioftherichard@gmail.com>
d8e7570 to
5727bbe
Compare
rimadeodhar
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 101 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] I think it would be cleaner to send the error through the channel
I have kept it separate as we reuse this method elsewhere to return an error if the Start method hasn't been called/returns an error.
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 112 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] We're doing a bunch of stuff to tolerate invalid usage like calling Start twice or calling the other functions without Start; this feels unnecessary.
I agree, but it feels like it would be necessary though as the functionality executed in Start() is critical to the Reader working correctly for other methods.
rimadeodhar
left a comment
There was a problem hiding this comment.
Updated the PR based on the comments so far. PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)
This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.
rimadeodhar
left a comment
There was a problem hiding this comment.
@ajwerner: Gentle ping for a review.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)
ajwerner
left a comment
There was a problem hiding this comment.
Sorry it took me so long to get back to this 😓
Reviewed 1 of 11 files at r1, 1 of 7 files at r2, 4 of 7 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rimadeodhar)
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 96 at r3 (raw file):
Quoted 7 lines of code…
case <-r.initialScanDone: if rf != nil { // Add rangefeed to the stopper to ensure it // is shutdown correctly. r.stopper.AddCloser(rf) } return r.checkStarted()
FWIW: this means that starting the subsystem is a synchronous scan. I'm not sure we need that. Perhaps a TODO or something to make the waiting for an initial scan lazy until calls to read?
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 124 at r3 (raw file):
}) if err != nil { log.Ops.Warningf(ctx, "failed to decode settings row %v: %v", keyVal.Key, err)
return?
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 154 at r3 (raw file):
EndKey: instancesTablePrefix.PrefixEnd(), } rf, err := r.f.RangeFeed(ctx, "sql_instances", instancesTableSpan, r.clock.Now(), updateCacheFn, rangefeed.WithInitialScan(initialScanDoneFn), rangefeed.WithOnInitialScanError(initialScanErrFn))
nit: wrap this
5727bbe to
1a1c8ff
Compare
1a1c8ff to
7e63022
Compare
rimadeodhar
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
pkg/sql/sqlinstance/instancestorage/instancereader.go, line 96 at r3 (raw file):
Previously, ajwerner wrote…
case <-r.initialScanDone: if rf != nil { // Add rangefeed to the stopper to ensure it // is shutdown correctly. r.stopper.AddCloser(rf) } return r.checkStarted()FWIW: this means that starting the subsystem is a synchronous scan. I'm not sure we need that. Perhaps a TODO or something to make the waiting for an initial scan lazy until calls to read?
Okay, I have added a TODO for now. Will make a followup PR with the change.
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)
pkg/sql/sqlinstance/instancestorage/instancereader_test.go, line 272 at r4 (raw file):
Quoted 15 lines of code…
if err != sqlinstance.NonExistentInstanceError { return errors.Newf("expected non existent instance error") } return nil }) } // Verify request for instance with expired session results in an error. { err := slStorage.Delete(ctx, sessionIDs[1]) if err != nil { t.Fatal(err) } testutils.SucceedsSoon(t, func() error { _, err = reader.GetInstance(ctx, instanceIDs[0]) if err != sqlinstance.NonExistentInstanceError {
the linter wants errors.Is here
This PR adds rangefeed backed cache for the sqlinstance table. This will prevent the SQL instance lookup code from making multiple roundtrips. This should have good performance gains for when we support multi-region for serverless. Release note (performance improvement): The sqlinstance subsystem no longer reads from the backing SQL table for every request for SQL instance details. This will result in improved performance for when we support multi-region setup for the multi-tenant architecture.
7e63022 to
4cd3062
Compare
|
TFTR! bors r=ajwerner |
|
Build succeeded: |
|
Is it now save to update the code below to just run cockroach/pkg/server/tenant.go Line 610 in d7a9883 |
|
Yes, if the only blocker to run GetAllInstances every time was caching, then you can go ahead and make the change. The |
The tenant's cost control code requires a function that retrieves the next live SQL instance ID. The implementation of this function currently caches the value and only refreshes it periodically. This is now unnecessary: cockroachdb#69976 changed the instance reader to use a rangefeed and serve values from memory. Release note: None
73484: kvserver: use background context when applying snapshots r=nvanbenschoten,tbg a=erikgrinaker Following a3fd4fb, context errors are now propagated back up the stack when receiving snapshots. However, this triggered a `maybeFatalOnRaftReadyErr` assertion which crashed the node. `handleRaftReadyRaftMuLocked` (which is called directly when applying snapshots) does not appear prepared to safely handle arbitrary context cancellation, as it is typically called with the Raft scheduler's background context. Furthermore, by the time we call it we have already received the entire snapshot, so it does not seem useful to abort snapshot application just because the caller goes away. This patch therefore uses a new background context for applying snapshots, disconnected from the client's context, once the entire snapshot has been received. Resolves #73371. Resolves #73469. Resolves #73482. Release note: None 73523: server: remove ad-hoc next-instance-id cache (tenant only) r=RaduBerinde a=RaduBerinde The tenant's cost control code requires a function that retrieves the next live SQL instance ID. The implementation of this function currently caches the value and only refreshes it periodically. This is now unnecessary: #69976 changed the instance reader to use a rangefeed and serve values from memory. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
This PR adds rangefeed backed cache for the sqlinstance
table. This will prevent the SQL instance lookup code from
making multiple roundtrips. This should have good performance
gains for when we support multi-region for serverless.
Release note (performance improvement): The sqlinstance subsystem
no longer reads from the backing SQL table for every request for
SQL instance details. This will result in improved performance
for when we support multi-region setup for the multi-tenant
architecture.