sqlinstance: Update instancereader with cache backed by rangefeed by rimadeodhar · Pull Request #69976 · cockroachdb/cockroach

rimadeodhar · 2021-09-09T17:27:49Z

This PR adds rangefeed backed cache for the sqlinstance
table. This will prevent the SQL instance lookup code from
making multiple roundtrips. This should have good performance
gains for when we support multi-region for serverless.

Release note (performance improvement): The sqlinstance subsystem
no longer reads from the backing SQL table for every request for
SQL instance details. This will result in improved performance
for when we support multi-region setup for the multi-tenant
architecture.

cockroach-teamcity · 2021-09-09T17:27:55Z

This change is

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rimadeodhar)

pkg/sql/catalog/lease/lease_internal_test.go, line 1 at r1 (raw file):

// Copyright 2015 The Cockroach Authors.

?

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 77 at r1 (raw file):

Quoted 10 lines of code…

	cacheConfig := cache.Config{
		Policy: cache.CacheLRU,
		ShouldEvict: func(size int, key, value interface{}) bool {
			// SQL instance data is expected to stay small enough
			// (order of a few KB), so we don't need an eviction strategy
			// and can retain cached data indefinitely.
			return false
		},
	}
	r.mu.instances = cache.NewUnorderedCache(cacheConfig)

why use this data structure if you're never going to evict? Why not just a *btree.BTree?

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 163 at r1 (raw file):

	}

	instancesTablePrefix := r.codec.TablePrefix(uint32(r.tableID))

Consider tightening this to just the primary index.

pkg/sql/sqlinstance/instancestorage/instancereader_test.go, line 52 at r1 (raw file):

	tDB := sqlutils.MakeSQLRunner(tc.ServerConn(0))
	dbName := t.Name()
	s := tc.Server(0)

The test is failing because rangefeeds aren't enabled on the table used for the test. They are enabled by default on all system table ranges and on all tenant ranges.

See this error:

W210914 16:49:20.321486 1529 kv/kvclient/rangefeed/rangefeed.go:228  [rangefeed=sql_instances] 62  rangefeed failed 0 times, restarting: kv/kvserver/replica_rangefeed.go:151: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/v21.2/change-data-capture.html#enable-rangefeeds-to-reduce-latency

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rimadeodhar)

pkg/sql/sqlinstance/sqlinstance.go, line 33 at r1 (raw file):

Quoted 9 lines of code…

// AddressResolver exposes API for retrieving the instance address and all live instances for a tenant.
type AddressResolver interface {
	// GetInstance returns the InstanceInfo for a pod given the instance ID.
	// Returns a NonExistentInstanceError if an instance with the given ID
	// does not exist.
	GetInstance(context.Context, base.SQLInstanceID) (InstanceInfo, error)
	// GetAllInstances returns a list of all SQL instances for the tenant.
	GetAllInstances(context.Context) ([]InstanceInfo, error)
}

I've been thinking about these interface a bit. How would you feel about simplifying the interface like what I have below. I also took a hammer to the type names. Consider reading them as sqlinstance.Info, sqlinstance.InfoIterator, sqlinstance.Resolver etc.

// Info exposes information on a SQL instance such as ID, network address and
// the associated sqlliveness.SessionID.
type Info struct {
	InstanceID   base.SQLInstanceID
	InstanceAddr string
	SessionID    sqlliveness.SessionID
}

// InfoIterator is used to iterate instance info stored in a resolver. 
type InfoIterator func(Info) error

// Resolver exposes API for retrieving the instance address and all live instances for a tenant.
type Resolver interface {
    // Iterate iterates the set of known, live instances stored in the resolver.
    // From indicates the first instance from which iteration should occur. Iteration
    // proceeds in ascending order, halting after the instance with the largest known
    // instance ID. Returning iterutil.StopIteration will stop the iteration and return
    // no error. 
    Iterate(ctx context.Context, from base.SQLInstanceID, f InfoIterator) error
}

// GetInfo returns the Info for an instance given the instance ID as reported
// by the Resolver. Returns a NonExistentInstanceError if an instance with the given ID
// does not exist.
func GetInfo(ctx context.Context, r Resolver, id base.SQLInstanceID) (Info, error) {
   var ret Info
   if err := r.Iterate(ctx, id, func(info Info) error {
      ret = info
      return iterutil.StopIteration()
   }); err != nil {
      return Info{}, err
   }
   if ret.ID == id {
      return ret, nil
   }
   return Info{}, NonExistentInstanceError
}

You can then implement this interface on a slice like:

// Infos is a slice of Info. If it is sorted, it can be used as a Resolver. Calls to
// Iterate have undefined behavior if Infos is not sorted.
type Infos []Info

var _ Resolver = (Infos)(nil)

// Iterate can be used to 
func (infos Infos) Iterate(ctx context.Context, from base.SQLInstanceID, f InfoIterator) error {
    idx := sort.Search(len(infos), func(i int) bool {
        return infos[i].ID >= from
    })
    for i := idx; i < len(infos); i++ {
        if err := f(infos[i]); err != nil {
            if iterutils.Done(err) {
                err = nil
            }
            return err
        }
    }
    return nil
}

I think that such an interface will make the cache nicer to use.

rimadeodhar

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rimadeodhar)

pkg/sql/sqlinstance/sqlinstance.go, line 33 at r1 (raw file):

Previously, ajwerner wrote…

// AddressResolver exposes API for retrieving the instance address and all live instances for a tenant.
type AddressResolver interface {
	// GetInstance returns the InstanceInfo for a pod given the instance ID.
	// Returns a NonExistentInstanceError if an instance with the given ID
	// does not exist.
	GetInstance(context.Context, base.SQLInstanceID) (InstanceInfo, error)
	// GetAllInstances returns a list of all SQL instances for the tenant.
	GetAllInstances(context.Context) ([]InstanceInfo, error)
}

I've been thinking about these interface a bit. How would you feel about simplifying the interface like what I have below. I also took a hammer to the type names. Consider reading them as sqlinstance.Info, sqlinstance.InfoIterator, sqlinstance.Resolver etc.

// Info exposes information on a SQL instance such as ID, network address and
// the associated sqlliveness.SessionID.
type Info struct {
	InstanceID   base.SQLInstanceID
	InstanceAddr string
	SessionID    sqlliveness.SessionID
}

// InfoIterator is used to iterate instance info stored in a resolver. 
type InfoIterator func(Info) error

// Resolver exposes API for retrieving the instance address and all live instances for a tenant.
type Resolver interface {
    // Iterate iterates the set of known, live instances stored in the resolver.
    // From indicates the first instance from which iteration should occur. Iteration
    // proceeds in ascending order, halting after the instance with the largest known
    // instance ID. Returning iterutil.StopIteration will stop the iteration and return
    // no error. 
    Iterate(ctx context.Context, from base.SQLInstanceID, f InfoIterator) error
}

// GetInfo returns the Info for an instance given the instance ID as reported
// by the Resolver. Returns a NonExistentInstanceError if an instance with the given ID
// does not exist.
func GetInfo(ctx context.Context, r Resolver, id base.SQLInstanceID) (Info, error) {
   var ret Info
   if err := r.Iterate(ctx, id, func(info Info) error {
      ret = info
      return iterutil.StopIteration()
   }); err != nil {
      return Info{}, err
   }
   if ret.ID == id {
      return ret, nil
   }
   return Info{}, NonExistentInstanceError
}

You can then implement this interface on a slice like:

// Infos is a slice of Info. If it is sorted, it can be used as a Resolver. Calls to
// Iterate have undefined behavior if Infos is not sorted.
type Infos []Info

var _ Resolver = (Infos)(nil)

// Iterate can be used to 
func (infos Infos) Iterate(ctx context.Context, from base.SQLInstanceID, f InfoIterator) error {
    idx := sort.Search(len(infos), func(i int) bool {
        return infos[i].ID >= from
    })
    for i := idx; i < len(infos); i++ {
        if err := f(infos[i]); err != nil {
            if iterutils.Done(err) {
                err = nil
            }
            return err
        }
    }
    return nil
}

I think that such an interface will make the cache nicer to use.

Maybe its just me, but I find this interface harder to use. Also, I'm kind of averse to updating the external facing API of the sqlinstance subsystem for a caching change which should be an internal implementation detail for sqlinstance. I could be convinced otherwise if the reason was compelling enough. But in this case, I feel like the original interface is good enough. Thoughts?

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 77 at r1 (raw file):

Previously, ajwerner wrote…

	cacheConfig := cache.Config{
		Policy: cache.CacheLRU,
		ShouldEvict: func(size int, key, value interface{}) bool {
			// SQL instance data is expected to stay small enough
			// (order of a few KB), so we don't need an eviction strategy
			// and can retain cached data indefinitely.
			return false
		},
	}
	r.mu.instances = cache.NewUnorderedCache(cacheConfig)

why use this data structure if you're never going to evict? Why not just a *btree.BTree?

I just (mistakenly) assumed you use the cache data structure everywhere within the code while caching. I'll update this.

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 163 at r1 (raw file):

Previously, ajwerner wrote…

Consider tightening this to just the primary index.

Please can you elaborate on this? How can I reference just the primary index? All other rangefeeds seems to reference the entire table, is this the recommended approach going forward?

pkg/sql/sqlinstance/instancestorage/instancereader_test.go, line 52 at r1 (raw file):

Previously, ajwerner wrote…

The test is failing because rangefeeds aren't enabled on the table used for the test. They are enabled by default on all system table ranges and on all tenant ranges.

See this error:
W210914 16:49:20.321486 1529 kv/kvclient/rangefeed/rangefeed.go:228  [rangefeed=sql_instances] 62  rangefeed failed 0 times, restarting: kv/kvserver/replica_rangefeed.go:151: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/v21.2/change-data-capture.html#enable-rangefeeds-to-reduce-latency

Thanks!! This fixed the test.

This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)

pkg/server/server_sql.go, line 951 at r2 (raw file):

	// sqlInstanceProvider must always be started after sqlLivenessProvider
	// as sqlInstanceProvider relies on the session initialized and maintained by
	// sqlLivenessProvider

[nit] period missing

pkg/sql/sqlinstance/sqlinstance.go, line 33 at r1 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

Maybe its just me, but I find this interface harder to use. Also, I'm kind of averse to updating the external facing API of the sqlinstance subsystem for a caching change which should be an internal implementation detail for sqlinstance. I could be convinced otherwise if the reason was compelling enough. But in this case, I feel like the original interface is good enough. Thoughts?

FWIW I also think the proposed interface feels harder to use..

pkg/sql/sqlinstance/sqlinstance.go, line 50 at r2 (raw file):

	Instance(context.Context) (base.SQLInstanceID, error)
	// Start starts the instanceprovider. This will block until
	// the instancereader has been started.

[nit] instancereader doesn't mean much in the context of documenting the interface

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 101 at r2 (raw file):

			r.stopper.AddCloser(rf)
		}
		return r.getStartError()

[nit] I think it would be cleaner to send the error through the channel

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 110 at r2 (raw file):

	}
}
func (r *Reader) maybeStartRf(ctx context.Context) *rangefeed.RangeFeed {

[nit] maybeStartRangeFeed.

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 112 at r2 (raw file):

func (r *Reader) maybeStartRf(ctx context.Context) *rangefeed.RangeFeed {
	if r.started() {
		// Nothing to do, return

[nit] We're doing a bunch of stuff to tolerate invalid usage like calling Start twice or calling the other functions without Start; this feels unnecessary.

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 133 at r2 (raw file):

		// This event corresponds to a deletion.
		if tombstone {
			r.updateInstanceMap(instance, true)

[nit] can just do r.updateInstanceMap(instance, tombstone)

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 196 at r2 (raw file):

}

// GetAllInstances implements sqlinstance.AddressResolver interface.

[nit] It's worth documenting that this implementation does not block (on RPCs or the like). Well, at least when slReader is a cached reader but I assume that's the case outside of tests.

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 254 at r2 (raw file):

// getAllInstanceRows returns all instancerow objects contained
// within the map.

[nit] add ", in arbitrary order."

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 286 at r2 (raw file):

}

func (r *Reader) getStartError() error {

[nit] checkStarted might be a better name

pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 166 at r2 (raw file):

		return instancerow{}, errors.Wrapf(err, "could not decode data for instance %d", instanceID)
	}
	if tombstone {

[nit] this seems redundant with the Value check above. I'd just ignore the new return.

pkg/sql/sqlinstance/instancestorage/instancestorage.go, line 212 at r2 (raw file):

		}
		if tombstone {
			// Corresponds to deleted data, continue.

[nit] A Scan does not return deleted data. I'd just ignore the new return.

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)

pkg/sql/sqlinstance/sqlinstance.go, line 33 at r1 (raw file):

Previously, RaduBerinde wrote…

FWIW I also think the proposed interface feels harder to use..

I buy that :). My motivation was:

The returned slice is mutable so either we're going to allocate a new one each time or we need the caller to not mutate it. I am almost definitely over-allergic to allocations.
I was hoping to unify getting a single ID and getting multiple.

I'm willing to accept neither are worth it.

Let's note that the slice ordering properties if we keep the slice. Ideally it'd be sorted. Also note the mutability properties.

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 163 at r1 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

Please can you elaborate on this? How can I reference just the primary index? All other rangefeeds seems to reference the entire table, is this the recommended approach going forward?

I think in the other cases it's just being sort of lazy and knowing that we only have a single index, the primary index. That's the case here too. It's interesting to think about what it would mean to run a schema change on this table in subsequent releases. The method would be codec.IndexPrefix and in this case, the primary index is 1.

In light of schema changes, we'd need to consider taking a catalog.TableDescriptor or something like that. Definitely future work. For now the whole table and the index are synonymous.

This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.

69380: docs/RFCs: draft MVCC-ification of bulk operations r=tbg a=dt Release note: none. Release justification: none. 70520: tenantcostclient: pass next live instance ID r=RaduBerinde a=RaduBerinde This set of changes adds tracking instances on the server side. The main motivator at this stage is to detect duplicate requests and avoid double-charging in some cases. A secondary motivator is that doing this later will cause headaches during upgrade. Informs #68479. #### tenantcostclient: plumb SQL instance and session ID This commit plumbs the SQL Instance ID and corresponding sqlliveness.SessionID to the tenant cost controller. Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers. #### tenantcost: implement request sequence number This commit adds a monotonic request sequence number for the TokenBucket API. This is used to detect duplicate requests and avoid double-charging. Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers. #### tenantcostserver: cleanup stale instances This commit implements the server-side logic for cleaning up stale instances from the tenant_usage table, according to the following scheme: - each tenant sends the ID of the next instance in circular order. The live instance set is maintained on the tenant side by a separate subsystem. - the server uses this information as a "hint" that some instances might be stale. When the next ID does not match the expected value, a cleanup for a specific instance ID range is triggered. The cleanup ultimately checks that the last update is stale, so that stale information from the tenant-side doesn't cause incorrect removals. Instances are cleaned up one at a time, with a limit of 10 instance removals per request. This solution avoids queries that scan ranges of the table which may contain a lot of tombstones. Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers. #### tenantcostclient: pass next live instance ID This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (##69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers. 70531: sql: add version gate to regrole type r=RichardJCai a=RichardJCai Without the version gate, in a mixed cluster setting, creating a table on the new binary version with type REGROLE can cause errors when inserting on the old verison. Errors can also happen on schema changes where the old binary uses the new type. Release justification: GA blocker, without a version gate this can cause panics. Release note: None. Not in production. Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: richardjcai <caioftherichard@gmail.com>

rimadeodhar

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 101 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] I think it would be cleaner to send the error through the channel

I have kept it separate as we reuse this method elsewhere to return an error if the Start method hasn't been called/returns an error.

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 112 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] We're doing a bunch of stuff to tolerate invalid usage like calling Start twice or calling the other functions without Start; this feels unnecessary.

I agree, but it feels like it would be necessary though as the functionality executed in Start() is critical to the Reader working correctly for other methods.

rimadeodhar

Updated the PR based on the comments so far. PTAL.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)

This change adds reporting of the next live instance ID to the tenant cost controller. For now we query the sqlinstance.Provider at most once a minute. This is temporary until the Provider is changed to a range-feed-driven cache (#cockroachdb#69976). Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.

rimadeodhar

@ajwerner: Gentle ping for a review.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)

ajwerner

Sorry it took me so long to get back to this 😓

Reviewed 1 of 11 files at r1, 1 of 7 files at r2, 4 of 7 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rimadeodhar)

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 96 at r3 (raw file):

Quoted 7 lines of code…

	case <-r.initialScanDone:
		if rf != nil {
			// Add rangefeed to the stopper to ensure it
			// is shutdown correctly.
			r.stopper.AddCloser(rf)
		}
		return r.checkStarted()

FWIW: this means that starting the subsystem is a synchronous scan. I'm not sure we need that. Perhaps a TODO or something to make the waiting for an initial scan lazy until calls to read?

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 124 at r3 (raw file):

		})
		if err != nil {
			log.Ops.Warningf(ctx, "failed to decode settings row %v: %v", keyVal.Key, err)

return?

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 154 at r3 (raw file):

		EndKey: instancesTablePrefix.PrefixEnd(),
	}
	rf, err := r.f.RangeFeed(ctx, "sql_instances", instancesTableSpan, r.clock.Now(), updateCacheFn, rangefeed.WithInitialScan(initialScanDoneFn), rangefeed.WithOnInitialScanError(initialScanErrFn))

nit: wrap this

rimadeodhar

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)

pkg/sql/sqlinstance/instancestorage/instancereader.go, line 96 at r3 (raw file):

Previously, ajwerner wrote…

	case <-r.initialScanDone:
		if rf != nil {
			// Add rangefeed to the stopper to ensure it
			// is shutdown correctly.
			r.stopper.AddCloser(rf)
		}
		return r.checkStarted()
FWIW: this means that starting the subsystem is a synchronous scan. I'm not sure we need that. Perhaps a TODO or something to make the waiting for an initial scan lazy until calls to read?

Okay, I have added a TODO for now. Will make a followup PR with the change.

ajwerner

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)

ajwerner

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @RaduBerinde, and @rimadeodhar)

pkg/sql/sqlinstance/instancestorage/instancereader_test.go, line 272 at r4 (raw file):

Quoted 15 lines of code…

				if err != sqlinstance.NonExistentInstanceError {
					return errors.Newf("expected non existent instance error")
				}
				return nil
			})
		}
		// Verify request for instance with expired session results in an error.
		{
			err := slStorage.Delete(ctx, sessionIDs[1])
			if err != nil {
				t.Fatal(err)
			}
			testutils.SucceedsSoon(t, func() error {
				_, err = reader.GetInstance(ctx, instanceIDs[0])
				if err != sqlinstance.NonExistentInstanceError {

the linter wants errors.Is here

This PR adds rangefeed backed cache for the sqlinstance table. This will prevent the SQL instance lookup code from making multiple roundtrips. This should have good performance gains for when we support multi-region for serverless. Release note (performance improvement): The sqlinstance subsystem no longer reads from the backing SQL table for every request for SQL instance details. This will result in improved performance for when we support multi-region setup for the multi-tenant architecture.

rimadeodhar · 2021-10-27T20:58:39Z

TFTR!

bors r=ajwerner

craig · 2021-10-27T22:52:28Z

Build succeeded:

GitHub CI (Cockroach)

RaduBerinde · 2021-12-02T19:30:06Z

Is it now save to update the code below to just run GetAllInstances every time?

cockroach/pkg/server/tenant.go

Line 610 in d7a9883

const interval = 1 * time.Minute

rimadeodhar · 2021-12-02T19:46:36Z

Yes, if the only blocker to run GetAllInstances every time was caching, then you can go ahead and make the change. The sqlinstance subsystem caches all instance data and GetAllInstances simply retrieves the data from the cache without issuing any SQL queries.

The tenant's cost control code requires a function that retrieves the next live SQL instance ID. The implementation of this function currently caches the value and only refreshes it periodically. This is now unnecessary: cockroachdb#69976 changed the instance reader to use a rangefeed and serve values from memory. Release note: None

73484: kvserver: use background context when applying snapshots r=nvanbenschoten,tbg a=erikgrinaker Following a3fd4fb, context errors are now propagated back up the stack when receiving snapshots. However, this triggered a `maybeFatalOnRaftReadyErr` assertion which crashed the node. `handleRaftReadyRaftMuLocked` (which is called directly when applying snapshots) does not appear prepared to safely handle arbitrary context cancellation, as it is typically called with the Raft scheduler's background context. Furthermore, by the time we call it we have already received the entire snapshot, so it does not seem useful to abort snapshot application just because the caller goes away. This patch therefore uses a new background context for applying snapshots, disconnected from the client's context, once the entire snapshot has been received. Resolves #73371. Resolves #73469. Resolves #73482. Release note: None 73523: server: remove ad-hoc next-instance-id cache (tenant only) r=RaduBerinde a=RaduBerinde The tenant's cost control code requires a function that retrieves the next live SQL instance ID. The implementation of this function currently caches the value and only refreshes it periodically. This is now unnecessary: #69976 changed the instance reader to use a rangefeed and serve values from memory. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

rimadeodhar added the do-not-merge bors won't merge a PR with this label. label Sep 9, 2021

ajwerner reviewed Sep 14, 2021

View reviewed changes

ajwerner reviewed Sep 15, 2021

View reviewed changes

rimadeodhar commented Sep 16, 2021

View reviewed changes

rimadeodhar force-pushed the instance_id_cache branch from 5a9d127 to d8e7570 Compare September 17, 2021 21:06

rimadeodhar removed the do-not-merge bors won't merge a PR with this label. label Sep 17, 2021

rimadeodhar requested a review from a team September 17, 2021 21:07

rimadeodhar marked this pull request as ready for review September 17, 2021 21:10

RaduBerinde mentioned this pull request Sep 21, 2021

tenantcostclient: pass next live instance ID #70520

Merged

RaduBerinde reviewed Sep 22, 2021

View reviewed changes

ajwerner reviewed Sep 22, 2021

View reviewed changes

rimadeodhar force-pushed the instance_id_cache branch from d8e7570 to 5727bbe Compare September 24, 2021 17:37

rimadeodhar commented Sep 24, 2021

View reviewed changes

rimadeodhar requested review from RaduBerinde and ajwerner September 24, 2021 17:46

RaduBerinde mentioned this pull request Sep 24, 2021

release-21.2: multitenant: implement fallback rate and pass next live instance ID #70727

Merged

rimadeodhar commented Sep 30, 2021

View reviewed changes

ajwerner reviewed Oct 25, 2021

View reviewed changes

rimadeodhar force-pushed the instance_id_cache branch from 5727bbe to 1a1c8ff Compare October 26, 2021 16:39

rimadeodhar force-pushed the instance_id_cache branch from 1a1c8ff to 7e63022 Compare October 26, 2021 16:44

rimadeodhar requested a review from ajwerner October 26, 2021 16:46

rimadeodhar commented Oct 26, 2021

View reviewed changes

ajwerner approved these changes Oct 26, 2021

View reviewed changes

ajwerner reviewed Oct 26, 2021

View reviewed changes

rimadeodhar force-pushed the instance_id_cache branch from 7e63022 to 4cd3062 Compare October 27, 2021 17:37

craig bot merged commit 12afa45 into cockroachdb:master Oct 27, 2021

cockroach-teamcity mentioned this pull request Oct 27, 2021

sqlinstance: Update instancereader with cache backed by rangefeed cockroachdb/docs#12105

Closed

rimadeodhar mentioned this pull request Nov 3, 2021

ccl/serverccl: TestTenantCannotSeeNonTenantStats failed #72079

Closed

RaduBerinde mentioned this pull request Dec 6, 2021

server: remove ad-hoc next-instance-id cache (tenant only) #73523

Merged

Conversation

rimadeodhar commented Sep 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Sep 9, 2021

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

rimadeodhar left a comment

Choose a reason for hiding this comment

Uh oh!

RaduBerinde left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

rimadeodhar left a comment

Choose a reason for hiding this comment

Uh oh!

rimadeodhar left a comment

Choose a reason for hiding this comment

Uh oh!

rimadeodhar left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

rimadeodhar left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

rimadeodhar commented Oct 27, 2021

Uh oh!

craig bot commented Oct 27, 2021

Uh oh!

RaduBerinde commented Dec 2, 2021

Uh oh!

rimadeodhar commented Dec 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rimadeodhar commented Sep 9, 2021 •

edited

Loading