Skip to content

migrations: introduce clusterversion.SeedTenantSpanConfigs#73623

Closed
irfansharif wants to merge 3 commits intocockroachdb:masterfrom
irfansharif:211207.tenant-spanconfig-migration
Closed

migrations: introduce clusterversion.SeedTenantSpanConfigs#73623
irfansharif wants to merge 3 commits intocockroachdb:masterfrom
irfansharif:211207.tenant-spanconfig-migration

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

SeedTenantSpanConfigs populates system.span_configurations with seed
data for secondary tenants. This state is what ensures that we always
split on tenant boundaries when using the span configs infrastructure.
This version also comes with a migration to populate the same seed data
for existing tenants.

Release note: None


First two commits are from #71994.

For newly created tenants, we want to ensure hard splits on tenant
boundaries. The source of truth for split points in the span configs
subsystem is the contents of `system.span_configurations`. To ensure
hard splits, we insert a single key record at the tenant prefix. In a
future commit we'll introduce the spanconfig.Reconciler process, which
runs per tenant and governs all config entries under each tenant's
purview. This has implications for this initial record we're talking
about (this record might get cleaned up for e.g.); we'll explore it in
tests for the Reconciler itself.

Creating a single key record is easy enough -- we could've written
directly to `system.span_configurations`. When a tenant is GC-ed
however, we need to clear out all tenant state transactionally. To that
end we plumb in a txn-scoped KVAccessor into the planner where
`crdb_internal.destroy_tenant` is executed. This lets us easily delete
all abandoned tenant span config records.

Note: We get rid of `spanconfig.experimental_kvaccessor.enabled`. Access
to spanconfigs infrastructure is already sufficiently gated through
COCKROACH_EXPERIMENTAL_SPAN_CONFIGS. Now that
crdb_internal.create_tenant attempts to write through the KVAccessor,
it's cumbersome to have to enable the setting manually in every
multi-tenant test (increasingly the default) enabling some part of the
span configs infrastructure.

---

This commit introduces the need for a migration -- for existing clusters
with secondary tenants, when upgrading we need to install this initial
record at the tenant prefix for all extant tenants (and make sure to
continue doing so for subsequent newly created tenants). This is to
preserve the hard-split-on-tenant-boundary invariant we wish to provide.
It's possible for an upgraded multi-tenant cluster to have dormant sql
pods that have never reconciled. If KV switches over to the span configs
subsystem, splitting only on the contents of
`system.span_configurations`, we'll fail to split on all tenant
boundaries. We'll introduce this migration in a future PR (before
enabling span configs by default).

Release note: None
Reconciler is responsible for reconciling a tenant's zone configs (SQL
construct) with the cluster's span configs (KV construct). It's the
central engine for the span configs infrastructure; a single Reconciler
instance is active for every tenant in the system.

    type Reconciler interface {
      // Reconcile starts the incremental reconciliation process from
      // the given checkpoint. If it does not find MVCC history going
      // far back enough[1], it falls back to a scan of all
      // descriptors and zone configs before being able to do more
      // incremental work. The provided callback is invoked with
      // timestamps that can be safely checkpointed. A future
      // Reconciliation attempt can make use of this timestamp to
      // reduce the amount of necessary work (provided the MVCC
      // history is still available).
      //
      // [1]: It's possible for system.{zones,descriptor} to have been
      //      GC-ed away; think suspended tenants.
      Reconcile(
        ctx context.Context,
        checkpoint hlc.Timestamp,
        callback func(checkpoint hlc.Timestamp) error,
      ) error
    }

Let's walk through what it does. At a high-level, we maintain an
in-memory data structure that's up-to-date with the contents of the KV
(at least the subset of spans we have access to, i.e. the keyspace
carved out for our tenant ID). We watch for changes to SQL state
(descriptors, zone configs), translate the SQL updates to the flattened
span+config form, "diff" the updates against our data structure to see
if there are any changes we need to inform KV of. If so, we do, and
ensure that our data structure is kept up-to-date. We continue watching
for future updates and repeat as necessary.

There's only single instance of the Reconciler running for a given
tenant at a given point it time (mutual exclusion/leasing is provided by
the jobs subsystem). We needn't worry about contending writers, or the
KV state being changed from underneath us. What we do have to worry
about, however, is suspended tenants' not being reconciling while
suspended. It's possible for a suspended tenant's SQL state to be GC-ed
away at older MVCC timestamps; when watching for changes, we could fail
to observe tables/indexes/partitions getting deleted. Left as is, this
would result in us never issuing a corresponding deletion requests for
the dropped span configs -- we'd be leaving orphaned span configs lying
around (taking up storage space and creating pointless empty ranges). A
"full reconciliation pass" is our attempt to find all these extraneous
entries in KV and to delete them.

We can use our span config data structure here too, one that's
pre-populated with the contents of KV. We translate the entire SQL state
into constituent spans and configs, diff against our data structure to
generate KV updates that we then apply. We follow this with clearing out
all these spans in our data structure, leaving behind all extraneous
entries to be found in KV -- entries we can then simply issue deletes
for.

Release note: None
@irfansharif irfansharif requested a review from a team as a code owner December 8, 2021 21:26
@irfansharif irfansharif requested a review from a team December 8, 2021 21:26
@irfansharif irfansharif requested a review from a team as a code owner December 8, 2021 21:26
@irfansharif irfansharif requested review from tbg and removed request for a team December 8, 2021 21:26
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@irfansharif irfansharif removed request for a team and tbg December 8, 2021 21:26
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration looks fine to me.

Reviewed 13 of 13 files at r1, 1 of 29 files at r2, 1 of 10 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @irfansharif, and @nvanbenschoten)


pkg/sql/tenant.go, line 155 at r1 (raw file):

			Span: roachpb.Span{
				Key:    tenantPrefix,
				EndKey: tenantPrefix.Next(),

I think I prefer PrefixEnd() to Next(). Any problems that causes?

SeedTenantSpanConfigs populates system.span_configurations with seed
data for secondary tenants. This state is what ensures that we always
split on tenant boundaries when using the span configs infrastructure.
This version also comes with a migration to populate the same seed data
for existing tenants.

Release note: None
@irfansharif irfansharif force-pushed the 211207.tenant-spanconfig-migration branch from 40f430f to f148f39 Compare December 9, 2021 01:06
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this after #71994 lands.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)


pkg/sql/tenant.go, line 155 at r1 (raw file):

Previously, ajwerner wrote…

I think I prefer PrefixEnd() to Next(). Any problems that causes?

No problems per se (see [1] in the comment above), but it does mean that in the first ever reconciliation attempt by every SQL pod we'll:

  1. Delete [tenantPrefix, tenantPrefix.PrefixEnd())
  2. Add spans for actual tables ([/tenant/<id>/Table/ID, /tenant/<id>/Table/ID+1), etc.)
  3. Add spans + configs for gaps between actual tables, containing the config found in (1)
  4. Delete all the "extraneous" span configs added in (3).

Using Next() just avoids steps 3 and 4, shaving 1 RTT (steps 2 and 3 happen in the same request).

@irfansharif
Copy link
Copy Markdown
Contributor Author

Just merged this code into #71994.

@irfansharif irfansharif deleted the 211207.tenant-spanconfig-migration branch December 10, 2021 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants