Skip to content

spanconfig: disable secondary tenant zone configs#78299

Closed
irfansharif wants to merge 1 commit intocockroachdb:masterfrom
irfansharif:220322.limit-secondary-cfgs
Closed

spanconfig: disable secondary tenant zone configs#78299
irfansharif wants to merge 1 commit intocockroachdb:masterfrom
irfansharif:220322.limit-secondary-cfgs

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

@irfansharif irfansharif commented Mar 23, 2022

#77344 outlines the various ways how without additional cost
attribution or resource limiting, tenant zone configs provides
a sufficient API to induce (costly) work in the host cluster. While
better attribution/limiting is underway, this PR introduces two cluster
settings to effectively switch off this machinery all throughout, and/or
enable it selectively for specific tenants. These settings will also
help us roll out this functionality more gradually as we build the
controls described in #77344.

  • spanconfig.tenant_reconciliation_job.enabled, a tenant-readonly
    setting that allows the host to disable the span config reconcilation
    job across all tenants. Defaults to false.
  • spanconfig.tenant_kvaccessor.enabled, a system only setting that
    rejects all KVAccessor requests issued by secondary tenants. Defaults
    to true, but acts as a global kill switch to reject all
    tenant reconciliation requests.

Release note: None

Jira issue: CRDB-14786

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@irfansharif
Copy link
Copy Markdown
Contributor Author

Letting it run through CI first; a few span config tests will need to set these cluster settings explicitly. I'm using #77344 as the corresponding issue to track whatever's needed for 22.1.

@irfansharif irfansharif force-pushed the 220322.limit-secondary-cfgs branch from 8260c1f to 8d7b2eb Compare March 23, 2022 00:23
Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @irfansharif)


pkg/spanconfig/spanconfigmanager/manager.go, line 63 at r1 (raw file):

// XXX: Needs a test. Also a bunch of span config tests need to set it when
// testing reconciliation for secondary tenants.
var tenantJobEnabledSetting = settings.RegisterBoolSetting(

We probably want to allow the operator to turn off already running reconciliation jobs when this setting is turned off as well, right? Should we make the reconciliation job cancellable and add a check for when this setting flips to false?

@irfansharif
Copy link
Copy Markdown
Contributor Author

My current thinking is to just use spanconfig.tenant_reconciliation_job.enabled to guide our rollout, specifically to avoid a thundering herd of tenant split activity. This will only really apply to existing clusters with unsplit tenants — this setting will start off as disabled, and the operator could gradually enable it for individual tenants or groups of tenants at a time while monitoring stability. In 22.2 we should already have split tenants, so this cluster setting would no longer apply. We could also use this to disable reconciliation/cancel the tenant side job like you’re suggesting — perhaps pairing it with a builtin to conveniently replace a tenant’s span configs in the system table to get rid of all its splits (though using it for break-the-glass emergencies only, since it’s out of band and might revert protection policies or configs the tenant may have installed).

\cockroachdb#77344 outlines the various ways how without additional cost
attribution or resource limiting, tenant zone configs provides
a sufficient API to induce (costly) work in the host cluster. While
better attribution/limiting is underway, this PR introduces two cluster
settings to effectively switch off this machinery all throughout, and/or
enable it selectively for specific tenants. These settings will also
help us roll out this functionality more gradually as we build the
controls described in cockroachdb#77344.

 - spanconfig.tenant_reconciliation_job.enabled, a tenant-readonly
   setting that allows the host to disable the span config reconcilation
   job across all tenants. Defaults to false.
 - spanconfig.tenant_kvaccessor.enabled, a system only setting that
   rejects all KVAccessor requests issued by secondary tenants. Defaults
   to true, but acts as a global kill switch to reject all
   tenant reconciliation requests.

Release note: None
@irfansharif irfansharif force-pushed the 220322.limit-secondary-cfgs branch from 8d7b2eb to 9059f17 Compare April 4, 2022 22:00
@irfansharif irfansharif requested a review from a team April 4, 2022 22:00
@irfansharif irfansharif requested a review from a team as a code owner April 4, 2022 22:00
@irfansharif
Copy link
Copy Markdown
Contributor Author

Updated test files to take into account these new settings. Kept spanconfig.tenant_kvaccessor.enabled but defaulted it to true -- might be useful as a global kill switch. spanconfig.tenant_reconciliation_job.enabled is a tenant-readonly setting that we can use to slowly rollout span configs to all tenants -- defaults to false to prevent a thundering herd of splits when upgrading our multi-tenant clusters.

We could also use this to disable reconciliation/cancel the tenant side job like you’re suggesting — perhaps pairing it with a builtin to conveniently replace a tenant’s span configs in the system table to get rid of all its splits

Haven't done this. Holding off until we discuss the rollout plan more fully.

@irfansharif
Copy link
Copy Markdown
Contributor Author

Shouldn't be necessary after #79700 (comment).

@irfansharif irfansharif deleted the 220322.limit-secondary-cfgs branch April 13, 2022 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants