Skip to content

spanconfig: disable infrastructure unless envvar is set#69658

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
irfansharif:210831.feature-gate-env
Sep 14, 2021
Merged

spanconfig: disable infrastructure unless envvar is set#69658
craig[bot] merged 2 commits intocockroachdb:masterfrom
irfansharif:210831.feature-gate-env

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

Cluster settings are too easy a switch to reach for to enable the new
span configs machinery. Let's gate it behind a necessary envvar as
well and use cluster settings to selectively toggle individual
components.

This commit also fixes a mighty silly bug introduced in #69047; for the
two methods we intended to gate use
spanconfig.experimental_kvaccessor.enabled, we were checking the
opposite condition or not checking it at all. Oops.

Release note: None
Release justification: non-production code changes

Cluster settings are too easy a switch to reach for to enable the new
span configs machinery. Let's gate it behind a necessary envvar as
well and use cluster settings to selectively toggle individual
components.

This commit also fixes a mighty silly bug introduced in cockroachdb#69047; for the
two methods we intended to gate use
`spanconfig.experimental_kvaccessor.enabled`, we were checking the
opposite condition or not checking it at all. Oops.

Release note: None
Release justification: non-production code changes
@irfansharif irfansharif requested review from a team as code owners August 31, 2021 18:15
@irfansharif irfansharif requested review from tbg and removed request for a team August 31, 2021 18:15
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@irfansharif irfansharif removed the request for review from tbg August 31, 2021 18:22
Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 11 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/server/server_sql.go, line 836 at r1 (raw file):

	// dependencies.
	spanConfigKnobs, _ := cfg.TestingKnobs.SpanConfig.(*spanconfig.TestingKnobs)
	spanConfigMgr := spanconfigmanager.New(

given you are constructing this thing unconditionally, why not just push the setting of whether or not this thing is disabled down into it so you don't need the two levels of checking in kvserver

Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another commit, PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @arulajmani)


pkg/server/server_sql.go, line 836 at r1 (raw file):

Previously, ajwerner wrote…

given you are constructing this thing unconditionally, why not just push the setting of whether or not this thing is disabled down into it so you don't need the two levels of checking in kvserver

Appended another commit here; it's actually better to gate this construction on the host tenant -- re-purposed the same envvar to make that happen. Also added another knob to disable the job itself.

@irfansharif irfansharif force-pushed the 210831.feature-gate-env branch from be1ef9a to 3f6dd52 Compare September 7, 2021 21:00
@irfansharif irfansharif requested a review from a team as a code owner September 7, 2021 21:00
@irfansharif irfansharif force-pushed the 210831.feature-gate-env branch from 3f6dd52 to 2f4fe6f Compare September 7, 2021 22:25
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, can you outline what happens when some nodes have the envvar enabled and some nodes haven't?

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @irfansharif)


pkg/server/node.go, line 1454 at r1 (raw file):

Quoted 4 lines of code…
	if !n.storeCfg.SpanConfigsEnabled {
		return nil, errors.New("use of span configs disabled")
	}
	entries, err := n.spanConfigAccessor.GetSpanConfigEntriesFor(ctx, req.Spans)

I still find it a bit painful that we do this two level checking. One here and one down a level in the implementation. I don't think it's awful but it has a bit of a smell. One option that I sort of like would be to have a no-op implementation of the accessor which returns the disabled error and plumb that thing through at construction time if its disabled instead of nil.


pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go, line 63 at r1 (raw file):

	ctx context.Context, spans []roachpb.Span,
) (resp []roachpb.SpanConfigEntry, retErr error) {
	if kvAccessorEnabled.Get(&k.settings.SV) {

make yourself a helper that returns the same error so we don't have to keep repeating it.

if err := ensureEnabled(k.settings); err != nil {
    return nil, err
}

@irfansharif irfansharif force-pushed the 210831.feature-gate-env branch from 2f4fe6f to 33677ad Compare September 13, 2021 18:00
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, can you outline what happens when some nodes have the envvar enabled and some nodes haven't?

Stuff breaks. This (EXPERIMENTAL) envvar exists is in the same space as our COCKROACH_EXPERIMENTAL_LINEARIZABLE one (busted for other reasons). The code being feature gated here is code we want to make very difficult to run for 21.2 customers. Possibly only our Serverless team will be trying it out in its current form (with all nodes running it). It was deemed that just the cluster settings are too easy a knob to reach for to start running all this experimental code, hence the additional hurdle.

TFTRs! Will merge on green unless there are other objections.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @arulajmani)


pkg/server/node.go, line 1454 at r1 (raw file):

Previously, ajwerner wrote…
	if !n.storeCfg.SpanConfigsEnabled {
		return nil, errors.New("use of span configs disabled")
	}
	entries, err := n.spanConfigAccessor.GetSpanConfigEntriesFor(ctx, req.Spans)

I still find it a bit painful that we do this two level checking. One here and one down a level in the implementation. I don't think it's awful but it has a bit of a smell. One option that I sort of like would be to have a no-op implementation of the accessor which returns the disabled error and plumb that thing through at construction time if its disabled instead of nil.

sure, Done.


pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go, line 63 at r1 (raw file):

Previously, ajwerner wrote…

make yourself a helper that returns the same error so we don't have to keep repeating it.

if err := ensureEnabled(k.settings); err != nil {
    return nil, err
}

We're only checking it just twice (the interface is not going to get wider), and using the helper is less verbose at the call sites than just checking the setting -- have a slight preference for keeping things as they are.

@irfansharif irfansharif force-pushed the 210831.feature-gate-env branch from 33677ad to 555730a Compare September 13, 2021 18:01
@irfansharif irfansharif added backport-21.2.x release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. branch-release-21.2 labels Sep 13, 2021
@irfansharif irfansharif force-pushed the 210831.feature-gate-env branch 2 times, most recently from 195723a to 8da4da3 Compare September 13, 2021 22:31
@irfansharif irfansharif force-pushed the 210831.feature-gate-env branch from 8da4da3 to 44ca933 Compare September 14, 2021 14:55
We'll repurpose the COCKROACH_EXPERIMENTAL_SPAN_CONFIGS envvar
introduced in the previous commit to prevent initializing the span
config manager for the host tenant (now truly switching off all span
config infrastructure unless explicitly opted into).

For both host and secondary tenants, we also want to disable the
creation of the span config reconciliation job by default. We introduce
`spanconfig.experimental_reconciliation_job.enabled` to that end.

---

We re-work some of the controls around when we start the reconciliation
job. It was previously possible for a tenant pod to start with a
conservative view of it's cluster version setting, decide not to create
the reconciliation job, find out about the up-to-date cluster version
(introducing the span config job), but only create it when the next
`check_interval` timer fired. This was undesirable.

We also want to "bounce" the job immediately when the `check_interval`
setting is changed. The earlier code didn't quite provide that property.

Release justification: non-production code changes
Release note: None
@irfansharif irfansharif force-pushed the 210831.feature-gate-env branch from 44ca933 to f3c94ef Compare September 14, 2021 16:48
@irfansharif
Copy link
Copy Markdown
Contributor Author

bors r+

Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 11 files at r1, 3 of 8 files at r3, 1 of 7 files at r4, 1 of 4 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @irfansharif)


pkg/server/server_sql.go, line 835 at r6 (raw file):

	if !codec.ForSystemTenant() || cfg.SpanConfigsEnabled {
		// Instantiate a span config manager. If we're the host tenant we'll
		// only do it if COCKROACH_EXPERIMENTAL_SPAN_CONFIGS is set.

Is there a reason we instantiate a manager for secondary tenants if COCKROACH_EXPERIMENTAL_SPAN_CONFIGS is unset?


pkg/spanconfig/spanconfigmanager/manager.go, line 46 at r6 (raw file):

//
// For the host tenant it has no effect unless
// COCKROACH_EXPERIMENTAL_SPAN_CONFIGS is also set.

Is there a reason to make a host tenant distinction here?


pkg/spanconfig/spanconfigmanager/manager.go, line 154 at r6 (raw file):

	triggerJobCheck()
	for {
		timer.Reset(checkReconciliationJobInterval.Get(&m.settings.SV))

Any reason to get rid of the "last checked" notion?


pkg/spanconfig/spanconfigmanager/manager.go, line 157 at r6 (raw file):

		select {
		case <-jobCheckCh:
			timer.Read = true

This is independent of the timer, is it not?


pkg/spanconfig/spanconfigmanager/manager_test.go, line 220 at r6 (raw file):

	var interceptCount int32
	checkInterceptCountGreaterThan := func(min int32) int32 {

Does this check need to be greater than? Can't it just be last count + 1?


pkg/sql/logictest/logic.go, line 463 at r6 (raw file):

	// cluster. Connections on behalf of the logic test will go to that tenant.
	useTenant bool
	// If true, XXX:

Unfinished comment?

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 14, 2021

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 14, 2021

Build succeeded:

@craig craig bot merged commit 1f98510 into cockroachdb:master Sep 14, 2021
@irfansharif irfansharif deleted the 210831.feature-gate-env branch September 14, 2021 23:05
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Sep 14, 2021
Accidentally left unfinished in cockroachdb#69658.

Release note: None
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/server/server_sql.go, line 835 at r6 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Is there a reason we instantiate a manager for secondary tenants if COCKROACH_EXPERIMENTAL_SPAN_CONFIGS is unset?

See other comment thread.


pkg/spanconfig/spanconfigmanager/manager.go, line 46 at r6 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Is there a reason to make a host tenant distinction here?

These jobs are created regardless of the envvar for secondary tenants (provided the setting is enabled). The envvar is protection only for the host tenant/KV (we discussed this offline during our pod meeting).


pkg/spanconfig/spanconfigmanager/manager.go, line 154 at r6 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Any reason to get rid of the "last checked" notion?

We don't need it, this code block is doing the same thing as we were before. Tried to elaborate in the commit message why I reworked this bit (+ wrote a hopefully readable test):

We re-work some of the controls around when we start the reconciliation
job. It was previously possible for a tenant pod to start with a
conservative view of it's cluster version setting, decide not to create
the reconciliation job, find out about the up-to-date cluster version
(introducing the span config job), but only create it when the next
`check_interval` timer fired. This was undesirable.

We also want to "bounce" the job immediately when the `check_interval`
setting is changed. The earlier code didn't quite provide that property.

pkg/spanconfig/spanconfigmanager/manager.go, line 157 at r6 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

This is independent of the timer, is it not?

It's the contract of the timer util package:

// var timer timeutil.Timer
// defer timer.Stop()
// for {
// timer.Reset(wait)
// switch {
// case <-timer.C:
// timer.Read = true
// ...
// }
// }

Cause we're resetting the timer at the start of the for loop, we'd need to mark this as Read even if this branch fires.


pkg/spanconfig/spanconfigmanager/manager_test.go, line 220 at r6 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Does this check need to be greater than? Can't it just be last count + 1?

I was seeing the OnChange callback get triggered multiple times when a setting was changed. I couldn't figure out why, something in the settings package. Instead of reworking that stuff to be 'guaranteed invoked once per setting change', this tiny testing change was easier (read: lazier).


pkg/sql/logictest/logic.go, line 463 at r6 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Unfinished comment?

Woops. #70232.

craig bot pushed a commit that referenced this pull request Sep 15, 2021
70232: logictest: fix a field comment r=irfansharif a=irfansharif

Accidentally left unfinished in #69658.

Release note: None

70240: sql: remove a couple of tests for dropping interleaved indexes r=yuzefovich a=yuzefovich

Addresses: #69150.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Sep 20, 2021
Accidentally left unfinished in cockroachdb#69658.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants