spanconfig: disable infrastructure unless envvar is set#69658
spanconfig: disable infrastructure unless envvar is set#69658craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
Cluster settings are too easy a switch to reach for to enable the new span configs machinery. Let's gate it behind a necessary envvar as well and use cluster settings to selectively toggle individual components. This commit also fixes a mighty silly bug introduced in cockroachdb#69047; for the two methods we intended to gate use `spanconfig.experimental_kvaccessor.enabled`, we were checking the opposite condition or not checking it at all. Oops. Release note: None Release justification: non-production code changes
arulajmani
left a comment
There was a problem hiding this comment.
Reviewed 6 of 11 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)
pkg/server/server_sql.go, line 836 at r1 (raw file):
// dependencies. spanConfigKnobs, _ := cfg.TestingKnobs.SpanConfig.(*spanconfig.TestingKnobs) spanConfigMgr := spanconfigmanager.New(
given you are constructing this thing unconditionally, why not just push the setting of whether or not this thing is disabled down into it so you don't need the two levels of checking in kvserver
irfansharif
left a comment
There was a problem hiding this comment.
Added another commit, PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @arulajmani)
pkg/server/server_sql.go, line 836 at r1 (raw file):
Previously, ajwerner wrote…
given you are constructing this thing unconditionally, why not just push the setting of whether or not this thing is disabled down into it so you don't need the two levels of checking in
kvserver
Appended another commit here; it's actually better to gate this construction on the host tenant -- re-purposed the same envvar to make that happen. Also added another knob to disable the job itself.
be1ef9a to
3f6dd52
Compare
3f6dd52 to
2f4fe6f
Compare
knz
left a comment
There was a problem hiding this comment.
For my education, can you outline what happens when some nodes have the envvar enabled and some nodes haven't?
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @irfansharif)
pkg/server/node.go, line 1454 at r1 (raw file):
Quoted 4 lines of code…
if !n.storeCfg.SpanConfigsEnabled { return nil, errors.New("use of span configs disabled") } entries, err := n.spanConfigAccessor.GetSpanConfigEntriesFor(ctx, req.Spans)
I still find it a bit painful that we do this two level checking. One here and one down a level in the implementation. I don't think it's awful but it has a bit of a smell. One option that I sort of like would be to have a no-op implementation of the accessor which returns the disabled error and plumb that thing through at construction time if its disabled instead of nil.
pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go, line 63 at r1 (raw file):
ctx context.Context, spans []roachpb.Span, ) (resp []roachpb.SpanConfigEntry, retErr error) { if kvAccessorEnabled.Get(&k.settings.SV) {
make yourself a helper that returns the same error so we don't have to keep repeating it.
if err := ensureEnabled(k.settings); err != nil {
return nil, err
}
2f4fe6f to
33677ad
Compare
irfansharif
left a comment
There was a problem hiding this comment.
For my education, can you outline what happens when some nodes have the envvar enabled and some nodes haven't?
Stuff breaks. This (EXPERIMENTAL) envvar exists is in the same space as our COCKROACH_EXPERIMENTAL_LINEARIZABLE one (busted for other reasons). The code being feature gated here is code we want to make very difficult to run for 21.2 customers. Possibly only our Serverless team will be trying it out in its current form (with all nodes running it). It was deemed that just the cluster settings are too easy a knob to reach for to start running all this experimental code, hence the additional hurdle.
TFTRs! Will merge on green unless there are other objections.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @arulajmani)
pkg/server/node.go, line 1454 at r1 (raw file):
Previously, ajwerner wrote…
if !n.storeCfg.SpanConfigsEnabled { return nil, errors.New("use of span configs disabled") } entries, err := n.spanConfigAccessor.GetSpanConfigEntriesFor(ctx, req.Spans)I still find it a bit painful that we do this two level checking. One here and one down a level in the implementation. I don't think it's awful but it has a bit of a smell. One option that I sort of like would be to have a no-op implementation of the accessor which returns the disabled error and plumb that thing through at construction time if its disabled instead of
nil.
sure, Done.
pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go, line 63 at r1 (raw file):
Previously, ajwerner wrote…
make yourself a helper that returns the same error so we don't have to keep repeating it.
if err := ensureEnabled(k.settings); err != nil { return nil, err }
We're only checking it just twice (the interface is not going to get wider), and using the helper is less verbose at the call sites than just checking the setting -- have a slight preference for keeping things as they are.
33677ad to
555730a
Compare
195723a to
8da4da3
Compare
8da4da3 to
44ca933
Compare
We'll repurpose the COCKROACH_EXPERIMENTAL_SPAN_CONFIGS envvar introduced in the previous commit to prevent initializing the span config manager for the host tenant (now truly switching off all span config infrastructure unless explicitly opted into). For both host and secondary tenants, we also want to disable the creation of the span config reconciliation job by default. We introduce `spanconfig.experimental_reconciliation_job.enabled` to that end. --- We re-work some of the controls around when we start the reconciliation job. It was previously possible for a tenant pod to start with a conservative view of it's cluster version setting, decide not to create the reconciliation job, find out about the up-to-date cluster version (introducing the span config job), but only create it when the next `check_interval` timer fired. This was undesirable. We also want to "bounce" the job immediately when the `check_interval` setting is changed. The earlier code didn't quite provide that property. Release justification: non-production code changes Release note: None
44ca933 to
f3c94ef
Compare
|
bors r+ |
arulajmani
left a comment
There was a problem hiding this comment.
Reviewed 5 of 11 files at r1, 3 of 8 files at r3, 1 of 7 files at r4, 1 of 4 files at r6, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @irfansharif)
pkg/server/server_sql.go, line 835 at r6 (raw file):
if !codec.ForSystemTenant() || cfg.SpanConfigsEnabled { // Instantiate a span config manager. If we're the host tenant we'll // only do it if COCKROACH_EXPERIMENTAL_SPAN_CONFIGS is set.
Is there a reason we instantiate a manager for secondary tenants if COCKROACH_EXPERIMENTAL_SPAN_CONFIGS is unset?
pkg/spanconfig/spanconfigmanager/manager.go, line 46 at r6 (raw file):
// // For the host tenant it has no effect unless // COCKROACH_EXPERIMENTAL_SPAN_CONFIGS is also set.
Is there a reason to make a host tenant distinction here?
pkg/spanconfig/spanconfigmanager/manager.go, line 154 at r6 (raw file):
triggerJobCheck() for { timer.Reset(checkReconciliationJobInterval.Get(&m.settings.SV))
Any reason to get rid of the "last checked" notion?
pkg/spanconfig/spanconfigmanager/manager.go, line 157 at r6 (raw file):
select { case <-jobCheckCh: timer.Read = true
This is independent of the timer, is it not?
pkg/spanconfig/spanconfigmanager/manager_test.go, line 220 at r6 (raw file):
var interceptCount int32 checkInterceptCountGreaterThan := func(min int32) int32 {
Does this check need to be greater than? Can't it just be last count + 1?
pkg/sql/logictest/logic.go, line 463 at r6 (raw file):
// cluster. Connections on behalf of the logic test will go to that tenant. useTenant bool // If true, XXX:
Unfinished comment?
|
Build failed (retrying...): |
|
Build succeeded: |
Accidentally left unfinished in cockroachdb#69658. Release note: None
irfansharif
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/server/server_sql.go, line 835 at r6 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Is there a reason we instantiate a manager for secondary tenants if
COCKROACH_EXPERIMENTAL_SPAN_CONFIGSis unset?
See other comment thread.
pkg/spanconfig/spanconfigmanager/manager.go, line 46 at r6 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Is there a reason to make a host tenant distinction here?
These jobs are created regardless of the envvar for secondary tenants (provided the setting is enabled). The envvar is protection only for the host tenant/KV (we discussed this offline during our pod meeting).
pkg/spanconfig/spanconfigmanager/manager.go, line 154 at r6 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Any reason to get rid of the "last checked" notion?
We don't need it, this code block is doing the same thing as we were before. Tried to elaborate in the commit message why I reworked this bit (+ wrote a hopefully readable test):
We re-work some of the controls around when we start the reconciliation
job. It was previously possible for a tenant pod to start with a
conservative view of it's cluster version setting, decide not to create
the reconciliation job, find out about the up-to-date cluster version
(introducing the span config job), but only create it when the next
`check_interval` timer fired. This was undesirable.
We also want to "bounce" the job immediately when the `check_interval`
setting is changed. The earlier code didn't quite provide that property.
pkg/spanconfig/spanconfigmanager/manager.go, line 157 at r6 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
This is independent of the timer, is it not?
It's the contract of the timer util package:
cockroach/pkg/util/timeutil/timer.go
Lines 38 to 47 in c097a16
Cause we're resetting the timer at the start of the for loop, we'd need to mark this as Read even if this branch fires.
pkg/spanconfig/spanconfigmanager/manager_test.go, line 220 at r6 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Does this check need to be greater than? Can't it just be last count + 1?
I was seeing the OnChange callback get triggered multiple times when a setting was changed. I couldn't figure out why, something in the settings package. Instead of reworking that stuff to be 'guaranteed invoked once per setting change', this tiny testing change was easier (read: lazier).
pkg/sql/logictest/logic.go, line 463 at r6 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Unfinished comment?
Woops. #70232.
70232: logictest: fix a field comment r=irfansharif a=irfansharif Accidentally left unfinished in #69658. Release note: None 70240: sql: remove a couple of tests for dropping interleaved indexes r=yuzefovich a=yuzefovich Addresses: #69150. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Accidentally left unfinished in cockroachdb#69658. Release note: None
Cluster settings are too easy a switch to reach for to enable the new
span configs machinery. Let's gate it behind a necessary envvar as
well and use cluster settings to selectively toggle individual
components.
This commit also fixes a mighty silly bug introduced in #69047; for the
two methods we intended to gate use
spanconfig.experimental_kvaccessor.enabled, we were checking theopposite condition or not checking it at all. Oops.
Release note: None
Release justification: non-production code changes