*: configure the use of separated intents in a cluster#59829
*: configure the use of separated intents in a cluster#59829craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
c72ed24 to
e3e08d7
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 57 of 57 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @petermattis, and @sumeerbhola)
pkg/kv/kvserver/store_snapshot.go, line 98 at r1 (raw file):
// Separated locks and snapshots send/receive: // When running in a mixed version cluster with 21.1 and 20.2, snapshots sent // by 20.1 nodes will attempt to read the lock table key space and send any
s/20.1/21.1/
pkg/kv/kvserver/rditer/replica_data_iter.go, line 83 at r1 (raw file):
func makeRangeKeyRanges(d *roachpb.RangeDescriptor, replicatedOnly bool) []KeyRange { rangeIDLocal := MakeRangeIDLocalKeyRange(d.RangeID, replicatedOnly)
Can the functions in this file be simplified at all now that storage.DisallowSeparatedIntents is gone? It seems like they can, especially if we're ok with assuming the len of the array returned by makeRangeLockTableKeyRanges at its callers.
pkg/server/sticky_engine.go, line 108 at r1 (raw file):
id: spec.StickyInMemoryEngineID, closed: false, // This engine will stay alive after the node dies, so don't want the caller to pass
"so we don't"
pkg/storage/intent_reader_writer.go, line 62 at r1 (raw file):
// initialize it up-front, but don't know if they are being used by code // that cares about intents (e.g. a temporary Engine used for disk-spilling // during query execution will never read-write intents).
There are trade-offs with this lazy initialization. On the one hand, it avoids the cost of computing writeSeparatedIntents for users who do not care about writing intents. On the other hand, it means that each intent write that a request issues will be forced to perform the computation, which looks moderately expensive. For instance, I think I see heap allocations and proto unmarshalling in clusterVersionSetting.activeVersionOrEmpty.
If we aren't relying on long-lived intentDemuxWriter instances observing changes to the cluster version or SeparatedIntentsEnabled setting, then maybe we can take a hybrid approach and memorize the lazy computation of writeSeparatedIntents. That would eliminate the trade-off with the lazy approach, at the expense of a bit more code.
EDIT: actually, is this structure long-lived?
pkg/storage/intent_reader_writer_test.go, line 218 at r1 (raw file):
// doesn't matter how the original call to createTestPebbleEngine // behaved in terms of separated intents config. w = wrapIntentWriter(&pw, makeSettingsForSeparatedIntents(false, separated))
nit: include a /* argName */ for false.
pkg/storage/pebble.go, line 881 at r1 (raw file):
errors.Errorf("Pebble without cluster.Settings does not support SafeToWriteSeparatedIntents") } return p.settings.Version.IsActive(ctx, clusterversion.SeparatedIntents), nil
nit: Should we reach inside the wrappedIntentWriter for the settings so that this impl directly mirrors the one in pebbleBatch?
fd654be to
c21fbee
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/kv/kvserver/store_snapshot.go, line 98 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/20.1/21.1/
Done
pkg/kv/kvserver/rditer/replica_data_iter.go, line 83 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can the functions in this file be simplified at all now that
storage.DisallowSeparatedIntentsis gone? It seems like they can, especially if we're ok with assuming the len of the array returned bymakeRangeLockTableKeyRangesat its callers.
Done
pkg/server/sticky_engine.go, line 108 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"so we don't"
Done
pkg/storage/intent_reader_writer.go, line 62 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
There are trade-offs with this lazy initialization. On the one hand, it avoids the cost of computing
writeSeparatedIntentsfor users who do not care about writing intents. On the other hand, it means that each intent write that a request issues will be forced to perform the computation, which looks moderately expensive. For instance, I think I see heap allocations and proto unmarshalling inclusterVersionSetting.activeVersionOrEmpty.If we aren't relying on long-lived
intentDemuxWriterinstances observing changes to the cluster version or SeparatedIntentsEnabled setting, then maybe we can take a hybrid approach and memorize the lazy computation ofwriteSeparatedIntents. That would eliminate the trade-off with the lazy approach, at the expense of a bit more code.EDIT: actually, is this structure long-lived?
I overlooked the expense -- good that you noticed.
We do have a long-lived intentDemuxWriter in Pebble which should not be getting used by performance sensitive code. I can cache the settings for the use in pebbleBatch which has the same lifetime as the batch.
Done.
pkg/storage/intent_reader_writer_test.go, line 218 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: include a
/* argName */forfalse.
Done
pkg/storage/pebble.go, line 881 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: Should we reach inside the
wrappedIntentWriterfor the settings so that this impl directly mirrors the one inpebbleBatch?
Done
nvb
left a comment
There was a problem hiding this comment.
Reviewed 17 of 17 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)
c00ad5f to
93f71af
Compare
This is done using a combination of the cluster version,
which prevents separated intents until any node is running
below the SeparatedIntents version, and then by the
storage.transaction.separated_intents.enabled cluster
setting which is on by default.
This replaces the temporary consts that were being used
previously for configuration.
The safety of transition from 20.2 to 21.1 is
discussed in:
- comments in intent_reader_writer.go (where the cluster
setting is declared).
- the new Writer.SafeToWriteSeparatedIntents which
gates the use of MVCCMetadata.TxnDidNotUpdateMeta
so that replicas do not diverge across new and
old nodes that do not understand this field.
- comments in store_snapshot.go regarding why snapshots
exchanged between 20.2 and 21.1 nodes are fine
even though the latter will iterate through the
lock table key space for outgoing snapshots and
construct ssts for deleting the lock table key
space for incoming snapshots.
The only non-test code that cares about this configuration
lives in the storage package and accesses the config
via the optional cluster.Settings provided when
constructing a Pebble Engine.
Affected tests are in 3 categories:
- tests that had different datadriven test files for
different configuration settings. These now iterate
through all the configurations. This is a small
subset of storage tests.
- tests that were adjusting their expected values
based on the configuration settings. These now
randomize the configuration settings.
- tests that were ignorant of the configuration
settings. These also randomize the settings, which
is done at different levels of abstraction depending
on the kind of test.
- When constructing the Engine in the test. For tests
that directly construct an Engine or indirectly
via LocalTestCluster or kvserver.testContext.
This allows for manipulating both the version
and the cluster setting.
- When constructing cluster.Settings for
server.TestServerFactory. This only manipulates
the cluster setting.
Informs cockroachdb#41720
Release note (ops change): adds the
storage.transaction.separated_intents.enabled cluster
setting, which enables separated intents by default.
93f71af to
cee3bcf
Compare
|
bors r+ |
|
Build succeeded: |
|
btw, I spent most of today trying to investigate the occasional TestLogic timeouts after 30m. I can't reproduce it locally -- even with parallel tests disabled, TestLogic consistently completes within 40min on my machine. And it is highly variable on teamcity e.g. 12m, 18m and of course the timeouts. There was nothing I could discern as bad error messages, or test stuckness. For example, tests like testdata/logic_test/fk and testdata/logic_test/aggregate were showing reasonable progress indicators when it timed out, though eyeballing testdata/logic_test/fk (which has a higher rate of progress) suggested that failed runs were running slower. I assumed this was lower cpu allocation on the test machine(s) -- is that a reasonable assumption? |
|
And increasing TESTTIMEOUT to 1h, which I tried here https://teamcity.cockroachdb.com/viewLog.html?buildId=2672528&buildTypeId=Cockroach_UnitTests&tab=testsInfo caused |
|
@sumeerbhola @nvanbenschoten I have been investigating a significant microbenchmark regression in I am running: At 48bcbf1 (commit right before this PR): At cee3bcf: At cee3bcf with The benchmark sets up a table with a FK to another table with a few rows and inserts 1000 rows in the child table. Any ideas what could have caused this? |
This is done using a combination of the cluster version,
which prevents separated intents until any node is running
below the SeparatedIntents version, and then by the
storage.transaction.separated_intents.enabled cluster
setting which is on by default.
This replaces the temporary consts that were being used
previously for configuration.
The safety of transition from 20.2 to 21.1 is
discussed in:
setting is declared).
gates the use of MVCCMetadata.TxnDidNotUpdateMeta
so that replicas do not diverge across new and
old nodes that do not understand this field.
exchanged between 20.2 and 21.1 nodes are fine
even though the latter will iterate through the
lock table key space for outgoing snapshots and
construct ssts for deleting the lock table key
space for incoming snapshots.
The only non-test code that cares about this configuration
lives in the storage package and accesses the config
via the optional cluster.Settings provided when
constructing a Pebble Engine.
Affected tests are in 3 categories:
different configuration settings. These now iterate
through all the configurations. This is a small
subset of storage tests.
based on the configuration settings. These now
randomize the configuration settings.
settings. These also randomize the settings, which
is done at different levels of abstraction depending
on the kind of test.
that directly construct an Engine or indirectly
via LocalTestCluster or kvserver.testContext.
This allows for manipulating both the version
and the cluster setting.
server.TestServerFactory. This only manipulates
the cluster setting.
Informs #41720
Release note (ops change): adds the
storage.transaction.separated_intents.enabled cluster
setting, which enables separated intents by default.