Skip to content

*: configure the use of separated intents in a cluster#59829

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
sumeerbhola:config_sep
Feb 12, 2021
Merged

*: configure the use of separated intents in a cluster#59829
craig[bot] merged 1 commit intocockroachdb:masterfrom
sumeerbhola:config_sep

Conversation

@sumeerbhola
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola commented Feb 4, 2021

This is done using a combination of the cluster version,
which prevents separated intents until any node is running
below the SeparatedIntents version, and then by the
storage.transaction.separated_intents.enabled cluster
setting which is on by default.

This replaces the temporary consts that were being used
previously for configuration.

The safety of transition from 20.2 to 21.1 is
discussed in:

  • comments in intent_reader_writer.go (where the cluster
    setting is declared).
  • the new Writer.SafeToWriteSeparatedIntents which
    gates the use of MVCCMetadata.TxnDidNotUpdateMeta
    so that replicas do not diverge across new and
    old nodes that do not understand this field.
  • comments in store_snapshot.go regarding why snapshots
    exchanged between 20.2 and 21.1 nodes are fine
    even though the latter will iterate through the
    lock table key space for outgoing snapshots and
    construct ssts for deleting the lock table key
    space for incoming snapshots.

The only non-test code that cares about this configuration
lives in the storage package and accesses the config
via the optional cluster.Settings provided when
constructing a Pebble Engine.

Affected tests are in 3 categories:

  • tests that had different datadriven test files for
    different configuration settings. These now iterate
    through all the configurations. This is a small
    subset of storage tests.
  • tests that were adjusting their expected values
    based on the configuration settings. These now
    randomize the configuration settings.
  • tests that were ignorant of the configuration
    settings. These also randomize the settings, which
    is done at different levels of abstraction depending
    on the kind of test.
    • When constructing the Engine in the test. For tests
      that directly construct an Engine or indirectly
      via LocalTestCluster or kvserver.testContext.
      This allows for manipulating both the version
      and the cluster setting.
    • When constructing cluster.Settings for
      server.TestServerFactory. This only manipulates
      the cluster setting.

Informs #41720

Release note (ops change): adds the
storage.transaction.separated_intents.enabled cluster
setting, which enables separated intents by default.

@sumeerbhola sumeerbhola requested a review from a team February 4, 2021 23:35
@sumeerbhola sumeerbhola requested a review from a team as a code owner February 4, 2021 23:35
@sumeerbhola sumeerbhola requested review from pbardea and removed request for a team February 4, 2021 23:35
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@sumeerbhola sumeerbhola removed request for a team and pbardea February 4, 2021 23:35
@sumeerbhola sumeerbhola force-pushed the config_sep branch 2 times, most recently from c72ed24 to e3e08d7 Compare February 5, 2021 15:21
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 57 of 57 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @petermattis, and @sumeerbhola)


pkg/kv/kvserver/store_snapshot.go, line 98 at r1 (raw file):

// Separated locks and snapshots send/receive:
// When running in a mixed version cluster with 21.1 and 20.2, snapshots sent
// by 20.1 nodes will attempt to read the lock table key space and send any

s/20.1/21.1/


pkg/kv/kvserver/rditer/replica_data_iter.go, line 83 at r1 (raw file):

func makeRangeKeyRanges(d *roachpb.RangeDescriptor, replicatedOnly bool) []KeyRange {
	rangeIDLocal := MakeRangeIDLocalKeyRange(d.RangeID, replicatedOnly)

Can the functions in this file be simplified at all now that storage.DisallowSeparatedIntents is gone? It seems like they can, especially if we're ok with assuming the len of the array returned by makeRangeLockTableKeyRanges at its callers.


pkg/server/sticky_engine.go, line 108 at r1 (raw file):

		id:     spec.StickyInMemoryEngineID,
		closed: false,
		// This engine will stay alive after the node dies, so don't want the caller to pass

"so we don't"


pkg/storage/intent_reader_writer.go, line 62 at r1 (raw file):

	// initialize it up-front, but don't know if they are being used by code
	// that cares about intents (e.g. a temporary Engine used for disk-spilling
	// during query execution will never read-write intents).

There are trade-offs with this lazy initialization. On the one hand, it avoids the cost of computing writeSeparatedIntents for users who do not care about writing intents. On the other hand, it means that each intent write that a request issues will be forced to perform the computation, which looks moderately expensive. For instance, I think I see heap allocations and proto unmarshalling in clusterVersionSetting.activeVersionOrEmpty.

If we aren't relying on long-lived intentDemuxWriter instances observing changes to the cluster version or SeparatedIntentsEnabled setting, then maybe we can take a hybrid approach and memorize the lazy computation of writeSeparatedIntents. That would eliminate the trade-off with the lazy approach, at the expense of a bit more code.

EDIT: actually, is this structure long-lived?


pkg/storage/intent_reader_writer_test.go, line 218 at r1 (raw file):

				// doesn't matter how the original call to createTestPebbleEngine
				// behaved in terms of separated intents config.
				w = wrapIntentWriter(&pw, makeSettingsForSeparatedIntents(false, separated))

nit: include a /* argName */ for false.


pkg/storage/pebble.go, line 881 at r1 (raw file):

			errors.Errorf("Pebble without cluster.Settings does not support SafeToWriteSeparatedIntents")
	}
	return p.settings.Version.IsActive(ctx, clusterversion.SeparatedIntents), nil

nit: Should we reach inside the wrappedIntentWriter for the settings so that this impl directly mirrors the one in pebbleBatch?

@sumeerbhola sumeerbhola force-pushed the config_sep branch 2 times, most recently from fd654be to c21fbee Compare February 10, 2021 22:14
Copy link
Copy Markdown
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/kv/kvserver/store_snapshot.go, line 98 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/20.1/21.1/

Done


pkg/kv/kvserver/rditer/replica_data_iter.go, line 83 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can the functions in this file be simplified at all now that storage.DisallowSeparatedIntents is gone? It seems like they can, especially if we're ok with assuming the len of the array returned by makeRangeLockTableKeyRanges at its callers.

Done


pkg/server/sticky_engine.go, line 108 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"so we don't"

Done


pkg/storage/intent_reader_writer.go, line 62 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

There are trade-offs with this lazy initialization. On the one hand, it avoids the cost of computing writeSeparatedIntents for users who do not care about writing intents. On the other hand, it means that each intent write that a request issues will be forced to perform the computation, which looks moderately expensive. For instance, I think I see heap allocations and proto unmarshalling in clusterVersionSetting.activeVersionOrEmpty.

If we aren't relying on long-lived intentDemuxWriter instances observing changes to the cluster version or SeparatedIntentsEnabled setting, then maybe we can take a hybrid approach and memorize the lazy computation of writeSeparatedIntents. That would eliminate the trade-off with the lazy approach, at the expense of a bit more code.

EDIT: actually, is this structure long-lived?

I overlooked the expense -- good that you noticed.
We do have a long-lived intentDemuxWriter in Pebble which should not be getting used by performance sensitive code. I can cache the settings for the use in pebbleBatch which has the same lifetime as the batch.
Done.


pkg/storage/intent_reader_writer_test.go, line 218 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: include a /* argName */ for false.

Done


pkg/storage/pebble.go, line 881 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: Should we reach inside the wrappedIntentWriter for the settings so that this impl directly mirrors the one in pebbleBatch?

Done

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 17 of 17 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)

@sumeerbhola sumeerbhola force-pushed the config_sep branch 7 times, most recently from c00ad5f to 93f71af Compare February 12, 2021 16:49
This is done using a combination of the cluster version,
which prevents separated intents until any node is running
below the SeparatedIntents version, and then by the
storage.transaction.separated_intents.enabled cluster
setting which is on by default.

This replaces the temporary consts that were being used
previously for configuration.

The safety of transition from 20.2 to 21.1 is
discussed in:
- comments in intent_reader_writer.go (where the cluster
  setting is declared).
- the new Writer.SafeToWriteSeparatedIntents which
  gates the use of MVCCMetadata.TxnDidNotUpdateMeta
  so that replicas do not diverge across new and
  old nodes that do not understand this field.
- comments in store_snapshot.go regarding why snapshots
  exchanged between 20.2 and 21.1 nodes are fine
  even though the latter will iterate through the
  lock table key space for outgoing snapshots and
  construct ssts for deleting the lock table key
  space for incoming snapshots.

The only non-test code that cares about this configuration
lives in the storage package and accesses the config
via the optional cluster.Settings provided when
constructing a Pebble Engine.

Affected tests are in 3 categories:
- tests that had different datadriven test files for
  different configuration settings. These now iterate
  through all the configurations. This is a small
  subset of storage tests.
- tests that were adjusting their expected values
  based on the configuration settings. These now
  randomize the configuration settings.
- tests that were ignorant of the configuration
  settings. These also randomize the settings, which
  is done at different levels of abstraction depending
  on the kind of test.
  - When constructing the Engine in the test. For tests
    that directly construct an Engine or indirectly
    via LocalTestCluster or kvserver.testContext.
    This allows for manipulating both the version
    and the cluster setting.
  - When constructing cluster.Settings for
    server.TestServerFactory. This only manipulates
    the cluster setting.

Informs cockroachdb#41720

Release note (ops change): adds the
storage.transaction.separated_intents.enabled cluster
setting, which enables separated intents by default.
@sumeerbhola
Copy link
Copy Markdown
Collaborator Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 12, 2021

Build succeeded:

@craig craig bot merged commit 98beba6 into cockroachdb:master Feb 12, 2021
@sumeerbhola
Copy link
Copy Markdown
Collaborator Author

btw, I spent most of today trying to investigate the occasional TestLogic timeouts after 30m. I can't reproduce it locally -- even with parallel tests disabled, TestLogic consistently completes within 40min on my machine. And it is highly variable on teamcity e.g. 12m, 18m and of course the timeouts. There was nothing I could discern as bad error messages, or test stuckness. For example, tests like testdata/logic_test/fk and testdata/logic_test/aggregate were showing reasonable progress indicators when it timed out, though eyeballing testdata/logic_test/fk (which has a higher rate of progress) suggested that failed runs were running slower. I assumed this was lower cpu allocation on the test machine(s) -- is that a reasonable assumption?

@sumeerbhola
Copy link
Copy Markdown
Collaborator Author

And increasing TESTTIMEOUT to 1h, which I tried here https://teamcity.cockroachdb.com/viewLog.html?buildId=2672528&buildTypeId=Cockroach_UnitTests&tab=testsInfo caused TestLogic to finish in 37m.

@RaduBerinde
Copy link
Copy Markdown
Member

@sumeerbhola @nvanbenschoten I have been investigating a significant microbenchmark regression in BenchmarkSQL/Cockroach/InsertFK/count=1000/nFks=1. Git bisect pointed me to this commit (cee3bcf). I noticed that the separated intents were disabled later, but that doesn't fix the regression.

I am running:

GOMAXPROCS=1 make bench PKG=./pkg/bench BENCHES='BenchmarkSQL/^^Cockroach/InsertFK/count=1000/nFks=1$$' TESTFLAGS='-count 8'

At 48bcbf1 (commit right before this PR):

SQL/Cockroach/InsertFK/count=1000/nFks=1  16.4ms ±21%

At cee3bcf:

SQL/Cockroach/InsertFK/count=1000/nFks=1  30.0ms ±33%

At cee3bcf with storage.transaction.separated_intents.enabled set to false:

SQL/Cockroach/InsertFK/count=1000/nFks=1  31.6ms ±44%

The benchmark sets up a table with a FK to another table with a few rows and inserts 1000 rows in the child table. Any ideas what could have caused this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants