kvnemesis: structure the config for the operation generator by danhhz · Pull Request #46017 · cockroachdb/cockroach

danhhz · 2020-03-12T01:28:26Z

Previously, it was an all flatted into a string->int map. This caused
the addition of another "client" operation, such as CPut, to require
quite a bit of unnecessary duplication.

It also was limiting in that the proportion of client operations
couldn't be varied between DB, Txn, Batch, and CommitInBatch. They can
now, though this isn't yet used.

Release justification: non-production code changes

Release note: None

cockroach-teamcity · 2020-03-12T01:28:32Z

This change is

danhhz · 2020-03-12T01:32:55Z

pkg/kv/kvnemesis/generator.go

 	// (see CrossRangeTxnWrapperSender) if they are. roachpb.SpanGroup can be used
 	// to efficiently check this.
-	config.OpPs[OpPBatch] = 0
+	config.Ops.Batch = BatchOperationConfig{}


Okay so this is wild. I tried making this into config.Ops.Batch.Ops.PutExisting = 0 instead, but it was failing assertions. The following batch committed at 2 different timestamps!? This seems to be a different case than the comment is describing.

{ b := &Batch{} b.Put(ctx, /Table/50/"c1556caf", v-5) // nil b.Get(ctx, /Table/50/"43275686") // (nil, nil) b.Put(ctx, /Table/50/"0998fa3b", v-6) // nil db0.Run(ctx, b) // nil }

committed batch different timestamps: /Table/50/"0998fa3b":1583976139.381769000,0->v-6 /Table/50/"c1556caf":1583976139.381491000,0->v-5

This seems to go away if I also disallow Get in the batches.

Were the two writes to different ranges?

I'm able to reproduce and do see the requests going to the same range, but it looks like they somehow ended up in their own non-transactional batch or something? More investigation needed.

Oh, I see, we're just straight-up missing a check that non-transactional batches don't get split into sub-batches due to request compatibility rules (as opposed to range boundaries). We'd want something like:

diff --git a/pkg/kv/kvclient/kvcoord/dist_sender.go b/pkg/kv/kvclient/kvcoord/dist_sender.go index 36954d9867..9ff2620ddd 100644 --- a/pkg/kv/kvclient/kvcoord/dist_sender.go +++ b/pkg/kv/kvclient/kvcoord/dist_sender.go @@ -692,6 +693,12 @@ func (ds *DistSender) Send( splitET = true } parts := splitBatchAndCheckForRefreshSpans(ba, splitET) + if len(parts) > 1 && ba.Txn == nil && ba.IsTransactional() && ba.ReadConsistency == roachpb.CONSISTENT { + // If there's no transaction and ba spans ranges, possibly re-run as part of + // a transaction for consistency. The case where we don't need to re-run is + // if the read consistency is not required. + return nil, roachpb.NewError(&roachpb.OpRequiresTxnError{}) + } if len(parts) > 1 && (ba.MaxSpanRequestKeys != 0 || ba.TargetBytes != 0) { // We already verified above that the batch contains only scan requests of the same type. // Such a batch should never need splitting.

This seems like such a large oversight that I wonder if there's something I'm missing about how this is expected to work. This is all super old code that hasn't been considered in a while. @tbg do you know how we could have missed this?

Also, I occasionally hit missing lock at sequence after that patch. This is being fixed in #46019 in case you also run into it.

@tbg do you know how we could have missed this?

Easy enough, we don't ever use such batches ourselves and much of the non-transactional API has basically been neglected for a long time. My expectation is that we want to remove it, I'd be curious to know if you see any roadblocks to doing so (i.e. nontransactional stuff will always be wrapped in a txn). AFAICT the few legitimate uses of nontransactional operations that we have (id generators etc) will be fine with the 1PC path (perhaps with the "allow only 1PC flag" sprinkled in).

nvb

Reviewed 2 of 3 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz)

pkg/kv/kvnemesis/generator.go, line 182 at r1 (raw file):

I'd be curious to know if you see any roadblocks to doing so

The only thing I can think of is that there may be cases where non-transactional requests benefit from their lazy timestamp assignment, but that's just speculation. I agree that we should push to remove it, but only to some degree. From the client's perspective, I think the batch API still makes sense, but we should probably just immediately put the batch into a Txn in client.DB.

We'd also need to figure out what to do with requests that truly aren't transactional. Maybe they're all marked with the isAlone flag already?

Anyway, I think we should fix this for now. I'll send something out tomorrow.

pkg/kv/kvnemesis/generator_test.go, line 29 at r2 (raw file):

	forEachIntField = func(v reflect.Value) bool {
		typ := v.Type()
		if typ.Kind() == reflect.Ptr {

nit: this is more easily structured as a switch on c.Type().Kind()

Previously, it was an all flatted into a string->int map. This caused the addition of another "client" operation, such as CPut, to require quite a bit of unnecessary duplication. It also was limiting in that the proportion of client operations couldn't be varied between DB, Txn, Batch, and CommitInBatch. They can now, though this isn't yet used. Release justification: non-production code changes Release note: None

danhhz

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @tbg)

pkg/kv/kvnemesis/generator.go, line 182 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd be curious to know if you see any roadblocks to doing so

The only thing I can think of is that there may be cases where non-transactional requests benefit from their lazy timestamp assignment, but that's just speculation. I agree that we should push to remove it, but only to some degree. From the client's perspective, I think the batch API still makes sense, but we should probably just immediately put the batch into a Txn in client.DB.

We'd also need to figure out what to do with requests that truly aren't transactional. Maybe they're all marked with the isAlone flag already?

Anyway, I think we should fix this for now. I'll send something out tomorrow.

Filed this as #46081

pkg/kv/kvnemesis/generator_test.go, line 29 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: this is more easily structured as a switch on c.Type().Kind()

Done

danhhz · 2020-03-13T16:57:05Z

bors r=nvanbenschoten

bobvawter · 2020-03-13T18:04:55Z

bors ping

craig · 2020-03-13T18:04:56Z

pong

bobvawter · 2020-03-13T18:07:22Z

bors r=nvanbenschoten

(Retrying)

craig · 2020-03-13T18:13:49Z

Build failed

GitHub CI (Cockroach)

danhhz · 2020-03-13T18:16:33Z

Failed because the merge commit didn't have a release justification

bors r=nvanbenschoten

craig · 2020-03-13T19:05:00Z

Build succeeded

GitHub CI (Cockroach)

46085: kv/kvserver: use LAI from before split when setting initialMaxClosed r=nvanbenschoten a=nvanbenschoten Addresses 1 of 2 bugs in #44878. Skews with #46017, so I'll let that go in first. Before this change, `TestKVNemesisMultiNode` with splits enabled and run with 3x increased frequency failed every ~2 minutes under roachprod-stress. After this change, I have yet to see it fail after ~20~ 40 minutes. The initialMaxClosed is assigned to the RHS replica to ensure that follower reads do not regress following the split. After the split occurs there will be no information in the closedts subsystem about the newly minted RHS range from its leaseholder's store. Furthermore, the RHS will have a lease start time equal to that of the LHS which might be quite old. This means that timestamps which follow the least StartTime for the LHS part are below the current closed timestamp for the LHS would no longer be readable on the RHS after the split. It is necessary for correctness that the call to maxClosed used to determine the current closed timestamp happens during the splitPreApply so that it uses a LAI that is _before_ the index at which this split is applied. If it were to refer to a LAI equal to or after the split then the value of initialMaxClosed might be unsafe. Concretely, any closed timestamp based on an LAI that is equal to or above the split index might be larger than the initial closed timestamp assigned to the RHS range's initial leaseholder. This is because the LHS range's leaseholder could continue closing out timestamps at the split's LAI after applying the split. Slow followers in that range could hear about these closed timestamp notifications before applying the split themselves. If these slow followers were allowed to pass these closed timestamps created after the split to the RHS replicas they create during the application of the split then these RHS replicas might end up with initialMaxClosed values above their current range's official closed timestamp. The leaseholder of the RHS range could then propose a write at a timestamp below this initialMaxClosed, violating the closed timestamp systems most important property. Using an LAI from before the index at which this split is applied avoids the hazard and ensures that no replica on the RHS is created with an initialMaxClosed that could be violated by a proposal on the RHS's initial leaseholder. See #44878. Release justification: fixes a high-severity bug in existing functionality which could cause CDC to violate its resolved timestamp guarantee. To some observers, this could look like data loss in CDC. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

danhhz requested a review from nvb March 12, 2020 01:28

danhhz commented Mar 12, 2020

View reviewed changes

danhhz force-pushed the kvnemesis_opconfig branch 2 times, most recently from 9873626 to 956543d Compare March 12, 2020 15:54

nvb approved these changes Mar 13, 2020

View reviewed changes

danhhz force-pushed the kvnemesis_opconfig branch from 956543d to 3d7a199 Compare March 13, 2020 15:48

danhhz mentioned this pull request Mar 13, 2020

kv: non-transactional batches can be split into sub-batches #46081

Open

danhhz force-pushed the kvnemesis_opconfig branch from 3d7a199 to 915bfe8 Compare March 13, 2020 15:59

danhhz commented Mar 13, 2020

View reviewed changes

nvb mentioned this pull request Mar 13, 2020

kv/kvserver: use LAI from before split when setting initialMaxClosed #46085

Merged

craig bot merged commit d4be53b into cockroachdb:master Mar 13, 2020

danhhz deleted the kvnemesis_opconfig branch March 13, 2020 19:57

This was referenced Mar 16, 2020

release-19.2: kv/kvserver: use LAI from before split when setting initialMaxClosed #46154

Merged

kv: batch double committing #46374

Closed

Conversation

danhhz commented Mar 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Mar 12, 2020

Uh oh!

danhhz Mar 12, 2020

Choose a reason for hiding this comment

Uh oh!

nvb Mar 12, 2020

Choose a reason for hiding this comment

Uh oh!

nvb Mar 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nvb Mar 12, 2020

Choose a reason for hiding this comment

Uh oh!

nvb Mar 12, 2020

Choose a reason for hiding this comment

Uh oh!

tbg Mar 12, 2020

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

danhhz commented Mar 13, 2020

Uh oh!

bobvawter commented Mar 13, 2020

Uh oh!

craig bot commented Mar 13, 2020

Uh oh!

bobvawter commented Mar 13, 2020

Uh oh!

craig bot commented Mar 13, 2020

Build failed

Uh oh!

danhhz commented Mar 13, 2020

Uh oh!

craig bot commented Mar 13, 2020

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

danhhz commented Mar 12, 2020 •

edited

Loading

nvb Mar 12, 2020 •

edited

Loading