Skip to content

kvserver: introduce ProbeRequest#72972

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
tbg:probe-request
Dec 9, 2021
Merged

kvserver: introduce ProbeRequest#72972
craig[bot] merged 2 commits intocockroachdb:masterfrom
tbg:probe-request

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Nov 19, 2021

This commit introduces a new KV request type ProbeRequest. This
request performs a mutationless round-trip through the replication
layer. The proximate reason to add this request is #33007, where we want
to fail-fast requests on a Replica until a ProbeRequest succeeds.

The alternative to introducing a new request type is using a a Put on
keys.RangeProbeKey (a key introduced for kvprober). However, when
the Replica circuit breakers perform a probe, they have to make sure
that this probe doesn't get caught up in the circuit breakers
themselves. In particular, the circuit breaker's request needs special
casing with respect to lease requests (which would otherwise also bounce
on the circuit breaker), and in particular we would also like the probe
to be proposed (via the local raft instance) when the local replica is
not the leaseholder, as this allows the circuit breakers to cover more
ground and ascertain that the Replica has access to the replication
layer regardless of its status.

ProbeRequest bypasses lease checks and can thus be processed by any
Replica. Upon application, they are guaranteed to result in a forced
error, and we suppress this error above the replication layer (i.e. at
the waiting request). This means that receiving a nil error from the
probe implies that the probe successfully made it through latching,
evaluation, and application.

While no concrete plans exist for adoption of ProbeRequest elsewhere,
this might be of interest for kvprober and as a general building block
for quickly assessing overall KV health; for example we may extend #72732
to facilitate sending a probe to the leaseholders of all Ranges in the
system.

ProbeRequest is a point request. A ranged version is possible but
since DistSender will not return partial responses, it is not
introduced at this point. This can be done when a concrete need
arises.

ProbeRequest by default will be routed to the leaseholder, but it
can also be used with the NEAREST routing policy.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg force-pushed the probe-request branch 4 times, most recently from b97d297 to 64ad3a8 Compare November 19, 2021 17:01
tbg added a commit to tbg/cockroach that referenced this pull request Nov 22, 2021
I found myself having to make a change to this method in cockroachdb#72972 and was
really unsure about when this returned a nonzero lease status and
whether that even mattered. Also, the code was hard to grok due to
the large number of "fallthrough" if-else statements.

Now it follows a standard pattern of checking and immediately returning
errors as early as possible, and prefers early returns for the success
cases as well.

The method now clearly states what its contract (now) is: it will either
return a valid lease status and no error (leaseholder read), or a zero
lease status and no error (follower read or lease check skipped), or a
zero lease status and an error (can't serve request, for example invalid
lease, out of bounds, or pending merge, etc). Leases previously returned
a half-populated lease status up for the purpose of communicating the
sequence to use during proposal. However, it was cleaner to return a
zero status and to repurpose the existing special casing for leases
in `evalAndPropose` path to pull the sequence from the lease request.

I checked the callers to make sure that none of the now absent lease
statuses introduce a behavior change. The TL;DR is that callers always
ignored all but the valid lease status.

----

The simplest caller is `executeAdminBatch` which doesn't even assign
the returned lease status:

https://github.com/cockroachdb/cockroach/blob/55720d0a4c0dd8030cd19645461226d173eefc2e/pkg/kv/kvserver/replica_send.go#L759-L775

The second caller is `executeReadOnlyBatch`:

https://github.com/cockroachdb/cockroach/blob/86e2edece61091944d8575336b5de74fafc28b35/pkg/kv/kvserver/replica_read.go#L42-L46

`st` is used only if `err == nil`, in two places:

https://github.com/cockroachdb/cockroach/blob/64aadfec4b3877371b6f71bf5f9aa83bd97aa515/pkg/kv/kvserver/observedts/limit.go#L42-L50

This will return early in case of a zero lease status, i.e. is a no-op
in this case.

The second use is storing `st` into the `endCmds`:

https://github.com/cockroachdb/cockroach/blob/86e2edece61091944d8575336b5de74fafc28b35/pkg/kv/kvserver/replica_read.go#L152

which is accessed only under a similar lease validity check:

https://github.com/cockroachdb/cockroach/blob/55720d0a4c0dd8030cd19645461226d173eefc2e/pkg/kv/kvserver/replica_send.go#L1019-L1021

The third caller is `executeWriteBatch`:

https://github.com/cockroachdb/cockroach/blob/455cdddc6d75c03645f486b22970e5c6198a8d56/pkg/kv/kvserver/replica_write.go#L81-L89

We handled `ComputeLocalUncertaintyLimit` already. Other than that, `st`
is passed to `evalAndPropose`. In that method, since it is the write
path, we shouldn't see follower reads and so the remaining case are
leases, where the status will be zero and the special-casing in that
method takes over.

----

This commit also simplifies leaseGoodToGoRLocked. It was repeatedly
reassigning `status` which I found hard to follow.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Nov 23, 2021
I found myself having to make a change to this method in cockroachdb#72972 and was
really unsure about when this returned a nonzero lease status and
whether that even mattered. Also, the code was hard to grok due to
the large number of "fallthrough" if-else statements.

Now it follows a standard pattern of checking and immediately returning
errors as early as possible, and prefers early returns for the success
cases as well.

The method now clearly states what its contract (now) is: it will either
return a valid lease status and no error (leaseholder read), or a zero
lease status and no error (follower read or lease check skipped), or a
zero lease status and an error (can't serve request, for example invalid
lease, out of bounds, or pending merge, etc). Leases previously returned
a half-populated lease status up for the purpose of communicating the
sequence to use during proposal. However, it was cleaner to return a
zero status and to repurpose the existing special casing for leases
in `evalAndPropose` path to pull the sequence from the lease request.

I checked the callers to make sure that none of the now absent lease
statuses introduce a behavior change. The TL;DR is that callers always
ignored all but the valid lease status.

----

The simplest caller is `executeAdminBatch` which doesn't even assign
the returned lease status:

https://github.com/cockroachdb/cockroach/blob/55720d0a4c0dd8030cd19645461226d173eefc2e/pkg/kv/kvserver/replica_send.go#L759-L775

The second caller is `executeReadOnlyBatch`:

https://github.com/cockroachdb/cockroach/blob/86e2edece61091944d8575336b5de74fafc28b35/pkg/kv/kvserver/replica_read.go#L42-L46

`st` is used only if `err == nil`, in two places:

https://github.com/cockroachdb/cockroach/blob/64aadfec4b3877371b6f71bf5f9aa83bd97aa515/pkg/kv/kvserver/observedts/limit.go#L42-L50

This will return early in case of a zero lease status, i.e. is a no-op
in this case.

The second use is storing `st` into the `endCmds`:

https://github.com/cockroachdb/cockroach/blob/86e2edece61091944d8575336b5de74fafc28b35/pkg/kv/kvserver/replica_read.go#L152

which is accessed only under a similar lease validity check:

https://github.com/cockroachdb/cockroach/blob/55720d0a4c0dd8030cd19645461226d173eefc2e/pkg/kv/kvserver/replica_send.go#L1019-L1021

The third caller is `executeWriteBatch`:

https://github.com/cockroachdb/cockroach/blob/455cdddc6d75c03645f486b22970e5c6198a8d56/pkg/kv/kvserver/replica_write.go#L81-L89

We handled `ComputeLocalUncertaintyLimit` already. Other than that, `st`
is passed to `evalAndPropose`. In that method, since it is the write
path, we shouldn't see follower reads and so the remaining case are
leases, where the status will be zero and the special-casing in that
method takes over.

----

This commit also simplifies leaseGoodToGoRLocked. It was repeatedly
reassigning `status` which I found hard to follow.

Release note: None
craig bot pushed a commit that referenced this pull request Nov 24, 2021
72978: kvserver: simplify requestCanProceed and leaseGoodToGo r=nvanbenschoten a=tbg

I found myself having to make a change to this method in #72972 and was
really unsure about when this returned a nonzero lease status and
whether that even mattered. Also, the code was hard to grok due to
the large number of "fallthrough" if-else statements.

Now it follows a standard pattern of checking and immediately returning
errors as early as possible, and prefers early returns for the success
cases as well.

The method now clearly states what its contract (now) is: it will either
return a valid lease status and no error (leaseholder read), or a zero
lease status and no error (follower read or lease check skipped), or a
zero lease status and an error (can't serve request, for example invalid
lease, out of bounds, or pending merge, etc). Leases previously returned
a half-populated lease status up for the purpose of communicating the
sequence to use during proposal. However, it was cleaner to return a
zero status and to repurpose the existing special casing for leases
in `evalAndPropose` path to pull the sequence from the lease request.

I checked the callers to make sure that none of the now absent lease
statuses introduce a behavior change. The TL;DR is that callers always
ignored all but the valid lease status.

----

The simplest caller is `executeAdminBatch` which doesn't even assign
the returned lease status:

https://github.com/cockroachdb/cockroach/blob/55720d0a4c0dd8030cd19645461226d173eefc2e/pkg/kv/kvserver/replica_send.go#L759-L775

The second caller is `executeReadOnlyBatch`:

https://github.com/cockroachdb/cockroach/blob/86e2edece61091944d8575336b5de74fafc28b35/pkg/kv/kvserver/replica_read.go#L42-L46

`st` is used only if `err == nil`, in two places:

https://github.com/cockroachdb/cockroach/blob/64aadfec4b3877371b6f71bf5f9aa83bd97aa515/pkg/kv/kvserver/observedts/limit.go#L42-L50

This will return early in case of a zero lease status, i.e. is a no-op
in this case.

The second use is storing `st` into the `endCmds`:

https://github.com/cockroachdb/cockroach/blob/86e2edece61091944d8575336b5de74fafc28b35/pkg/kv/kvserver/replica_read.go#L152

which is accessed only under a similar lease validity check:

https://github.com/cockroachdb/cockroach/blob/55720d0a4c0dd8030cd19645461226d173eefc2e/pkg/kv/kvserver/replica_send.go#L1019-L1021

The third caller is `executeWriteBatch`:

https://github.com/cockroachdb/cockroach/blob/455cdddc6d75c03645f486b22970e5c6198a8d56/pkg/kv/kvserver/replica_write.go#L81-L89

We handled `ComputeLocalUncertaintyLimit` already. Other than that, `st`
is passed to `evalAndPropose`. In that method, since it is the write
path, we shouldn't see follower reads and so the remaining case are
leases, where the status will be zero and the special-casing in that
method takes over.

----

This commit also simplifies leaseGoodToGoRLocked. It was repeatedly
reassigning `status` which I found hard to follow.

Release note: None


73095: sql: reset indexedTypeFormatter on FmtCtx close r=RichardJCai a=RichardJCai

Previously indexedTypeFormatter wouldn't be reset when
FmtCtx.Close() was called causing the indexedTypeFormatter
function to potentially be reused.

Release note (enterprise change): Fix a bug where restore can sometimes map OIDs
to invalid types in certain circumstances containing UDTs.

Co-authored-by: Tobias Grieger <tobias.schottdorf@gmail.com>
Co-authored-by: richardjcai <caioftherichard@gmail.com>
@tbg tbg force-pushed the probe-request branch 2 times, most recently from 9940e1b to efc5809 Compare November 26, 2021 15:22
@tbg tbg marked this pull request as ready for review November 26, 2021 15:22
@tbg tbg requested a review from a team as a code owner November 26, 2021 15:22
@tbg tbg requested review from a team and erikgrinaker November 26, 2021 15:22
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 20 of 20 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


-- commits, line 9 at r1:
"a a"


-- commits, line 23 at r1:
It might be useful to, in the future, extend this with a variant that also respects the lease (e.g. with a request parameter). This might make sense e.g. for kvprober, which I suppose would ultimately want to know that the lease is also valid and able to propose and apply a command. I don't think there should be anything preventing this, right?


-- commits, line 50 at r1:
FitlerFilter


pkg/kv/kvserver/replica_application_state_machine.go, line 175 at r1 (raw file):

// corresponding to a ProbeRequest is handled.
//
// This error is used with `errors.Is`, so don't change it or mixed-version

I don't think this is accurate. The error is instantiated on the local node during command application, and the errors.Is() check is also done on the local node after passing it via local memory, so it shouldn't be possible to get an error from a different version here.


pkg/kv/kvserver/replica_application_state_machine.go, line 204 at r1 (raw file):

	replicaState *kvserverpb.ReplicaState,
) (uint64, proposalReevaluationReason, *roachpb.Error) {
	if raftCmd.ReplicatedEvalResult.IsProbe {

Update the function comment to describe this condition as well.


pkg/kv/kvserver/replica_application_state_machine.go, line 1278 at r1 (raw file):

	// NB: we intentionally never zero out rResult.IsProbe because probes are
	// implemented by always catching a forced error and thus never show up in
	// this method.

Consider adding a sentence saying that this effectively asserts that IsProbe returned a forced error.


pkg/kv/kvserver/replica_probe_test.go, line 33 at r1 (raw file):

// TestReplicaProbeRequest verifies that all Replicas can serve a ProbeRequest,
// and that it applies downstream of raft via a forced error.
func TestReplicaProbeRequest(t *testing.T) {

Consider adding a test-case when there is no leader as well, to make sure it actually fails/times out when the Raft group can't make progress. It's arguably a bit overkill, but that is the crucial property that we want from these requests.


pkg/kv/kvserver/replica_probe_test.go, line 52 at r1 (raw file):

		}
		if args.ForcedError == nil {
			t.Error("probe has no forcedError") // avoid Fatal on goroutine

I believe you could do if !assert.NotNil { return 0, args.ForcedError } here as well.


pkg/kv/kvserver/replica_probe_test.go, line 116 at r1 (raw file):

			return errors.Errorf("waiting for stores to apply command: %d/%d", act, exp)
		}
		n := 2 * len(tc.Servers) // sent two probes per server

Do we want to assert the routing policies here, or is that out of scope?


pkg/kv/kvserver/replica_probe_test.go, line 152 at r1 (raw file):

		ba.Add(probeReq)
		_, pErr := repl.Send(ctx, ba)
		require.True(t, errors.Is(pErr.GoError(), injErr.GoError()), "%+v", pErr.GoError())

require.ErrorIs()


pkg/kv/kvserver/replica_raft.go, line 184 at r1 (raw file):

		// to mutate state.
		var seq roachpb.LeaseSequence
		switch t := ba.Requests[0].GetInner().(type) {

I'd prefer to keep this in GetPrevLeaseForLeaseRequest() and return a zero-valued lease (or separate ok boolean) for irrelevant request types, to keep this sort of request logic in one place.


pkg/kv/kvserver/replica_write.go, line 276 at r1 (raw file):

				propResult.Err = roachpb.NewError(applicationErr)
			}
			if propResult.Err != nil && ba.IsSingleProbeRequest() && errors.Is(propResult.Err.GoError(), noopOnProbeCommandErr.GoError()) {

Wrap at 100.


pkg/kv/kvserver/testing_knobs.go, line 72 at r1 (raw file):

	// *Error replace the existing proposalReevaluationReason (if initially zero
	// only) and forced error.
	TestingApplyForcedErrFilter kvserverbase.ReplicaApplyFilter

It's unclear to me why TestingApplyFilter couldn't handle this case as well.


pkg/kv/kvserver/batcheval/cmd_probe.go, line 28 at r1 (raw file):

// Probe causes an effectless round-trip through the replication layer,
// i.e. it is a write that does not change any kv pair. It declares a
// read on the targeted key.

It declares read/write on the targeted key, but only with a latch (no lock).


pkg/roachpb/api.proto, line 178 at r1 (raw file):

// latches, and declare key access, but it will not check locks (i.e.
// if an intent exists on the key that is being probed, the probe will
// not observe it). ProbeRequest can be served by any Replica including

Consider being explicit about bypassing the lease here, i.e. it checks that the Raft group can make progress but not that the leaseholder (and thus the range) can make progress.

@tbg tbg requested a review from erikgrinaker December 2, 2021 10:25
Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I addressed all the small fry, didn't touch the test yet since I had a question about whether we even want to acquire any latch here, let's discuss that and then I will address the remaining open comments.
I "dismissed" you from all comments that were 100% trivial (I think your default disposition is different from most reviewers', might be something to look into for you, may not be intentional)

Dismissed @erikgrinaker from 9 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)


-- commits, line 9 at r1:

Previously, erikgrinaker (Erik Grinaker) wrote…

"a a"

Done.


-- commits, line 23 at r1:

Previously, erikgrinaker (Erik Grinaker) wrote…

It might be useful to, in the future, extend this with a variant that also respects the lease (e.g. with a request parameter). This might make sense e.g. for kvprober, which I suppose would ultimately want to know that the lease is also valid and able to propose and apply a command. I don't think there should be anything preventing this, right?

Yeah, should be possible, but doesn't seem necessary and I'm also not sure there is a strong case for a lease-respecting probe for kvprober since it can do the same thing using the existing KV API (it's doing that now) and I'd rather not complicate our KV API.


-- commits, line 50 at r1:

Previously, erikgrinaker (Erik Grinaker) wrote…

FitlerFilter

Done.


pkg/kv/kvserver/replica_application_state_machine.go, line 175 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I don't think this is accurate. The error is instantiated on the local node during command application, and the errors.Is() check is also done on the local node after passing it via local memory, so it shouldn't be possible to get an error from a different version here.

Done.


pkg/kv/kvserver/replica_application_state_machine.go, line 204 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Update the function comment to describe this condition as well.

Done.


pkg/kv/kvserver/replica_application_state_machine.go, line 1278 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Consider adding a sentence saying that this effectively asserts that IsProbe returned a forced error.

Done.


pkg/kv/kvserver/replica_probe_test.go, line 52 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I believe you could do if !assert.NotNil { return 0, args.ForcedError } here as well.

That would work, but we don't use assert much and I want to be extra clear here.


pkg/kv/kvserver/replica_probe_test.go, line 152 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

require.ErrorIs()

No, we need cockroachdb/errors.Is here. I think this is because we are round-tripping through (roachpb.Error).EncodedErr:

        	Error:      	Target error should be in err chain:
        	            	expected: "kv/kvserver/replica_probe_test.go:139: bang"
        	            	in chain: "kv/kvserver/replica_probe_test.go:139: bang"
        	            		"kv/kvserver/replica_probe_test.go:139: bang"
        	            		"bang"
        	            		"bang"

pkg/kv/kvserver/replica_raft.go, line 184 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I'd prefer to keep this in GetPrevLeaseForLeaseRequest() and return a zero-valued lease (or separate ok boolean) for irrelevant request types, to keep this sort of request logic in one place.

This is bespoke logic for these requests that is part of how they evaluate, which is not something I like to see tucked far away in roachpb, which is why I inlined this code. I think the previous abstraction was premature and made me anxious about what exactly the contracts were.


pkg/kv/kvserver/replica_write.go, line 276 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Wrap at 100.

Done.


pkg/kv/kvserver/testing_knobs.go, line 72 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

It's unclear to me why TestingApplyFilter couldn't handle this case as well.

All existing users of this filter would need to address the special case of getting a forced error. This would incur a large diff in this PR but also add a subtle condition that would need to be handled to all future users of this filter, which would likely be forgotten and lead to weird flakes in some cases.


pkg/kv/kvserver/batcheval/cmd_probe.go, line 28 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

It declares read/write on the targeted key, but only with a latch (no lock).

Oh, you're right, thanks.

I have been a little torn over whether to declare keys at all. From a purity-based point of view it would be nice for this request to really only check the replication layer, i.e. completely bypass latching. From the POV of the circuit breaker though, you want to make sure that you trip if the replication layer is ok but something doesn't release the latch. I think I lean towards the pure approach and to solve the rest in the circuit breaker probe? So the probe would

  • check replication layer via ProbeRequest
  • check lease (which is now up to date thanks to having seen the probe pop up from below raft)
  • if leaseholder is elsewhere & lease is valid, declare success
  • send ScanRequest(desc.Start, desc.End, limit=1) (with special-cased ctx that bypasses circuit breaker, and also code that translates that ctx into the lease request which has its own ctx to prevent that from bouncing on the breaker)
  • success

We want some of that lease stuff anyway, so baking a latch into ProbeRequest isn't really giving us a single-step circuit breaker probe, and the latch makes a failing probe just a little difficult to interpret since now there are more things that could block it.

WDYT?


pkg/roachpb/api.proto, line 178 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Consider being explicit about bypassing the lease here, i.e. it checks that the Raft group can make progress but not that the leaseholder (and thus the range) can make progress.

Done.

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo the latch tweak and test improvements.

I "dismissed" you from all comments that were 100% trivial (I think your default disposition is different from most reviewers', might be something to look into for you, may not be intentional)

Ah, yeah, it was set to blocking by default. I don't really use the disposition that actively, so changed it to discussing. Thanks for the heads-up.

Reviewed 9 of 9 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


-- commits, line 23 at r1:

Previously, tbg (Tobias Grieger) wrote…

Yeah, should be possible, but doesn't seem necessary and I'm also not sure there is a strong case for a lease-respecting probe for kvprober since it can do the same thing using the existing KV API (it's doing that now) and I'd rather not complicate our KV API.

Suppose that's true.


pkg/kv/kvserver/replica_probe_test.go, line 52 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

That would work, but we don't use assert much and I want to be extra clear here.

Sure, nbd.


pkg/kv/kvserver/replica_probe_test.go, line 152 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

No, we need cockroachdb/errors.Is here. I think this is because we are round-tripping through (roachpb.Error).EncodedErr:

        	Error:      	Target error should be in err chain:
        	            	expected: "kv/kvserver/replica_probe_test.go:139: bang"
        	            	in chain: "kv/kvserver/replica_probe_test.go:139: bang"
        	            		"kv/kvserver/replica_probe_test.go:139: bang"
        	            		"bang"
        	            		"bang"

Ah, you're right, forgot about that part.


pkg/kv/kvserver/batcheval/cmd_probe.go, line 28 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Oh, you're right, thanks.

I have been a little torn over whether to declare keys at all. From a purity-based point of view it would be nice for this request to really only check the replication layer, i.e. completely bypass latching. From the POV of the circuit breaker though, you want to make sure that you trip if the replication layer is ok but something doesn't release the latch. I think I lean towards the pure approach and to solve the rest in the circuit breaker probe? So the probe would

  • check replication layer via ProbeRequest
  • check lease (which is now up to date thanks to having seen the probe pop up from below raft)
  • if leaseholder is elsewhere & lease is valid, declare success
  • send ScanRequest(desc.Start, desc.End, limit=1) (with special-cased ctx that bypasses circuit breaker, and also code that translates that ctx into the lease request which has its own ctx to prevent that from bouncing on the breaker)
  • success

We want some of that lease stuff anyway, so baking a latch into ProbeRequest isn't really giving us a single-step circuit breaker probe, and the latch makes a failing probe just a little difficult to interpret since now there are more things that could block it.

WDYT?

I agree, I don't think latching adds anything here -- let's check that elsewhere in the circuit breaker.

tbg and others added 2 commits December 9, 2021 16:42
This commit introduces a new KV request type `ProbeRequest`. This
request performs a mutationless round-trip through the replication
layer. The proximate reason to add this request is cockroachdb#33007, where we want
to fail-fast requests on a Replica until a `ProbeRequest` succeeds.

The alternative to introducing a new request type is using a `Put` on
`keys.RangeProbeKey` (a key introduced for [kvprober]). However, when
the Replica circuit breakers perform a probe, they have to make sure
that this probe itself doesn't get caught up in the circuit breakers. In
particular, the circuit breaker's request needs special casing with
respect to lease requests (which would otherwise also bounce on the
circuit breaker). But even with that added (not too large a change),
the lease would be required, so we wouldn't be able to apply the
circuit breakers anywhere but the leaseholder (i.e. we would have
to heal the breaker when the lease is "not obtainable". What we are
after is really a per-Replica concept that is disjoint from the lease,
where we check that the replica can manage to get a command into the
log and observe it coming out of the replication layer.

`ProbeRequest` bypasses lease checks and can thus be processed by any
Replica. Upon application, they are guaranteed to result in a forced
error, and we suppress this error above the replication layer (i.e. at
the waiting request). This means that receiving a nil error from the
probe implies that the probe successfully made it through latching
(where it doesn't declare any key, i.e. should not conflict with long-
evaluating requests), evaluation, and application.

While no concrete plans exist for adoption of `ProbeRequest` elsewhere,
this might be of interest for [kvprober] and as a general building block
for quickly assessing overall KV health; for example we may extend cockroachdb#72732
to facilitate sending a probe to the leaseholders of all Ranges in the
system.

`ProbeRequest` is a point request. A ranged version is possible but
since `DistSender` will not return partial responses, it is not
introduced at this point. This can be done when a concrete need
arises.

`ProbeRequest` by default will be routed to the leaseholder, but it
can also be used with the `NEAREST` routing policy.

Some details helpful for reviewers follow.

- to enable testing of this new request type, `TestingApplyFilter` was
  augmented with a `TestingApplyForcedErrFilter` which gets to inspect
  and mutate forced errors below raft. The semantics of
  `TestingApplyFilter` remain unchainged; the newly added field
  `ForcedErr` is always nil when handed to such a filter.
- `ProbeRequest` is the first request type that skips the lease check
  and is not itself a lease check. This prompted a small refactor to
  `GetPrevLeaseForLeaseRequest`, which can now return false when the
  underlying request is a `Probe` and not `{Transfer,Request}Lease`.

[kvprober]: https://github.com/cockroachdb/cockroach/tree/2ad2bee257e78970ce2c457ddd6996099ed6727a/pkg/kv/kvprober
[routing policy]: https://github.com/cockroachdb/cockroach/blob/e1c7d6a55ed147aeb20d6d5c564d3d9bf73d2624/pkg/roachpb/api.proto#L48-L60

Release note: None
The preceding commit introduced `ProbeRequest`. The future user(s) of
this request type will need to check this cluster version.

Release note: None
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 9, 2021

bors r=erikgrinaker

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 9, 2021

Build succeeded:

@craig craig bot merged commit 936e3c3 into cockroachdb:master Dec 9, 2021
@tbg tbg deleted the probe-request branch December 9, 2021 17:09
tbg added a commit to tbg/cockroach that referenced this pull request Jan 11, 2022
Fixes cockroachdb#33007.
Closes cockroachdb#61311.

This PR uses the circuit breaker package introduced in cockroachdb#73641 to address
issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an `AmbiguousResultError`, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within `base.SlowRequestThreshold` (15s at time of
writing), the breaker is tripped, cancelling all inflight and future
requests until the breaker heals. Perhaps surprisingly, the "slowness"
existing detection (the informational "have been waiting ... for
proposal" warning in `executeWriteBatch`) was moved deeper into
replication (`refreshProposalsLocked`), where it now trips the breaker.
This has the advantage of providing a unified path for lease requests
(which don't go through `executeWriteBatch`) and pipelined writes (which
return before waiting on the inflight replication process). To make this
work, we need to pick a little fight with how leases are (not)
refreshed (cockroachdb#74711) and we need to remember the ticks at which a proposal
was first inserted (despite potential reproposals).

A tripped breaker deploys a background probe which sends a
`ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, all of this is off via the
`kv.replica_circuit_breakers.enabled` cluster setting while we determine
which of the many follow-up items to target for 22.1 (see cockroachdb#74705).
Primarily though, the breaker-sensitive context cancelation is a toy
implementation that comes with too much of a performance overhead (one
extra goroutine per request); cockroachdb#74707 will address that.

Other things not done here that we certainly want in the 22.1 release
are

- UI work (cockroachdb#74713)
- Metrics (cockroachdb#74505)

The release note is deferred to cockroachdb#74705 (where breakers are turned on).

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 11, 2022
Fixes cockroachdb#33007.
Closes cockroachdb#61311.

This PR uses the circuit breaker package introduced in cockroachdb#73641 to address
issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an `AmbiguousResultError`, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within `base.SlowRequestThreshold` (15s at time of
writing), the breaker is tripped, cancelling all inflight and future
requests until the breaker heals. Perhaps surprisingly, the existing
"slowness" detection (the informational "have been waiting ... for
proposal" warning in `executeWriteBatch`) was moved deeper into
replication (`refreshProposalsLocked`), where it now trips the breaker.
This has the advantage of providing a unified path for lease requests
(which don't go through `executeWriteBatch`) and pipelined writes (which
return before waiting on the inflight replication process). To make this
work, we need to pick a little fight with how leases are (not) refreshed
(cockroachdb#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

A tripped breaker deploys a background probe which sends a
`ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, all of this is opt-in via the
`kv.replica_circuit_breakers.enabled` cluster setting while we determine
which of the many follow-up items to target for 22.1 (see cockroachdb#74705).  For
example, the breaker-sensitive context cancelation is a toy
implementation that comes with too much of a performance overhead (one
extra goroutine per request); cockroachdb#74707 will address that.

Other things not done here that we certainly want in the 22.1 release
are UI work (cockroachdb#74713) and metrics (cockroachdb#74505).

The release note is deferred to cockroachdb#74705 (where breakers are turned on).

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 13, 2022
Fixes cockroachdb#33007.
Closes cockroachdb#61311.

This PR uses the circuit breaker package introduced in cockroachdb#73641 to address
issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an `AmbiguousResultError`, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within the timeout specified by the
`kv.replica_circuit_breaker.slow_replication_threshold` cluster setting,
the breaker is tripped, cancelling all inflight and future requests
until the breaker heals. Perhaps surprisingly, the existing "slowness"
detection (the informational "have been waiting ... for proposal"
warning in `executeWriteBatch`) was moved deeper into replication
(`refreshProposalsLocked`), where it now trips the breaker.  This has
the advantage of providing a unified path for lease requests (which
don't go through `executeWriteBatch`) and pipelined writes (which return
before waiting on the inflight replication process). To make this work,
we need to pick a little fight with how leases are (not) refreshed
(cockroachdb#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

A tripped breaker deploys a background probe which sends a
`ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, the cluster setting defaults to zero (disabling the
entire mechanism) until some necessary follow-up items have been
addressed (see cockroachdb#74705). For example, the breaker-sensitive context
cancelation is a toy implementation that comes with too much of a
performance overhead (one extra goroutine per request); cockroachdb#74707 will
address that.  Other things not done here that we certainly want in the
22.1 release are UI work (cockroachdb#74713) and metrics (cockroachdb#74505).

The release note is deferred to cockroachdb#74705 (where breakers are turned on).

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 13, 2022
Fixes cockroachdb#33007.
Closes cockroachdb#61311.

This PR uses the circuit breaker package introduced in cockroachdb#73641 to address
issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an `AmbiguousResultError`, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within the timeout specified by the
`kv.replica_circuit_breaker.slow_replication_threshold` cluster setting,
the breaker is tripped, cancelling all inflight and future requests
until the breaker heals. Perhaps surprisingly, the existing "slowness"
detection (the informational "have been waiting ... for proposal"
warning in `executeWriteBatch`) was moved deeper into replication
(`refreshProposalsLocked`), where it now trips the breaker.  This has
the advantage of providing a unified path for lease requests (which
don't go through `executeWriteBatch`) and pipelined writes (which return
before waiting on the inflight replication process). To make this work,
we need to pick a little fight with how leases are (not) refreshed
(cockroachdb#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

Perhaps surprisingly, when the breaker is tripped, *all* traffic to
the Replica gets the fail-fast behavior, not just mutations. This is
because even though a read may look good to serve based on the lease,
we would also need to check for latches, and in particular we would
need to fail-fast if there is a transitive dependency on any write
(since that write is presumably stuck). This is not trivial and so
we don't do it in this first iteration (see cockroachdb#74799).

A tripped breaker deploys a background probe which sends a
`ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, the cluster setting defaults to zero (disabling the
entire mechanism) until some necessary follow-up items have been
addressed (see cockroachdb#74705). For example, the breaker-sensitive context
cancelation is a toy implementation that comes with too much of a
performance overhead (one extra goroutine per request); cockroachdb#74707 will
address that.  Other things not done here that we certainly want in the
22.1 release are UI work (cockroachdb#74713) and metrics (cockroachdb#74505).

The release note is deferred to cockroachdb#74705 (where breakers are turned on).

Release note: None

touchie
tbg added a commit to tbg/cockroach that referenced this pull request Jan 13, 2022
Fixes cockroachdb#33007.
Closes cockroachdb#61311.

This PR uses the circuit breaker package introduced in cockroachdb#73641 to address
issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an `AmbiguousResultError`, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within the timeout specified by the
`kv.replica_circuit_breaker.slow_replication_threshold` cluster setting,
the breaker is tripped, cancelling all inflight and future requests
until the breaker heals. Perhaps surprisingly, the existing "slowness"
detection (the informational "have been waiting ... for proposal"
warning in `executeWriteBatch`) was moved deeper into replication
(`refreshProposalsLocked`), where it now trips the breaker.  This has
the advantage of providing a unified path for lease requests (which
don't go through `executeWriteBatch`) and pipelined writes (which return
before waiting on the inflight replication process). To make this work,
we need to pick a little fight with how leases are (not) refreshed
(cockroachdb#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

Perhaps surprisingly, when the breaker is tripped, *all* traffic to
the Replica gets the fail-fast behavior, not just mutations. This is
because even though a read may look good to serve based on the lease,
we would also need to check for latches, and in particular we would
need to fail-fast if there is a transitive dependency on any write
(since that write is presumably stuck). This is not trivial and so
we don't do it in this first iteration (see cockroachdb#74799).

A tripped breaker deploys a background probe which sends a
`ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, the cluster setting defaults to zero (disabling the
entire mechanism) until some necessary follow-up items have been
addressed (see cockroachdb#74705). For example, the breaker-sensitive context
cancelation is a toy implementation that comes with too much of a
performance overhead (one extra goroutine per request); cockroachdb#74707 will
address that.  Other things not done here that we certainly want in the
22.1 release are UI work (cockroachdb#74713) and metrics (cockroachdb#74505).

The release note is deferred to cockroachdb#74705 (where breakers are turned on).

Release note: None

touchie
craig bot pushed a commit that referenced this pull request Jan 14, 2022
71806: kvserver: circuit-break requests to unavailable ranges r=erikgrinaker a=tbg

Fixes #33007.
Closes #61311.

This PR uses the circuit breaker package introduced in #73641 to address
issue #33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an `AmbiguousResultError`, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within `base.SlowRequestThreshold` (15s at time of
writing), the breaker is tripped, cancelling all inflight and future
requests until the breaker heals. Perhaps surprisingly, the existing
"slowness" detection (the informational "have been waiting ... for
proposal" warning in `executeWriteBatch`) was moved deeper into
replication (`refreshProposalsLocked`), where it now trips the breaker.
This has the advantage of providing a unified path for lease requests
(which don't go through `executeWriteBatch`) and pipelined writes (which
return before waiting on the inflight replication process). To make this
work, we need to pick a little fight with how leases are (not) refreshed
(#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

A tripped breaker deploys a background probe which sends a
`ProbeRequest` (introduced in #72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, all of this is opt-in via the
`kv.replica_circuit_breakers.enabled` cluster setting while we determine
which of the many follow-up items to target for 22.1 (see #74705).  For
example, the breaker-sensitive context cancelation is a toy
implementation that comes with too much of a performance overhead (one
extra goroutine per request); #74707 will address that.

Other things not done here that we certainly want in the 22.1 release
are UI work (#74713) and metrics (#74505).

The release note is deferred to #74705 (where breakers are turned on).

Release note: None


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants