kv: clarify the Sender contract and pre-allocate commit batches#30485
kv: clarify the Sender contract and pre-allocate commit batches#30485craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
|
Tobi, putting this up on the flag pole to see if the cat licks it up. |
tbg
left a comment
There was a problem hiding this comment.
I'm with you in spirit, and I'm glad the time has come where such reuse is actually important for performance, but I'm afraid that turning it off and going wild-west on the whole thing is going to cost us dearly in obscure bugs. Let's figure out what the contracts should be (to make the optimizations most powerful) and then let's think about the best way to enforce them (I hope that in the process we can get rid of this transport, which is quite expensive and slows down race a bunch).
The race transport doesn't only exist because we might be manipulating data on the server (in the local case). It also exists because grpc might be reading the data even after the call came back (or so we assume). In the spirit of there are no benign data races, it follows that unless you can prove that what you gave to gRPC is no longer going to be accessed by it, you don't want to mutate it. I haven't looked at what grpc does with the passed-in message exactly. Ideally it would encode it before it does anything async and then we're home free and can forget about the race transport for that purpose.
Then what's left is finding better ways to enforce the guarantees we want in our code.
For example,
Server side code is currently not allowed to mutate objects referenced by a Batch.
Can (mostly) be caught by hashing the batch on the way in and on the way out and comparing at the (Node).Batch endpoint (unless the requests gets passed elsewhere out-of-band, but that doesn't happen often and we can audit it). But I agree that we're probably fairly good here, though that too may change as we move objects between the heap and the stack (and so what was a shallow copy cedes to be one).
Ideally I think we get to a place where the incoming request (incl. the Txn proto) is treated immutable at the node (i.e. below DistSender) and where we feel comfortable that it's not passed around across goroutine boundaries anywhere in DistSender or above (and so can be pooled and reused aggressively).
Reviewable status:
complete! 0 of 0 LGTMs obtained
1eb4e63 to
c876f8e
Compare
andreimatei
left a comment
There was a problem hiding this comment.
I looked into grpc... At first it looked pretty convincing that there no funny business and once the thing returns from an Invoke() call, there's no referencing a request in the background or anything. But then I realized that I was looking at an older version. In the current version it's a bit harder to assert anything; it's not the most traceable code. But I didn't see any clues that there might be monkey business, and thinking about it I'm confident enough that it's all fine - like, it order for it to not be fine, the code would really have to try explicitly to screw you.
I've added better comments as per our convo to the Sender iface. PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained
bdarnell
left a comment
There was a problem hiding this comment.
I think it's reasonable to assume that grpc won't touch the input proto once the client method has returned. But I'm not sure about making that same assumption about DistSender. Couldn't tasks started by sendPartialBatchAsync still be running when DistSender.Send returns?
Reviewable status:
complete! 0 of 0 LGTMs obtained
|
Ben, I think your concern is valid. I hadn't seen this on prior inspection but, upon more looking, I think the following early return from the cockroach/pkg/kv/dist_sender.go Line 811 in 8886bf1 Looks like we return without waiting for all the sub-batches when we error to "combine" two responses. How legitimate this errors are, I don't know. I'll see if I can get rid of them, or otherwise make the So I guess in the meantime I'll take the relevant commit out of #30448. |
|
Errors from |
81e441b to
e2c0dda
Compare
|
I got rid of those combine errors. |
6c6c382 to
d5b3b2e
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 5 files at r1, 1 of 3 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/internal/client/txn.go, line 41 at r3 (raw file):
typ TxnType // alloc holds pre-allocated fields that can be used to avoid heap allocations
Nice! The only thing I'll point out is that this kind of optimization is even more impactful when we start pooling the Txn object itself. Let's leave a TODO to do that, along with the TxnCoordSender. They're both huge objects that would benefit a lot from being pooled. Pooling them would also force us to clean up their lifetimes, which is a good idea on its own.
pkg/internal/client/txn.go, line 44 at r3 (raw file):
// for batches with lifetimes tied to a Txn. alloc struct { requests [4]roachpb.RequestUnion
We're only ever using a single element out of this, so let's give this a length of 1.
pkg/roachpb/api.go, line 246 at r1 (raw file):
} if rh.ResumeSpan != nil { log.Fatalf(ctx, "combining %+v with %+v", rh.ResumeSpan, otherRH.ResumeSpan)
I don't know about this. We shouldn't fatal if we get something we don't like back from a different node in the cluster. Why don't we just change the defer in divideAndSendBatchToRange to treat the error like any other error? It can return it later without immediately breaking from the loop.
bdarnell
left a comment
There was a problem hiding this comment.
Reviewed 4 of 5 files at r1, 3 of 3 files at r2, 1 of 1 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/roachpb/api.go, line 246 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't know about this. We shouldn't fatal if we get something we don't like back from a different node in the cluster. Why don't we just change the defer in
divideAndSendBatchToRangeto treat the error like any other error? It can return it later without immediately breaking from the loop.
I agree. Changing a panic to a fatal here doesn't hurt anything, but it's moving in the wrong direction to force us to fatal instead of returning an error. It would be better to keep the error return from combine() and just delay the return until everything is done.
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/roachpb/api.go, line 246 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I agree. Changing a panic to a fatal here doesn't hurt anything, but it's moving in the wrong direction to force us to fatal instead of returning an error. It would be better to keep the error return from
combine()and just delay the return until everything is done.
Before I change everything back, I'd like to litigate this more.
Ben, the position you've expressed when we were talking before seems inconsistent to me: you said you're fine with assertions about the behavior of a single process, but not so much about what is received from a different node. And you said that this policy preference is not necessarily related to different node version mismatches. But generally it seems to me like the only sane thing to do is have the same policy across nodes that we have within a single process. Like, we make a million assumptions about what another node will return; we can't accept any arbitrary data and trying to tolerate programming errors on another node that we don't tolerate locally... I don't see why we'd do that. If a remote node says this is the data at this key, we take it at face value. We don't assert that it returned data from the key we requested, and such.
What gives?
|
I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an I think it's worth a little extra effort to keep the error return from |
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/roachpb/api.go, line 246 at r1 (raw file):
Pasting Ben's response here:
I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an error return I wouldn't be advocating for adding one at this point, but since it's there removing it seems like a step in the wrong direction.
I think it's worth a little extra effort to keep the error return from combine() and modify the loop to wait for all results regardless of error. But it's not worth a lot of extra effort. I was also mainly reacting to Nathan's comment without looking too closely at the particular panic here. If combining ResumeSpans and Errors are the only way combine can fail, then I'm not too worried about the panics.
Thanks. I'd still be inclined to debate what exactly you're thinking when you say "crash that can be triggered by data received over the network is a bug" - like, if we try to read the value of a sequence and we get back a string, would it be a "bug" if we crashed? Or an inconsistent lease? I don't think you want to argue that, so you probably want to refine your statement to only refer to some response "metadata". But even then, do you think that a more precise description of what you want to express can be phrased, and still be general enough that I can tattoo it on my chest?
We don't need to have a general discussion if you don't want to; we can discuss this specific.
- combining something with a
ResumeSpanwas a panic; now it's a fatal. I think this indicates a situation where the gateway split a batch that it shouldn't have. - combining a response that had an error was an error. Now it's a Fatal. This indicates a bug in the DistSender's logic that's supposed to strip these errors before combining anything.
- combining two "non-combinable" was an error. The patch makes it a fatal. "Non-combinable" means we were trying to combine a, say,
ScanResponsewith aPutResponse. Even before the patch, trying to combine aScanResponsewith aDelRangeResponse(where they're both "combinable") would be crash. This indicates a bug in how the gateway calculates the positions of the responses it combines or, I guess, a server-side bug where we rearrange the responses to individual constituents of a batch.
So, given these specific changes, do you object to the two new fatals introduced?
I do want to argue that. We should strive to return an error instead of panicking for things like this. Data-dependent crashes are nearly always bugs (sometimes, like at startup, it really doesn't make sense to continue if we can't read what's on the disk, but ideally even that would be handled by returning an error up the stack).
Not enough to keep arguing about it. |
|
+1 to @bdarnell's stance here. My general rule of thumb is that errors that can be scoped to a single request should be. |
... before returning from DistSender.Send(). It had an obscure error case when it wasn't waiting for all. This waiting starts being mandated by the next commits. Release note: None
This patch clarifies the transport's contract wrt ownership and lifetime of the request. In particular, it makes it clear that the client is allowed to mutate the request after the the Send() code completed. The race transport is then relaxed a bit to allow some specific mutations. Release note: None
d5b3b2e to
755d2be
Compare
When commiting, the client.Txn used to allocate a slice of requests (inside a BatchRequest), a RequestUnion, and an EndTransactionRequest. This shows up on profiles of a single node read-heavy workload. This patch moves to pre-allocating these inside a Txn. This works since there's no concurrent commits on a single txn. Release note: None
755d2be to
06c1adf
Compare
andreimatei
left a comment
There was a problem hiding this comment.
I've reintroduced the combine errors (and converted the panic we had to an error too).
PTAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/internal/client/txn.go, line 41 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Nice! The only thing I'll point out is that this kind of optimization is even more impactful when we start pooling the
Txnobject itself. Let's leave a TODO to do that, along with the TxnCoordSender. They're both huge objects that would benefit a lot from being pooled. Pooling them would also force us to clean up their lifetimes, which is a good idea on its own.
Done.
pkg/internal/client/txn.go, line 44 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're only ever using a single element out of this, so let's give this a length of 1.
Done.
pkg/roachpb/api.go, line 246 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Pasting Ben's response here:
I'm just coming from the general principle that any crash that can be triggered by data received over the network is a bug (not necessarily a high-priority bug, but a bug nonetheless). If the existing code didn't have an error return I wouldn't be advocating for adding one at this point, but since it's there removing it seems like a step in the wrong direction.
I think it's worth a little extra effort to keep the error return from combine() and modify the loop to wait for all results regardless of error. But it's not worth a lot of extra effort. I was also mainly reacting to Nathan's comment without looking too closely at the particular panic here. If combining ResumeSpans and Errors are the only way combine can fail, then I'm not too worried about the panics.
Thanks. I'd still be inclined to debate what exactly you're thinking when you say "crash that can be triggered by data received over the network is a bug" - like, if we try to read the value of a sequence and we get back a string, would it be a "bug" if we crashed? Or an inconsistent lease? I don't think you want to argue that, so you probably want to refine your statement to only refer to some response "metadata". But even then, do you think that a more precise description of what you want to express can be phrased, and still be general enough that I can tattoo it on my chest?
We don't need to have a general discussion if you don't want to; we can discuss this specific.
- combining something with a
ResumeSpanwas a panic; now it's a fatal. I think this indicates a situation where the gateway split a batch that it shouldn't have.- combining a response that had an error was an error. Now it's a Fatal. This indicates a bug in the DistSender's logic that's supposed to strip these errors before combining anything.
- combining two "non-combinable" was an error. The patch makes it a fatal. "Non-combinable" means we were trying to combine a, say,
ScanResponsewith aPutResponse. Even before the patch, trying to combine aScanResponsewith aDelRangeResponse(where they're both "combinable") would be crash. This indicates a bug in how the gateway calculates the positions of the responses it combines or, I guess, a server-side bug where we rearrange the responses to individual constituents of a batch.So, given these specific changes, do you object to the two new fatals introduced?
This conversation had moved to the top level PR comments.
bdarnell
left a comment
There was a problem hiding this comment.
Reviewed 2 of 8 files at r4, 3 of 3 files at r5, 1 of 1 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained
nvb
left a comment
There was a problem hiding this comment.
Reviewed 8 of 8 files at r4, 2 of 3 files at r5, 1 of 1 files at r6.
Reviewable status:complete! 1 of 0 LGTMs obtained
Timed out |
|
bors r+
…On Thu, Sep 27, 2018 at 5:34 PM craig[bot] ***@***.***> wrote:
Timed out
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#30485 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXBcVVK_NrX7X1d66Aj7oCN9nH6N7Zsks5ufUR9gaJpZM4WzOEm>
.
|
Build succeeded |
This reverts commit 06c1adf. Revert the last commit from cockroachdb#30485 - the one that pre-allocation and possible reuse of EndTransaction batches. It's causing races. I'll figure it out and re-send the original patch. Touches cockroachdb#30769 Release note: None
30773: Revert "client: don't allocate some commit batches" r=andreimatei a=andreimatei This reverts commit 06c1adf. Revert the last commit from #30485 - the one that pre-allocation and possible reuse of EndTransaction batches. It's causing races. I'll figure it out and re-send the original patch. Touches #30769 Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
See individual commits.
Release note: None