technote: add a technote about CRDB retries#18575
technote: add a technote about CRDB retries#18575andreimatei wants to merge 1 commit intocockroachdb:masterfrom
Conversation
|
Only the Raft level sections are ready to be read. You can see the rendered doc at: (We'll figure out the publishing details later). |
4901b8e to
2d50722
Compare
|
This is incredibly detailed. Wow! |
|
Github has its own rendered view. We shouldn't introduce a new documentation format (RST) and a new rendering pipeline (whatever undocumented script was used here) casually; please stick with markdown. |
|
The github renderer doesn't support admonitions (notes and sidebars) - they told me that my feedback is in good hands. I find those very useful. |
|
the markdown ship will get shored up soon: cockroachdb/docs#1716 |
|
(Though I agree with @bdarnell) I don't think it's worth the time converting this technote from ps @andreimatei I prefer GitHub's rendered view over yours. Other than that, we should probably limit this discussion to the contents of the tech note, and less its presentation. |
|
|
Aside from the format debate, the content generally LGTM Review status: 0 of 3 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful. docs/tech-notes/Retries-in-CRDB.rst, line 93 at r1 (raw file):
Too many backticks here and on the next line. docs/tech-notes/Retries-in-CRDB.rst, line 378 at r1 (raw file):
When the lease holder is not the leader, it's possible that the leader has committed the original proposal but the lease holder hasn't heard yet, so reproposals still introduce potential ambiguity and rely on the LeaseAppliedIndex to prevent this. docs/tech-notes/Retries-in-CRDB.rst, line 525 at r1 (raw file):
It's RequestLease, not ProposeLease. docs/tech-notes/Retries-in-CRDB.rst, line 535 at r1 (raw file):
I don't think that's true. Batches containing EndTransaction never leave intents because evalEndTransaction cleans up all intents on its range. And EndTransaction only writes to the transaction record if there is an existing record in the pending state. This would never be true if a previous instance of the command applied. docs/tech-notes/Retries-in-CRDB.rst, line 611 at r1 (raw file):
File issues for these. docs/tech-notes/Retries-in-CRDB.rst, line 631 at r1 (raw file):
Correct. DistSender will avoid generating AmbiguousResultError when there is no commit, but when lower levels generate AmbiguousResultError it gets passed along as-is. This should probably be replaced with something that catches AmbiguousResultError and converts it to a TxnRestartError unless it affects a commit. docs/tech-notes/Retries-in-CRDB.rst, line 645 at r1 (raw file):
The original proposal was applied, but its BatchResponse was lost. So I think we're going to need some sort of retry here anyway. docs/tech-notes/Retries-in-CRDB.rst, line 648 at r1 (raw file):
A command can't stay in the command queue across leases moving around, so that shouldn't be an issue. docs/tech-notes/Retries-in-CRDB.rst, line 654 at r1 (raw file):
Stopping review here. Comments from Reviewable |
|
Thanks for writing this up! Reviewed 2 of 3 files at r1. docs/tech-notes/Retries-in-CRDB.rst, line 21 at r1 (raw file):
docs/tech-notes/Retries-in-CRDB.rst, line 76 at r1 (raw file):
Throughout this section, we use the term "node". I think replica is clearer in the context of docs/tech-notes/Retries-in-CRDB.rst, line 82 at r1 (raw file):
"generally"? Can it ever know with certainty? docs/tech-notes/Retries-in-CRDB.rst, line 127 at r1 (raw file):
nit: docs/tech-notes/Retries-in-CRDB.rst, line 131 at r1 (raw file):
Consider making a note that this protection isn't needed because of the latching provided by the docs/tech-notes/Retries-in-CRDB.rst, line 189 at r1 (raw file):
Ah, that's why my understanding was lacking! Previous explanations were missing the "Raft message flow sidebar". I'm sold on RST. docs/tech-notes/Retries-in-CRDB.rst, line 209 at r1 (raw file):
Is this true? Aren't we using gRPC streams for this, which provide ordering guarantees? Is this related to the issue of multiple streams which may violate the ordering guarantee of a single stream? docs/tech-notes/Retries-in-CRDB.rst, line 271 at r1 (raw file):
That still doesn't get us exactly once delivery. docs/tech-notes/Retries-in-CRDB.rst, line 287 at r1 (raw file):
There are also cases where the docs/tech-notes/Retries-in-CRDB.rst, line 321 at r1 (raw file):
This is an optimization I've been thinking about for some time. If we could batch together multiple docs/tech-notes/Retries-in-CRDB.rst, line 494 at r1 (raw file):
Comments from Reviewable |
|
We should get this cleaned up and merged when we have time. Lots of good stuff in here! |
No description provided.