Skip to content

technote: add a technote about CRDB retries#18575

Open
andreimatei wants to merge 1 commit intocockroachdb:masterfrom
andreimatei:technote-export
Open

technote: add a technote about CRDB retries#18575
andreimatei wants to merge 1 commit intocockroachdb:masterfrom
andreimatei:technote-export

Conversation

@andreimatei
Copy link
Copy Markdown
Contributor

No description provided.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor Author

Only the Raft level sections are ready to be read.

You can see the rendered doc at:
https://rawgit.com/andreimatei/cockroach/technote-export/docs/tech-notes/Retries-in-CRDB.html

(We'll figure out the publishing details later).

@petermattis
Copy link
Copy Markdown
Collaborator

This is incredibly detailed. Wow!

@bdarnell
Copy link
Copy Markdown
Contributor

Github has its own rendered view.

We shouldn't introduce a new documentation format (RST) and a new rendering pipeline (whatever undocumented script was used here) casually; please stick with markdown.

@andreimatei
Copy link
Copy Markdown
Contributor Author

andreimatei commented Sep 19, 2017

The github renderer doesn't support admonitions (notes and sidebars) - they told me that my feedback is in good hands. I find those very useful.
Another cool thing we can do if we control the process is embed a Discus comments section.

@knz
Copy link
Copy Markdown
Contributor

knz commented Sep 19, 2017

the markdown ship will get shored up soon: cockroachdb/docs#1716

@tbg
Copy link
Copy Markdown
Member

tbg commented Sep 19, 2017

(Though I agree with @bdarnell) I don't think it's worth the time converting this technote from .rst to .md because it doesn't seem to matter, but we shouldn't check in a generated html that nobody knows how to update, and we also shouldn't check in a random generator that we then have to maintain. Let's wait until cockroachdb/docs#1716 is solved and then use the same thing.

ps @andreimatei I prefer GitHub's rendered view over yours.

Other than that, we should probably limit this discussion to the contents of the tech note, and less its presentation.

@knz
Copy link
Copy Markdown
Contributor

knz commented Sep 25, 2017

  1. Regarding md vs rst:

    • don't check in the html
    • keep rst
    • place a note at the top of the document with a reminder that the GH view is incomplete, and instructions on how to regenerate the HTML locally (using the default rst stylesheet) if the reader wants more.
  2. Regarding the process of merging this tech note. I am ok with merging this without content for the missing sections; however I am not OK with dangling section titles. Make a new section "Future work / other considerations" and list the sections there that are in want of more information with a small bullet list of what you'd like to see in them.

@bdarnell
Copy link
Copy Markdown
Contributor

bdarnell commented Oct 5, 2017

Aside from the format debate, the content generally LGTM


Review status: 0 of 3 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


docs/tech-notes/Retries-in-CRDB.rst, line 93 at r1 (raw file):

retries are of two types: reproposals and re-evaluations. Both of them are
triggered by Raft processing; they're triggered by
```replica.handleRaftReady()``. Reproposals are lower-level operation - they're

Too many backticks here and on the next line.


docs/tech-notes/Retries-in-CRDB.rst, line 378 at r1 (raw file):

  will not be allowed to apply.

.. note:: Reminder that reproposals never introduce any ambiguity: we only repropose when we know that the original proposal has not been applied yet.

When the lease holder is not the leader, it's possible that the leader has committed the original proposal but the lease holder hasn't heard yet, so reproposals still introduce potential ambiguity and rely on the LeaseAppliedIndex to prevent this.


docs/tech-notes/Retries-in-CRDB.rst, line 525 at r1 (raw file):

- Other special requests (leases, splits, etc.) rely on ad-hoc mechanism. Some
  of them are tied to an ``EndTransaction``, so they use its protections. The
  ``ProposeLease`` command, for example, relies on the current lease verification [#]_.

It's RequestLease, not ProposeLease.


docs/tech-notes/Retries-in-CRDB.rst, line 535 at r1 (raw file):

   on the second application. However, relying on this and not on the
   ``TimestampCache`` entry would have the disadvantage that the second
   application would leave dangling intents and a dangling transaction record.

I don't think that's true. Batches containing EndTransaction never leave intents because evalEndTransaction cleans up all intents on its range. And EndTransaction only writes to the transaction record if there is an existing record in the pending state. This would never be true if a previous instance of the command applied.


docs/tech-notes/Retries-in-CRDB.rst, line 611 at r1 (raw file):

.. _TODO_repropose_on_proposalIllegalLeaseIndex:

File issues for these.


docs/tech-notes/Retries-in-CRDB.rst, line 631 at r1 (raw file):

**TODO:** ``AmbiguousResultErrors`` are never transformed to txn
retry errors, as far as I can see. They’re converted in SQL to errors

Correct. DistSender will avoid generating AmbiguousResultError when there is no commit, but when lower levels generate AmbiguousResultError it gets passed along as-is. This should probably be replaced with something that catches AmbiguousResultError and converts it to a TxnRestartError unless it affects a commit.


docs/tech-notes/Retries-in-CRDB.rst, line 645 at r1 (raw file):

be ambiguous**. We'd be able to compare the re-evaluation result with the result
of the original evaluation and, if they're not exactly the same, then we could
conclude that the original proposal was applied. This way we'd resolve the

The original proposal was applied, but its BatchResponse was lost. So I think we're going to need some sort of retry here anyway.


docs/tech-notes/Retries-in-CRDB.rst, line 648 at r1 (raw file):

direction of the ambiguity that's currently unresolved. There may be difficulty
with variance in some conditions that make evaluation non-deterministic (e.g.
leases moving around and the timestamp cache overflowing), but it seems that at

A command can't stay in the command queue across leases moving around, so that shouldn't be an issue.


docs/tech-notes/Retries-in-CRDB.rst, line 654 at r1 (raw file):

----------------------

.. warning:: EVERYTHING BELOW IS WIP, NOT READY FOR REVIEW

Stopping review here.


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Oct 19, 2017

Thanks for writing this up!


Reviewed 2 of 3 files at r1.
Review status: 2 of 3 files reviewed at latest revision, 20 unresolved discussions, all commit checks successful.


docs/tech-notes/Retries-in-CRDB.rst, line 21 at r1 (raw file):

These are all actions that the code does consciously; none of them are
introduced transparently by a network transport or such. Executing things twice,
or executing them at a later point in time then when we intended the execution -

s/then/than/


docs/tech-notes/Retries-in-CRDB.rst, line 76 at r1 (raw file):

``DistSender`` retries
  The ``DistSender`` sends *requests* to a remote node for *evaluation* (and the

Throughout this section, we use the term "node". I think replica is clearer in the context of DistSender because we're picking which remote node to send requests to based on which ones contain replicas of the desired range.


docs/tech-notes/Retries-in-CRDB.rst, line 82 at r1 (raw file):

  the request again, to a different node (it’s true that generally a single
  node -  the lease holder of the respective range - can service any given
  request, but it’s also true that a ``DistSender`` doesn’t generally know with

"generally"? Can it ever know with certainty?


docs/tech-notes/Retries-in-CRDB.rst, line 127 at r1 (raw file):

  instructions (e.g. “set value of key *k* to *v*\ ” or “delete key *k”).* These
  batches have been produced by the evaluation process starting from a given
  Rocksdb snapshot. At least given the way we use them, these batches can, as

nit: RocksDB


docs/tech-notes/Retries-in-CRDB.rst, line 131 at r1 (raw file):

  from the snapshot on top of which they were produced - no error will be
  generated and there’s no need to “merge changes”. So RocksDB does not protect
  us from overwriting values we didn’t intend to overwrite. Also keep in mind

Consider making a note that this protection isn't needed because of the latching provided by the CommandQueue. That would segway into the issue with removing from the CommandQueue
while a reproposal is in-flight.


docs/tech-notes/Retries-in-CRDB.rst, line 189 at r1 (raw file):

The second question that comes to mind is how exactly can command C run before
the second application of A/B, given that C was proposed later? The explanation
requires the Raft message flow sidebar.

Ah, that's why my understanding was lacking! Previous explanations were missing the "Raft message flow sidebar".

I'm sold on RST.


docs/tech-notes/Retries-in-CRDB.rst, line 209 at r1 (raw file):

Raft transport protocol, which is not currently
under CRDB control and does not make any ordering guarantees

Is this true? Aren't we using gRPC streams for this, which provide ordering guarantees? Is this related to the issue of multiple streams which may violate the ordering guarantee of a single stream?


docs/tech-notes/Retries-in-CRDB.rst, line 271 at r1 (raw file):

   the transport to obviate the need to propose commands twice would be
   reliable *exactly once delivery* semantics, which we're unlikely to get from
   a transport layer (except if we'd make the commands idempotent; see below).

That still doesn't get us exactly once delivery.


docs/tech-notes/Retries-in-CRDB.rst, line 287 at r1 (raw file):

- Other special commands carry other special side-effects (splits, merges,
  rebalances, leases, log truncation). Some of these are currently not
  idempotent.

There are also cases where the WriteBatch itself is not idempotent, such as when it includes MVCCMerge operations (which are used for time series data).


docs/tech-notes/Retries-in-CRDB.rst, line 321 at r1 (raw file):

  close in time to one another. That might save some Raft work at the cost of
  increased latency for some of the requests. This optimization would be
  possible with or without absolute values for the statistics.

This is an optimization I've been thinking about for some time. If we could batch together multiple WriteBatches before the actual Raft proposal, we could multiplex multiple requests into the same Raft command/log entry and presumably get a lot of the benefits of #15648 without some of the associated complications. I'm sure this would come with its own complications, such as the added latency you mentioned. I'm curious if @bdarnell has thought about this at all.


docs/tech-notes/Retries-in-CRDB.rst, line 494 at r1 (raw file):

succeeds. Why exactly is it that a successful evaluation proves that the command
did not previously apply? What's preventing the re-evaluation from succeeding
even if the command had previously apply? The answer depends on the type of the

s/apply/applied/


Comments from Reviewable

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jul 21, 2018

We should get this cleaned up and merged when we have time. Lots of good stuff in here!

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019
@bdarnell bdarnell removed their assignment Oct 30, 2019
@cockroach-teamcity
Copy link
Copy Markdown
Member

cockroach-teamcity commented Mar 10, 2020

CLA assistant check
All committers have signed the CLA.

@andreimatei andreimatei deleted the technote-export branch April 24, 2020 18:55
@andreimatei andreimatei restored the technote-export branch April 24, 2020 18:55
@andreimatei andreimatei deleted the technote-export branch April 24, 2020 18:59
@andreimatei andreimatei restored the technote-export branch April 24, 2020 19:00
@andreimatei andreimatei reopened this Apr 24, 2020
@tbg tbg removed their assignment Dec 20, 2022
@nvb nvb removed their assignment Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

X-noremind Bots won't notify about PRs with X-noremind

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants