Conversation
|
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions. docs/RFCS/leader_evaluated_raft.md, line 95 [r1] (raw file): My initial reaction on seeing this list was that this is exactly the stuff that we may need to change and we haven't gained much if it's just a way to make changes to the relatively simple commands that aren't on this list. But after looking at the list of migrations considered in #5985 it does seem to cover most of them. And even for these commands the bulk of the work could still be done with the mvcc batch. For instance, only the call to docs/RFCS/leader_evaluated_raft.md, line 96 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 178 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 208 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 232 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 238 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 251 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 254 [r1] (raw file):
docs/RFCS/leader_evaluated_raft.md, line 256 [r1] (raw file): Splits will also be large if the abort cache is copied into KV Write pairs. docs/RFCS/leader_evaluated_raft.md, line 260 [r1] (raw file): Comments from Reviewable |
fca7031 to
41ce979
Compare
|
Added a second commit to address @bdarnell's initial round of comments. Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, some commit checks pending. docs/RFCS/leader_evaluated_raft.md, line 95 [r1] (raw file):
Yes, exactly. I added more detail on the individual triggers in the mock proto below. docs/RFCS/leader_evaluated_raft.md, line 96 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 178 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 208 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 232 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 251 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 254 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 256 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 260 [r1] (raw file): Comments from Reviewable |
41ce979 to
ff7bb96
Compare
|
Review status: 0 of 1 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful. docs/RFCS/leader_evaluated_raft.md, line 178 [r1] (raw file): Underneath the hood, a RocksDB WriteBatch stores all of the writes in a C++ I see two reasonable options here. One is that we simply pass around the internal WriteBatch buffer. RocksDB already has an interface for retrieving this internal data and adding an interface to create a WriteBatch from such a buffer would be straightforward. The second option would be to iterate over the WriteBatch mutations and build up our own set of mutations to pass over the wire. This would be slightly less efficient, but would divorce us from RocksDB. My preference is for the first option (using the internal docs/RFCS/leader_evaluated_raft.md, line 361 [r4] (raw file): Underneath the hood, a RocksDB WriteBatch stores all of the writes in a C++ I see two reasonable options here. One is that we simply pass around the internal WriteBatch buffer. RocksDB already has an interface for retrieving this internal data ( The second option would be to iterate over the WriteBatch mutations and build up our own set of mutations to pass over the wire. This would be slightly less efficient, but would divorce us from RocksDB. My preference is for the first option (using the internal Comments from Reviewable |
|
Review status: 0 of 1 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful. docs/RFCS/leader_evaluated_raft.md, line 361 [r4] (raw file): Comments from Reviewable |
|
Review status: 0 of 1 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful. docs/RFCS/leader_evaluated_raft.md, line 178 [r1] (raw file): Comments from Reviewable |
|
Reviewed 1 of 1 files at r4. docs/RFCS/leader_evaluated_raft.md, line 280 [r4] (raw file): Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. docs/RFCS/leader_evaluated_raft.md, line 254 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 361 [r4] (raw file): Comments from Reviewable |
ff7bb96 to
460f9c2
Compare
|
Updated, PTAL. I feel that this PR has split up relatively well. We have (depending on how you count) 7 well-contained prerequisites which could in theory all be worked on in parallel (the one needing most thought likely being the Freeze/Unfreeze part, which is queued first) and which would (mostly) be useful independently. The remaining "forklift" portion which actually moves the bulk processing pre-Raft is still going to take some effort, but it does appear to me that the bulk of the work is in those prerequisites. Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions. docs/RFCS/leader_evaluated_raft.md, line 254 [r1] (raw file): docs/RFCS/leader_evaluated_raft.md, line 280 [r4] (raw file): docs/RFCS/leader_evaluated_raft.md, line 361 [r4] (raw file): For reference, LevelDB has the following: Doesn't seem unlikely that I updated the RFC to reflect that this is the suggested implementation. Comments from Reviewable |
|
Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions. docs/RFCS/leader_evaluated_raft.md, line 361 [r4] (raw file): Comments from Reviewable |
|
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions. docs/RFCS/leader_evaluated_raft.md, line 219 [r7] (raw file): The first number ( Comments from Reviewable |
460f9c2 to
0f1027c
Compare
|
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending. docs/RFCS/leader_evaluated_raft.md, line 361 [r4] (raw file): docs/RFCS/leader_evaluated_raft.md, line 219 [r7] (raw file): Comments from Reviewable |
0f1027c to
65d8a3e
Compare
|
Review status: 0 of 1 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. docs/RFCS/leader_evaluated_raft.md, line 361 [r4] (raw file): docs/RFCS/leader_evaluated_raft.md, line 219 [r7] (raw file): Comments from Reviewable |
|
Seems that there's a good consensus so far - @spencerkimball and anyone else interested, I'll keep this open until Monday night ET (and longer if there's discussion). PTAL. Once merged, I'll create a roadmap (issues for each points outlined in the implementation strategy below and the remainder (and perhaps a label)), but we should also determine how long we're hoping to get this done and who will work on it. Review status: 0 of 1 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending. docs/RFCS/leader_evaluated_raft.md, line 361 [r4] (raw file): docs/RFCS/leader_evaluated_raft.md, line 219 [r7] (raw file): Comments from Reviewable |
|
Review status: 0 of 1 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. docs/RFCS/leader_evaluated_raft.md, line 219 [r7] (raw file): Comments from Reviewable |
|
Is the presumption here that an upgrade/migration of a node must be performed while a transaction is in progress? I mean, why not use the following model:
BTW, with Cassandra, if the application is reading and writing with quorum consistency, any single node could be brought down without breaking quorum, assuring absolute non-stop operation. The node would then catch up due to hinted hand-off, provided that the down time was small, under an hour or whatever the handoff timeout grace period is. It would be nice if there was an application return code indicating that the server is in maintenance mode and that the application should hold off new transactions for some interval of time, rather than the application simply getting some error and mindlessly retrying the operation or assuming that the server is in a bad state rather than simply being maintained. |
|
No, this happens below the level of transactions. The primary source of complication is that we keep byte-wise consistent replicas (a problem Cassandra does not have) and that the post-raft evaluation of commands makes the bulk of code live downstream of the point at which byte-wise output must be identical. Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. Comments from Reviewable |
This is a hastily written, initial draft released early to get early feedback for more refined iterations. [ci skip]
Misc edits and flesh out - remaining post-Raft code - replay protection - migration strategy - drawbacks - unresolved questions
Flesh out - use of RocksDB WriteBatch for zero-copy serialization - concretize implementation strategy - updated discussion on removal of write-write blocking [ci skip]
ca7f5e8 to
625193e
Compare
|
|
||
| * any key-value writes: For example the outcome of a `Put`, `DeleteRange`, `CPut`, `BeginTransaction`, or `LeaderLease`. | ||
| Some alternatives exist: writes could be encoded as `[]MVCCKeyValue`, or even | ||
| as post-MVCC raw (i.e. opaque) key-value pairs. We'll assume the former for |
There was a problem hiding this comment.
Why assume []MVCCKeyValue for the "effects" contents? Then we're tying up potential for migration in how we encode MVCC information. I think it's more prudent to send "post-MVCC raw" key-value pairs.
|
Any follow-up discussions are best held in the appropriate subissue: https://github.com/cockroachdb/cockroach/labels/leader-proposed-raft Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful. docs/RFCS/leader_evaluated_raft.md, line 71 [r17] (raw file): docs/RFCS/leader_evaluated_raft.md, line 98 [r17] (raw file): docs/RFCS/leader_evaluated_raft.md, line 175 [r17] (raw file): docs/RFCS/leader_evaluated_raft.md, line 177 [r17] (raw file): docs/RFCS/leader_evaluated_raft.md, line 274 [r17] (raw file): Comments from Reviewable |
|
Review status: 0 of 1 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. docs/RFCS/leader_evaluated_raft.md, line 35 [r17] (raw file): Comments from Reviewable |
Remove the `storagebase.RaftCommand` proto by moving its fields onto `ReplicatedProposalData` while preserving the tag numbers. Use that message instead of `RaftCommand` throughout, including on the wire (since the tag numbers were preserved, this does not require any special handling). This in preparation for a follow-up change which adds an experimental switch to use proposer-evaluated KV (cockroachdb#6166).
Remove the `storagebase.RaftCommand` proto by moving its fields onto `ReplicatedProposalData` while preserving the tag numbers. Use that message instead of `RaftCommand` throughout, including on the wire (since the tag numbers were preserved, this does not require any special handling). This in preparation for a follow-up change which adds an experimental switch to use proposer-evaluated KV (cockroachdb#6166).
Remove the `storagebase.RaftCommand` proto by moving its fields onto `ReplicatedProposalData` while preserving the tag numbers. Use that message instead of `RaftCommand` throughout, including on the wire (since the tag numbers were preserved, this does not require any special handling). This in preparation for a follow-up change which adds an experimental switch to use proposer-evaluated KV (cockroachdb#6166).
Remove the `storagebase.RaftCommand` proto by moving its fields onto `ReplicatedProposalData` while preserving the tag numbers. Use that message instead of `RaftCommand` throughout, including on the wire (since the tag numbers were preserved, this does not require any special handling). This in preparation for a follow-up change which adds an experimental switch to use proposer-evaluated KV (cockroachdb#6166).
Read the latest version here.
This change is