Skip to content

Refactor Consensus Protocol#48

Merged
zonotope merged 35 commits intomainfrom
refactor/consensus-protocol
Apr 12, 2024
Merged

Refactor Consensus Protocol#48
zonotope merged 35 commits intomainfrom
refactor/consensus-protocol

Conversation

@zonotope
Copy link
Contributor

@zonotope zonotope commented Apr 1, 2024

This patch among other things is a fix for the raft command queue being overwhelmed during a large reindexing process because of a lack of back pressure. The changes adding back pressure when forwarding new index file notifications to the consensus group are all contained in this commit

In the process of identifying changes we could make to the consensus api I found that, even though we had a TxGroup protocol for consensus, we had raft-specific implementation details throughout the code on either side of the protocol, so the rest of these changes are meant to isolate the raft implementation and tighten up the consensus protocol to make it more modular, extensible, and easier to add new consensus types in the future. I have refactored the TxGroup protocol to only contain the methods we need for general consensus, moved all raft-specific implementation details to the fluree.server.consensus.raft* namespace hierarchy, and removed unused code.

@zonotope zonotope requested a review from a team April 1, 2024 15:17
@zonotope zonotope self-assigned this Apr 1, 2024
Base automatically changed from feature/sync-sid-calculation to main April 2, 2024 02:59
@cap10morgan
Copy link
Contributor

@zonotope Looks like a potentially legit test failure? I think the expectation just needs to be updated (or the test itself to have something reasonable to expect now) due to the changes brought in by the db version bump.

@zonotope
Copy link
Contributor Author

zonotope commented Apr 8, 2024

@zonotope Looks like a potentially legit test failure? I think the expectation just needs to be updated (or the test itself to have something reasonable to expect now) due to the changes brought in by the db version bump.

yep, a legit test failure brought on by the db version bump. I'm pretty sure it came from fluree/db#744. I'm still thinking about the best way to fix it though, and the answer might be more philosophical than technical. if you query for an iri specifically, but we have no facts about it, should we give you nothing or an id map with that iri and nothing else? I could make the argument either way.

@zonotope
Copy link
Contributor Author

zonotope commented Apr 9, 2024

@zonotope Looks like a potentially legit test failure? I think the expectation just needs to be updated (or the test itself to have something reasonable to expect now) due to the changes brought in by the db version bump.

yep, a legit test failure brought on by the db version bump. I'm pretty sure it came from fluree/db#744. I'm still thinking about the best way to fix it though, and the answer might be more philosophical than technical. if you query for an iri specifically, but we have no facts about it, should we give you nothing or an id map with that iri and nothing else? I could make the argument either way.

I went with the id map with nothing else instead of empty results. The subject exists by virtue of the user specifying the iri; we just don't know anything about it. The id map fits most closely with that I think.

Copy link
Contributor

@cap10morgan cap10morgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, suggestions, and questions to consider. Once you're happy I'm happy. Overall, looks like a great improvement!

(ns fluree.server.consensus
"To allow for pluggable consensus, we have a TxGroup protocol. In order to allow
for a new consensus type, we need to create a record with all of the following
methods. Currently, we support a Raft and Solo.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/we support a Raft/we support Raft/ ?

(queue-new-ledger [group ledger-id tx-id txn opts]
(raft-queue-new-ledger group ledger-id tx-id txn opts))
(queue-new-transaction [group ledger-id tx-id txn opts]
(raft-queue-new-transaction group ledger-id tx-id txn opts)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask if it made sense to drop the raft- prefixes in these fn names, but I see that it serves to differentiate them from the protocol method names here. Perhaps something like queue-new-ledger* would be better? The "this is a piece of a larger whole" implication maybe makes sense for protocol method implementation fns? I like that it would make me at least go look for that and be aware of it, but could still invoke these fns directly when / if it made sense to do so.


(defn build-snapshot-config
"Returns a map of the necessary configurations for snapshot reading/writing, etc.
used automatically by the raft system to handle all snapshot activities automaticallly."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize the 'u' in "used"? And there are two "automatically"'s here, probably just need one (and the second one is misspelled).

io-file/canonicalize-path)
ledger-directory* (-> (or ledger-directory
(str "./data/" (name this-server) "/ledger/"))
io-file/canonicalize-path)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace the ->'s with the canonicalize-path calls?

This is used in the consensus handler to issue additional raft commands after doing
some work.

Returns a promise channel, which could contain an exception so be sure to check!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the word channel now? Looks like it just returns a regular ol' promise.

(new-command! config command params timeout-ms callback)
p)))

(defn leader-new-command!-async
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a docstring here

callback (fn [resp]
(async/put! ch resp
(fn [_]
(async/close! ch))))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use a promise-chan for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I think that should work

(case (:event next-file-event)
:new-index-file (push-index-file config next-file-event)
:new-commit (consensus-push-index-commit config (:data next-file-event)))
:new-index-file (<! (push-index-file config next-file-event))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💙

to wait for the operation to complete through consensus before providing the
response back to the API requester. These operations get delivered via
consensus, where they end up being picked up by a responsible server from a
queue, processed, and then the results broadcast back out."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like ns docstrings and I think we should use them more often, but this one kind of feels like it's missing the part about what this ns does. I think it's good to keep what's here as an example use case, but would love to have it start with a short explanation of what this ns is for in the abstract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. This was a raw comment at the top of the ns and I just copied it verbatim into the ns docstring instead. I'll try to add a description of what this ns does as well.

(defn close
[watcher]
(let [watcher-atom (:watcher-atom watcher)
watches (vals @watcher-atom)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let form alignment

@zonotope zonotope merged commit 857cd0a into main Apr 12, 2024
@zonotope zonotope deleted the refactor/consensus-protocol branch April 12, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants