Skip to content

Merge upstream#2

Merged
alesapin merged 43 commits intoalesapin:masterfrom
eBay:master
Jan 22, 2022
Merged

Merge upstream#2
alesapin merged 43 commits intoalesapin:masterfrom
eBay:master

Conversation

@alesapin
Copy link
Copy Markdown
Owner

No description provided.

greensky00 and others added 30 commits April 7, 2021 22:20
* The destruction order of global static variables is non-deterministic.
* Make auto-forwarding serialized and parallel

* Added a new parameter `auto_forwarding_max_connections_`, and
auto-forwarding can be parallelized up to this number.

* Raft server is actively managing the RPC client so as to avoid
the race on the same connection.

* [Update PR] Unlock before calling `send()`
…separate_function (#195)

Split server startup into separate methods

(cherry picked from commit c35819f)
* We should not include internal header to public.
* Need to have enough communication before checking commit.
* If we use `unordered_map`, its traversal order won't be deterministic
and depending on platforms the test executable is running. We should
make the test deterministic.
* Should use lock for list manipulation.

* Added explicit shutdown for things related to auto-forwarding.
* The new API returns the last log index and response time.
* Disable reconnection back-off timer for deterministic tests

* For deterministic tests based on mock network layer (raft_server_test
and failure_test), we should disable back-off timer as it makes test
depend on the timing.

* [Update PR] Fix to testing issue
* This option is used for avoiding a server being a leader, when
the server's state machine is lagging behind the last committed index.
* Retry snapshot read on the next heartbeat

* If reading snapshot fails, it will be re-attempted upon the next
heartbeat, with the latest (newer) snapshot.

* [Update PR] Log level and missing return
* If the commit thread executes the state machine operation before the
user thread (in the middle of `append_entries`) installs the commit
element, the user's callback function can be missed.

* Commit thread should make sure that if order inversion is detected,
it should let user thread invoke the callback, by generating the
corresponding commit element initialized with the state machine result.
* Close snapshot context on timeout

* When snapshot receiver (follower) is not responding long time, the
snapshot and its user context should be closed after configured
timeout.

* The same timeout should be applied to below cases:
  - When the snapshot receiver is removed from the group.
  - When a new joining server is not responding.

* [Update PR] Refactoring
* Asio disconnection event is not immediately fired under code
coverage mode.
* Add snapshot IO manager

* Reading a snapshot object is an expensive operation, but currently
it is executed in Raft threads synchronously, which has bad impact on
the overall latency of the leader.

* Added an experimental option to execute snapshot IO in background
asynchronously. It is managed by the newly introduced global snapshot
IO manager.

* [Update PR] Fix race condition

* While the request in the queue is being processed, new request for
the same peer shouldn't be enqueued.

* [Update PR] Fix compiler error on Mac

* [Update PR] Update functional test for both sync/async IO
* Replaying config log shouldn't result in reverting committed
configurations.
* The config received during join request should be durable via the
state manager.

* During the replaying of logs, the newer config should not be
overwritten by the older config.
* Add new APIs to pause/resume state machine execution

* The state machine execution is done in background asynchronously.
We need an API to control the execution flow for various purposes.

* [Update PR] Log and test
* `buffer_serializer` should include `string`.

* Modified README.md to fetch the correct Asio version.
* Add conf condition at the beginning of snapshot_and_compact.

* Update handle_commit.cxx

Co-authored-by: zhangxiao871 <zhangxiao871@ZBMAC-C02DN6312.local>
Co-authored-by: Li, Yong <yoli@ebay.com>
Co-authored-by: Jung-Sang Ahn <jungsang.ahn@gmail.com>
* `get_next_log_idx` may not be accurate if we want to get the last
log successfully sent to the peer.
* If we use matched index, it can be reset to 0 on reconnection so
that the peer can be treated as "stale" incorrectly.
* If this mode is on, and all members in the cluster are healthy, the
leader will commit the incoming request only after getting the consensus
from all members. It guarantees that all member have the data at the
moment that the log is committed.

* If any unhealthy (not responding) member exists, the regular quorum
based consensus will be used.
* Even though a few unhealthy members exist, if the number of healthy
members is greater than the regular quorum size, we will pursue the
full consensus among them.
* Since `request_vote()` calls `save_state()`, `initiate_vote()`
doesn't need to call `save_state()` right before `request_vote().`
* User can adjust commit index based on the list of peers' log indexes.
* When a new member is added, its election timer is disabled until it
fully catches up with the leader. Suppose somehow the leader has a
problem so that a leader election is initiated. In that case, the
new member may refuse the vote request if the candidate's priority is
lower than its priority. Since the election timer is disabled, there
is no way to decrease the target priority of the new member;
consequently, the leader election will not succeed forever.

* To avoid such a situation, the new member in catch-up mode should
ignore priority for the vote.
* With this API,

  - User can set an expected term. `append_entries` will succeed only
    if the current server's term matches, behaves similar to
    compare-and-swap operation.

  - User can set a callback function to know the log index and term
    of the log that is just appended.
* If pre-vote is rejected and the server starts receiving heartbeat
again, the pre-vote rejection counter should be reset to 0. But
currently `become_follower` will not be called as role of the server
remains as `follower`.
@alesapin alesapin merged commit 6f26688 into alesapin:master Jan 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants