Merge upstream by alesapin · Pull Request #2 · alesapin/NuRaft

alesapin · 2022-01-22T20:35:33Z

No description provided.

* The destruction order of global static variables is non-deterministic.

* Make auto-forwarding serialized and parallel * Added a new parameter `auto_forwarding_max_connections_`, and auto-forwarding can be parallelized up to this number. * Raft server is actively managing the RPC client so as to avoid the race on the same connection. * [Update PR] Unlock before calling `send()`

…separate_function (#195) Split server startup into separate methods (cherry picked from commit c35819f)

* We should not include internal header to public.

* Need to have enough communication before checking commit.

* If we use `unordered_map`, its traversal order won't be deterministic and depending on platforms the test executable is running. We should make the test deterministic.

* Should use lock for list manipulation. * Added explicit shutdown for things related to auto-forwarding.

* The new API returns the last log index and response time.

* Disable reconnection back-off timer for deterministic tests * For deterministic tests based on mock network layer (raft_server_test and failure_test), we should disable back-off timer as it makes test depend on the timing. * [Update PR] Fix to testing issue

* This option is used for avoiding a server being a leader, when the server's state machine is lagging behind the last committed index.

* Retry snapshot read on the next heartbeat * If reading snapshot fails, it will be re-attempted upon the next heartbeat, with the latest (newer) snapshot. * [Update PR] Log level and missing return

* If the commit thread executes the state machine operation before the user thread (in the middle of `append_entries`) installs the commit element, the user's callback function can be missed. * Commit thread should make sure that if order inversion is detected, it should let user thread invoke the callback, by generating the corresponding commit element initialized with the state machine result.

* Close snapshot context on timeout * When snapshot receiver (follower) is not responding long time, the snapshot and its user context should be closed after configured timeout. * The same timeout should be applied to below cases: - When the snapshot receiver is removed from the group. - When a new joining server is not responding. * [Update PR] Refactoring

* Asio disconnection event is not immediately fired under code coverage mode.

* Add snapshot IO manager * Reading a snapshot object is an expensive operation, but currently it is executed in Raft threads synchronously, which has bad impact on the overall latency of the leader. * Added an experimental option to execute snapshot IO in background asynchronously. It is managed by the newly introduced global snapshot IO manager. * [Update PR] Fix race condition * While the request in the queue is being processed, new request for the same peer shouldn't be enqueued. * [Update PR] Fix compiler error on Mac * [Update PR] Update functional test for both sync/async IO

* Replaying config log shouldn't result in reverting committed configurations.

* The config received during join request should be durable via the state manager. * During the replaying of logs, the newer config should not be overwritten by the older config.

* Add new APIs to pause/resume state machine execution * The state machine execution is done in background asynchronously. We need an API to control the execution flow for various purposes. * [Update PR] Log and test

* `buffer_serializer` should include `string`. * Modified README.md to fetch the correct Asio version.

* Add conf condition at the beginning of snapshot_and_compact. * Update handle_commit.cxx Co-authored-by: zhangxiao871 <zhangxiao871@ZBMAC-C02DN6312.local>

Co-authored-by: Li, Yong <yoli@ebay.com> Co-authored-by: Jung-Sang Ahn <jungsang.ahn@gmail.com>

* `get_next_log_idx` may not be accurate if we want to get the last log successfully sent to the peer.

* If we use matched index, it can be reset to 0 on reconnection so that the peer can be treated as "stale" incorrectly.

* If this mode is on, and all members in the cluster are healthy, the leader will commit the incoming request only after getting the consensus from all members. It guarantees that all member have the data at the moment that the log is committed. * If any unhealthy (not responding) member exists, the regular quorum based consensus will be used.

* Even though a few unhealthy members exist, if the number of healthy members is greater than the regular quorum size, we will pursue the full consensus among them.

* Since `request_vote()` calls `save_state()`, `initiate_vote()` doesn't need to call `save_state()` right before `request_vote().`

* User can adjust commit index based on the list of peers' log indexes.

* When a new member is added, its election timer is disabled until it fully catches up with the leader. Suppose somehow the leader has a problem so that a leader election is initiated. In that case, the new member may refuse the vote request if the candidate's priority is lower than its priority. Since the election timer is disabled, there is no way to decrease the target priority of the new member; consequently, the leader election will not succeed forever. * To avoid such a situation, the new member in catch-up mode should ignore priority for the vote.

* With this API, - User can set an expected term. `append_entries` will succeed only if the current server's term matches, behaves similar to compare-and-swap operation. - User can set a callback function to know the log index and term of the log that is just appended.

* If pre-vote is rejected and the server starts receiving heartbeat again, the pre-vote rejection counter should be reset to 0. But currently `become_follower` will not be called as role of the server remains as `follower`.

greensky00 and others added 30 commits April 7, 2021 22:20

Get rid of auto_destoyer (#193)

bc2c5df

* The destruction order of global static variables is non-deterministic.

Merge pull request #16 from ClickHouse-Extras/move_server_startup_to_…

8a19e7d

…separate_function (#195) Split server startup into separate methods (cherry picked from commit c35819f)

Get rid of event_awaiter.h from raft_server.hxx (#197)

4a33312

* We should not include internal header to public.

Fix testing issue (#198)

8699f3b

* Need to have enough communication before checking commit.

Make fake network transfer order deterministic (#202)

cf68d33

* If we use `unordered_map`, its traversal order won't be deterministic and depending on platforms the test executable is running. We should make the test deterministic.

Fix to intermittent test failure of auto forwarding (#203)

4680429

* Should use lock for list manipulation. * Added explicit shutdown for things related to auto-forwarding.

Add APIs to get peer info (#204)

f004f4c

* The new API returns the last log index and response time.

Add an option for grace period of lagging state machine (#208)

b1f7c07

* This option is used for avoiding a server being a leader, when the server's state machine is lagging behind the last committed index.

Exit if ctrl-d received (#211)

42b41b1

Retry snapshot read on the next heartbeat (#213)

256082e

* Retry snapshot read on the next heartbeat * If reading snapshot fails, it will be re-attempted upon the next heartbeat, with the latest (newer) snapshot. * [Update PR] Log level and missing return

Fix the testing issue under code coverage mode (#218)

ca8f903

* Asio disconnection event is not immediately fired under code coverage mode.

Address thread sanitizer warnings (#219)

8715f27

fix segfault when logger is set to nullptr (#220)

6f23733

Fix to compiler error on Mac (#221)

714db11

Fix to peer's next_log_idx_ overflow (#227)

7207197

Correct ambiguous comment of rollback (#230)

61e4fe3

Add test for replaying config log (#231)

b76cddc

* Replaying config log shouldn't result in reverting committed configurations.

Fix to bug on the restart of joining server (#232)

1c63694

* The config received during join request should be durable via the state manager. * During the replaying of logs, the newer config should not be overwritten by the older config.

Add new APIs to pause/resume state machine execution (#234)

8b97ca7

* Add new APIs to pause/resume state machine execution * The state machine execution is done in background asynchronously. We need an API to control the execution flow for various purposes. * [Update PR] Log and test

Improve test cases for snapshot and state machine pause (#235)

3fd5d65

Improve snapshot test cases (#236)

9c9cfc0

Temporarily suppress test failure for experimental option (#237)

29c28d1

Fix to Windows build (#240)

0ffb76b

* `buffer_serializer` should include `string`. * Modified README.md to fetch the correct Asio version.

Add conf condition at the beginning of snapshot_and_compact. (#244)

10aa8a3

* Add conf condition at the beginning of snapshot_and_compact. * Update handle_commit.cxx Co-authored-by: zhangxiao871 <zhangxiao871@ZBMAC-C02DN6312.local>

Add lifecycle callback functions for worker threads. (#245)

7aa8eb1

Co-authored-by: Li, Yong <yoli@ebay.com> Co-authored-by: Jung-Sang Ahn <jungsang.ahn@gmail.com>

greensky00 added 13 commits August 4, 2021 23:23

Fix to bug in get_peer_info (#246)

9584558

* `get_next_log_idx` may not be accurate if we want to get the last log successfully sent to the peer.

Use last accepted index for staleness checking (#249)

2ae1c50

* If we use matched index, it can be reset to 0 on reconnection so that the peer can be treated as "stale" incorrectly.

Revise full consensus mode (#250)

9553f3e

* Even though a few unhealthy members exist, if the number of healthy members is greater than the regular quorum size, we will pursue the full consensus among them.

Fix typo (#251)

2c7bdab

Fix to TSAN alerts (#261)

4aa8728

Fix to duplicate save_state call on a vote (#262)

b46b638

* Since `request_vote()` calls `save_state()`, `initiate_vote()` doesn't need to call `save_state()` right before `request_vote().`

Add a new callback for custom quorum condition (#263)

789cc75

* User can adjust commit index based on the list of peers' log indexes.

Add a test case for doing write on non-leader server (#272)

28952ce

Print Asio error message along with error code (#276)

66bcf56

Change role to candidate on initiating pre-vote (#277)

af7ccba

* If pre-vote is rejected and the server starts receiving heartbeat again, the pre-vote rejection counter should be reset to 0. But currently `become_follower` will not be called as role of the server remains as `follower`.

alesapin merged commit 6f26688 into alesapin:master Jan 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream#2

Merge upstream#2
alesapin merged 43 commits intoalesapin:masterfrom
eBay:master

alesapin commented Jan 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

alesapin commented Jan 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants