Skip to content

raft: Make flow control more aggressive#9985

Merged
xiang90 merged 1 commit intoetcd-io:masterfrom
bdarnell:flow-control
Aug 8, 2018
Merged

raft: Make flow control more aggressive#9985
xiang90 merged 1 commit intoetcd-io:masterfrom
bdarnell:flow-control

Conversation

@bdarnell
Copy link
Copy Markdown
Contributor

@bdarnell bdarnell commented Aug 7, 2018

We allow multiple in-flight append messages, but prior to this change
the only way we'd ever send them is if there is a steady stream of new
proposals. Catching up a follower that is far behind would be
unnecessarily slow (this is exacerbated by a quirk of CockroachDB's
use of raft which limits our ability to catch up via snapshot in some
cases).

See cockroachdb/cockroach#27983

Please read https://github.com/coreos/etcd/blob/master/CONTRIBUTING.md#contribution-flow.

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #9985 into master will decrease coverage by 0.22%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9985      +/-   ##
==========================================
- Coverage   69.34%   69.11%   -0.23%     
==========================================
  Files         386      386              
  Lines       35914    35918       +4     
==========================================
- Hits        24905    24826      -79     
- Misses       9212     9297      +85     
+ Partials     1797     1795       -2
Impacted Files Coverage Δ
raft/raft.go 92.01% <92.3%> (+0.73%) ⬆️
client/client.go 58.82% <0%> (-25.82%) ⬇️
pkg/transport/timeout_conn.go 80% <0%> (-20%) ⬇️
proxy/grpcproxy/register.go 69.44% <0%> (-11.12%) ⬇️
clientv3/namespace/watch.go 72.72% <0%> (-6.07%) ⬇️
pkg/logutil/zap_grpc.go 47.61% <0%> (-4.77%) ⬇️
pkg/transport/listener.go 58.67% <0%> (-4.09%) ⬇️
lease/leasehttp/http.go 58.08% <0%> (-1.48%) ⬇️
etcdctl/ctlv3/command/printer_simple.go 72.48% <0%> (-1.35%) ⬇️
etcdserver/raft.go 80.09% <0%> (-0.72%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93be31d...a5ea496. Read the comment docs.

@xiang90 xiang90 self-assigned this Aug 7, 2018
raft/raft.go Outdated
// argument controls whether messages with no entries will be sent
// ("empty" messages are useful to convey updated Commit indexes, but
// are undesirable when we're sending multiple messages in a batch).
func (r *raft) sendAppend(to uint64, sendIfEmpty bool) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the old sendAppend? and add a new func maybeSendAppend (or sendAppendIfNotEmpty)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@xiang90
Copy link
Copy Markdown
Contributor

xiang90 commented Aug 7, 2018

@bdarnell Have you tried this in CockroachDB setup?

@xiang90
Copy link
Copy Markdown
Contributor

xiang90 commented Aug 7, 2018

LGTM after fixing the nit.

We allow multiple in-flight append messages, but prior to this change
the only way we'd ever send them is if there is a steady stream of new
proposals. Catching up a follower that is far behind would be
unnecessarily slow (this is exacerbated by a quirk of CockroachDB's
use of raft which limits our ability to catch up via snapshot in some
cases).

See cockroachdb/cockroach#27983
@bdarnell
Copy link
Copy Markdown
Contributor Author

bdarnell commented Aug 8, 2018

Yes, we've tested it. We had a node that had accumulated a multi-gigabyte raft log. Without this change it was going to take weeks to catch up, and with this it caught up in hours.

@xiang90 xiang90 merged commit 6002bf3 into etcd-io:master Aug 8, 2018
bdarnell added a commit to cockroachdb/vendored that referenced this pull request Aug 13, 2018
Picks up etcd-io/etcd#9982 and etcd-io/etcd#9985 (and no other changes
to packages we use).
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Aug 13, 2018
Picks up etcd-io/etcd#9982 and etcd-io/etcd#9985 (and no other changes
to packages we use).

Fixes cockroachdb#27983
Fixes cockroachdb#27804

Release note (bug fix): Additional fixes for out-of-memory errors
caused by very large raft logs.

Release note (performance improvement): Greatly improved performance
when catching up followers that are behind when raft logs are large.
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Aug 13, 2018
28511: vendor: Update etcd r=tschottdorf a=bdarnell

Picks up etcd-io/etcd#9982 and etcd-io/etcd#9985 (and no other changes
to packages we use).

Fixes #27983
Fixes #27804

Release note (bug fix): Additional fixes for out-of-memory errors
caused by very large raft logs.

Release note (performance improvement): Greatly improved performance
when catching up followers that are behind when raft logs are large.

Co-authored-by: Ben Darnell <ben@bendarnell.com>
@siddontang
Copy link
Copy Markdown
Contributor

Yes, we've tested it. We had a node that had accumulated a multi-gigabyte raft log. Without this change it was going to take weeks to catch up, and with this it caught up in hours.

Hi @benbjohnson

Why don't you use snapshot here? seem your Range size is 64 MB, the total log size has exceeded this threshold too much.

BusyJay added a commit to BusyJay/raft-rs that referenced this pull request Feb 27, 2020
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
BusyJay added a commit to tikv/raft-rs that referenced this pull request Mar 3, 2020
The patch is to speed up log replication when a node is way
behind than leader and logs are not compacted yet.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
BusyJay added a commit to BusyJay/raft-rs that referenced this pull request Mar 26, 2020
The patch is to speed up log replication when a node is way behind than
leader and logs are not compacted yet.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
gengliqi pushed a commit to tikv/raft-rs that referenced this pull request Mar 27, 2020
The patch is to speed up log replication when a node is way behind than
leader and logs are not compacted yet.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
shbieng pushed a commit to shbieng/raft-rs that referenced this pull request Sep 19, 2022
The patch is to speed up log replication when a node is way behind than
leader and logs are not compacted yet.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants