kvserver: fix 'observed raft log position' assertion#107412
kvserver: fix 'observed raft log position' assertion#107412craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
Fixes cockroachdb#107336. Fixes cockroachdb#106123. Fixes cockroachdb#107156. Fixes cockroachdb#106589. It's possible to hit this assertion under --stress --race when the proposing replica is starved enough for raft ticks that it loses leadership right when it steps proposals through raft. We're relying on undocumented API semantics in the etcd raft library whereby it mutates stepped entries with the term+index its to end up in. But that's only applicable if stepping through entries as a leader. Simply relax this assertion instead. Release note: None
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
sumeerbhola
left a comment
There was a problem hiding this comment.
can you also get a quick look from someone in repl-team? I have limited understanding of etcd/raft implementation details.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)
|
I think this is fine, but I'll have a closer look tomorrow. |
There was a problem hiding this comment.
This seems OK at a glance.
This means that AC is effectively disabled when the leader and leaseholder isn't colocated. That should be rare enough that we can disregard it, at least initially.
How do we handle leader changes and such here? Let's say the leader proposes 1000 entries, and deducts tokens for them, but is unable to commit the entries. A new leader is elected which replaces the old leader's uncommitted tail. Will we return all of the tokens in that case? Otherwise, if the old leader reacquires leadership, can it deadlock because it will never commit the log entries AC is waiting for? Conversely, if we reset them, will flapping leadership effectively reset AC? We expect to see such flapping precisely under overload, if leaders are unable to heartbeat in time.
|
Thanks for looking! bors r+
Yes. We’ll err on the side of over-admission to avoid token leaks/write stalls.
Yes, for IO/flow tokens, not for CPU slots. I suspect CPU overload comes into play more so when we’re unable to tick raft groups in time. Perhaps also if IO latencies are severely degraded (>10ms p99s under IOPS/bandwidth saturation) but AC doesn’t monitor IO latencies directly today. |
|
Build succeeded: |
Fixes #107336.
Fixes #106123.
Fixes #107156.
Fixes #106589.
It's possible to hit this assertion under --stress --race when the proposing replica is starved enough for raft ticks that it loses leadership right when it steps proposals through raft. We're relying on undocumented API semantics in the etcd raft library whereby it mutates stepped entries with the term+index its to end up in. But that's only applicable if stepping through entries as a leader. Simply relax this assertion instead.
Release note: None