kvserver: fix closed timestamp regression on clock jump#75298
kvserver: fix closed timestamp regression on clock jump#75298andreimatei wants to merge 1 commit intocockroachdb:masterfrom
Conversation
Before this patch, the following scenario was possible: - a node is stopped - the clock jumps backwards - the node is restarted - a replica from the node proposes a command with a closed timestamp lower than timestamps closed before the restart. This causes an assertion to fire on application. The problem is that, after the restart, the propBuf, which is in charge of assigning closed timestamps to proposals, doesn't have info on what had been closed prior to the restart. The propBuf maintains the b.assignedClosedTimestamp field, which is supposed to be in advance of r.mu.state.RaftClosedTimestamp on leaseholder replicas, but nobody initializes that field on restart. For ranges with epoch-based leases, I believe we don't have a problem because the range will need to acquire a new lease after restart before proposing any commands - and lease acquisitions initialize b.assignedClosedTimestamp. But for expiration-based leases, I believe a lease from before the restart might be used(*). (*) This is probably a bad thing which we should discuss separately. I think we don't want leases to be used after a restart for multiple reasons and we have the r.minLeaseProposedTS[1] guard that supposed to protect against using them (in addition to the epoch protection we have for epoch-based leases, I think). But I believe this protection can be elided by a backwards clock jump - we refuse to use leases acquired *before* minLeaseProposedTS, and minLPTS is assigned to time.Now() on start; if the clock went backwards, the leases will appear to be good. [1] https://github.com/cockroachdb/cockroach/blob/6664d0c34df0fea61de4fff1e97987b7de609b9e/pkg/kv/kvserver/replica.go#L468 Release note: A bug causing nodes to repeatedly crash with a "raft closed timestamp regression" was fixed. The bug occurred when the node in question had been stopped and the machine clock jumped backwards before the node was restarted. Fixes cockroachdb#70894
|
There some things here I'd like to discuss:
|
|
Thanks for looking into this! I'm curious, can you explain the failed state assertion before this patch? I agree that fundamentally the problem is not the missing initialization, but the fact that we allow using the expiration-based lease that is there. Couldn't we - sort of - prevent that if we forwarded the Giant clock jumps put CRDB into unsupported territory anyway, so I don't think we have to do something that "always works". Curious if you like the "max(ProposedTS).Next()` idea which would be closer, in spirit, to the "correct" solution. |
Before this patch, the following scenario was possible:
lower than timestamps closed before the restart. This causes an
assertion to fire on application.
The problem is that, after the restart, the propBuf, which is in charge
of assigning closed timestamps to proposals, doesn't have info on what
had been closed prior to the restart. The propBuf maintains the
b.assignedClosedTimestamp field, which is supposed to be in advance of
r.mu.state.RaftClosedTimestamp on leaseholder replicas, but nobody
initializes that field on restart. For ranges with epoch-based leases, I
believe we don't have a problem because the range will need to acquire a
new lease after restart before proposing any commands - and lease
acquisitions initialize b.assignedClosedTimestamp. But for
expiration-based leases, I believe a lease from before the restart
might be used(*).
(*) This is probably a bad thing which we should discuss separately. I
think we don't want leases to be used after a restart for multiple
reasons and we have the r.minLeaseProposedTS[1] guard that supposed to
protect against using them (in addition to the epoch protection we have
for epoch-based leases, I think). But I believe this protection can be
elided by a backwards clock jump - we refuse to use leases acquired
before minLeaseProposedTS, and minLPTS is assigned to time.Now() on
start; if the clock went backwards, the leases will appear to be good.
[1]
cockroach/pkg/kv/kvserver/replica.go
Line 468 in 6664d0c
Release note: A bug causing nodes to repeatedly crash with a "raft
closed timestamp regression" was fixed. The bug occurred when the node
in question had been stopped and the machine clock jumped backwards
before the node was restarted.
Fixes #70894