-
Notifications
You must be signed in to change notification settings - Fork 962
Description
I have found a sequence of events that could lead to repeated failure of a ledger recovery process that would lead to data unavailability. I have verified this with both my TLA+ specification (https://github.com/Vanlightly/bookkeeper-tlaplus) of the replication protocol and new unit tests.
BUG REPORT
When a second writer performs recovery, it can end up trying to create an invalid ensemble which will cause recovery to fail.
The following example involves no timeouts on LAC reads, but an empty ensemble returning -1 for LAC reads. However, the same result would follow if LAC reads timed out, causing the value of -1 to be used as the default.
participant w1
participant w2
participant b1
participant b2
participant b3
participant b4
participant b5
participant b6
participant zk
note over w1,zk:Ensembles: 0:{b1,b2}, 1000:{b2,b3}
w1->b1: e1999
w1->b2: e1999
b1->w1: ack e1999
b2->w1: ack e1999
w1--xb1: e2000
w1--xb2: e2000
w1->zk:Add ensemble 2000:{b4,b5}
w2->zk:Start recovery
w2->b4: Read LAC
w2->b5: Read LAC
b4->w2: LAC -1
b5->w2: LAC -1
w2->b1: Read e0
w2->b2: Read e0
b1->w2:e0
b2->w2:e0
w2->b4:e0
w2--xb5:e0
b4->w2:ack e0
w2->zk:Add ensemble 0:{b4,b6}
note over w2,zk:INVALID ENSEMBLE
Any ledger recovery is vulnerable to this situation if the ledger has at least 1 existing ensemble, a default of -1 is used for the LAC and an error occurs during the write-back phase causing a new ensemble to be created (from entry 0).
THE FIX
The fix is to not use -1 (or 0 in the TLA+ spec) as the minimum but to take the first entry id of the current ensemble - 1 as the minimum. This ensures we only try to recover the current ensemble. Previous ensembles, if any, have already been committed and so recovery reads/writes of those ensembles is unnecessary. If a failure occurs during recovery of the current ensemble, then updating that ensemble is a legal operation.
The TLA+ specification uses this new minimum value and so the spec will not currently reach this illegal state.
I will shortly submit a PR with the code fix and new unit tests.