A transaction running around 4.5s after cluster creation (think tests) can easily catch a TransactionAbortedError. This is because all ranges start up as having expiration-based leases (as a likely path dependency / accident - they all come from the first range, and splits maintain the original epoch-based lease on both sides). 4.5s later, these leases become eligible for a refresh. The new leases are generally epoch-based, and so it is not Equivalent() to the old one (and so it gets a new Sequence, which fact in turn causes the new lease acquisition to trigger this code which resets the timestamp cache.
After that ts cache wipe, a concurrent BeginTxn can fail its tscache check resulting in TransactionAbortedError(ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY).
I believe we've seen this be a cause of flakiness for multiple tests.
Discussing with @bdarnell, it seems that we have a couple of options:
- if the rhs of a split wants epoch-based leases, have it not inherit the expiration-based lease from the lhs. Exactly how to do that is yet unclear. Can a range not have a lease at all? Perhaps we can give the rhs an expired lease.
- make
Lease.Equivalent() understand this transition from exp to epo, and have it consider the two equivalent.
cc @tbg @benesch
A transaction running around 4.5s after cluster creation (think tests) can easily catch a
TransactionAbortedError. This is because all ranges start up as having expiration-based leases (as a likely path dependency / accident - they all come from the first range, and splits maintain the original epoch-based lease on both sides). 4.5s later, these leases become eligible for a refresh. The new leases are generally epoch-based, and so it is notEquivalent()to the old one (and so it gets a newSequence, which fact in turn causes the new lease acquisition to trigger this code which resets the timestamp cache.After that ts cache wipe, a concurrent
BeginTxncan fail its tscache check resulting inTransactionAbortedError(ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY).I believe we've seen this be a cause of flakiness for multiple tests.
Discussing with @bdarnell, it seems that we have a couple of options:
Lease.Equivalent()understand this transition from exp to epo, and have it consider the two equivalent.cc @tbg @benesch