Skip to content

kv: DeleteRange anomaly at SNAPSHOT #6240

@tbg

Description

@tbg

TestTxnDBLostUpdateAnomaly_DeleteRange currently fails with variations of

iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[3 2 1]
I1.1(A)[1] C1.1 DR3.1(A-C) R2.1(A)[1] C3.1 W2.1(B-A)[1] C2.1

and

iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[2 3 1]
I1.1(A)[1] C1.1 R2.1(A)[1] DR3.1(A-C) W2.1(B-A)[1] C3.1 C2.1

In the first

  • the value A=1 is written and committed
  • transaction 3 deletes A-C (which means leaving an intent on A and
    populating the write timestamp cache interval A-C.
  • transaction 2 pushes the intent on A and reads A=1.
    The write timestamp cache entry on A isn't updated properly in the process
    (which is an unrelated issue that likely incurs anomalies).
  • transaction 2 writes B=1, commits.
    This is the anomaly: logically, there is DeleteRange intent sitting there
    with a higher timestamp, so this should be a write-write conflict. But there
    isn't an actual intent there (since the key was previously vacant, just like
    the countably infinite rest of vacant keys around it).
  • transaction 3 commits (at its higher timestamp), so now A=,
    but note that we still have B=1 (written by transaction 2).

Note that this example should fail in the same way with only transaction 3 at
SNAPSHOT isolation.

The one obvious fix that comes to mind is to "somehow" push the whole range
entry in the timestamp cache forward along with the push of one of its intents.
However, that seems brittle and may not reliably be possible since the
timestamp cache's current implementation freely splits up its intervals as new
entries are added.

In the second one, the read of A and the range deletion are interchanged:

iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[2 3 1]
I1.1(A)[1] C1.1 R2.1(A)[1] DR3.1(A-C) W2.1(B-A)[1] C3.1 C2.1

In other words:

  • write and commit A=1 as before
  • txn2 reads A=1
  • txn3 deletes A-C (so ts(txn3) > ts(txn2))
  • txn2 writes B=1 (pushing its timestamp due to tsCache entry on A-C); now
    again ts(txn3) < ts(txn2)
  • both commit and the visible values are A=, B=1; we've again lost txn3's
    deletion of B.

This seems a little less anomalous than the above since essentially what
happens is that txn2 has write skew: It reads a stale value of A and writes
it somewhere, except that this "somewhere" would be a write-write conflict in
most monolithic databases.

See #6227 for the PR that introduced these tests.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions