Skip to content

kv: document DeleteRange (SI) and Delete (SSI) anomalies#6227

Merged
tbg merged 5 commits intocockroachdb:masterfrom
tbg:delete-anomaly-snapshot
Apr 22, 2016
Merged

kv: document DeleteRange (SI) and Delete (SSI) anomalies#6227
tbg merged 5 commits intocockroachdb:masterfrom
tbg:delete-anomaly-snapshot

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Apr 22, 2016

DeleteRange

The newly added test (which was pruned to make its runtime palatable) currently
returns ~22 failing histories at SNAPSHOT isolation owing to two anomalies.
This was found looking into #6124, but is not technically related to it (but see the next section).
The first (and more severe) anomaly:

iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[3 2 1]
I1.1(A)[1] C1.1 DR3.1(A-C) R2.1(A)[1] C3.1 W2.1(B-A)[1] C2.1

In other words

  • the value A=1 is written and committed
  • transaction 3 deletes A-C (which means leaving an intent on A and
    populating the write timestamp cache interval A-C.
  • transaction 2 pushes the intent on A and reads A=1.
    The write timestamp cache entry on A isn't updated properly in the process
    (which is an unrelated issue that likely incurs anomalies).
  • transaction 2 writes B=1, commits.
    This is the anomaly: logically, there is DeleteRange intent sitting there
    with a higher timestamp, so this should be a write-write conflict. But there
    isn't an actual intent there (since the key was previously vacant, just like
    the countably infinite rest of vacant keys around it).
  • transaction 3 commits (at its higher timestamp), so now A=,
    but note that we still have B=1 (written by transaction 2).

Note that this example should fail in the same way with only transaction 3 at
SNAPSHOT isolation.

The one obvious fix that comes to mind is to "somehow" push the whole range
entry in the timestamp cache forward along with the push of one of its intents.
However, that seems brittle and may not reliably be possible since the
timestamp cache's current implementation freely splits up its intervals as new
entries are added.

Then there's also this failing history (the key point is essentially that the
read of A and the range deletion are interchanged):

iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[2 3 1]
I1.1(A)[1] C1.1 R2.1(A)[1] DR3.1(A-C) W2.1(B-A)[1] C3.1 C2.1

In other words:

  • write and commit A=1 as before
  • txn2 reads A=1
  • txn3 deletes A-C (so ts(txn3) > ts(txn2))
  • txn2 writes B=1 (pushing its timestamp due to tsCache entry on A-C); now
    again ts(txn3) < ts(txn2)
  • both commit and the visible values are A=, B=1; we've again lost txn3's
    deletion of B.

This seems a little less anomalous than the above since essentially what
happens is that txn2 has write skew: It reads a stale value of A and writes
it somewhere, except that this "somewhere" would be a write-write conflict in
most monolithic databases.

Delete

As pointed out in #6124, Delete currently neither populates the timestamp
cache nor leaves a tombstone when there is no previous value.

Consider the history I1(A) C1 R2(A) D3(A) D3(B) W2(B-A) C2 C3

  • A=1 is written and committed
  • txn2 reads A=1
  • txn3 deletes A and B (the latter of which is vacant, i.e. has no old version)
    There is no conflict since ts(txn3) > ts(txn2).
  • txn2 writes B=1 and both txns commit.
    This is illegal since txn2 wrote into the past of txn3s deletion of B.

A tombstone on B would have fixed this (txn2 would have caught write-too-old),
or an entry in the write timestamp cache.

Writing the tombstone is likely the more economical option.

See #6124 for discovery and discussion of this anomaly.


This change is Reviewable

@tbg tbg force-pushed the delete-anomaly-snapshot branch 4 times, most recently from 35b91c4 to a806e2c Compare April 22, 2016 09:00
@tbg tbg changed the title kv: document DeleteRange anomaly at SNAPSHOT kv: document DeleteRange (SI) and Delete (SSI) anomalies Apr 22, 2016
@vivekmenezes
Copy link
Copy Markdown
Contributor

In the second failing history is ts2 still reading A=1?


Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@vivekmenezes
Copy link
Copy Markdown
Contributor

I've to step out for an hour and a half. I'll review it when I get back.

@bdarnell
Copy link
Copy Markdown
Contributor

LGTM.

For the DeleteRange anomaly, what if we marked intents laid down by commands that also update the timestamp cache (just DeleteRange?) as unpushable, so the intent would never move to a timestamp inconsistent with the cache? I guess that would be equivalent to making all transactions containing DeleteRange SSI instead of SI. That seems easier than trying to update the timestamp cache when a transaction is pushed.

@vivekmenezes
Copy link
Copy Markdown
Contributor

Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


kv/txn_correctness_test.go, line 471 [r5] (raw file):
can you document this in the comment above?


kv/txn_correctness_test.go, line 696 [r5] (raw file):
delete key?


kv/txn_correctness_test.go, line 783 [r5] (raw file):
issue #6277 doesn't exist

You can make this my TODO since I'm fixing the delete case in #6124


kv/txn_correctness_test.go, line 796 [r5] (raw file):
update to the correct issue


Comments from Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 22, 2016

@vivekmenezes yes, txn2 always reads A=1 in the failing cases.


Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


kv/txn_correctness_test.go, line 471 [r5] (raw file):
Done.


kv/txn_correctness_test.go, line 696 [r5] (raw file):
Done.


kv/txn_correctness_test.go, line 783 [r5] (raw file):
Done.


kv/txn_correctness_test.go, line 796 [r5] (raw file):
Done.


Comments from Reviewable

tbg added 5 commits April 22, 2016 12:39
The newly added test (which was pruned to make its runtime palatable) currently
returns ~22 failing histories at SNAPSHOT isolation owing to two anomalies.
This was found looking into cockroachdb#6124, but is not technically related to it (the
issue there is for `Delete` and an anomaly exists but isn't exercised here).

The first (and more severe) anomaly:

```
iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[3 2 1]
I1.1(A)[1] C1.1 DR3.1(A-C) R2.1(A)[1] C3.1 W2.1(B-A)[1] C2.1
```

In other words
* the value A=1 is written and committed
* transaction 3 deletes A-C (which means leaving an intent on A and
  populating the write timestamp cache interval A-C.
* transaction 2 pushes the intent on A and reads A=1.
  The write timestamp cache entry on A isn't updated properly in the process
  (which is an unrelated issue that likely incurs anomalies).
* transaction 2 writes B=1, commits.
  This is the anomaly: logically, there is DeleteRange intent sitting there
  with a higher timestamp, so this should be a write-write conflict. But there
  isn't an actual intent there (since the key was previously vacant, just like
  the countably infinite rest of vacant keys around it).
* transaction 3 commits (at its higher timestamp), so now A=<nil>,
  but note that we still have B=1 (written by transaction 3).

Note that this example should fail in the same way with only transaction 3 at
SNAPSHOT isolation.

The one obvious fix that comes to mind is to "somehow" push the whole range
entry in the timestamp cache forward along with the push of one of its intents.
However, that seems brittle and may not reliably be possible since the
timestamp cache's current implementation freely splits up its intervals as new
entries are added.

Then there's also this failing history (the key point is essentially that the
read of A and the range deletion are interchanged):

```
iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[2 3 1]
I1.1(A)[1] C1.1 R2.1(A)[1] DR3.1(A-C) W2.1(B-A)[1] C3.1 C2.1
```

In other words:

* write and commit A=1 as before
* txn2 reads A=1
* txn3 deletes A-C (so `ts(txn3) > ts(txn2)`)
* txn2 writes B=1 (pushing its timestamp due to tsCache entry on A-C); now
  again `ts(txn3) < ts(txn2)`
* both commit and the visible values are A=<nil>, B=1; we've again lost txn3's
  deletion of B.

This seems a little less anomalous than the above since essentially what
happens is that txn2 has write skew: It reads a stale value of A and writes
it somewhere, except that this "somewhere" would be a write-write conflict in
most monolithic databases.
As pointed out in cockroachdb#6124, `Delete` currently neither populates the timestamp
cache nor leaves a tombstone when there is no previous value.

Consider the history `I1(A) C1 R2(A) D3(A) D3(B) W2(B-A) C2 C3`

* A=1 is written and committed
* txn2 reads `A=1`
* txn3 deletes A and B (the latter of which is vacant, i.e. has no old version)
  There is no conflict since `ts(txn3) > ts(txn2)`.
* txn2 writes `B=1` and both txns commit.
  This is illegal since `txn2` wrote into the past of `txn3`s deletion of B.

A tombstone on B would have fixed this (txn2 would have caught write-too-old),
or an entry in the write timestamp cache.

Writing the tombstone is likely the more economical option.

See cockroachdb#6124 for discovery and discussion of this anomaly.
@tbg tbg force-pushed the delete-anomaly-snapshot branch from a806e2c to 6495dab Compare April 22, 2016 16:40
@vivekmenezes
Copy link
Copy Markdown
Contributor

:lgtm:


Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


Comments from Reviewable

@tbg tbg merged commit 371a60e into cockroachdb:master Apr 22, 2016
@tbg tbg deleted the delete-anomaly-snapshot branch April 22, 2016 16:56
BramGruneir added a commit to BramGruneir/cockroach that referenced this pull request Apr 28, 2017
This removes the unused dynamic client, load and repair test which are all no longer needed.
This test and the associated files was originally added before we had the localcluster which tests this functionality already.

Fixes cockroachdb#6227.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants