kv: document DeleteRange (SI) and Delete (SSI) anomalies#6227
kv: document DeleteRange (SI) and Delete (SSI) anomalies#6227tbg merged 5 commits intocockroachdb:masterfrom
Conversation
35b91c4 to
a806e2c
Compare
|
In the second failing history is ts2 still reading A=1? Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
|
I've to step out for an hour and a half. I'll review it when I get back. |
|
LGTM. For the DeleteRange anomaly, what if we marked intents laid down by commands that also update the timestamp cache (just DeleteRange?) as unpushable, so the intent would never move to a timestamp inconsistent with the cache? I guess that would be equivalent to making all transactions containing DeleteRange SSI instead of SI. That seems easier than trying to update the timestamp cache when a transaction is pushed. |
|
Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful. kv/txn_correctness_test.go, line 471 [r5] (raw file): kv/txn_correctness_test.go, line 696 [r5] (raw file): kv/txn_correctness_test.go, line 783 [r5] (raw file): You can make this my TODO since I'm fixing the delete case in #6124 kv/txn_correctness_test.go, line 796 [r5] (raw file): Comments from Reviewable |
|
@vivekmenezes yes, txn2 always reads A=1 in the failing cases. Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful. kv/txn_correctness_test.go, line 471 [r5] (raw file): kv/txn_correctness_test.go, line 696 [r5] (raw file): kv/txn_correctness_test.go, line 783 [r5] (raw file): kv/txn_correctness_test.go, line 796 [r5] (raw file): Comments from Reviewable |
The newly added test (which was pruned to make its runtime palatable) currently returns ~22 failing histories at SNAPSHOT isolation owing to two anomalies. This was found looking into cockroachdb#6124, but is not technically related to it (the issue there is for `Delete` and an anomaly exists but isn't exercised here). The first (and more severe) anomaly: ``` iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[3 2 1] I1.1(A)[1] C1.1 DR3.1(A-C) R2.1(A)[1] C3.1 W2.1(B-A)[1] C2.1 ``` In other words * the value A=1 is written and committed * transaction 3 deletes A-C (which means leaving an intent on A and populating the write timestamp cache interval A-C. * transaction 2 pushes the intent on A and reads A=1. The write timestamp cache entry on A isn't updated properly in the process (which is an unrelated issue that likely incurs anomalies). * transaction 2 writes B=1, commits. This is the anomaly: logically, there is DeleteRange intent sitting there with a higher timestamp, so this should be a write-write conflict. But there isn't an actual intent there (since the key was previously vacant, just like the countably infinite rest of vacant keys around it). * transaction 3 commits (at its higher timestamp), so now A=<nil>, but note that we still have B=1 (written by transaction 3). Note that this example should fail in the same way with only transaction 3 at SNAPSHOT isolation. The one obvious fix that comes to mind is to "somehow" push the whole range entry in the timestamp cache forward along with the push of one of its intents. However, that seems brittle and may not reliably be possible since the timestamp cache's current implementation freely splits up its intervals as new entries are added. Then there's also this failing history (the key point is essentially that the read of A and the range deletion are interchanged): ``` iso=[SNAPSHOT SNAPSHOT SNAPSHOT], pri=[2 3 1] I1.1(A)[1] C1.1 R2.1(A)[1] DR3.1(A-C) W2.1(B-A)[1] C3.1 C2.1 ``` In other words: * write and commit A=1 as before * txn2 reads A=1 * txn3 deletes A-C (so `ts(txn3) > ts(txn2)`) * txn2 writes B=1 (pushing its timestamp due to tsCache entry on A-C); now again `ts(txn3) < ts(txn2)` * both commit and the visible values are A=<nil>, B=1; we've again lost txn3's deletion of B. This seems a little less anomalous than the above since essentially what happens is that txn2 has write skew: It reads a stale value of A and writes it somewhere, except that this "somewhere" would be a write-write conflict in most monolithic databases.
As pointed out in cockroachdb#6124, `Delete` currently neither populates the timestamp cache nor leaves a tombstone when there is no previous value. Consider the history `I1(A) C1 R2(A) D3(A) D3(B) W2(B-A) C2 C3` * A=1 is written and committed * txn2 reads `A=1` * txn3 deletes A and B (the latter of which is vacant, i.e. has no old version) There is no conflict since `ts(txn3) > ts(txn2)`. * txn2 writes `B=1` and both txns commit. This is illegal since `txn2` wrote into the past of `txn3`s deletion of B. A tombstone on B would have fixed this (txn2 would have caught write-too-old), or an entry in the write timestamp cache. Writing the tombstone is likely the more economical option. See cockroachdb#6124 for discovery and discussion of this anomaly.
a806e2c to
6495dab
Compare
|
Review status: 0 of 2 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful. Comments from Reviewable |
This removes the unused dynamic client, load and repair test which are all no longer needed. This test and the associated files was originally added before we had the localcluster which tests this functionality already. Fixes cockroachdb#6227.
DeleteRange
The newly added test (which was pruned to make its runtime palatable) currently
returns ~22 failing histories at SNAPSHOT isolation owing to two anomalies.
This was found looking into #6124, but is not technically related to it (but see the next section).
The first (and more severe) anomaly:
In other words
populating the write timestamp cache interval A-C.
The write timestamp cache entry on A isn't updated properly in the process
(which is an unrelated issue that likely incurs anomalies).
This is the anomaly: logically, there is DeleteRange intent sitting there
with a higher timestamp, so this should be a write-write conflict. But there
isn't an actual intent there (since the key was previously vacant, just like
the countably infinite rest of vacant keys around it).
but note that we still have B=1 (written by transaction 2).
Note that this example should fail in the same way with only transaction 3 at
SNAPSHOT isolation.
The one obvious fix that comes to mind is to "somehow" push the whole range
entry in the timestamp cache forward along with the push of one of its intents.
However, that seems brittle and may not reliably be possible since the
timestamp cache's current implementation freely splits up its intervals as new
entries are added.
Then there's also this failing history (the key point is essentially that the
read of A and the range deletion are interchanged):
In other words:
ts(txn3) > ts(txn2))again
ts(txn3) < ts(txn2)deletion of B.
This seems a little less anomalous than the above since essentially what
happens is that txn2 has write skew: It reads a stale value of A and writes
it somewhere, except that this "somewhere" would be a write-write conflict in
most monolithic databases.
Delete
As pointed out in #6124,
Deletecurrently neither populates the timestampcache nor leaves a tombstone when there is no previous value.
Consider the history
I1(A) C1 R2(A) D3(A) D3(B) W2(B-A) C2 C3A=1There is no conflict since
ts(txn3) > ts(txn2).B=1and both txns commit.This is illegal since
txn2wrote into the past oftxn3s deletion of B.A tombstone on B would have fixed this (txn2 would have caught write-too-old),
or an entry in the write timestamp cache.
Writing the tombstone is likely the more economical option.
See #6124 for discovery and discussion of this anomaly.
This change is