Skip to content

[dnm] use scans in kvnemeses#47120

Closed
tbg wants to merge 2 commits intocockroachdb:masterfrom
tbg:kvnemeses
Closed

[dnm] use scans in kvnemeses#47120
tbg wants to merge 2 commits intocockroachdb:masterfrom
tbg:kvnemeses

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Apr 7, 2020

Hack up just enough support for scans to see if we can perhaps tickle
the release blocker here:

#46652

No validation was added. We're just using kvnemeses to send random
scans for us.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Hack up just enough support for scans to see if we can perhaps tickle
the release blocker here:

cockroachdb#46652

No validation was added. We're just using kvnemeses to send random
scans for us.

Release note: None
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 7, 2020

Here's an example run: https://gist.github.com/tbg/aaa88d3f8dedf0aba4792b11fe21297a

Not much to see other than that we're actually doing a bunch of scans, and they sometimes return data. (kvnemesis can't handle multiple results, so I'm pretending scans just return their first hit if any, and nothing is validated).

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 7, 2020

❌ The GitHub CI (Cockroach) build has failed on 655c9eac.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. I am experimental - my owner is otan.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 8, 2020

I added reverse scans.

@nvanbenschoten this might be of interest to you (full gist)

 b := &Batch{}
    b.ReverseScan(ctx, /Table/50/"b8f60bac", /Table/50/"bccc30d8", 4) // (nil, resolving lock at /Table/50/"0010408c" on end transaction [COMMITTED]: Not implemented: SeekForPrev() not supported)
    b.ReverseScan(ctx, /Table/50/"83e7a5df", /Table/50/"cc2cbde9", 9) // (nil, resolving lock at /Table/50/"0010408c" on end transaction [COMMITTED]: Not implemented: SeekForPrev() not supported)
    b.ReverseScan(ctx, /Table/50/"dd14914e", /Table/50/"f2207731", 5) // (nil, resolving lock at /Table/50/"0010408c" on end transaction [COMMITTED]: Not implemented: SeekForPrev() not supported)
    txn.CommitInBatch(ctx, b) // resolving lock at /Table/50/"0010408c" on end transaction [COMMITTED]: Not implemented: SeekForPrev() not supported

and this is of interest as well:

F200408 10:24:52.248408 5210 kv/kvclient/kvcoord/dist_sender.go:699  [n2,txn=3bab6373] batch with MaxSpanRequestKeys=11, TargetBytes=0 needs splitting

If I ignore both of those errors in the output during make stress, it seems to pass, but that's not saying much since the first is a simple error that probably occurs in most runs.

I'll try to avoid the reversescan+commit scenario in the generator. It's clear from the error why that fails.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 8, 2020

Oh, the "batch needs splitting" is similarly an artifact of committing a batch with an endtxn.

Here, we allow the batch to consist of limited ops plus a commit:

if ba.MaxSpanRequestKeys != 0 || ba.TargetBytes != 0 {
// Verify that the batch contains only specific range requests or the
// EndTxnRequest. Verify that a batch with a ReverseScan only contains
// ReverseScan range requests.
isReverse := ba.IsReverse()
for _, req := range ba.Requests {
inner := req.GetInner()
switch inner.(type) {
case *roachpb.ScanRequest, *roachpb.ResolveIntentRangeRequest,
*roachpb.DeleteRangeRequest, *roachpb.RevertRangeRequest:
// Accepted forward range requests.
if isReverse {
return roachpb.NewErrorf("batch with limit contains both forward and reverse scans")
}
case *roachpb.ReverseScanRequest:
// Accepted reverse range requests.
case *roachpb.QueryIntentRequest, *roachpb.EndTxnRequest:
// Accepted point requests that can be in batches with limit.

This already makes me nervous, though I guess it's fine? But then it seems pretty possible that we'll pass canSplitET = true to ba.Split via this code

if ba.Txn != nil && ba.Txn.Epoch > 0 && !require1PC {
splitET = true
}
parts := splitBatchAndCheckForRefreshSpans(&ba, splitET)

and that will happily split off the EndTxn from the scans. Oops, real bug.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Apr 8, 2020

Ready for more, @nvanbenschoten? Here's another fatal:

https://gist.github.com/tbg/0a793101e942a3e998d3967f3d68df10

panic: expected latches held, found none [recovered]
panic: expected latches held, found none [recovered]
panic: expected latches held, found none

goroutine 2534 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0002cea20, 0x7e00c80, 0xc0013a09f0)
        /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:183 +0x11f
panic(0x6fd0c60, 0x7d37080)
        /usr/local/Cellar/go/1.13.4/libexec/src/runtime/panic.go:679 +0x1b2
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Send.func1(0xc0010a56b8, 0xc0010a5740, 0xc0010a5738, 0xc000579c00, 0x1603d1e37e88ee18, 0x0)
        /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_send.go:115 +0x1f9
panic(0x6fd0c60, 0x7d37080)
        /usr/local/Cellar/go/1.13.4/libexec/src/runtime/panic.go:679 +0x1b2
github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency.(*Guard).AssertLatches(...)
        /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:469
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeBatchWithConcurrencyRetries(0xc00057a300, 0x7e00c80, 0xc0013a0a80, 0xc001743d00, 0x76f24f0, 0x0, 0xc0013b2e60)
        /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:249 +0xdec
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendWithRangeID(0xc00057a300, 0x7e00c80, 0xc0013a0a50, 0x1, 0xc001743d00, 0x0, 0x0)
        /Users/tschottdorf/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:98 +0x2f3

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 8, 2020

❌ The GitHub CI (Cockroach) build has failed on ef5f93b8.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. I am experimental - my owner is otan.

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Apr 8, 2020

expected latches held, found none

I'm happy you hit this. I saw this once a few weeks ago while working on #46639 but was unable to hit it again when I went back to try to reproduce. If you're still stressing this to hit the limit bug, do you mind applying the following diff:

diff --git a/pkg/kv/kvserver/concurrency/concurrency_manager.go b/pkg/kv/kvserver/concurrency/concurrency_manager.go
index a3f61c9..d3f11ae 100644
--- a/pkg/kv/kvserver/concurrency/concurrency_manager.go
+++ b/pkg/kv/kvserver/concurrency/concurrency_manager.go
@@ -13,6 +13,7 @@ package concurrency
 import (
        "context"
        "sync"
+       "fmt"

        "github.com/cockroachdb/cockroach/pkg/kv"
        "github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/lock"
@@ -466,7 +467,9 @@ func (g *Guard) HoldingLatches() bool {
 // AssertLatches asserts that the guard is non-nil and holding latches.
 func (g *Guard) AssertLatches() {
        if !g.HoldingLatches() {
-               panic("expected latches held, found none")
+               var ba roachpb.BatchRequest
+               ba.Requests = g.Req.Requests
+               panic(fmt.Sprintf("expected latches held, found none: %s", ba))
        }
 }

I'll also spin up some kvnemeses to assist the effort.

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Apr 8, 2020

txn.CommitInBatch(ctx, b) // resolving lock at /Table/50/"0010408c" on end transaction [COMMITTED]: Not implemented: SeekForPrev() not supported

This one is strange. We're calling SeekLT in either MVCCResolveWriteIntentUsingIter or MVCCResolveWriteIntentRangeUsingIter. I guess we shouldn't be on a *rocksDBBatch.

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Apr 8, 2020

It's clear from the error why that fails.

Do you mind expanding? It's not so clear why it fails to me. I could understand if it failed during the ReverseScan evaluation, but this coming from here. What does the ReverseScan have to do with this?

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Apr 8, 2020

diff --git a/pkg/kv/kvserver/batcheval/cmd_scan.go b/pkg/kv/kvserver/batcheval/cmd_scan.go
index fd10986..97a2c3e 100644
--- a/pkg/kv/kvserver/batcheval/cmd_scan.go
+++ b/pkg/kv/kvserver/batcheval/cmd_scan.go
@@ -18,6 +18,7 @@ import (
        "github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/lock"
        "github.com/cockroachdb/cockroach/pkg/roachpb"
        "github.com/cockroachdb/cockroach/pkg/storage"
+       "github.com/cockroachdb/cockroach/pkg/util/log"
 )

 func init() {
@@ -30,7 +31,7 @@ func init() {
 // (MaxInt64 for no limit).
 func Scan(
        ctx context.Context, reader storage.Reader, cArgs CommandArgs, resp roachpb.Response,
-) (result.Result, error) {
+) (_ result.Result, retErr error) {
        args := cArgs.Args.(*roachpb.ScanRequest)
        h := cArgs.Header
        reply := resp.(*roachpb.ScanResponse)
@@ -47,6 +48,11 @@ func Scan(
                FailOnMoreRecent: args.KeyLocking != lock.None,
                Reverse:          false,
        }
+       defer func() {
+               if _, ok := retErr.(*roachpb.WriteIntentError); ok && opts.Inconsistent {
+                       log.Fatalf(ctx, "inconsistent scan found intent %+v %+v", opts, err)
+               }
+       }()

        switch args.ScanFormat {
        case roachpb.BATCH_RESPONSE:

Well, that's an interesting assertion to see fire. This is what's responsible for the expected latches held, found none panic. It's not clear whether this is a misunderstanding of MVCCScanOptions.Inconsistent on my end or a bug in MVCC.

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Apr 8, 2020

Huh, the error is coming from CollectIntentRows.

@nvb
Copy link
Copy Markdown
Contributor

nvb commented Apr 8, 2020

I've filed this as #47219.

@tbg tbg closed this Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants