Skip to content

kvnemesis,kvserver: add coverage for illegal lease applied index and reproposal errors#105272

Merged
craig[bot] merged 8 commits intocockroachdb:masterfrom
tbg:kvnemesis-reproposals
Jun 27, 2023
Merged

kvnemesis,kvserver: add coverage for illegal lease applied index and reproposal errors#105272
craig[bot] merged 8 commits intocockroachdb:masterfrom
tbg:kvnemesis-reproposals

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Jun 21, 2023

This adds testing knobs that allow probabilistically invalidating the
MaxLeaseIndex of proposals in the replication layer, and also allows
injecting errors into the reproposals that result from doing so.o

It then uses these knobs in kvnemesis and in a newly added "more targeted"
kvserver unit test. This unit test was co-developed with the kvnemesis
flavor, and various amounts of cross-pollination have ensued. The idea is that
kvnemesis produces "more randomness" and thus has a higher chance of finding
interesting behavior, while the kvserver test is more suitable to
reproducing/regression testing that behavior. For example, tracing initially
caused problems in kvnemesis, so I added tracking to the kvserver test, and
sorted out the problems this encountered (which in turn addressed the problem
in kvnemesis as well).

I am adding this testing as part of my work on
#97779. Because the test is
agnostic to how exactly reproposals are implemented, I am landing it
separately which will make it very clear that it can pass without these
changes (and once did). With #97779, the plan is to metamorphically swap
the implementations to retain coverage for both, until we are willing to
commit to the new method in #97779.

Release note: None
Epic: CRDB-25287

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jun 21, 2023

No good deed goes unpunished, the new testing is bringing out some weird results:

Details
--- FAIL: TestKVNemesisSingleNode_ReproposalChaos (10.98s)
    test_log_scope.go:167: test logs captured to: /Users/tobias/go/src/github.com/cockroachdb/cockroach/tmp/_tmp/3071ecfd607c2dcda710d9ffa8fdef40/logTestKVNemesisSingleNode_ReproposalChaos720228959
    test_log_scope.go:81: use -show-logs to present logs inline
    kvnemesis_test.go:278: seed: 4043305725150643922
    kvnemesis_test.go:192: kvnemesis logging to /Users/tobias/go/src/github.com/cockroachdb/cockroach/tmp/_tmp/3071ecfd607c2dcda710d9ffa8fdef40/kvnemesis3540977959
    kvnemesis.go:168: error applying x.Scan(ctx, tk(13779811691630448898), tk(17053492883382310021), 0) // failed to push meta={id=0d4951ff key=/Table/100/"d63f744e684afbd2" iso=ReadCommitted pri=0.04459242 epo=5 ts=1687351579.475751000,0 min=1687351579.468285000,0 seq=3} lock=true stat=PENDING rts=0,0 wto=false gul=0,0: {<nil> 0 {0x14000723090} index:0  1687351579.500642000,0}
    kvnemesis.go:180: could not scan actual latest values: injected error
        (1)
          | (opaque error wrapper)
          | type name: github.com/cockroachdb/errors/withstack/*withstack.withStack
          | reportable 0:
          |
          | github.com/cockroachdb/cockroach/pkg/kv/kvnemesis.init
          |     github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis.go:30
          | runtime.doInit
          |     GOROOT/src/runtime/proc.go:6349
          | runtime.doInit
          |     GOROOT/src/runtime/proc.go:6326
          | runtime.main
          |     GOROOT/src/runtime/proc.go:233
          | runtime.goexit
          |     GOROOT/src/runtime/asm_arm64.s:1172
        Wraps: (2) injected error
        Error types: (1) *errbase.opaqueWrapper (2) *errutil.leafError
    kvnemesis.go:188: failures(verbose): /Users/tobias/go/src/github.com/cockroachdb/cockroach/tmp/_tmp/3071ecfd607c2dcda710d9ffa8fdef40/kvnemesis3540977959/failures
        repro steps: /Users/tobias/go/src/github.com/cockroachdb/cockroach/tmp/_tmp/3071ecfd607c2dcda710d9ffa8fdef40/kvnemesis3540977959/repro.go
        rangefeed KVs: /Users/tobias/go/src/github.com/cockroachdb/cockroach/tmp/_tmp/3071ecfd607c2dcda710d9ffa8fdef40/kvnemesis3540977959/kvs-rangefeed.txt
        scan KVs: 
    kvnemesis_test.go:305: 
                Error Trace:    github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:305
                                                        github.com/cockroachdb/cockroach/pkg/kv/kvnemesis/pkg/kv/kvnemesis/kvnemesis_test.go:242
                Error:          Should be zero, but was 1
                Test:           TestKVNemesisSingleNode_ReproposalChaos
                Messages:       kvnemesis detected failures
    panic.go:522: -- test log scope end --

I am not sure how errInjected even ends up in a Scan. Need to look at this.

Edit: ah, better logging shows:

    kvnemesis.go:180: could not scan actual latest values: on 1 ResolveIntent at /Table/100/"e69071dc080bf2f5": injected error

Edit: I don't understand the push error though, this is a *TransactionPushError and we only ever create those in evaluating PushTxn. So why are we failing a push on a db.Get in a way that bubbles all the way up? I assumed that shouldn't be possible. And how is it only occurring when I inject reproposals+reprop errors? This doesn't repro (readily) on master.

edit: filed #105330.
edit: added a workaround.
edit: also found another new issues, filed #105332. That one is rarer.

@tbg tbg force-pushed the kvnemesis-reproposals branch from bd731ea to c8e86ba Compare June 21, 2023 13:16
tbg added a commit to tbg/cockroach that referenced this pull request Jun 21, 2023
@tbg tbg force-pushed the kvnemesis-reproposals branch from c8e86ba to b2531ef Compare June 22, 2023 10:28
@tbg tbg requested a review from erikgrinaker June 22, 2023 11:19
@tbg tbg marked this pull request as ready for review June 22, 2023 11:20
@tbg tbg requested a review from a team June 22, 2023 11:20
@tbg tbg requested a review from a team as a code owner June 22, 2023 11:20
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to have coverage of this!

})
}

func TestKVNemesisSingleNode_ReproposalChaos(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it we don't really need any long-running nightlies for this, it's sufficient with a few runs as part of regular stress runs?

tbg added 5 commits June 26, 2023 17:35
This allows setting a LeaseFilter from outside the `kvserver` package.
Epic: none
Release note: none
This test can shake out bugs in the reproposal pipeline by actively introducing
reproposals.

Epic: none
Release note: none
It hasn't bit us yet, but with upcoming code changes we will see more situations
in which a batch will hard-fail. There were a few places in
`ClosureTxnOperation` where this would leave the remaining operations in the txn
with uninitialized results. `kvnemesis` requires all results to be explicit
(i.e. success or error). Now, extending the previous but incomplete approach,
they are all set to `errOmitted`.
@tbg tbg force-pushed the kvnemesis-reproposals branch from b2531ef to f2da8d3 Compare June 26, 2023 15:35
tbg added 3 commits June 27, 2023 10:50
This introduces a flavor of single-node kvnemesis where KV mutations randomly
bounce around in the replication layer a few times before actually applying.
We're frequently getting a TransactionPushError back from `db.{Get,Scan,...}`.

Accept this as "ok" while we investigate the failure, to avoid delaying merging
the kvnemesis updates.

cockroachdb#105330
@tbg tbg force-pushed the kvnemesis-reproposals branch from f2da8d3 to e3bafec Compare June 27, 2023 08:51
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jun 27, 2023

Here goes, TFTR!

bors r=erikgrinaker

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 27, 2023

Build succeeded:

@craig craig bot merged commit 19b8fca into cockroachdb:master Jun 27, 2023
tbg added a commit to tbg/cockroach that referenced this pull request Jun 28, 2023
This test can't be salvaged and besides, hasn't been run in >1.5 years. We just
merged decent enough coverage in cockroachdb#105272, though.

Closes cockroachdb#71148.

Epic: none
Release note: None
craig bot pushed a commit that referenced this pull request Jun 28, 2023
105622: kvserver: rm TestProposalNotAcknowledgedOrReproposedAfterApplication r=erikgrinaker a=tbg

This test can't be salvaged and besides, hasn't been run in >1.5 years. We just
merged decent enough coverage in #105272, though.

Closes #71148.

Epic: none
Release note: None


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants