kvnemesis,kvserver: add coverage for illegal lease applied index and reproposal errors#105272
Conversation
|
No good deed goes unpunished, the new testing is bringing out some weird results: DetailsI am not sure how Edit: ah, better logging shows: Edit: I don't understand the push error though, this is a edit: filed #105330. |
bd731ea to
c8e86ba
Compare
These are in cockroachdb#105272.
c8e86ba to
b2531ef
Compare
erikgrinaker
left a comment
There was a problem hiding this comment.
Great to have coverage of this!
| }) | ||
| } | ||
|
|
||
| func TestKVNemesisSingleNode_ReproposalChaos(t *testing.T) { |
There was a problem hiding this comment.
I take it we don't really need any long-running nightlies for this, it's sufficient with a few runs as part of regular stress runs?
Epic: none Release note: none
This allows setting a LeaseFilter from outside the `kvserver` package.
Epic: none Release note: none
This test can shake out bugs in the reproposal pipeline by actively introducing reproposals. Epic: none Release note: none
It hasn't bit us yet, but with upcoming code changes we will see more situations in which a batch will hard-fail. There were a few places in `ClosureTxnOperation` where this would leave the remaining operations in the txn with uninitialized results. `kvnemesis` requires all results to be explicit (i.e. success or error). Now, extending the previous but incomplete approach, they are all set to `errOmitted`.
b2531ef to
f2da8d3
Compare
This introduces a flavor of single-node kvnemesis where KV mutations randomly bounce around in the replication layer a few times before actually applying.
We're frequently getting a TransactionPushError back from `db.{Get,Scan,...}`.
Accept this as "ok" while we investigate the failure, to avoid delaying merging
the kvnemesis updates.
cockroachdb#105330
f2da8d3 to
e3bafec
Compare
|
Here goes, TFTR! bors r=erikgrinaker |
|
Build succeeded: |
This test can't be salvaged and besides, hasn't been run in >1.5 years. We just merged decent enough coverage in cockroachdb#105272, though. Closes cockroachdb#71148. Epic: none Release note: None
105622: kvserver: rm TestProposalNotAcknowledgedOrReproposedAfterApplication r=erikgrinaker a=tbg This test can't be salvaged and besides, hasn't been run in >1.5 years. We just merged decent enough coverage in #105272, though. Closes #71148. Epic: none Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
This adds testing knobs that allow probabilistically invalidating the
MaxLeaseIndexof proposals in the replication layer, and also allowsinjecting errors into the reproposals that result from doing so.o
It then uses these knobs in kvnemesis and in a newly added "more targeted"
kvserverunit test. This unit test was co-developed with the kvnemesisflavor, and various amounts of cross-pollination have ensued. The idea is that
kvnemesis produces "more randomness" and thus has a higher chance of finding
interesting behavior, while the kvserver test is more suitable to
reproducing/regression testing that behavior. For example, tracing initially
caused problems in kvnemesis, so I added tracking to the kvserver test, and
sorted out the problems this encountered (which in turn addressed the problem
in kvnemesis as well).
I am adding this testing as part of my work on
#97779. Because the test is
agnostic to how exactly reproposals are implemented, I am landing it
separately which will make it very clear that it can pass without these
changes (and once did). With #97779, the plan is to metamorphically swap
the implementations to retain coverage for both, until we are willing to
commit to the new method in #97779.
Release note: None
Epic: CRDB-25287