-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: kvnemesis can thrash and livelock on multiple concurrent Range merges #46639
Copy link
Copy link
Labels
A-kv-distributionRelating to rebalancing and leasing.Relating to rebalancing and leasing.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Description
When stressing TestKVNemesisSingleNode with a short timeout (2-5 minutes), it's easy to see the test time out. This appears to be due to thrashing within the Range merge transaction.
The Range merge transaction does not use the standard transaction retry mechanism (i.e. epochs). Instead, it uses a completely separate transaction when restarting due to retry errors. This leaves room for thrashing and livelock if multiple transactions keep stepping on each other's toes.
make roachprod-stress PKG=./pkg/kv/kvnemesis TESTS=TestKVNemesisSingleNode TESTTIMEOUT=2m TESTFLAGS='-v -show-logs' STRESSFLAGS='-stderr -maxfails 1' CLUSTER=<cluster-name>
Reactions are currently unavailable
Metadata
Metadata
Labels
A-kv-distributionRelating to rebalancing and leasing.Relating to rebalancing and leasing.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.