Skip to content

kv: operations on a range can stall during lease changes when under heavy load #32367

@ajwerner

Description

@ajwerner

Description
When a lease range is under very heavy load and a lease change occurs, throughput on the range can stall for minutes at a time. It seems like requests which pile up are too high to

To Reproduce

The stall was observed while running the kv workload at high concurrency on large machines. The stall can be reliably reproduced with the following steps:

  1. Set up CockroachDB with 3 database nodes on large nodes (I was using 64 core, doesn't seem to really reproduce on 4 core machines)
  2. Put load on the cluster using the following workload:
    roachprod run ${CLUSTER}:4 -- ./workload run kv '{pgurl:1-3}' --init --read-percent=95 --concurrency=1024
  3. While the cluster in under load, execute the following SQL command to trigger lease change:
    ALTER TABLE kv.kv SCATTER;
  4. Watch throughput tank and the command hang

Expected behavior
The scatter should complete in a timely manner and the throughput should not be adversely affected for a long period of time.

Additional data / screenshots

The hypothesis is that when load is to high, by the time the newly elected leader catches up to recognize that it is the leader, there are too many outstanding requests for it to catch up. Adding backoff in DistSender when NotLeaseHolderError is encountered seems to resolve the problem. The below small diff has been verified as resolving the problem for the above reproduction steps.

diff --git a/pkg/kv/dist_sender.go b/pkg/kv/dist_sender.go
index 1ccf87b11a..8822551bdd 100644
--- a/pkg/kv/dist_sender.go
+++ b/pkg/kv/dist_sender.go
@@ -1324,6 +1324,7 @@ func (ds *DistSender) sendToReplicas(
        }
        br, err := transport.SendNext(ctx, ba)

+       r := retry.StartWithCtx(ctx, ds.rpcRetryOptions)
        // This loop will retry operations that fail with errors that reflect
        // per-replica state and may succeed on other replicas.
        for {
@@ -1388,6 +1389,7 @@ func (ds *DistSender) sendToReplicas(
                                // via the NotLeaseHolderError or nil error paths, both of which update the                                                                                                       
                                // leaseholder cache.
                        case *roachpb.NotLeaseHolderError:
+                               r.Next()
                                ds.metrics.NotLeaseHolderErrCount.Inc(1)
                                if lh := tErr.LeaseHolder; lh != nil {
                                        // Update the leaseholder cache. Naively this would also happen when the 

@tbg, @nvanbenschoten mentioned you were making changes here recently and might want to take a look.

Environment:

  • CockroachDB version v2.2.0-alpha.00000000-2316-ga25f97f
  • Server OS: Ubuntu 16.04
  • Client app: workload
  • Server: GCE n1-standard-64 machines

Additional context

The problem was initially thought to relate to #22837 but this seems to be more somewhat more specific.

Metadata

Metadata

Assignees

Labels

A-kv-clientRelating to the KV client and the KV interface.C-investigationFurther steps needed to qualify. C-label will change.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions