-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Leaseholders temporarily move outside of regional table's configured region #83951
Copy link
Copy link
Closed
Closed
Copy link
Labels
A-kv-distributionRelating to rebalancing and leasing.Relating to rebalancing and leasing.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-kvKV TeamKV Team
Description
Describe the problem
Given a regional table, leases sometimes transfer out of the table's region, even when sufficient resources (nodes, CPU, etc.) are available in the region.
This leads to unexpectedly higher SQL service latencies during the period that the leases are not in the primary region.
To Reproduce
- Setup a 3-region cluster with 3 nodes in each region
For example, using roachprod:
roachprod create jon-multiregion -n 9 --clouds=gce \
--gce-zones=us-east1-b:1,us-east1-c:1,us-east1-d:1,us-east4-a:1,us-east4-b:1,us-east4-c:1,us-west1-a:1,us-west1-b:1,us-west1-c:1 \
--gce-machine-type=n1-standard-4
- Stage a
22.1release
roachprod stage jon-multiregion release v22.1.2
- Start the cluster
roachprod start jon-multiregion
- Setup a multi-region database and a simple table:
CREATE DATABASE birdie;
ALTER DATABASE birdie SET PRIMARY REGION "us-east1";
ALTER DATABASE birdie ADD REGION "us-west1";
ALTER DATABASE birdie ADD REGION "us-east4";
ALTER DATABASE birdie SURVIVE REGION FAILURE;
CREATE TABLE birdie.t (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
v STRING
);
- Get external IPs for a node in each region (we'll only use
us-east1):
export PGURL_USEAST1=`roachprod pgurl -q --external $CLUSTER:1 | tr -d \'`
export PGURL_USEAST4=`roachprod pgurl -q --external $CLUSTER:4 | tr -d \'`
export PGURL_USWEST1=`roachprod pgurl -q --external $CLUSTER:7 | tr -d \'`
- Monitor leaseholders outside of the primary region (queries
crdb_internal.rangesevery 5 seconds):
cockroach sql --insecure --url "$PGURL_USEAST1" --execute "SELECT now(), range_id, node_id, node_id = lease_holder as is_leasholder, ARRAY[node_id] <@ voting_replicas as voter, locality, database_name, table_name, start_pretty, end_pretty FROM [ SELECT start_pretty, end_pretty, range_id, unnest(replicas) as node_id, unnest(replica_localities) as locality, lease_holder, voting_replicas, database_name, table_name FROM [ SELECT * FROM crdb_internal.ranges WHERE database_name = 'birdie' ] GROUP BY range_id, node_id, lease_holder, start_pretty, end_pretty, locality, voting_replicas, database_name, table_name ] WHERE node_id = lease_holder and locality not like 'cloud=gce,region=us-east1%' ORDER BY range_id ASC, node_id = lease_holder DESC, ARRAY[node_id] <@ voting_replicas ASC;" --watch 5s
- Run concurrent INSERTs into
us-east1(I used a simple go program with 500 concurrent INSERTs):
INSERT INTO birdie.t (v) VALUES ('test')
- Observe the output of the leaseholder monitoring query. Leaseholders will temporarily appear outside of the primary region and eventually move back.
Expected behavior
Given sufficient resources in the primary region (nodes, CPU, etc.), the leaseholders should never move outside of the primary region.
Additional data / screenshots
Zone configuration for table birdie.t:
> SELECT raw_config_sql FROM [SHOW ZONE CONFIGURATION FOR TABLE birdie.t];
raw_config_sql
----------------------------------------------------------------------------------------
ALTER DATABASE birdie CONFIGURE ZONE USING
range_min_bytes = 134217728,
range_max_bytes = 536870912,
gc.ttlseconds = 90000,
num_replicas = 5,
num_voters = 5,
constraints = '{+region=us-east1: 1, +region=us-east4: 1, +region=us-west1: 1}',
voter_constraints = '{+region=us-east1: 2}',
lease_preferences = '[[+region=us-east1]]'
Example range monitoring query output after about 30 seconds of running INSERTs:
now | range_id | node_id | is_leasholder | voter | locality | database_name | table_name | start_pretty | end_pretty
--------------------------------+----------+---------+---------------+-------+-------------------------------------------+---------------+------------+------------------------------------------------------------------+-------------------------------------------------------------
2022-07-06 17:50:06.420554+00 | 60 | 5 | true | true | cloud=gce,region=us-east4,zone=us-east4-b | birdie | t | /Table/108/1/"\xbe\x991\xaa\x8b\x9dDk\x97\xe5\xe8\xad\xf7\x84Lp" | /Table/108/1/"\xe2\x8fIk#\xe2Ob\x9e\x1f\xda\xcd<\x8e\x8aw"
(1 row)
Lease history from range_id 60 indicating a cooperative lease transfer:
Environment:
- CockroachDB version 22.1.2
- Server OS: Linux
- Client app: Go/pgx, cockroach sql
Jira issue: CRDB-17367
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-kv-distributionRelating to rebalancing and leasing.Relating to rebalancing and leasing.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-kvKV TeamKV Team
