Skip to content

kvserver: add lease preference metrics#107505

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
kvoli:230720.lease-preference-metrics
Jul 25, 2023
Merged

kvserver: add lease preference metrics#107505
craig[bot] merged 2 commits intocockroachdb:masterfrom
kvoli:230720.lease-preference-metrics

Conversation

@kvoli
Copy link
Copy Markdown
Contributor

@kvoli kvoli commented Jul 24, 2023

There were no existing metrics to monitor the lease preference
conformance. This commit adds two metric gauges:
leases.preferences.violating and leases.preferences.less-preferred.
These metrics are reported by the store.

leases.preferences.violating indicates the number of valid leases a
store owns, which satisfy none of the preferences applied.

leases.preferences.less-preferred indicates the number of valid leases
a store owns, which satisfy some of the preferences applied, but not the
first one.

For example, with a lease preference '[[+zone=a],[+zone=b]]', the
metric values with different leaseholders are:

leaseholder_locality="zone=c"
leases.preferences.less-preferred: 0
leases.preferences.violating: 1
leaseholder_locality="zone=b"
leases.preferences.less-preferred: 1
leases.preferences.violating: 0
leaseholder_locality="zone=a"
leases.preferences.less-preferred: 0
leases.preferences.violating: 0

When no preferences are applied, the lease is not counted in either
metric.

Epic: none

Informs: #106100

Release note (ops change): Introduce two new metrics to monitor lease
range preference conformance. leases.preferences.violating indicates
the number of valid leases a store owns, which satisfy none of the
preferences applied. leases.preferences.less-preferred indicates the
number of valid leases a store owns, which satisfy some of the
preferences applied, but not the first one.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Jul 24, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Previously, `ConjunctionsCheck` took a store descriptor as an input.
However, it only needed to know the store/node attributes and locality.
Some upcoming callers (lease acquisition) can't easily construct a full
store descriptor since the locking order would cause deadlocks.

This patch changes it to only take the attributes and locality instead
of the entire store descriptor, and renames it to `CheckConjunction()`.
It also adds `CheckStoreConjunction()` as a convenience method that
takes a store descriptor, and migrates all existing callers.

Epic: none
Release note: None
@kvoli kvoli force-pushed the 230720.lease-preference-metrics branch from f26c552 to 8ef4e36 Compare July 24, 2023 23:28
@kvoli kvoli marked this pull request as ready for review July 24, 2023 23:41
@kvoli kvoli requested a review from a team as a code owner July 24, 2023 23:41
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)


pkg/kv/kvserver/replica_range_lease.go line 1545 at r2 (raw file):

	// leasePreferencesViolating indicates the leaseholder does not
	// satisfy any lease preference applied.
	leasePreferencesViolating leasePreferencesStatus = iota

Let's add a leasePreferencesUnknown zero value.


pkg/kv/kvserver/replica_range_lease.go line 1586 at r2 (raw file):

) leasePreferencesStatus {
	if !leaseStatus.IsValid() || !leaseStatus.Lease.OwnedBy(storeID) {
		return leasePreferencesOK

This condition is a bit odd: even if some other node owns the lease, it may still violate the preferences.

I know it doesn't matter for the purposes of LeaseViolatesPreferences(), but if this is intended to be general, I think we should change this condition to return leasePreferencesUnknown. If the lease isn't valid, there may be a valid lease we don't know about, and if the lease is owned by a different store than the one we passed in attributes for then we can't know if it satisfies the preferences or not.

There were no existing metrics to monitor the lease preference
conformance. This commit adds two metric gauges:
`leases.preferences.violating` and `leases.preferences.less-preferred`.
These metrics are reported by the store.

`leases.preferences.violating` indicates the number of valid leases a
store owns, which satisfy none of the preferences applied.

`leases.preferences.less-preferred` indicates the number of valid leases
a store owns, which satisfy some of the preferences applied, but not the
first one.

For example, with a lease preference `'[[+zone=a],[+zone=b]]'`, the
metric values with different leaseholders are:

```
leaseholder_locality="zone=c"
leases.preferences.less-preferred: 0
leases.preferences.violating: 1
```

```
leaseholder_locality="zone=b"
leases.preferences.less-preferred: 1
leases.preferences.violating: 0
```

```
leaseholder_locality="zone=a"
leases.preferences.less-preferred: 0
leases.preferences.violating: 0
```

When no preferences are applied, the lease is not counted in either
metric.

Epic: none

Informs: cockroachdb#106100

Release note (ops change): Introduce two new metrics to monitor lease
range preference conformance. `leases.preferences.violating` indicates
the number of valid leases a store owns, which satisfy none of the
preferences applied. `leases.preferences.less-preferred` indicates the
number of valid leases a store owns, which satisfy some of the
preferences applied, but not the first one.
@kvoli kvoli force-pushed the 230720.lease-preference-metrics branch from 8ef4e36 to e393afb Compare July 25, 2023 13:54
Copy link
Copy Markdown
Contributor Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @erikgrinaker)


pkg/kv/kvserver/replica_range_lease.go line 1545 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Let's add a leasePreferencesUnknown zero value.

Done.


pkg/kv/kvserver/replica_range_lease.go line 1586 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

This condition is a bit odd: even if some other node owns the lease, it may still violate the preferences.

I know it doesn't matter for the purposes of LeaseViolatesPreferences(), but if this is intended to be general, I think we should change this condition to return leasePreferencesUnknown. If the lease isn't valid, there may be a valid lease we don't know about, and if the lease is owned by a different store than the one we passed in attributes for then we can't know if it satisfies the preferences or not.

I see what you mean, good idea. Added in leasePreferencesUnknown.

@kvoli
Copy link
Copy Markdown
Contributor Author

kvoli commented Jul 25, 2023

TYFTR

bors r=erikgrinaker

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 25, 2023

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 25, 2023

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 25, 2023

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 25, 2023

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 25, 2023

Build succeeded:

@craig craig bot merged commit b0a4853 into cockroachdb:master Jul 25, 2023
craig bot pushed a commit that referenced this pull request Jul 26, 2023
105441: server: notify tenant servers of metadata changes r=yuzefovich,stevendanna a=knz

Informs (towards fixing) #98431.
Fixes #105721.
Epic: CRDB-26691

See the individual commits for details.


106325: sql/parser: parse CREATE PROCEDURE r=mgartner a=mgartner

#### sql/parser: parse CREATE PROCEDURE

`CREATE PROCEDURE` statements can now be parsed by the SQL parser.
Currently, executing `CREATE PROCEDURE` will produce an unimplemented
error.

Epic: CRDB-25388

Release note: None

#### sem/tree: rename function AST nodes

This commit renames UDF-related AST nodes with "function" in the name to
use the more general term "routine". The term "routine" is inclusive of
both UDFs and procedures (e.g., Postgres implements a `DROP ROUTINE`
statement which drops both UDFs and procedures), which is fitting
because we'll be using the same AST nodes for both `CREATE FUNCTION` and
`CREATE PROCEDURE` statements.

Release note: None

#### sem/tree: rename udf.go to create_routine.go

Release note: None


107470: server, ccl: adjust more tests to work with test tenant r=yuzefovich a=yuzefovich

Informs: #76378.
Epic: CRDB-18499.


107507:  kvserver: transfer lease when acquiring lease outside preferences r=erikgrinaker,andrewbaptist a=kvoli

First 2 commits are from #107505.

When a leaseholder is lost, any surviving replica may acquire the lease, even if it violates lease preferences. There are two main reasons for this: we need to elect a new Raft leader who will acquire the lease, which is agnostic to lease preferences, and there may not even be any surviving replicas that satisfy the lease preferences at all, so we don't want to keep the range unavailable while we try to figure this out (considering e.g. network timeouts can delay this for many seconds).

However, after acquiring a lease, we rely on the replicate queue to transfer the lease back to a replica that conforms with the preferences, which can take several minutes. In multi-region clusters, this can cause severe latency degradation if the lease is acquired in a remote region.

This PR will detect lease preference violations when a replica acquires a new lease, and eagerly enqueue it in the replicate queue for transfer (if possible). If the first process fails, the replica will be sent to purgatory and retried soon after.

Additionally, two roachtests are added for lease preference conformance timing. The first test, `.../partial-first-preference-down`, takes down one of the preferred locality nodes (1/2). The second test, `.../full-first-preference-down`, takes down both the preferred locality nodes (2/2).

Resolves #106100.
Epic: none

Release note (bug fix): When losing a leaseholder and using lease preferences, the lease can be acquired by any other replica (regardless of lease preferences) in order to restore availability as soon as possible. The new leaseholder will now immediately check if it violates the lease preferences, and attempt to transfer the lease to a replica that satisfies the preferences if possible.


107512: sql: fix crdb_internal.encode_key in some contexts r=yuzefovich a=yuzefovich

Previously, we forgot to set `CatalogBuiltins` for the internal planner which is used by `crdb_internal.encode_key` (which, in turn, is used to power `SHOW RANGE ... FOR ROW`), so evaluating it would lead to an error. For example, the internal planner is used in the backfill. This is now fixed.

Fixes: #106397.

Release note (bug fix): CockroachDB would previously return an error when using `SHOW RANGE ... FOR ROW ...` in `CREATE TABLE ... AS ...` construct, and this is now fixed.

107537: workload/ycsb: default --read-modify-write-in-txn to false r=erikgrinaker a=nvanbenschoten

The `read-modify-write-in-txn` flag was added to ycsb in #103117. This commit changes the default of the flag from true to false. This makes the default configuration of ycsb more closely mirror the official ycsb implementation, to avoid confusion when external users run the workload and compare.

We saw in #103117 and in #107517 that this improves throughput substantially. We will expect to see the same in roachperf once this change is merged. After merging the PR, I will add a roachperf annotation.

Epic: None
Release note: None

107549: persistedsqlstats: add AOST clause to statement statistics size check r=maryliag a=THardy98

Epic: None

This change as an AOST clause to the statement statistics size check.
This helps mitigate contention on the `system.statement_statistics`
table which has caused the sql stats compaction job to fail.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Thomas Hardy <thardy@cockroachlabs.com>
kvoli added a commit to kvoli/cockroach that referenced this pull request Dec 19, 2023
Lease preference metrics were added in cockroachdb#107505, however were only
visible in DB Console via custom metrics. This commit adds the
`leases.preferences.violating` and `leases.preferences.less-preferred`
metrics to the replication dashboard under a new lease preferences
graph.

Resolves: cockroachdb#116678

Release note (ui change): Introduce a new lease preferences graph on the
replication dashboard. The lease preferences graph will display when the
current leaseholder is not the first lease preference and where the
current leaseholder satisfies no applied lease preference.
craig bot pushed a commit that referenced this pull request Dec 20, 2023
116709: ui: add lease preference metrics to replication dashboard r=andrewbaptist a=kvoli

Lease preference metrics were added in #107505, however were only
visible in DB Console via custom metrics. This commit adds the
`leases.preferences.violating` and `leases.preferences.less-preferred`
metrics to the replication dashboard under a new lease preferences
graph.

Resolves: #116678

Release note (ui change): Introduce a new lease preferences graph on the
replication dashboard. The lease preferences graph will display when the
current leaseholder is not the first lease preference and where the
current leaseholder satisfies no applied lease preference.

116888: roachtest/cdc: disable distribution strategy in old versions  r=jayshrivastava a=jayshrivastava

oachtest/cdc: disable distribution strategy in old versions
The `DistributionStrategy` feature flag was added to randomly set the
`changefeed.default_range_distribution_strategy` cluster setting. Since this setting
was added in 23.2, this feature flag should be disabled when testing versions < 23.2.

This change updates the feature flag to not be set in versions < 23.2. It also
refactors how enum feature flags work: they should use the `featureUnset`, `featureDisabled`,
and `featureEnabled` states. For enum feature flags, unset and disabled can be considered
equivalent (ie. for a cluster setting which is an enum, don't set a value). In the future,
this can be changed to add a specific case for disabled. For now, this suffices.

Closes: #116865
Release note: None
Epic: None

Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants