kvserver: add lease preference metrics#107505
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously, `ConjunctionsCheck` took a store descriptor as an input. However, it only needed to know the store/node attributes and locality. Some upcoming callers (lease acquisition) can't easily construct a full store descriptor since the locking order would cause deadlocks. This patch changes it to only take the attributes and locality instead of the entire store descriptor, and renames it to `CheckConjunction()`. It also adds `CheckStoreConjunction()` as a convenience method that takes a store descriptor, and migrates all existing callers. Epic: none Release note: None
f26c552 to
8ef4e36
Compare
erikgrinaker
left a comment
There was a problem hiding this comment.
Reviewed 7 of 7 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)
pkg/kv/kvserver/replica_range_lease.go line 1545 at r2 (raw file):
// leasePreferencesViolating indicates the leaseholder does not // satisfy any lease preference applied. leasePreferencesViolating leasePreferencesStatus = iota
Let's add a leasePreferencesUnknown zero value.
pkg/kv/kvserver/replica_range_lease.go line 1586 at r2 (raw file):
) leasePreferencesStatus { if !leaseStatus.IsValid() || !leaseStatus.Lease.OwnedBy(storeID) { return leasePreferencesOK
This condition is a bit odd: even if some other node owns the lease, it may still violate the preferences.
I know it doesn't matter for the purposes of LeaseViolatesPreferences(), but if this is intended to be general, I think we should change this condition to return leasePreferencesUnknown. If the lease isn't valid, there may be a valid lease we don't know about, and if the lease is owned by a different store than the one we passed in attributes for then we can't know if it satisfies the preferences or not.
There were no existing metrics to monitor the lease preference conformance. This commit adds two metric gauges: `leases.preferences.violating` and `leases.preferences.less-preferred`. These metrics are reported by the store. `leases.preferences.violating` indicates the number of valid leases a store owns, which satisfy none of the preferences applied. `leases.preferences.less-preferred` indicates the number of valid leases a store owns, which satisfy some of the preferences applied, but not the first one. For example, with a lease preference `'[[+zone=a],[+zone=b]]'`, the metric values with different leaseholders are: ``` leaseholder_locality="zone=c" leases.preferences.less-preferred: 0 leases.preferences.violating: 1 ``` ``` leaseholder_locality="zone=b" leases.preferences.less-preferred: 1 leases.preferences.violating: 0 ``` ``` leaseholder_locality="zone=a" leases.preferences.less-preferred: 0 leases.preferences.violating: 0 ``` When no preferences are applied, the lease is not counted in either metric. Epic: none Informs: cockroachdb#106100 Release note (ops change): Introduce two new metrics to monitor lease range preference conformance. `leases.preferences.violating` indicates the number of valid leases a store owns, which satisfy none of the preferences applied. `leases.preferences.less-preferred` indicates the number of valid leases a store owns, which satisfy some of the preferences applied, but not the first one.
8ef4e36 to
e393afb
Compare
kvoli
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @erikgrinaker)
pkg/kv/kvserver/replica_range_lease.go line 1545 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Let's add a
leasePreferencesUnknownzero value.
Done.
pkg/kv/kvserver/replica_range_lease.go line 1586 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
This condition is a bit odd: even if some other node owns the lease, it may still violate the preferences.
I know it doesn't matter for the purposes of
LeaseViolatesPreferences(), but if this is intended to be general, I think we should change this condition to returnleasePreferencesUnknown. If the lease isn't valid, there may be a valid lease we don't know about, and if the lease is owned by a different store than the one we passed in attributes for then we can't know if it satisfies the preferences or not.
I see what you mean, good idea. Added in leasePreferencesUnknown.
|
TYFTR bors r=erikgrinaker |
|
This PR was included in a batch that was canceled, it will be automatically retried |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build succeeded: |
105441: server: notify tenant servers of metadata changes r=yuzefovich,stevendanna a=knz Informs (towards fixing) #98431. Fixes #105721. Epic: CRDB-26691 See the individual commits for details. 106325: sql/parser: parse CREATE PROCEDURE r=mgartner a=mgartner #### sql/parser: parse CREATE PROCEDURE `CREATE PROCEDURE` statements can now be parsed by the SQL parser. Currently, executing `CREATE PROCEDURE` will produce an unimplemented error. Epic: CRDB-25388 Release note: None #### sem/tree: rename function AST nodes This commit renames UDF-related AST nodes with "function" in the name to use the more general term "routine". The term "routine" is inclusive of both UDFs and procedures (e.g., Postgres implements a `DROP ROUTINE` statement which drops both UDFs and procedures), which is fitting because we'll be using the same AST nodes for both `CREATE FUNCTION` and `CREATE PROCEDURE` statements. Release note: None #### sem/tree: rename udf.go to create_routine.go Release note: None 107470: server, ccl: adjust more tests to work with test tenant r=yuzefovich a=yuzefovich Informs: #76378. Epic: CRDB-18499. 107507: kvserver: transfer lease when acquiring lease outside preferences r=erikgrinaker,andrewbaptist a=kvoli First 2 commits are from #107505. When a leaseholder is lost, any surviving replica may acquire the lease, even if it violates lease preferences. There are two main reasons for this: we need to elect a new Raft leader who will acquire the lease, which is agnostic to lease preferences, and there may not even be any surviving replicas that satisfy the lease preferences at all, so we don't want to keep the range unavailable while we try to figure this out (considering e.g. network timeouts can delay this for many seconds). However, after acquiring a lease, we rely on the replicate queue to transfer the lease back to a replica that conforms with the preferences, which can take several minutes. In multi-region clusters, this can cause severe latency degradation if the lease is acquired in a remote region. This PR will detect lease preference violations when a replica acquires a new lease, and eagerly enqueue it in the replicate queue for transfer (if possible). If the first process fails, the replica will be sent to purgatory and retried soon after. Additionally, two roachtests are added for lease preference conformance timing. The first test, `.../partial-first-preference-down`, takes down one of the preferred locality nodes (1/2). The second test, `.../full-first-preference-down`, takes down both the preferred locality nodes (2/2). Resolves #106100. Epic: none Release note (bug fix): When losing a leaseholder and using lease preferences, the lease can be acquired by any other replica (regardless of lease preferences) in order to restore availability as soon as possible. The new leaseholder will now immediately check if it violates the lease preferences, and attempt to transfer the lease to a replica that satisfies the preferences if possible. 107512: sql: fix crdb_internal.encode_key in some contexts r=yuzefovich a=yuzefovich Previously, we forgot to set `CatalogBuiltins` for the internal planner which is used by `crdb_internal.encode_key` (which, in turn, is used to power `SHOW RANGE ... FOR ROW`), so evaluating it would lead to an error. For example, the internal planner is used in the backfill. This is now fixed. Fixes: #106397. Release note (bug fix): CockroachDB would previously return an error when using `SHOW RANGE ... FOR ROW ...` in `CREATE TABLE ... AS ...` construct, and this is now fixed. 107537: workload/ycsb: default --read-modify-write-in-txn to false r=erikgrinaker a=nvanbenschoten The `read-modify-write-in-txn` flag was added to ycsb in #103117. This commit changes the default of the flag from true to false. This makes the default configuration of ycsb more closely mirror the official ycsb implementation, to avoid confusion when external users run the workload and compare. We saw in #103117 and in #107517 that this improves throughput substantially. We will expect to see the same in roachperf once this change is merged. After merging the PR, I will add a roachperf annotation. Epic: None Release note: None 107549: persistedsqlstats: add AOST clause to statement statistics size check r=maryliag a=THardy98 Epic: None This change as an AOST clause to the statement statistics size check. This helps mitigate contention on the `system.statement_statistics` table which has caused the sql stats compaction job to fail. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Thomas Hardy <thardy@cockroachlabs.com>
Lease preference metrics were added in cockroachdb#107505, however were only visible in DB Console via custom metrics. This commit adds the `leases.preferences.violating` and `leases.preferences.less-preferred` metrics to the replication dashboard under a new lease preferences graph. Resolves: cockroachdb#116678 Release note (ui change): Introduce a new lease preferences graph on the replication dashboard. The lease preferences graph will display when the current leaseholder is not the first lease preference and where the current leaseholder satisfies no applied lease preference.
116709: ui: add lease preference metrics to replication dashboard r=andrewbaptist a=kvoli Lease preference metrics were added in #107505, however were only visible in DB Console via custom metrics. This commit adds the `leases.preferences.violating` and `leases.preferences.less-preferred` metrics to the replication dashboard under a new lease preferences graph. Resolves: #116678 Release note (ui change): Introduce a new lease preferences graph on the replication dashboard. The lease preferences graph will display when the current leaseholder is not the first lease preference and where the current leaseholder satisfies no applied lease preference. 116888: roachtest/cdc: disable distribution strategy in old versions r=jayshrivastava a=jayshrivastava oachtest/cdc: disable distribution strategy in old versions The `DistributionStrategy` feature flag was added to randomly set the `changefeed.default_range_distribution_strategy` cluster setting. Since this setting was added in 23.2, this feature flag should be disabled when testing versions < 23.2. This change updates the feature flag to not be set in versions < 23.2. It also refactors how enum feature flags work: they should use the `featureUnset`, `featureDisabled`, and `featureEnabled` states. For enum feature flags, unset and disabled can be considered equivalent (ie. for a cluster setting which is an enum, don't set a value). In the future, this can be changed to add a specific case for disabled. For now, this suffices. Closes: #116865 Release note: None Epic: None Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
There were no existing metrics to monitor the lease preference
conformance. This commit adds two metric gauges:
leases.preferences.violatingandleases.preferences.less-preferred.These metrics are reported by the store.
leases.preferences.violatingindicates the number of valid leases astore owns, which satisfy none of the preferences applied.
leases.preferences.less-preferredindicates the number of valid leasesa store owns, which satisfy some of the preferences applied, but not the
first one.
For example, with a lease preference
'[[+zone=a],[+zone=b]]', themetric values with different leaseholders are:
When no preferences are applied, the lease is not counted in either
metric.
Epic: none
Informs: #106100
Release note (ops change): Introduce two new metrics to monitor lease
range preference conformance.
leases.preferences.violatingindicatesthe number of valid leases a store owns, which satisfy none of the
preferences applied.
leases.preferences.less-preferredindicates thenumber of valid leases a store owns, which satisfy some of the
preferences applied, but not the first one.