Skip to content

server/status: selective metric export#79021

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
darinpp:prometheus-exporter-select
Apr 5, 2022
Merged

server/status: selective metric export#79021
craig[bot] merged 1 commit intocockroachdb:masterfrom
darinpp:prometheus-exporter-select

Conversation

@darinpp
Copy link
Copy Markdown
Contributor

@darinpp darinpp commented Mar 30, 2022

Previously PrometheusExporter could only export all the metrics in a
registry without ability to select a subset. For serverless we use a
separate metric endpoint (_status/load) that currently shows cpu
utilization metrics that are generated each time the metrics are
pulled. We need however some additional metrics that are currently
tracked by MetricRecorder. Exporting all the metrics tracked by the
MetricRecorder is not desirable, as this incurs performance penalty
given the higher poll rate on the load endpoint.
So this PR modifies PrometheusExporter to only scrape a subset of all
the metrics.
A second change is how the locking is done when scraping and writing the
scraped output. Previously the lock when doing that was external and
was a responsibility of the caller. This PR adds a ScrapeAndPrintAsText
method to the exporter that is thread safe and does the locking
internally.

Release justification: Low risk, high reward changes to existing functionality
Release note: None

Jira issue: CRDB-14803

@darinpp darinpp requested a review from a team March 30, 2022 04:11
@darinpp darinpp requested review from a team as code owners March 30, 2022 04:11
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - looking good. A couple small comments but otherwise :lgtm:

As an aside, @darinpp you mentioned elsewhere the idea of updating /_status/vars to take in a metric filter argument. This would allow us to do away with /_status/load altogether in favor of using the existing endpoint with a filter. I'm curious - do we expect the metric set required by /_status/load for use by the auto-scaler to change somewhat frequently? If so, your suggested approach sounds like it could enable much faster iteration cycles when it comes to tweaking the metrics used by the auto-scaler.

cc @dhartunian @rimadeodhar, curious to hear your thoughts on this. If ya'll think it's a good idea, we can create an issue.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @darinpp and @knz)


pkg/util/metric/prometheus_exporter.go, line 25 at r1 (raw file):

// PrometheusExporter contains a map of metric families (a metric with multiple labels).
// It initializes each metric family once and reuses it for each prometheus scrape.
// It is NOT thread-safe.

nit: does this need to be updated? Can we also add some information about the selection functionality?

Code quote:

// It is NOT thread-safe.

pkg/util/metric/prometheus_exporter.go, line 110 at r1 (raw file):

// prometheus' text format. It removes individual metrics from the families
// as it goes, readying the families for another found of registry additions.
func (pm *PrometheusExporter) PrintAsText(w io.Writer) error {

Do we still need this function? Or can we remove it from the *PrometheusExporter's exported API now that we have ScrapeAndPrintAsText?

Code quote:

func (pm *PrometheusExporter) PrintAsText(w io.Writer) error {

@abarganier
Copy link
Copy Markdown
Contributor

cc @dhartunian @rimadeodhar, curious to hear your thoughts on this. If ya'll think it's a good idea, we can create an issue.

Just an update - we discussed this during our team weekly and think it's a good idea. Let's continue the discussion here: #79388

Previously PrometheusExporter could only export all the metrics in a
registry without ability to select a subset. For serverless we use a
separate metric endpoint (_status/load)  that currently shows cpu
utilization metrics that are generated each time the metrics are
pulled. We need however some additional metrics that are currently
tracked by MetricRecorder. Exporting all the metrics tracked by the
MetricRecorder is not desirables as this incurs performabnce penalty
given the higher poll rate on the load endpoint.
So this PR modifies PrometheusExporter to only scrape a subset of all
the metrics.
A second change is how the locking is done when scraping and writing the
screaped output. Previously the lock when doing that was external and
was a responsibility of the caller. This PR adds a ScrapeAndPrintAsText
method to the exporter that is thread safe and does the locking
internally.

Release justification: Low risk, high reward changes to existing functionality
Release note: None
@darinpp darinpp force-pushed the prometheus-exporter-select branch from 981e3a8 to a3690d8 Compare April 4, 2022 22:13
@darinpp darinpp requested a review from a team as a code owner April 4, 2022 22:13
Copy link
Copy Markdown
Contributor Author

@darinpp darinpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier and @knz)


pkg/util/metric/prometheus_exporter.go, line 25 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

nit: does this need to be updated? Can we also add some information about the selection functionality?

Changed and added a comment about selecting metrics.


pkg/util/metric/prometheus_exporter.go, line 110 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

Do we still need this function? Or can we remove it from the *PrometheusExporter's exported API now that we have ScrapeAndPrintAsText?

I made it private and changed the few remaining places that were calling it directly.

@darinpp
Copy link
Copy Markdown
Contributor Author

darinpp commented Apr 4, 2022

bors r+

@darinpp
Copy link
Copy Markdown
Contributor Author

darinpp commented Apr 4, 2022

As an aside, @darinpp you mentioned elsewhere the idea of updating /_status/vars to take in a metric filter argument. This would allow us to do away with /_status/load altogether in favor of using the existing endpoint with a filter. I'm curious - do we expect the metric set required by /_status/load for use by the auto-scaler to change somewhat frequently? If so, your suggested approach sounds like it could enable much faster iteration cycles when it comes to tweaking the metrics used by the auto-scaler.

I don't expect the metrics to change frequently. For the most part, the metrics didn't change with the auto-scaler improvements. We just needed to fetch them quicker and with less delay. The one new metric that we had to add is the number of jobs that are running and not reporting idle.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 5, 2022

Build failed (retrying...):

@darinpp
Copy link
Copy Markdown
Contributor Author

darinpp commented Apr 5, 2022

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 5, 2022

Already running a review

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 5, 2022

Build succeeded:

@craig craig bot merged commit 5d5d0c9 into cockroachdb:master Apr 5, 2022
@darinpp darinpp deleted the prometheus-exporter-select branch April 5, 2022 15:41
craig bot pushed a commit that referenced this pull request Apr 5, 2022
…79409 #79427 #79428 #79433 #79444

76312: kvserver, batcheval: pin Engine state during read-only command evaluation r=aayushshah15 a=aayushshah15

This commit makes it such that we eagerly pin the engine state of the `Reader`
created during the evaluation of read-only requests.

Generally, reads will hold latches throughout the course of their evaluation
(particularly, while they do their `MVCCScan`). Mainly, this commit paves the
way for us to move to a world where we avoid holding latches during the
MVCCScan. Additionally, it also lets us make MVCC garbage collection latchless
as described in #55293.

There are a few notable changes in this patch:

1. Pinning the engine state eagerly runs into #70974. To resolve this, the
closed timestamp of the `Replica` is now captured at the time the `EvalContext`
is created, and not during the command evaluation of
`QueryResolvedTimestampRequest`.

2. `EvalContext` now has a `ImmutableEvalContext` embedded into it. The
`ImmutableEvalContext` is supposed to encapsulate state that must not change
after the `EvalContext` is created. The closed timestamp of the replica is part
of the `ImmutableEvalContext`.

3. `Replica` no longer fully implements the `EvalContext` interface. Instead,
it implements everything but `GetClosedTimestamp()` (which is implemented by
`ImmutableEvalContext` instead).

Relates to #55293
Resolves #55461
Resolves #70974

Release note: None


78652: sql: implement to_reg* builtins r=otan a=e-mbrown

Resolves #77838
This commit implements the `to_regclass`, `to_regnamespace`, `to_regproc`,
`to_regprocedure`, `to_regrole`, and `to_regtype` builtins.

Release note (<category, see below>): The `to_regclass`, `to_regnamespace`, `to_regproc`,
`to_regprocedure`, `to_regrole`, and `to_regtype` builtin functions are now supported,
improving compatibility with PostgreSQL.

79022: server/status: add running non-idle jobs metric r=darinpp a=darinpp

Previously serverless was using the sql jobs running metric to determine
if a tenant process is idle and can be shut down. With the introduction
of continiously running jobs this isn't a good indicator anymore. A
recent addition is a per job metrics that show running or idle. The auto
scaler doesn't care about the individual jobs and only cares about the
total number of jobs that a running but haven't reported as being idle.
The pull rate is also very high so the retriving all the individual
running/idle metrics for each job type isn't optimal. So this PR adds a
single metric that just aggregates and tracks the total count of jobs
running and not idle.

Release justification: Bug fixes and low-risk updates to new functionality
Release note: None

Will be re-based once #79021 is merged

79157: cli: tweak slow decommission message r=knz a=cameronnunez

Release note: None

79313: opt: do not push LIMIT into the scan of a virtual table r=msirek a=msirek

Fixes #78578

Previously, a LIMIT operation could be pushed into the scan of a virtual
table with an ORDER BY clause.              

This was inadequate because in-order scans of virtual indexes aren't
supported. When an index that should provide the order requested by a
query is used, a sort is actually produced under the covers:
```
EXPLAIN(vec)
SELECT oid, typname FROM pg_type ORDER BY OID;
               info
----------------------------------
  │
  └ Node 1
    └ *colexec.sortOp
      └ *sql.planNodeToRowSource

```
Functions `CanLimitFilteredScan` and `GenerateLimitedScans` are modified
to avoid pushing LIMIT operations into ordered scans of virtual indexes. 

Release justification: Low risk fix for incorrect results in queries
involving virtual system tables.

Release note (bug fix): LIMIT queries with an ORDER BY clause which scan
the index of a virtual system tables, such as `pg_type`, could
previously return incorrect results. This is corrected by teaching the
optimizer that LIMIT operations cannot be pushed into ordered scans of
virtual indexes.


79346: ccl/sqlproxyccl: add rebalancer queue for connection rebalancing r=JeffSwenson a=jaylim-crl

#### ccl/sqlproxyccl: add rebalancer queue for rebalance requests 

This commit adds a rebalancer queue implementation to the balancer component.
The queue will be used for rebalance requests for the connection migration
work. This is done to ensure a centralized location that invokes the
TransferConnection method on the connection handles. Doing this also enables
us to limit the number of concurrent transfers within the proxy.

Release note: None

#### ccl/sqlproxyccl: run rebalancer queue processor in the background 

The previous commit added a rebalancer queue. This commit connects the queue to
the balancer, and runs the queue processor in the background. By the default,
we limit up to 100 concurrent transfers at any point in time, and each transfer
will be retried up to 3 times.

Release note: None

Jira issue: CRDB-14727

79362: kv: remove stale comment in processOneChange r=nvanbenschoten a=nvanbenschoten

The comment was added in 2fb56bd and hasn't been accurate since 5178559.

Jira issue: CRDB-14753

79368: ccl/sqlproxyccl: include DRAINING pods in the directory cache r=JeffSwenson a=jaylim-crl

Previously, #67452 removed DRAINING pods from the directory cache. This commit
adds that back. The connector will now need to filter for RUNNING pods manually
before invoking the balancer. This is needed so that we could track DRAINING
pods, and wait until 60 seconds has elapsed before transferring connections
away from them. To support that, we also update the Pod's proto definition to
include a StateTimestamp field to reprevent that timestamp that the state field
was last updated.

The plan is to have a polling mechanism every X seconds to check DRAINING pods,
and use that information to start migrating connections.

Release note: None

Jira issue: CRDB-14759

79386: colexec: remove redundant benchmarks r=yuzefovich a=yuzefovich

This commit finishes the transition of some of the benchmarks in the
colexec package started in 22.1 cycle.

Fixes: #75106.

Release note: None

Jira issue: CRDB-14783

79409: sql: refactor deps tests to use bazel r=yuzefovich a=yuzefovich

This commit refactors most `VerifyNoImports` dependency tests in the sql
folder to use the newly introduced bazel test utilities.

Release note: None

Jira issue: CRDB-14814

79427: backupccl: allow cluster restore from different tenant r=dt a=stevendanna

This removes a prohibition for cluster restores with mismatched tenant
IDs since we believe they are now correct as of #73831

This allows users to take a cluster backup in a tenant and restore it
into another tenant.

The new tenant_settings table needs special care since it may exist in
the source tenant but not the target tenant when the source tenant is
the system tenant.

In this change, we throw an error in the case of a non-empty
tenant_settings table being restored into a non-system tenant. This is
a bit user-unfriendly since we detect this error rather late in the
restore process.

Release note: None

Jira issue: CRDB-14844

79428: backupccl: Refactor encryption utility functions into their own file. r=benbardin a=benbardin

Release note: None

Jira issue: CRDB-14845

79433: sql: use new ALTER TENANT syntax in tests r=stetvendanna a=rafiss

Release note: None

79444: roachtest: warmup follower-reads for fixed duration, not fixed number of ops r=nvanbenschoten a=nvanbenschoten

Fixes #78596.

This change switches the warmup phase of the follower-read roachtest
suite from running a fixed number of operations (100) to running for a
fixed duration (15s). This should ensure that the single-region variant
of the test is given sufficient time to warm up follower reads immediately
after one of its nodes is restarted.

Before this change, the single-region variant was only being given about
500ms after startup to catch up on the closed timestamp, which made the
test flaky.

Release justification: testing only

Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: e-mbrown <ebsonari@gmail.com>
Co-authored-by: Darin Peshev <darinp@gmail.com>
Co-authored-by: Cameron Nunez <cameron@cockroachlabs.com>
Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
Co-authored-by: Jay <jay@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Ben Bardin <bardin@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
craig bot pushed a commit that referenced this pull request Apr 6, 2022
79023: server/status: add load related metrics r=darinpp a=darinpp

Previously there were only CPU related metrics available on the
_status/load endpoint. For serverless we need in addition to these, the
metrics which show the total number of current sql connections, the
number of sql queries executed and the number of jobs currently running
that are not idle. This PR adds the three new metrics by using selective
prometheus exporter and scraping the MetricsRecorder.

Release justification: Low risk, high reward changes to existing functionality

Release note: None

Will be re-based once #79021 and #79022 are merged.

Co-authored-by: Darin Peshev <darinp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants