Skip to content

metrics: introduce CounterFloat64 and use for tenant RUs#96109

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
dhartunian:add-counterfloat64-for-tenant-metrics
Feb 9, 2023
Merged

metrics: introduce CounterFloat64 and use for tenant RUs#96109
craig[bot] merged 1 commit intocockroachdb:masterfrom
dhartunian:add-counterfloat64-for-tenant-metrics

Conversation

@dhartunian
Copy link
Copy Markdown
Collaborator

Previously, we were prevented from using float64s directly in metric counters since counters were limited to ints. This led to the use of Gauges in situations where Counters would be preferable since we didn't have code to help manage a monotonically increasing float64 value.

This commit introduces some helpers around atomically adding float64s together and increasing one monotonically. Those primitives are composed into a CounterFloat64 type further used to construct AggCounterFloat64 which can be used in place of AggCounter to track per-tenant metrics.

The two GaugeFloat64 types used for tenant totalRU and totalKVRU metrics are replaced with the new CounterFloat64 type to properly reflect the fact that these are monotonically increasing values. This helps Prometheus when scraping these metrics to correctly account for missing data if necessary.

Resolves #68291

Epic: CRDB0-14536

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@dhartunian dhartunian marked this pull request as ready for review January 27, 2023 18:16
@dhartunian dhartunian requested a review from a team January 27, 2023 18:16
@dhartunian dhartunian requested a review from a team as a code owner January 27, 2023 18:16
@dhartunian dhartunian force-pushed the add-counterfloat64-for-tenant-metrics branch from a732fae to 5c6944b Compare January 27, 2023 18:18
@dhartunian dhartunian requested a review from tbg January 27, 2023 18:18
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier, @andy-kimball, @dhartunian, and @tbg)


pkg/ccl/multitenantccl/tenantcostserver/token_bucket.go line 125 at r1 (raw file):

	// Report current consumption.
	metrics.totalRU.UpdateIfHigher(consumption.RU)

I think I'd prefer to let it update the value unconditionally.. If there's a bug (e.g. some kind of wraparound in an internal calculation), the IfHigher will hide what's going on (it will look like it's frozen)..

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to Update a counter strikes me as weird. Counters are for counting! And we are counting. It's just not happening on the metric:

if c.shouldAccountForExternalIORUs() {
c.mu.consumption.RU += float64(totalRU)
}

It's probably not worth spending too much time on pushing the metrics update down; after all that field is on a protobuf (which doesn't mean we can't use it anyway but still).

Re: Update checking, I'm fine either way, I think it is a judgment call. Ideally we don't use counters like we do here often but if we do hopefully we're using them only in incrementing fashions, so UpdateIfHigher seems mildly preferrable. Yes there could be a bug where we're putting a bogus number in and lock it forever, but there could also be concurrency bugs where we wipe out the correct number frequently. It's anyone's guess which one is more likely, both seem pretty unlikely.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @abarganier, @andy-kimball, and @dhartunian)


pkg/util/metric/aggmetric/counter.go line 219 at r1 (raw file):

// Unlink disconnects this Counter from its parents. Unlike Gauge.Destroy, it
// does not decrement its value from its parent.

Would you mind matching this more closely to the (very recent) updated comment here?

// Unlink unlinks this child from the parent, i.e. the parent will no longer
// track this child (i.e. won't generate labels for it, etc). However, the child
// will continue to be functional and reference the parent, meaning updates to
// it will be reflected in the aggregate stored in the parent.
//
// See tenantrate.TestUseAfterRelease.
func (g *GaugeFloat64) Unlink() {
g.parent.remove(g)
}

Code quote:

does not decrement its value from its parent

@dhartunian dhartunian force-pushed the add-counterfloat64-for-tenant-metrics branch from 5c6944b to 11597cb Compare February 7, 2023 19:32
Copy link
Copy Markdown
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs. My preference is to keep UpdateIfHigher for now. The original request was to switch to counters for nicer Prometheus treatment and I think the code should enforce those expectations around the data since decrementing counters will cause downstream problems.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @abarganier and @andy-kimball)


pkg/util/metric/aggmetric/counter.go line 219 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Would you mind matching this more closely to the (very recent) updated comment here?

// Unlink unlinks this child from the parent, i.e. the parent will no longer
// track this child (i.e. won't generate labels for it, etc). However, the child
// will continue to be functional and reference the parent, meaning updates to
// it will be reflected in the aggregate stored in the parent.
//
// See tenantrate.TestUseAfterRelease.
func (g *GaugeFloat64) Unlink() {
g.parent.remove(g)
}

Done.

@dhartunian dhartunian force-pushed the add-counterfloat64-for-tenant-metrics branch from 11597cb to e963f97 Compare February 8, 2023 22:05
Previously, we were prevented from using float64s directly in metric counters
since counters were limited to ints. This led to the use of Gauges in
situations where Counters would be preferable since we didn't have code to help
manage a monotonically increasing float64 value.

This commit introduces some helpers around atomically adding float64s together
and increasing one monotonically. Those primitives are composed into a
`CounterFloat64` type further used to construct `AggCounterFloat64` which can
be used in place of `AggCounter` to track per-tenant metrics.

The two `GaugeFloat64` types used for tenant `totalRU` and `totalKVRU` metrics
are replaced with the new `CounterFloat64` type to properly reflect the fact
that these are monotonically increasing values. This helps Prometheus when
scraping these metrics to correctly account for missing data if necessary.

Resolves cockroachdb#68291

Epic: CRDB0-14536

Release note: None
@dhartunian dhartunian force-pushed the add-counterfloat64-for-tenant-metrics branch from e963f97 to f96b21f Compare February 9, 2023 16:17
@dhartunian
Copy link
Copy Markdown
Collaborator Author

bors r=RaduBerinde,tbg

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 9, 2023

Build succeeded:

@craig craig bot merged commit faeebee into cockroachdb:master Feb 9, 2023
dhartunian added a commit to dhartunian/cockroach that referenced this pull request Feb 10, 2023
PR cockroachdb#96109 failed to amend the hdr test files.

Resolves cockroachdb#96909

Epic: None

Release note: None
craig bot pushed a commit that referenced this pull request Feb 10, 2023
96921: metrics: fix agg metrics hdr test files r=dt a=dhartunian

PR #96109 failed to amend the hdr test files.

Resolves #96909

Epic: None

Release note: None

Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
@dhartunian dhartunian deleted the add-counterfloat64-for-tenant-metrics branch February 5, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tenantcostserver: use counters instead of gauges for tenant usage metrics

4 participants