metrics: introduce CounterFloat64 and use for tenant RUs#96109
Conversation
a732fae to
5c6944b
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @abarganier, @andy-kimball, @dhartunian, and @tbg)
pkg/ccl/multitenantccl/tenantcostserver/token_bucket.go line 125 at r1 (raw file):
// Report current consumption. metrics.totalRU.UpdateIfHigher(consumption.RU)
I think I'd prefer to let it update the value unconditionally.. If there's a bug (e.g. some kind of wraparound in an internal calculation), the IfHigher will hide what's going on (it will look like it's frozen)..
tbg
left a comment
There was a problem hiding this comment.
Having to Update a counter strikes me as weird. Counters are for counting! And we are counting. It's just not happening on the metric:
cockroach/pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
Lines 875 to 877 in ffc9606
It's probably not worth spending too much time on pushing the metrics update down; after all that field is on a protobuf (which doesn't mean we can't use it anyway but still).
Re: Update checking, I'm fine either way, I think it is a judgment call. Ideally we don't use counters like we do here often but if we do hopefully we're using them only in incrementing fashions, so UpdateIfHigher seems mildly preferrable. Yes there could be a bug where we're putting a bogus number in and lock it forever, but there could also be concurrency bugs where we wipe out the correct number frequently. It's anyone's guess which one is more likely, both seem pretty unlikely.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @abarganier, @andy-kimball, and @dhartunian)
pkg/util/metric/aggmetric/counter.go line 219 at r1 (raw file):
// Unlink disconnects this Counter from its parents. Unlike Gauge.Destroy, it // does not decrement its value from its parent.
Would you mind matching this more closely to the (very recent) updated comment here?
cockroach/pkg/util/metric/aggmetric/gauge.go
Lines 230 to 238 in 7038e70
Code quote:
does not decrement its value from its parent5c6944b to
11597cb
Compare
dhartunian
left a comment
There was a problem hiding this comment.
TFTRs. My preference is to keep UpdateIfHigher for now. The original request was to switch to counters for nicer Prometheus treatment and I think the code should enforce those expectations around the data since decrementing counters will cause downstream problems.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @abarganier and @andy-kimball)
pkg/util/metric/aggmetric/counter.go line 219 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Would you mind matching this more closely to the (very recent) updated comment here?
cockroach/pkg/util/metric/aggmetric/gauge.go
Lines 230 to 238 in 7038e70
Done.
11597cb to
e963f97
Compare
Previously, we were prevented from using float64s directly in metric counters since counters were limited to ints. This led to the use of Gauges in situations where Counters would be preferable since we didn't have code to help manage a monotonically increasing float64 value. This commit introduces some helpers around atomically adding float64s together and increasing one monotonically. Those primitives are composed into a `CounterFloat64` type further used to construct `AggCounterFloat64` which can be used in place of `AggCounter` to track per-tenant metrics. The two `GaugeFloat64` types used for tenant `totalRU` and `totalKVRU` metrics are replaced with the new `CounterFloat64` type to properly reflect the fact that these are monotonically increasing values. This helps Prometheus when scraping these metrics to correctly account for missing data if necessary. Resolves cockroachdb#68291 Epic: CRDB0-14536 Release note: None
e963f97 to
f96b21f
Compare
|
bors r=RaduBerinde,tbg |
|
Build succeeded: |
PR cockroachdb#96109 failed to amend the hdr test files. Resolves cockroachdb#96909 Epic: None Release note: None
Previously, we were prevented from using float64s directly in metric counters since counters were limited to ints. This led to the use of Gauges in situations where Counters would be preferable since we didn't have code to help manage a monotonically increasing float64 value.
This commit introduces some helpers around atomically adding float64s together and increasing one monotonically. Those primitives are composed into a
CounterFloat64type further used to constructAggCounterFloat64which can be used in place ofAggCounterto track per-tenant metrics.The two
GaugeFloat64types used for tenanttotalRUandtotalKVRUmetrics are replaced with the newCounterFloat64type to properly reflect the fact that these are monotonically increasing values. This helps Prometheus when scraping these metrics to correctly account for missing data if necessary.Resolves #68291
Epic: CRDB0-14536
Release note: None