[native] Fix recording of SUM type in PrometheusStatsReporter by lingbin · Pull Request #23622 · prestodb/presto

lingbin · 2024-09-11T13:41:45Z

For SUM type worker metrics, the corresponding type in
PrometheusStatsReporter is prometheus::Gauge.

For these metrics, each time they are recorded (via
RECORD_METRIC_VALUE), a "delta" is passed in, so Gauge::Increment()
should be used in PrometheusStatsReporter instead of Gauge::Set()
(which overwrites the old value).

== NO RELEASE NOTE ==

lingbin · 2024-09-11T13:45:34Z

@majetideepak Could you please help to take a look? Thanks.

majetideepak · 2024-09-12T09:50:19Z

@lingbin thanks for the fix!
@karteekmurthys can you help review this?

aditi-pandit · 2024-09-12T10:53:36Z

@lingbin : Thanks for this fix. Would it be possible to write a test for it ?

For SUM type worker metrics, the corresponding type in `PrometheusStatsReporter` is `prometheus::Gauge`. For these metrics, each time they are recorded (via `RECORD_METRIC_VALUE`), a "delta" is passed in, so `Gauge::Increment()` should be used in `PrometheusStatsReporter` instead of `Gauge::Set()` (which overwrites the old value).

lingbin · 2024-09-12T14:16:04Z

@aditi-pandit A unit test has been added. And already rebased. Please take a look at it again, thanks.

karteekmurthys · 2024-09-12T17:03:40Z

+    }
+    case velox::StatType::SUM: {
+      auto* gauge = reinterpret_cast<::prometheus::Gauge*>(statsInfo.metricPtr);
+      gauge->Increment(static_cast<double>(value));


The intent is to capture the change in value since the last time it was recorded. You can compute the cumulative sum in the visualization tool like Grafana. If we increment the value, we are as good as COUNT type. Please refer this doc that explains SUM capture Delta vs Cumulative SUM: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#sums

We are capturing the delta and we don't want to increment here.

@karteekmurthys
Thank you for your explanation, I benefited a lot.

However, for Prometheus, I feel that when we use the "pull mode"(through the fetchMetrics() interface), it is impossible to accurately express the semantics of "SUM capture Delta".

I can give an example. If a metric is updated every 1 second, but we pull the full amount of metrics every 5 seconds, if we use "SUM capture Delta" in Prometheus, we will lose information.
For simplicity, suppose the delta value generated every second is 2,

t0 t1 t2 t3 t4 t5 ... 2, 2, 2, 2, 2, 2 ... | | v v pull_1 pull_2

As a result, only 2 points (t0, 2) and (t5, 2) are saved to Prometheus, that is, t1, t2, t3, t4 are lost, so we are NOT able to compute the "cumulative sum" anymore(We expect 12 but can only get 4).

So, in my opinion, in Prometheus's "pull mode", we can only use the "Cumulative SUM" method.

Maybe I missed something, please correct me if I am wrong.

That is a good point @lingbin. We can minimize the data loss by setting scrape interval by prometheus to 1s or 2s. Of course, this doesn't guarantee no data loss.

If we capture the cumulative sum. We can only answer questions like "what is the total bandwidth/cpu/iops?" etc. In query performance analysis we would like to see the variation of data points when a specific query was run. Like "how must did query 1 consume a resource vs query 2?".
May be this can be solved by some function at visualization tool end, but the accuracy of trend will again depend on scrape interval.
eg:

Time: 0 | 1 | 2 | 3 | 4 | 5 | 6 Event: 2 | 3 | 5 | 1 | 5 .... Cumulative: 2 | 5 | 10| 11| 16..... Let's say Q1 ran in interval t0 to t1 and if we report only cumulative sum, we read metric as 2. Let' say Q2 ran in interval t3 to t4 ( Event value = 1) but cumulative sum is 16, assuming your prometheus scrape interval is 4s. Let's say on visualization tool you used some form of delta function that gives us delta from prev recording metric. So we see {2 - 0, 16 -2} = {2, 14}. But Q2 didn't consume 14 units of that resource.

Reporting cumulative vs sum both have similar pros/cons. I would like to hear your thoughts on this.

The scrape interval must be aware of your system as well. In our case, it is presto, which is meant to run analytic workloads where a single query can run for several seconds/minutes. So scraping values every 2/3s should give a decent picture of what is happening.

@karteekmurthys Thanks for your further explanation.

Some metrics are updated much more frequently than the "scrape interval", right? For example, they are updated many times per second (depending on the number of events). I think there must be such metrics for those metrics that are not updated periodically in PeriodicStatsReporter.

Even more unfortunately, for those metrics with longer update periods, reducing the "scrape interval" to a very small value will only give more wrong results.

For example, let's assume a scenario where we scrape every 1 second, but the metric is updated every 4 seconds.

Time: 0 | 1 | 2 | 3 | 4 | 5 | 6 ... Event: 2 | - | - | - | 5 | - | - ... Cumulative: 2 | 2 | 2 | 2 | 7 | 7 | 7 ... scraped-value: 2 | 2 | 2 | 2 | 5 | 5 | 5 ... In this case, we can no longer even calculate the "Cumulative SUM" correctly any more based on the values scraped, and still cannot accurately express "dalta". - At time t6, we expect the SUM to be 7 (2+5), but we get 23(2+2+2+2+5+5+5). - At time t1, t2, t3, we expect the DELTA to be 0, but we get 2. If we see such a chart in the monitoring system, it will definitely be very misleading.

We expect to use Prometheus as a monitoring tool for online production systems. The serious misleading mentioned above determines that we can only use the "Cumulative SUM" method, what do you think?

After using the "Cumulative SUM" method, if we need to calculate "delta", we can use the rate() function in Prometheus, which calculates the per-second average rate of increase of the time series.
Yes, it really cannot express the delta since the last time it was recorded.

aditi-pandit

Thanks @lingbin

lingbin requested a review from a team as a code owner September 11, 2024 13:41

majetideepak requested a review from karteekmurthys September 12, 2024 09:43

lingbin force-pushed the fix-recording-worker-metrics-of-sum-type branch from af2f976 to f5cfdf1 Compare September 12, 2024 14:11

lingbin force-pushed the fix-recording-worker-metrics-of-sum-type branch from f5cfdf1 to 7896180 Compare September 12, 2024 14:12

karteekmurthys reviewed Sep 12, 2024

View reviewed changes

karteekmurthys approved these changes Sep 16, 2024

View reviewed changes

majetideepak approved these changes Sep 17, 2024

View reviewed changes

aditi-pandit approved these changes Sep 18, 2024

View reviewed changes

aditi-pandit merged commit ba811b9 into prestodb:master Sep 18, 2024

lingbin mentioned this pull request Nov 5, 2025

fix(native): Fix OS metrics to report cumulative values for AVG type #26517

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[native] Fix recording of SUM type in PrometheusStatsReporter#23622

[native] Fix recording of SUM type in PrometheusStatsReporter#23622
aditi-pandit merged 1 commit into
prestodb:masterfrom
lingbin:fix-recording-worker-metrics-of-sum-type

lingbin commented Sep 11, 2024

Uh oh!

lingbin commented Sep 11, 2024

Uh oh!

majetideepak commented Sep 12, 2024

Uh oh!

aditi-pandit commented Sep 12, 2024

Uh oh!

lingbin commented Sep 12, 2024

Uh oh!

karteekmurthys Sep 12, 2024

Uh oh!

lingbin Sep 12, 2024 •

edited

Loading

Uh oh!

karteekmurthys Sep 12, 2024 •

edited

Loading

Uh oh!

karteekmurthys Sep 12, 2024

Uh oh!

lingbin Sep 13, 2024 •

edited

Loading

Uh oh!

lingbin Sep 13, 2024 •

edited

Loading

Uh oh!

aditi-pandit left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lingbin commented Sep 11, 2024

Uh oh!

lingbin commented Sep 11, 2024

Uh oh!

majetideepak commented Sep 12, 2024

Uh oh!

aditi-pandit commented Sep 12, 2024

Uh oh!

lingbin commented Sep 12, 2024

Uh oh!

karteekmurthys Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

lingbin Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karteekmurthys Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karteekmurthys Sep 12, 2024

Choose a reason for hiding this comment

Uh oh!

lingbin Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lingbin Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aditi-pandit left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lingbin Sep 12, 2024 •

edited

Loading

karteekmurthys Sep 12, 2024 •

edited

Loading

lingbin Sep 13, 2024 •

edited

Loading

lingbin Sep 13, 2024 •

edited

Loading