Skip to content

[native] Fix recording of SUM type in PrometheusStatsReporter#23622

Merged
aditi-pandit merged 1 commit into
prestodb:masterfrom
lingbin:fix-recording-worker-metrics-of-sum-type
Sep 18, 2024
Merged

[native] Fix recording of SUM type in PrometheusStatsReporter#23622
aditi-pandit merged 1 commit into
prestodb:masterfrom
lingbin:fix-recording-worker-metrics-of-sum-type

Conversation

@lingbin

@lingbin lingbin commented Sep 11, 2024

Copy link
Copy Markdown
Contributor

For SUM type worker metrics, the corresponding type in
PrometheusStatsReporter is prometheus::Gauge.

For these metrics, each time they are recorded (via
RECORD_METRIC_VALUE), a "delta" is passed in, so Gauge::Increment()
should be used in PrometheusStatsReporter instead of Gauge::Set()
(which overwrites the old value).

== NO RELEASE NOTE ==

@lingbin lingbin requested a review from a team as a code owner September 11, 2024 13:41
@lingbin

lingbin commented Sep 11, 2024

Copy link
Copy Markdown
Contributor Author

@majetideepak Could you please help to take a look? Thanks.

@majetideepak

Copy link
Copy Markdown
Collaborator

@lingbin thanks for the fix!
@karteekmurthys can you help review this?

@aditi-pandit

Copy link
Copy Markdown
Contributor

@lingbin : Thanks for this fix. Would it be possible to write a test for it ?

@lingbin lingbin force-pushed the fix-recording-worker-metrics-of-sum-type branch from af2f976 to f5cfdf1 Compare September 12, 2024 14:11
For SUM type worker metrics, the corresponding type in
`PrometheusStatsReporter` is `prometheus::Gauge`.

For these metrics, each time they are recorded (via
`RECORD_METRIC_VALUE`), a "delta" is passed in, so `Gauge::Increment()`
should be used in `PrometheusStatsReporter` instead of `Gauge::Set()`
(which overwrites the old value).
@lingbin lingbin force-pushed the fix-recording-worker-metrics-of-sum-type branch from f5cfdf1 to 7896180 Compare September 12, 2024 14:12
@lingbin

lingbin commented Sep 12, 2024

Copy link
Copy Markdown
Contributor Author

@aditi-pandit A unit test has been added. And already rebased. Please take a look at it again, thanks.

}
case velox::StatType::SUM: {
auto* gauge = reinterpret_cast<::prometheus::Gauge*>(statsInfo.metricPtr);
gauge->Increment(static_cast<double>(value));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to capture the change in value since the last time it was recorded. You can compute the cumulative sum in the visualization tool like Grafana. If we increment the value, we are as good as COUNT type. Please refer this doc that explains SUM capture Delta vs Cumulative SUM: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#sums

We are capturing the delta and we don't want to increment here.

@lingbin lingbin Sep 12, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karteekmurthys
Thank you for your explanation, I benefited a lot.

However, for Prometheus, I feel that when we use the "pull mode"(through the fetchMetrics() interface), it is impossible to accurately express the semantics of "SUM capture Delta".

I can give an example. If a metric is updated every 1 second, but we pull the full amount of metrics every 5 seconds, if we use "SUM capture Delta" in Prometheus, we will lose information.
For simplicity, suppose the delta value generated every second is 2,

  t0   t1    t2    t3    t4    t5    ...
  2,    2,    2,    2,    2,    2    ...
  |                             |
  v                             v
pull_1                        pull_2

As a result, only 2 points (t0, 2) and (t5, 2) are saved to Prometheus, that is, t1, t2, t3, t4 are lost, so we are NOT able to compute the "cumulative sum" anymore(We expect 12 but can only get 4).

So, in my opinion, in Prometheus's "pull mode", we can only use the "Cumulative SUM" method.

Maybe I missed something, please correct me if I am wrong.

@karteekmurthys karteekmurthys Sep 12, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point @lingbin. We can minimize the data loss by setting scrape interval by prometheus to 1s or 2s. Of course, this doesn't guarantee no data loss.

If we capture the cumulative sum. We can only answer questions like "what is the total bandwidth/cpu/iops?" etc. In query performance analysis we would like to see the variation of data points when a specific query was run. Like "how must did query 1 consume a resource vs query 2?".
May be this can be solved by some function at visualization tool end, but the accuracy of trend will again depend on scrape interval.
eg:

Time:              0 | 1 | 2 | 3 | 4 | 5 | 6
Event:             2 | 3 | 5 | 1 | 5 ....
Cumulative:        2 | 5 | 10| 11| 16.....

Let's say Q1 ran in interval t0 to t1 and if we report only cumulative sum, we read metric as 2.
Let' say Q2 ran in interval t3 to t4 ( Event value = 1) but cumulative sum is 16,
 assuming your prometheus scrape interval is 4s.

Let's say on visualization tool you used some form of delta function that gives 
us delta from prev recording metric. So we see {2 - 0, 16 -2} = {2, 14}. 
But Q2 didn't consume 14 units of that resource.

Reporting cumulative vs sum both have similar pros/cons. I would like to hear your thoughts on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scrape interval must be aware of your system as well. In our case, it is presto, which is meant to run analytic workloads where a single query can run for several seconds/minutes. So scraping values every 2/3s should give a decent picture of what is happening.

@lingbin lingbin Sep 13, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karteekmurthys Thanks for your further explanation.

Some metrics are updated much more frequently than the "scrape interval", right? For example, they are updated many times per second (depending on the number of events). I think there must be such metrics for those metrics that are not updated periodically in PeriodicStatsReporter.

Even more unfortunately, for those metrics with longer update periods, reducing the "scrape interval" to a very small value will only give more wrong results.

For example, let's assume a scenario where we scrape every 1 second, but the metric is updated every 4 seconds.

Time:              0 | 1 | 2 | 3 | 4 | 5 | 6 ...
Event:             2 | - | - | - | 5 | - | - ...
Cumulative:        2 | 2 | 2 | 2 | 7 | 7 | 7 ...
scraped-value:     2 | 2 | 2 | 2 | 5 | 5 | 5 ...

In this case, we can no longer even calculate the "Cumulative SUM" correctly any more 
based on the values ​​scraped, and still cannot accurately express "dalta". 

- At time t6, we expect the SUM to be 7 (2+5), but we get 23(2+2+2+2+5+5+5).  
- At time t1, t2, t3, we expect the DELTA to be 0, but we get 2.

If we see such a chart in the monitoring system, it will definitely be very misleading.

We expect to use Prometheus as a monitoring tool for online production systems. The serious misleading mentioned above determines that we can only use the "Cumulative SUM" method, what do you think?

@lingbin lingbin Sep 13, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After using the "Cumulative SUM" method, if we need to calculate "delta", we can use the rate() function in Prometheus, which calculates the per-second average rate of increase of the time series.
Yes, it really cannot express the delta since the last time it was recorded.

@aditi-pandit aditi-pandit left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lingbin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants