[native] Fix recording of SUM type in PrometheusStatsReporter#23622
Conversation
|
@majetideepak Could you please help to take a look? Thanks. |
|
@lingbin thanks for the fix! |
|
@lingbin : Thanks for this fix. Would it be possible to write a test for it ? |
af2f976 to
f5cfdf1
Compare
For SUM type worker metrics, the corresponding type in `PrometheusStatsReporter` is `prometheus::Gauge`. For these metrics, each time they are recorded (via `RECORD_METRIC_VALUE`), a "delta" is passed in, so `Gauge::Increment()` should be used in `PrometheusStatsReporter` instead of `Gauge::Set()` (which overwrites the old value).
f5cfdf1 to
7896180
Compare
|
@aditi-pandit A unit test has been added. And already rebased. Please take a look at it again, thanks. |
| } | ||
| case velox::StatType::SUM: { | ||
| auto* gauge = reinterpret_cast<::prometheus::Gauge*>(statsInfo.metricPtr); | ||
| gauge->Increment(static_cast<double>(value)); |
There was a problem hiding this comment.
The intent is to capture the change in value since the last time it was recorded. You can compute the cumulative sum in the visualization tool like Grafana. If we increment the value, we are as good as COUNT type. Please refer this doc that explains SUM capture Delta vs Cumulative SUM: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#sums
We are capturing the delta and we don't want to increment here.
There was a problem hiding this comment.
@karteekmurthys
Thank you for your explanation, I benefited a lot.
However, for Prometheus, I feel that when we use the "pull mode"(through the fetchMetrics() interface), it is impossible to accurately express the semantics of "SUM capture Delta".
I can give an example. If a metric is updated every 1 second, but we pull the full amount of metrics every 5 seconds, if we use "SUM capture Delta" in Prometheus, we will lose information.
For simplicity, suppose the delta value generated every second is 2,
t0 t1 t2 t3 t4 t5 ...
2, 2, 2, 2, 2, 2 ...
| |
v v
pull_1 pull_2
As a result, only 2 points (t0, 2) and (t5, 2) are saved to Prometheus, that is, t1, t2, t3, t4 are lost, so we are NOT able to compute the "cumulative sum" anymore(We expect 12 but can only get 4).
So, in my opinion, in Prometheus's "pull mode", we can only use the "Cumulative SUM" method.
Maybe I missed something, please correct me if I am wrong.
There was a problem hiding this comment.
That is a good point @lingbin. We can minimize the data loss by setting scrape interval by prometheus to 1s or 2s. Of course, this doesn't guarantee no data loss.
If we capture the cumulative sum. We can only answer questions like "what is the total bandwidth/cpu/iops?" etc. In query performance analysis we would like to see the variation of data points when a specific query was run. Like "how must did query 1 consume a resource vs query 2?".
May be this can be solved by some function at visualization tool end, but the accuracy of trend will again depend on scrape interval.
eg:
Time: 0 | 1 | 2 | 3 | 4 | 5 | 6
Event: 2 | 3 | 5 | 1 | 5 ....
Cumulative: 2 | 5 | 10| 11| 16.....
Let's say Q1 ran in interval t0 to t1 and if we report only cumulative sum, we read metric as 2.
Let' say Q2 ran in interval t3 to t4 ( Event value = 1) but cumulative sum is 16,
assuming your prometheus scrape interval is 4s.
Let's say on visualization tool you used some form of delta function that gives
us delta from prev recording metric. So we see {2 - 0, 16 -2} = {2, 14}.
But Q2 didn't consume 14 units of that resource.
Reporting cumulative vs sum both have similar pros/cons. I would like to hear your thoughts on this.
There was a problem hiding this comment.
The scrape interval must be aware of your system as well. In our case, it is presto, which is meant to run analytic workloads where a single query can run for several seconds/minutes. So scraping values every 2/3s should give a decent picture of what is happening.
There was a problem hiding this comment.
@karteekmurthys Thanks for your further explanation.
Some metrics are updated much more frequently than the "scrape interval", right? For example, they are updated many times per second (depending on the number of events). I think there must be such metrics for those metrics that are not updated periodically in PeriodicStatsReporter.
Even more unfortunately, for those metrics with longer update periods, reducing the "scrape interval" to a very small value will only give more wrong results.
For example, let's assume a scenario where we scrape every 1 second, but the metric is updated every 4 seconds.
Time: 0 | 1 | 2 | 3 | 4 | 5 | 6 ...
Event: 2 | - | - | - | 5 | - | - ...
Cumulative: 2 | 2 | 2 | 2 | 7 | 7 | 7 ...
scraped-value: 2 | 2 | 2 | 2 | 5 | 5 | 5 ...
In this case, we can no longer even calculate the "Cumulative SUM" correctly any more
based on the values scraped, and still cannot accurately express "dalta".
- At time t6, we expect the SUM to be 7 (2+5), but we get 23(2+2+2+2+5+5+5).
- At time t1, t2, t3, we expect the DELTA to be 0, but we get 2.
If we see such a chart in the monitoring system, it will definitely be very misleading.
We expect to use Prometheus as a monitoring tool for online production systems. The serious misleading mentioned above determines that we can only use the "Cumulative SUM" method, what do you think?
There was a problem hiding this comment.
After using the "Cumulative SUM" method, if we need to calculate "delta", we can use the rate() function in Prometheus, which calculates the per-second average rate of increase of the time series.
Yes, it really cannot express the delta since the last time it was recorded.
For
SUMtype worker metrics, the corresponding type inPrometheusStatsReporterisprometheus::Gauge.For these metrics, each time they are recorded (via
RECORD_METRIC_VALUE), a "delta" is passed in, soGauge::Increment()should be used in
PrometheusStatsReporterinstead ofGauge::Set()(which overwrites the old value).