Use Elasticsearch types in Cockroachdb module#17736
Use Elasticsearch types in Cockroachdb module#17736jsoriano wants to merge 9 commits intoelastic:mainfrom
Conversation
| { | ||
| "agg_with": "avg", | ||
| "field": "prometheus.metrics.raft_process_logcommit_latency_count", | ||
| "field": "prometheus.raft_process_logcommit_latency.histogram", |
There was a problem hiding this comment.
Current dashboard is using sum and count to calculate the average of this value. I think it can make sense now to calculate percentiles, but I haven't managed to use histograms in TSVB yet. @exekias do you know if they are already supported?
There was a problem hiding this comment.
It works with other visualizations, I will go on with line graphs by now.
There was a problem hiding this comment.
Yes, currently only Visualize supports this type
There was a problem hiding this comment.
I have replaced the graphs that were using sum and count to calculate averages and they are using 99th percentile now (as the CockroachDB admin UI does). It is quite ok now but the timings are in nanoseconds and I haven't found a way to format them.
|
Pinging @elastic/integrations-platforms (Team:Platforms) |
|
I have moved changes for fields validation to #17759 |
Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero.
Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see elastic#17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)
Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see elastic#17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)
f34de4f to
55e9891
Compare
…17783) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)
…17784) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)
| metrics_path: /_status/vars | ||
| use_types: true | ||
| processors: | ||
| - drop_fields: |
There was a problem hiding this comment.
Wondering if this could make use of metrics_filters instead?
There was a problem hiding this comment.
We could, as all sql_mem_sql... metrics are duplicated in cockroachdb. But only histograms are problematic, the rest are gauges, and in principle Metricbeat doesn't have any problem with duplicated gauges.
Also, removing these metrics at this point ensures that they are never collected, even if the user sets its own metrics_filters, or adds its own processors.
|
This pull request does not have a backport label. Could you fix it @jsoriano? 🙏
NOTE: |
|
This pull request is now in conflicts. Could you fix it? 🙏 |
|
@jsoriano - Closing this one as there were no activity for a while |
…lows (elastic#17784) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see elastic#17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 2d6e0ca)
What does this PR do?
Use native Elasticsearch types for CockroachDB data so histograms are stored much more efficiently.
Adapt dashboard to use these types and some other fixes (see section about this below).
Why is it important?
Align CockroachDB module with latest Prometheus changes to leverage the use of new histogram type.
Checklist
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.Author's Checklist
How to test this PR locally
Run the CockroachDB module, check that dashboard works.
Related issues
Dashboard
There are some changes in dashboards: