Use Elasticsearch types in Cockroachdb module by jsoriano · Pull Request #17736 · elastic/beats

jsoriano · 2020-04-15T19:30:55Z

What does this PR do?

Use native Elasticsearch types for CockroachDB data so histograms are stored much more efficiently.

Adapt dashboard to use these types and some other fixes (see section about this below).

Why is it important?

Align CockroachDB module with latest Prometheus changes to leverage the use of new histogram type.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

Verify that the fix in histograms makes sense. (Moved to Fix prometheus histogram rate overflows #17753)
Create a follow-up issue to investigate responses with duplicated histograms.
Find a way to maintain previous visualizations for raft latency.
- Can nanoseconds be formated?
Add some additional check to system tests.

How to test this PR locally

Run the CockroachDB module, check that dashboard works.

Related issues

Relates to #14064

Dashboard

There are some changes in dashboards:

Calculation of ranges is more accurate now.
Histogram-based visualizations have been replaced to show more representative data, and use 99th percentile instead of average.

jsoriano · 2020-04-15T19:33:58Z

.../metricbeat/module/cockroachdb/_meta/kibana/7/dashboard/Metricbeat-cockroachdb-overview.json

                  {
                    "agg_with": "avg",
-                    "field": "prometheus.metrics.raft_process_logcommit_latency_count",
+                    "field": "prometheus.raft_process_logcommit_latency.histogram",


Current dashboard is using sum and count to calculate the average of this value. I think it can make sense now to calculate percentiles, but I haven't managed to use histograms in TSVB yet. @exekias do you know if they are already supported?

It works with other visualizations, I will go on with line graphs by now.

Yes, currently only Visualize supports this type

I have replaced the graphs that were using sum and count to calculate averages and they are using 99th percentile now (as the CockroachDB admin UI does). It is quite ok now but the timings are in nanoseconds and I haven't found a way to format them.

x-pack/metricbeat/module/prometheus/collector/histogram.go

elasticmachine · 2020-04-15T19:40:25Z

Pinging @elastic/integrations-platforms (Team:Platforms)

jsoriano · 2020-04-16T15:00:21Z

I have moved changes for fields validation to #17759

Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero.

Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see elastic#17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)

…17783) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)

…17784) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see #17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 0afffa8)

ChrsMark · 2020-04-21T07:47:50Z

x-pack/metricbeat/module/cockroachdb/status/manifest.yml

    metrics_path: /_status/vars
+    use_types: true
+processors:
+  - drop_fields:


Wondering if this could make use of metrics_filters instead?

We could, as all sql_mem_sql... metrics are duplicated in cockroachdb. But only histograms are problematic, the rest are gauges, and in principle Metricbeat doesn't have any problem with duplicated gauges.
Also, removing these metrics at this point ensures that they are never collected, even if the user sets its own metrics_filters, or adds its own processors.

mergify · 2021-09-22T11:47:20Z

This pull request does not have a backport label. Could you fix it @jsoriano? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

mergify · 2021-11-10T14:31:05Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b cockroachdb-use-types upstream/cockroachdb-use-types
git merge upstream/master
git push upstream cockroachdb-use-types

jlind23 · 2022-03-31T14:41:38Z

@jsoriano - Closing this one as there were no activity for a while

…lows (elastic#17784) Fix some overflows on Prometheus histogram rate calculations. They could be caused by: * New buckets added to existing histograms on runtime, this happens at least with CockroachDB (see elastic#17736). * Buckets with bigger upper limits have lower counters. This is wrong and has been only reproduced this on tests, but handling it just in case to avoid losing other data if this happens with some service. Rate calculation methods return now also a boolean to be able to differenciate if a zero value is caused because it was the first call, or because it the rate is actually zero. (cherry picked from commit 2d6e0ca)

jsoriano added 5 commits April 15, 2020 19:53

Use prometheus types in cockroachdb module

2180359

Add counts and value keys for histograms

ffd182b

Add common fields into the x-pack prometheus module

d581f47

Adapt dashboard

e0d0cc1

Fix zero-valued rates in histograms

780c3e5

jsoriano commented Apr 15, 2020

View reviewed changes

jsoriano self-assigned this Apr 15, 2020

jsoriano commented Apr 15, 2020

View reviewed changes

x-pack/metricbeat/module/prometheus/collector/histogram.go Outdated Show resolved Hide resolved

jsoriano added [zube]: In Progress enhancement in progress Pull request is currently in progress. module Team:Platforms Label for the Integrations - Platforms team labels Apr 15, 2020

PEP8

e977e84

This was referenced Apr 16, 2020

Fix prometheus histogram rate overflows #17753

Merged

Add fields validation for histogram subfields #17759

Merged

jsoriano mentioned this pull request Apr 17, 2020

Cherry-pick #17753 to 7.x: Fix prometheus histogram rate overflows #17783

Merged

6 tasks

jsoriano mentioned this pull request Apr 17, 2020

Cherry-pick #17753 to 7.7: Fix prometheus histogram rate overflows #17784

Merged

6 tasks

This was referenced Apr 17, 2020

Cherry-pick #17759 to 7.x: Add fields validation for histogram subfields #17785

Merged

Cherry-pick #17759 to 7.7: Add fields validation for histogram subfields #17786

Merged

jsoriano added 2 commits April 17, 2020 14:56

Merge remote-tracking branch 'origin/master' into cockroachdb-use-types

410095c

Drop duplicated histograms

55e9891

jsoriano force-pushed the cockroachdb-use-types branch from f34de4f to 55e9891 Compare April 17, 2020 14:32

Update dashboard to use new histograms

85d01bc

jsoriano mentioned this pull request Apr 17, 2020

Support pre-aggregated histogram type elastic/kibana#52426

Closed

3 tasks

ChrsMark reviewed Apr 21, 2020

View reviewed changes

masci mentioned this pull request Dec 28, 2020

Move cockroachdb module to GA elastic/integrations#480

Closed

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Sep 22, 2021

jlind23 closed this Mar 31, 2022

zube bot added [zube]: Done and removed [zube]: In Progress labels Mar 31, 2022

zube bot removed the [zube]: Done label Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Elasticsearch types in Cockroachdb module#17736

Use Elasticsearch types in Cockroachdb module#17736
jsoriano wants to merge 9 commits intoelastic:mainfrom
jsoriano:cockroachdb-use-types

jsoriano commented Apr 15, 2020 •

edited by zube bot

Loading

Uh oh!

jsoriano Apr 15, 2020 •

edited

Loading

Uh oh!

jsoriano Apr 16, 2020 •

edited

Loading

Uh oh!

exekias Apr 16, 2020

Uh oh!

jsoriano Apr 17, 2020

Uh oh!

Uh oh!

elasticmachine commented Apr 15, 2020

Uh oh!

jsoriano commented Apr 16, 2020

Uh oh!

ChrsMark Apr 21, 2020

Uh oh!

jsoriano Apr 21, 2020

Uh oh!

mergify bot commented Sep 22, 2021

Uh oh!

mergify bot commented Nov 10, 2021

Uh oh!

jlind23 commented Mar 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jsoriano commented Apr 15, 2020 • edited by zube bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Dashboard

Uh oh!

jsoriano Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsoriano Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

exekias Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

jsoriano Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticmachine commented Apr 15, 2020

Uh oh!

jsoriano commented Apr 16, 2020

Uh oh!

ChrsMark Apr 21, 2020

Choose a reason for hiding this comment

Uh oh!

jsoriano Apr 21, 2020

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 22, 2021

Uh oh!

mergify bot commented Nov 10, 2021

Uh oh!

jlind23 commented Mar 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jsoriano commented Apr 15, 2020 •

edited by zube bot

Loading

jsoriano Apr 15, 2020 •

edited

Loading

jsoriano Apr 16, 2020 •

edited

Loading