Skip to content

[Windows] Add metric_type mapping for the fields of service datastream.#7200

Merged
ritalwar merged 10 commits intoelastic:mainfrom
ritalwar:windows_tsdb_metrictype_service_6993
Aug 28, 2023
Merged

[Windows] Add metric_type mapping for the fields of service datastream.#7200
ritalwar merged 10 commits intoelastic:mainfrom
ritalwar:windows_tsdb_metrictype_service_6993

Conversation

@ritalwar
Copy link
Copy Markdown
Contributor

@ritalwar ritalwar commented Aug 1, 2023

  • Enhancement

What does this PR do?

This PR adds metric type mapping for the fields of service datastream.

Checklist

Screenhots

Refer: #6993

@ritalwar ritalwar requested review from a team as code owners August 1, 2023 05:00
@ritalwar ritalwar mentioned this pull request Aug 1, 2023
6 tasks
@elasticmachine
Copy link
Copy Markdown

elasticmachine commented Aug 1, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-08-28T11:35:51.121+0000

  • Duration: 20 min 47 sec

Test stats 🧪

Test Results
Failed 0
Passed 150
Skipped 0
Total 150

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link
Copy Markdown

elasticmachine commented Aug 1, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (8/8) 💚
Files 91.667% (11/12) 👍
Classes 91.667% (11/12) 👍
Methods 85.156% (109/128) 👍
Lines 92.459% (5726/6193) 👍
Conditionals 100.0% (0/0) 💚

- name: uptime.ms
type: long
format: duration
metric_type: gauge
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should uptime be a gauge/counter is a common doubt. LGTM!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the uptime is not cumulative and continuously increases without any resets, it is more appropriate to represent it as a gauge.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the uptime is not cumulative and continuously increases without any resets

That sounds like a counter right? @felixbarny any thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does indeed sound like a counter if the following assumptions are true:

  • The value is monotonically incrementing over time
  • It resets when the service restarts

However, it's somewhat different from other counters in that it wouldn't make sense to visualize the rate of that counter as the rate will always be the elapsed time: In a 60s interval, the value will increase by 60, so the rate will just be a flat line. But it still seems like a counter.

Are we visualizing the uptime in any way? If so, how?

Copy link
Copy Markdown
Contributor

@agithomas agithomas Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look at what is used across various packages for assigning metric_type for uptime metrics. The distribution goes as below.

Gauge

  • Redis
  • GCP Redis
  • GCP Compute
  • HA Proxy
  • Memcached
  • Influxdb
  • Elasticsearch (JVM max uptime)
  • Mongodb
  • Couchbase

Counter

  • Apache
  • Elastic Package Registry
  • System (uptime datastream)
  • AWS (RDS)

So, we may have a lack of consistency here. But, as uptime datastream of system package already considers metric_type as counter, in the absence of a clear source of truth, uptime.ms of this (service) datastream can be assigned counter type. This ensures there exists consistency within the same package.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the definition of monotonically increasing?

Yes, I just wanted to rule out any confusion associated with continuous or monotonically increasing increments, making it a "counter."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It resets as it restarts, right?

Yes.

Copy link
Copy Markdown

@salvatore-campagna salvatore-campagna Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really matter if we define it as a counter or a gauge in this specific scenario? Excluding the monotonicity property of a counter I think when it comes to deciding if a metric is a counter or a gauge the question we need to ask is, for instance: does it make sense to calculate a sum (or average or...) aggregate over that metric? Or do we need to first calculate a rate and then aggregate? Also imagine to use the uptime in a computation...for instance you divide a quantity by the uptime to get some kind of rate (over time, for instance average number of bytes processed by a host in a certain (up)time window). In that case you would need to use the difference between two values of the uptime (at t1 and t2)...dividing just by t1 or t2 does not make sense, right? For a gauge you would need to sum all values between t1 and t2, to account for possible negative values... Which means that a measure as a point in time value does not make sense. So, in my opinion uptime is a counter.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 for counter.

There are two aspects to a counter:

  1. Monotonically increasing between resets - true for uptime
  2. Discrete - theoretically that's true for classical counter use cases (e.g. number of requests for a webpage) and false for uptime. In practice everything is discrete in our current computing systems so it doesn't matter (we always count seconds or ms or some unit of time). The problem is that it is not intuitive to users because they think of time as continuous and not discrete. On the plus side they will get the right visualization and behavior, because practically this data is exactly like counter data. I think the upside out-weights the downside in this case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the offline discussions, we decided to display "uptime" metrics as gauge. This choice comes from discussing the best metric type. We realized that using a counter for uptime could make calculating changes over time difficult. By considering the idea of "temporality," which is about reporting metrics as cumulative or delta values, we agreed that uptime, being a distinct metric, should be shown as a gauge. This way, it fits well with the immediate and non-negative nature of uptime values.

agithomas
agithomas previously approved these changes Aug 1, 2023
Copy link
Copy Markdown
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@agithomas agithomas dismissed their stale review August 1, 2023 10:20

Dismissing as the PR link is not correct.

Copy link
Copy Markdown
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ritalwar ritalwar merged commit fecc5c6 into elastic:main Aug 28, 2023
@elasticmachine
Copy link
Copy Markdown

Package windows - 1.34.1 containing this change is available at https://epr.elastic.co/search?package=windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants