[Windows] Add metric_type mapping for the fields of service datastream.#7200
Conversation
🌐 Coverage report
|
| - name: uptime.ms | ||
| type: long | ||
| format: duration | ||
| metric_type: gauge |
There was a problem hiding this comment.
Should uptime be a gauge/counter is a common doubt. LGTM!
There was a problem hiding this comment.
Since the uptime is not cumulative and continuously increases without any resets, it is more appropriate to represent it as a gauge.
There was a problem hiding this comment.
Since the uptime is not cumulative and continuously increases without any resets
That sounds like a counter right? @felixbarny any thoughts?
There was a problem hiding this comment.
It does indeed sound like a counter if the following assumptions are true:
- The value is monotonically incrementing over time
- It resets when the service restarts
However, it's somewhat different from other counters in that it wouldn't make sense to visualize the rate of that counter as the rate will always be the elapsed time: In a 60s interval, the value will increase by 60, so the rate will just be a flat line. But it still seems like a counter.
Are we visualizing the uptime in any way? If so, how?
There was a problem hiding this comment.
I had a look at what is used across various packages for assigning metric_type for uptime metrics. The distribution goes as below.
Gauge
- Redis
- GCP Redis
- GCP Compute
- HA Proxy
- Memcached
- Influxdb
- Elasticsearch (JVM max uptime)
- Mongodb
- Couchbase
Counter
- Apache
- Elastic Package Registry
- System (uptime datastream)
- AWS (RDS)
So, we may have a lack of consistency here. But, as uptime datastream of system package already considers metric_type as counter, in the absence of a clear source of truth, uptime.ms of this (service) datastream can be assigned counter type. This ensures there exists consistency within the same package.
There was a problem hiding this comment.
Isn't that the definition of monotonically increasing?
Yes, I just wanted to rule out any confusion associated with continuous or monotonically increasing increments, making it a "counter."
There was a problem hiding this comment.
It resets as it restarts, right?
Yes.
There was a problem hiding this comment.
Does it really matter if we define it as a counter or a gauge in this specific scenario? Excluding the monotonicity property of a counter I think when it comes to deciding if a metric is a counter or a gauge the question we need to ask is, for instance: does it make sense to calculate a sum (or average or...) aggregate over that metric? Or do we need to first calculate a rate and then aggregate? Also imagine to use the uptime in a computation...for instance you divide a quantity by the uptime to get some kind of rate (over time, for instance average number of bytes processed by a host in a certain (up)time window). In that case you would need to use the difference between two values of the uptime (at t1 and t2)...dividing just by t1 or t2 does not make sense, right? For a gauge you would need to sum all values between t1 and t2, to account for possible negative values... Which means that a measure as a point in time value does not make sense. So, in my opinion uptime is a counter.
There was a problem hiding this comment.
I'm +1 for counter.
There are two aspects to a counter:
- Monotonically increasing between resets - true for uptime
- Discrete - theoretically that's true for classical counter use cases (e.g. number of requests for a webpage) and false for uptime. In practice everything is discrete in our current computing systems so it doesn't matter (we always count seconds or ms or some unit of time). The problem is that it is not intuitive to users because they think of time as continuous and not discrete. On the plus side they will get the right visualization and behavior, because practically this data is exactly like counter data. I think the upside out-weights the downside in this case.
There was a problem hiding this comment.
Following the offline discussions, we decided to display "uptime" metrics as gauge. This choice comes from discussing the best metric type. We realized that using a counter for uptime could make calculating changes over time difficult. By considering the idea of "temporality," which is about reporting metrics as cumulative or delta values, we agreed that uptime, being a distinct metric, should be shown as a gauge. This way, it fits well with the immediate and non-negative nature of uptime values.
Dismissing as the PR link is not correct.
|
Package windows - 1.34.1 containing this change is available at https://epr.elastic.co/search?package=windows |
What does this PR do?
This PR adds metric type mapping for the fields of service datastream.
Checklist
I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's
changelog.ymlfile.I have verified that Kibana version constraints are current according to guidelines.
Relates Windows TSDB Enablement #6993
Screenhots
Refer: #6993