[Stats] Add metrics collector and stats API#17773
Conversation
|
Since this will give metrics an accumulator that remembers previous values, the next step we can do is collect CPU Utilization with |
💚 Build Succeeded |
💚 Build Succeeded |
edb307f to
d873a04
Compare
💔 Build Failed |
There was a problem hiding this comment.
+1 on moving metrics out of status API with 7.0. I think there is value in having different API endpoints to expose different metrics and states like ES does.
There was a problem hiding this comment.
Does this fetch the cluster_uuid on each request from ES?
There was a problem hiding this comment.
it does. thoughts on a different endpoint if we need the cluster uuid? it looks like these are all independent of the cluster except for uuid, and it adds a network request to in memory data
There was a problem hiding this comment.
Yeah, hm. I like the idea of not making that request to ES if the user doesn't care about cluster_uuid in the first place.
How keep the same endpoint, and keep it as a GET, but add an optional query param? /api/stats?with_cluster_uuid or something?
There was a problem hiding this comment.
@ruflin does it make a difference on the Beats side whether fetching the cluster_uuid is on or off by default? I could make a separate endpoint completely, but it seems like over-engineering - it'd have to be named and documented reasonably.
There was a problem hiding this comment.
I could get behind something like an ?extended flag that would include elasticsearch data. Saved object counts, uuid, can't think of anything else.
There was a problem hiding this comment.
++ for extended. Thanks!
There was a problem hiding this comment.
@tsullivan We can make the field optional and then it's fetched when it's there and otherwise ignored. I like it to have in the same document as it means we don't have to fire an additional http request.
If we have an extended mode an additional endpoint could make more sense, but param for 1 endpoint also works.
@tsullivan For the ES cluster_id: Can KB be connected to more then one cluster in the future? And could we perhaps cache the ES cluster uuid locally and only update it if KB for example reconnets to ES and check if it's still the same (if there is such a thing like reconnect).
There was a problem hiding this comment.
Looks like a non async call which I like.
There was a problem hiding this comment.
how would you feel if the hapi specifics were separated with this a class solely for accumulating events? I ask because it'll let us swap out hapi events one by one as we move away from it.
it's pretty much there, maybe renaming hapiEvent to stats is enough. in the future we could allow for many collect calls that only accumulate the fields sent in.
There was a problem hiding this comment.
++ for changing to stats
There was a problem hiding this comment.
i think this is specific enough to make it part of the metrics collector class. thoughts?
There was a problem hiding this comment.
++ it was separate for testability, but it could be a static method of the class for the same idea
6b6dbf1 to
97fbf02
Compare
💚 Build Succeeded |
|
Depends on #17788 (cleanup changes broken out) |
💚 Build Succeeded |
💚 Build Succeeded |
💚 Build Succeeded |
💚 Build Succeeded |
💚 Build Succeeded |
|
@jbudz I've added an optional query param for Gets the cluster_uuid:Doesn't get the cluster_uuid: |
💚 Build Succeeded |
There was a problem hiding this comment.
nit, in theory this may end up with a slightly different number. thoughts on setting this to process.uptime_ms?
There was a problem hiding this comment.
yeah, good idea. It makes the TODO make more sense. 0abf0ff
845c336 to
634dfc8
Compare
|
rebased to master, fixed conflicts from #17788 |
| avg_in_millis: get(hapiEvent, ['responseTimes', port, 'avg']), | ||
| max_in_millis: get(hapiEvent, ['responseTimes', port, 'max']) | ||
| // TODO: rename to use `_ms` suffix per beats naming conventions | ||
| avg_in_millis: isNaN(avgInMillis) ? undefined : avgInMillis, // convert NaN to undefined |
There was a problem hiding this comment.
thoughts on getting this over with now? if it's a new api we don't have to worry about breaking changes yet
There was a problem hiding this comment.
This module is still used by the status API though, so a change in the field name here would break that.
| sinon.stub(metrics, 'captureEvent').returns({ 'a': [{ 'b': 2 }, { 'd': 4 }], process: { uptime_ms: 1980 } }); | ||
| sinon.stub(metrics, 'captureCGroupsIfAvailable').returns({ 'a': [{ 'c': 3 }, { 'e': 5 }] }); | ||
| sinon.stub(Date.prototype, 'toISOString').returns('2017-04-14T18:35:41.534Z'); | ||
| sinon.stub(process, 'uptime').returns(5000); |
There was a problem hiding this comment.
this was existing code, but it looks like stubbing uptime this way actually affects all the tests. That's why this had to be moved down to parses the hapi event
💔 Build Failed |
💔 Build Failed |
please test this |
|
"please test this" doesn't work, looks like jenkins test this |
💚 Build Succeeded |
|
Preliminary changes have been pulled out and merged, this is rebased on top, and it's all green now. Ready for last review |
pickypg
left a comment
There was a problem hiding this comment.
Functionality LGTM, but I think we can simplify the sumAccumulate method to make it a little more readable for anyone else that may have to read this code.
| static sumAccumulate(property, accum, next) { | ||
| const accumValue = accum[property]; | ||
| const nextValue = next[property]; | ||
| if (Object.prototype.toString.call(nextValue) === '[object Object]') { |
There was a problem hiding this comment.
I believe the ideal approach to looking for basic Objects is checking the constructor (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/constructor):
nextValue && nextValue.constructor === ObjectI wonder if this function should be structured differently:
if (nextValue === null || nextValue === undefined) {
// return ?
} else if (nextValue.constructor === Object) {
// recurse
} else if (nextValue.constructor === Number) {
// sum
} else {
// return raw value or drop it since we don't know what it is?
}This should avoid string manipulation for the comparison and it should read a little cleaner. Also, we're ignoring Arrays, but since the OS / process stats report system load that should be okay.
There was a problem hiding this comment.
this is a good idea, since we should not try to accumulate null
| }, | ||
| async handler(req, reply) { | ||
| const { extended } = req.query; | ||
| const isExtended = extended !== undefined && extended !== 'false'; |
| const { cluster_uuid: uuid } = await callWithRequest(req, 'info', { filterPath: 'cluster_uuid', }); | ||
| clusterUuid = uuid; | ||
| } catch (err) { | ||
| clusterUuid = undefined; // fallback from anonymous access or auth failure, redundant for explicitness |
There was a problem hiding this comment.
You don't really need to remove it, but it's implicitly undefined regardless of this line if we are catching an error.
There was a problem hiding this comment.
yeah, I'm leaving it to be redundant for explicitness, and also because it would fail lint to have an empty catch block
There was a problem hiding this comment.
I'm pretty sure that the linter will succeed given a comment, but this works too.
💚 Build Succeeded |
* [Stats] Add metrics collector and stats API * uptime_ms in the process namespace * make uptime_in_millis always equal process.uptime_ms * fix api integration test * fix api integration test better * fix false positive with last change * change object detection, add fallbacks to return undefined
|
6.x/6.4.0: #18677 |
* [Stats] Add metrics collector and stats API * uptime_ms in the process namespace * make uptime_in_millis always equal process.uptime_ms * fix api integration test * fix api integration test better * fix false positive with last change * change object detection, add fallbacks to return undefined
Closes #17567
Depends on #17788
This adds a
MetricsCollectormodule that's intended to be wrapped in the ops event handler. Currently the ops event handler is implemented with theeven-betterservice (a fork ofgoodfrom the HapiJS framework). The metrics collector class isn't a service and doesn't need start and stop methods.The module accumulates the metrics data it receives via the
collect()method. Importing the module doesn't access a singleton, so any function that needs access to the accumulated data (usinggetStats()) needs to have the collector passed to it.The metrics collector instance is passed down to a new route for metrics stats:
/api/stats, and this PR adds an API integration test for the new API route.The intention for the new stats API is to eventually replace the existing
/api/statusroute. That change can happen in the next major version as it would be a non-backwards compatible change. Therefore, theMetricsclass, which was in use for the existing/api/statusdata, has been moved to the metrics collector's module directory, and the paths that reference it have been updated in the existing code.Release notes: Added a new
/api/statsroute to expose metrics for Kibana Metricbeat.