Skip to content

[Stats] Add metrics collector and stats API#17773

Merged
tsullivan merged 8 commits intoelastic:masterfrom
tsullivan:stats/add-metrics-collector-and-route
May 1, 2018
Merged

[Stats] Add metrics collector and stats API#17773
tsullivan merged 8 commits intoelastic:masterfrom
tsullivan:stats/add-metrics-collector-and-route

Conversation

@tsullivan
Copy link
Copy Markdown
Member

@tsullivan tsullivan commented Apr 18, 2018

Closes #17567
Depends on #17788

This adds a MetricsCollector module that's intended to be wrapped in the ops event handler. Currently the ops event handler is implemented with the even-better service (a fork of good from the HapiJS framework). The metrics collector class isn't a service and doesn't need start and stop methods.

The module accumulates the metrics data it receives via the collect() method. Importing the module doesn't access a singleton, so any function that needs access to the accumulated data (using getStats()) needs to have the collector passed to it.

The metrics collector instance is passed down to a new route for metrics stats: /api/stats, and this PR adds an API integration test for the new API route.

The intention for the new stats API is to eventually replace the existing /api/status route. That change can happen in the next major version as it would be a non-backwards compatible change. Therefore, the Metrics class, which was in use for the existing /api/status data, has been moved to the metrics collector's module directory, and the paths that reference it have been updated in the existing code.

Release notes: Added a new /api/stats route to expose metrics for Kibana Metricbeat.

@tsullivan
Copy link
Copy Markdown
Member Author

tsullivan commented Apr 18, 2018

Since this will give metrics an accumulator that remembers previous values, the next step we can do is collect CPU Utilization with process.cpuUsage: https://nodejs.org/dist/latest-v8.x/docs/api/process.html#process_process_cpuusage_previousvalue

@tsullivan tsullivan mentioned this pull request Apr 18, 2018
@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@tsullivan tsullivan force-pushed the stats/add-metrics-collector-and-route branch from edb307f to d873a04 Compare April 18, 2018 20:51
@elasticmachine
Copy link
Copy Markdown
Contributor

💔 Build Failed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on moving metrics out of status API with 7.0. I think there is value in having different API endpoints to expose different metrics and states like ES does.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fetch the cluster_uuid on each request from ES?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does. thoughts on a different endpoint if we need the cluster uuid? it looks like these are all independent of the cluster except for uuid, and it adds a network request to in memory data

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, hm. I like the idea of not making that request to ES if the user doesn't care about cluster_uuid in the first place.

How keep the same endpoint, and keep it as a GET, but add an optional query param? /api/stats?with_cluster_uuid or something?

Copy link
Copy Markdown
Member Author

@tsullivan tsullivan Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin does it make a difference on the Beats side whether fetching the cluster_uuid is on or off by default? I could make a separate endpoint completely, but it seems like over-engineering - it'd have to be named and documented reasonably.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could get behind something like an ?extended flag that would include elasticsearch data. Saved object counts, uuid, can't think of anything else.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ for extended. Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsullivan We can make the field optional and then it's fetched when it's there and otherwise ignored. I like it to have in the same document as it means we don't have to fire an additional http request.

If we have an extended mode an additional endpoint could make more sense, but param for 1 endpoint also works.

@tsullivan For the ES cluster_id: Can KB be connected to more then one cluster in the future? And could we perhaps cache the ES cluster uuid locally and only update it if KB for example reconnets to ES and check if it's still the same (if there is such a thing like reconnect).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a non async call which I like.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would you feel if the hapi specifics were separated with this a class solely for accumulating events? I ask because it'll let us swap out hapi events one by one as we move away from it.

it's pretty much there, maybe renaming hapiEvent to stats is enough. in the future we could allow for many collect calls that only accumulate the fields sent in.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ for changing to stats

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is specific enough to make it part of the metrics collector class. thoughts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ it was separate for testability, but it could be a static method of the class for the same idea

@tsullivan tsullivan force-pushed the stats/add-metrics-collector-and-route branch 2 times, most recently from 6b6dbf1 to 97fbf02 Compare April 19, 2018 18:01
@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@tsullivan
Copy link
Copy Markdown
Member Author

Depends on #17788 (cleanup changes broken out)

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@tsullivan
Copy link
Copy Markdown
Member Author

@jbudz I've added an optional query param for extended. The cluster_uuid isn't fetched by default.

Gets the cluster_uuid:

curl --silent 'https://elastic:changeme@spicy.local:5601/api/stats?extended' -H 'kbn-version: 7.0.0-alpha1'
curl --silent 'https://elastic:changeme@spicy.local:5601/api/stats?extended=true' -H 'kbn-version: 7.0.0-alpha1'

Doesn't get the cluster_uuid:

curl --silent 'https://elastic:changeme@spicy.local:5601/api/stats' -H 'kbn-version: 7.0.0-alpha1'
curl --silent 'https://elastic:changeme@spicy.local:5601/api/stats?extended=false' -H 'kbn-version: 7.0.0-alpha1'

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, in theory this may end up with a slightly different number. thoughts on setting this to process.uptime_ms?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good idea. It makes the TODO make more sense. 0abf0ff

@tsullivan tsullivan force-pushed the stats/add-metrics-collector-and-route branch from 845c336 to 634dfc8 Compare April 25, 2018 17:48
@tsullivan
Copy link
Copy Markdown
Member Author

rebased to master, fixed conflicts from #17788

avg_in_millis: get(hapiEvent, ['responseTimes', port, 'avg']),
max_in_millis: get(hapiEvent, ['responseTimes', port, 'max'])
// TODO: rename to use `_ms` suffix per beats naming conventions
avg_in_millis: isNaN(avgInMillis) ? undefined : avgInMillis, // convert NaN to undefined
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on getting this over with now? if it's a new api we don't have to worry about breaking changes yet

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is still used by the status API though, so a change in the field name here would break that.

sinon.stub(metrics, 'captureEvent').returns({ 'a': [{ 'b': 2 }, { 'd': 4 }], process: { uptime_ms: 1980 } });
sinon.stub(metrics, 'captureCGroupsIfAvailable').returns({ 'a': [{ 'c': 3 }, { 'e': 5 }] });
sinon.stub(Date.prototype, 'toISOString').returns('2017-04-14T18:35:41.534Z');
sinon.stub(process, 'uptime').returns(5000);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was existing code, but it looks like stubbing uptime this way actually affects all the tests. That's why this had to be moved down to parses the hapi event

@elasticmachine
Copy link
Copy Markdown
Contributor

💔 Build Failed

@elasticmachine
Copy link
Copy Markdown
Contributor

💔 Build Failed

@tsullivan
Copy link
Copy Markdown
Member Author

22:26:52        ?- ? fail: "discover app discover app query load query should show query name"
22:26:52        ?        tryForTime timeout: [POST http://localhost:9515/session/b28072116d3c83dc16daf1b559fff01b/element / {"using":"partial link text","value":"Query # 1"}] no such element: Unable to locate element: {"method":"partial link text","selector":"Query # 1"}

please test this

@tsullivan
Copy link
Copy Markdown
Member Author

"please test this" doesn't work, looks like

jenkins test this

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

@tsullivan
Copy link
Copy Markdown
Member Author

Preliminary changes have been pulled out and merged, this is rebased on top, and it's all green now. Ready for last review

Copy link
Copy Markdown
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality LGTM, but I think we can simplify the sumAccumulate method to make it a little more readable for anyone else that may have to read this code.

static sumAccumulate(property, accum, next) {
const accumValue = accum[property];
const nextValue = next[property];
if (Object.prototype.toString.call(nextValue) === '[object Object]') {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the ideal approach to looking for basic Objects is checking the constructor (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/constructor):

nextValue && nextValue.constructor === Object

I wonder if this function should be structured differently:

if (nextValue === null || nextValue === undefined) {
  // return ?
} else if (nextValue.constructor === Object) {
  // recurse
} else if (nextValue.constructor === Number) {
  // sum
} else {
  // return raw value or drop it since we don't know what it is?
}

This should avoid string manipulation for the comparison and it should read a little cleaner. Also, we're ignoring Arrays, but since the OS / process stats report system load that should be okay.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good idea, since we should not try to accumulate null

},
async handler(req, reply) {
const { extended } = req.query;
const isExtended = extended !== undefined && extended !== 'false';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like ES APIs! I like it.

const { cluster_uuid: uuid } = await callWithRequest(req, 'info', { filterPath: 'cluster_uuid', });
clusterUuid = uuid;
} catch (err) {
clusterUuid = undefined; // fallback from anonymous access or auth failure, redundant for explicitness
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't really need to remove it, but it's implicitly undefined regardless of this line if we are catching an error.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm leaving it to be redundant for explicitness, and also because it would fail lint to have an empty catch block

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that the linter will succeed given a comment, but this works too.

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Copy link
Copy Markdown
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tsullivan tsullivan merged commit c10dc75 into elastic:master May 1, 2018
@tsullivan tsullivan deleted the stats/add-metrics-collector-and-route branch May 1, 2018 03:46
tsullivan added a commit to tsullivan/kibana that referenced this pull request May 1, 2018
* [Stats] Add metrics collector and stats API

* uptime_ms in the process namespace

* make uptime_in_millis always equal process.uptime_ms

* fix api integration test

* fix api integration test better

* fix false positive with last change

* change object detection, add fallbacks to return undefined
@tsullivan
Copy link
Copy Markdown
Member Author

6.x/6.4.0: #18677

tsullivan added a commit that referenced this pull request May 1, 2018
* [Stats] Add metrics collector and stats API

* uptime_ms in the process namespace

* make uptime_in_millis always equal process.uptime_ms

* fix api integration test

* fix api integration test better

* fix false positive with last change

* change object detection, add fallbacks to return undefined
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants