Proposal: Container and Engine telemetry by discordianfish · Pull Request #9130 · moby/moby

discordianfish · 2014-11-13T00:05:55Z

This is proposing to add per container metrics, based on cgroups, as well as docker engine metrics about Docker's internal health and job management.

This will either solve or at least provide a basis to address #8875, #1091, #4530, #5473, #8842

Why?

Tools like cAdvisor talk to the docker daemon as well as access the cgroup kernel features to read metrics from. Even though both use libcontainer, it's error prone and racy. Metrics are coupled to the container and since this container is managed by Docker, it should also provide the metrics for it.

Beside that, metrics are key for every production deployment of Docker. So this is a problem that everyone needs to solve. This is also reflected in the high number of issues asking for native docker metrics.

This also proposes white box telemetry. The idea is to expose various metrics about the Docker internals. For now, some failure/success counters and latencies for the internal job management as well as latencies for the API endpoints.

With the proposed change, it's easy to pull the metrics from Docker into any monitoring system. For smaller setups and on the fly debugging, there will be a metrics endpoint.

How?

The exposed metrics are pretty raw: We won't do rate conversions or advanced features like histograms and leave this to the monitoring tool. This has the limitation that docker metrics [container] will only show the current raw, values. If -f is used, it will convert the values into a rate where appropriate. To make it explicit whether a metric is a gauge or a counter, the type is included in the API.
The API is self-explanatory by including a description for each metric.

/cc @aluzzardi @crosbymichael @johncosta @kencochrane

Additions

expvar/ to get various stats (memstats, http handler) and maybe use it for implementation
container uptime and restart counters
correlate jobs to containers where possible; have a container dimension for the jobs

aluzzardi · 2014-11-13T00:24:49Z

Great. We talked a lot with @crosbymichael about something like this.
Your infra/prometheus background is really appreciated for metrics.

I'll have a deeper look once I get some spare cycles. Thanks!

SvenDowideit · 2014-11-13T01:22:23Z

docs/sources/reference/commandline/cli.md

when the code works, it would be good to add an example output.

oh, can you please also make a man/docker-metrics-1.md?

@SvenDowideit [x] done!
I kept it simple because I was too lazy to generate output manually. Once implemented, I can extend the man page. I also realize that we probably should add some filtering for the metrics so you get only what you are interested in. But that's something to IMO discuss later.

@discordianfish We can use go template, just like docker inspect. No need to implement the filter flags otherwise the code may become more and more complicated 😃

@tobegit3hub Good point. Let's do that. But as I said, I would like to get this first approved and implemented before having to decide on all those minor UI things. We can do that on the PR implementing this I suppose.

SvenDowideit · 2014-11-13T01:31:15Z

nice!

discordianfish · 2014-11-13T20:22:22Z

FYI: I've change the metrics format significantly. It uses now pretty much the same format as prometheus which makes it much more consistent at the price that it's a little bit harder to parse, given that it uses a labels notion.

First I proposed different structures for different kind of metrics and implicit encoding of dimensions into keys. But that's ugly and in the end harder to implement and to parse properly.

aluzzardi · 2014-11-13T22:17:46Z

I'll let @crosbymichael validate the format. I'm +1'ing the concept though.

thaJeztah · 2014-11-14T00:24:05Z

I think there's a similar PR in progress here; #8886 should both PRs be put next to each other and decided which one to proceed with?

discordianfish · 2014-11-14T00:54:15Z

@thaJeztah Thanks for pointed that out. I think we should first figure out the format/UI/API, then implement it. This PR so far doesn't include any implementation, so happy to use whatever matches the design.

thaJeztah · 2014-11-14T07:44:01Z

@discordianfish yes, I noticed this was in a "design" stage and the other PR started "spontaneously".

The main reason for pointing it out was to prevent people (the other PR?) investing a lot of time in something that might not get approved. But possibly they can join in after approval to help implement.

Nice proposal, btw, thanks for that!

proppy · 2014-11-15T01:51:52Z

/cc @vmarmol

I wonder if you considered http://golang.org/pkg/expvar/ as an (additional?) way to expose this?

discordianfish · 2014-11-16T10:25:25Z

@proppy Good idea! I would add those metrics as well and use it where possible.

juliusv · 2014-11-17T13:14:49Z

docs/sources/articles/runmetrics.md

Just a note: we ended up wrapping numbers in quotes in Prometheus since JSON doesn't support NaN, +Inf, or -Inf, and we needed to support those special values. Not sure if you'll ever need to support those kinds of values though.

I can't think of any situation where we would need NaN or Inf. Anyone else who can?

The kernel, today, typically uses int64max for those types of values.

huslage · 2014-11-20T21:06:14Z

I think we also need to include a couple of other stats:

how long it took to pull the image that this container is running from.
container uptime.

Both of these can be sort of derived, but it would be good to have them available in the metrics.

pnasrat · 2014-11-20T22:03:43Z

Yeah I'm +1 on expvar - we expose the endpoint in debug mode eg http://localhost:4243/debug/vars

discordianfish · 2014-11-21T11:24:48Z

@huslage Good point. I'll look into this, but can't say yet how it will look into details therefor I've added it to this PRs description (instead of docs)

This is proposing to add per container metrics, based on cgroups, as well as docker engine metrics about Docker's internal health and job management. This will either solve or at least provide a basis to address moby#8875, moby#1091, moby#4530, moby#5473, moby#8842 Signed-off-by: Johannes 'fish' Ziemke <github@freigeist.org>

discordianfish · 2015-01-05T12:01:22Z

@shykes @aluzzardi @crosbymichael Can we decide soon what's next with this proposal? Even though there isn't any code here yet, it already starts to get messy when rebasing..

crosbymichael · 2015-01-21T18:10:04Z

#9984 was just merged, I don't know if you want to refine this as this also includes the engine stats

discordianfish · 2015-01-22T11:30:25Z

@crosbymichael You mean I shall update the proposal based on the implementation? LOL.

SvenDowideit added Proposal labels Nov 13, 2014

SvenDowideit reviewed Nov 13, 2014
View reviewed changes

discordianfish force-pushed the propose-docker-metrics branch 4 times, most recently from 3323fd9 to f337e32 Compare November 13, 2014 20:06

discordianfish force-pushed the propose-docker-metrics branch from f337e32 to 32859a6 Compare November 13, 2014 20:23

juliusv reviewed Nov 17, 2014
View reviewed changes

discordianfish mentioned this pull request Jan 2, 2015

Get metrics from docker daemon #8842

Closed

discordianfish force-pushed the propose-docker-metrics branch from 32859a6 to b218adc Compare January 5, 2015 11:54

SvenDowideit mentioned this pull request Jan 13, 2015

docker stats live container resource metrics #9984

Merged

discordianfish closed this Jan 22, 2015

Conversation

discordianfish commented Nov 13, 2014

Why?

How?

Additions

Uh oh!

aluzzardi commented Nov 13, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SvenDowideit commented Nov 13, 2014

Uh oh!

discordianfish commented Nov 13, 2014

Uh oh!

aluzzardi commented Nov 13, 2014

Uh oh!

thaJeztah commented Nov 14, 2014

Uh oh!

discordianfish commented Nov 14, 2014

Uh oh!

thaJeztah commented Nov 14, 2014

Uh oh!

proppy commented Nov 15, 2014

Uh oh!

discordianfish commented Nov 16, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huslage commented Nov 20, 2014

Uh oh!

pnasrat commented Nov 20, 2014

Uh oh!

discordianfish commented Nov 21, 2014

Uh oh!

discordianfish commented Jan 5, 2015

Uh oh!

crosbymichael commented Jan 21, 2015

Uh oh!

discordianfish commented Jan 22, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants