Add prometheus metrics output to docker by crosbymichael · Pull Request #25820 · moby/moby

crosbymichael · 2016-08-17T22:50:03Z

This adds metrics support to docker via the prometheus output. There is a new /metrics endpoint where the prometheus handler is installed.

This adds metrics to the daemon package for basic container, image, and daemon operations.

The client package being used is located here:

https://github.com/docker/go-metrics

This pr is the beginning, any more packages need to be instrumented but help/suggestions from the individual subsystem maintainers is needed to collection relevant and useful information.

jhorwit2 · 2016-08-18T03:03:06Z

daemon/metrics.go

Thoughts on subscribing to the health events for this container and keeping counters? This way you can track/alert containers that are flapping.

Sounds like a good idea. I was trying to keep the scope small at this point to make sure we have the basics monitored before adding more metrics. This is just the first metrics PR, we will expand the scope after to other packages/etc.

jhorwit2 · 2016-08-18T03:31:18Z

Thoughts on adding expvar collector if daemon is in debug mode?

❤️ this PR! 😃

stevvooe · 2016-08-19T22:32:05Z

vendor/src/github.com/docker/go-metrics/timer.go

StartTimer costs a closure allocation. This interface is the most performant, but simple stuff can use the sugar.

crosbymichael · 2016-08-25T16:55:10Z

@xbglowx you might be interested in this PR

cpuguy83 · 2016-08-26T18:33:10Z

So cool design LGTM

discordianfish · 2016-08-26T19:59:20Z

@crosbymichael Ha, nice! You might want to look into #9130 for what naming I proposed. That follows the prometheus best practices better. In doubt, see: https://prometheus.io/docs/practices/naming/
Would be really great if those metrics followed the best pratices.

@fabxc You might be interested in this as well.

stevvooe · 2016-08-27T00:38:52Z

Would be really great if those metrics followed the best pratices.

@discordianfish We've read through this in detail, a few times now, and we think we are following them. It would be more constructive if you could identify the areas where we don't follow best practices or suggest how making them different could help out.

jhorwit2 · 2016-08-28T19:13:09Z

daemon/metrics.go

Not sure if you are aware, but this isn't called anywhere.

ya, i removed it for now until we have multiple registries

discordianfish · 2016-09-01T08:38:55Z

@stevvooe I referred to my PR for specific naming suggestions. Beside that by quickly skimming through:

Standardize on seconds everywhere
Use self-explanatory metric names (not network_tx but network_tx_packets for example, you could also try to use similar/same names as in node_exporter or container_exporter)

Since we're migrating away from Docker, I can't really put much more time in this though. Still, if you have specific questions I'm happy to answer.

icecrime · 2016-09-09T21:07:32Z

Ping @LK4D4: PTAL!

stevvooe · 2016-09-13T02:17:44Z

daemon/metrics.go

@crosbymichael The recommendation from @discordianfish was to standardize on using seconds. Should we update go-metrics to follow that standard?

for this, everyone expects cputime in nanoseconds, that is that is returned from all stat and cgroup calls return. i don't know if it makes sense to do it here or not. what do you think?

I'm not a huge fan of the units type. I don't think people will take time to make a considered choice in practice, leaving us with inconsistent counters throughout the project.

However, I do see your point.

In this case, I am really not sure, since this is actually a gauge (may need to be a counter). We don't really get a guaranteed conversion here, since it will still be incremented as a float.

yah i think this should be a counter.

LK4D4 · 2016-09-26T16:14:25Z

@crosbymichael I wonder how heavy all registrations in init. Reexec is still used in overlay2 storage driver, chrootarchive and in multiple places in libnetwork.

LK4D4 · 2016-09-27T16:21:36Z

Ok, seems like it quite heavy, but that's supposed way to use prometheus client.
@crosbymichael need rebase.

crosbymichael · 2016-09-29T18:39:10Z

Rebased and placed this feature behind the experimental flag.

@LK4D4 for the initialization, these counters are not heavy at all and have no out of process calls or allocations. They are just initializing types and they are small

LK4D4 · 2016-10-27T17:18:54Z

cmd/dockerd/metrics.go

d is unused

LK4D4 · 2016-10-27T17:24:50Z

cmd/dockerd/daemon.go

let's split it a little and return an error if MetricsAddress is not empty on non-experimental.

This adds a metrics packages that creates additional metrics. Add the metrics endpoint to the docker api server under `/metrics`. Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Add metrics to daemon package Signed-off-by: Michael Crosby <crosbymichael@gmail.com> api: use standard way for metrics route Also add "type" query parameter Signed-off-by: Alexander Morozov <lk4d4@docker.com> Convert timers to ms Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

LK4D4 · 2016-10-27T20:07:34Z

LGTM

LK4D4 · 2016-10-27T20:07:47Z

ping @thaJeztah

thaJeztah · 2016-10-27T23:44:16Z

Since this is a new API, we need a new section in the docs for this. Let's do so in a follow up before 1.13 is released

/cc @mstanleyjones

thaJeztah · 2016-10-27T23:47:30Z

opened #27843 for tracking docs

outcoldman · 2016-10-29T12:50:54Z

Am I right that in current implementation only prometheus can be used for collecting metrics? Why not to have extensibility as in log drivers? Current implementation is really tight to the prometheus client library without any abstractions.

brian-brazil · 2016-10-30T14:05:26Z

Am I right that in current implementation only prometheus can be used for collecting metrics?

For clarity I define "collection" to mean instrumentation, and "ingestion" as the process of getting the collected metric data into a monitoring system.

In that context, the answer is yes as I understand things on the Docker side.

Why not to have extensibility as in log drivers? Current implementation is really tight to the prometheus client library without any abstractions.

The challenge is that metric instrumentation is nowhere near as standardised as logs. Indeed Prometheus client libraries can be viewed as one standardisation option to allow for extensiblity, as they are designed to be an open ecosystem.

You could add an abstraction at the instrumentation level as I believe you are suggesting. You'd likely end up with a lowest-common denominator library, which would lose you the benefits of things like labels, floating point numbers, take a performance hit and not be idiomatic for any monitoring system. In short you'd gain abstraction, at the cost of good instrumentation. I personally do not believe that is a good tradeoff, and have never seen it work out well when attempted.

With Prometheus client libraries there's abstraction at the ingestion level. The Prometheus approach is to instrument full-on with Prometheus client libraries, and then you use a small shim (called a "bridge") to output to other systems like Graphite (currently in code review to be included out of the box for the Go library, already exists for Java/Python), New Relic, InfluxDB, CloudWatch etc. There's no need for a user to run a Prometheus server in such a scenario, see https://www.robustperception.io/exporting-to-graphite-with-the-prometheus-python-client/ for example. There are other ways you can plumb that too.

Prometheus client libraries put all the smarts in the server rather than the instrumentation and have a data model that's on the powerful end of the scale, so it's almost always possible to automagically convert to something sensible in other monitoring/instrumentation systems. The reverse is unfortunately rarely true.

peterbourgon · 2016-11-01T02:16:52Z

You could add an abstraction at the instrumentation level . . .

Self-plug: one candidate is go-kit/kit/metrics, which provides exactly this abstraction and suffers none of the drawbacks you mention. I'm not necessarily advocating for it in this circumstance—it seems exceedingly unlikely that Docker should or would use a different metrics backend than Prometheus—but at least the option is there.

GordonTheTurtle added the status/0-triage label Aug 17, 2016

tiborvass added status/1-design-review and removed status/0-triage labels Aug 17, 2016

jhorwit2 reviewed Aug 18, 2016
View reviewed changes

jhorwit2 mentioned this pull request Aug 18, 2016

Proposal: Service/Task Stats moby/swarmkit#1284

Open

stevvooe reviewed Aug 19, 2016
View reviewed changes

jhorwit2 reviewed Aug 28, 2016
View reviewed changes

crosbymichael added this to the 1.13.0 milestone Aug 31, 2016

crosbymichael assigned LK4D4 Aug 31, 2016

vdemeester mentioned this pull request Sep 4, 2016

Add counts of container operations #25032

Closed

icecrime mentioned this pull request Sep 10, 2016

Request: display stats in a historical format #10770

Closed

stevvooe reviewed Sep 13, 2016
View reviewed changes

cpuguy83 mentioned this pull request Sep 19, 2016

Expose information about running containers via "AgentX" #17302

Closed

icecrime added status/2-code-review and removed status/1-design-review labels Sep 19, 2016

crosbymichael force-pushed the prom branch from fd2db86 to 0bec4bc Compare September 29, 2016 18:37

LK4D4 suggested changes Oct 27, 2016

View reviewed changes

crosbymichael force-pushed the prom branch from 8490d88 to 3343d23 Compare October 27, 2016 17:34

LK4D4 approved these changes Oct 27, 2016

View reviewed changes

LK4D4 removed the status/failing-ci Indicates that the PR in its current state fails the test suite label Oct 27, 2016

LK4D4 added status/3-docs-review and removed status/2-code-review labels Oct 27, 2016

thaJeztah added impact/api impact/changelog docs/revisit impact/documentation labels Oct 27, 2016

thaJeztah merged commit 33474a1 into moby:master Oct 27, 2016

thaJeztah mentioned this pull request Oct 27, 2016

[1.13] add new documentation reference section for metrics API #27843

Closed

crosbymichael deleted the prom branch October 28, 2016 16:57

thaJeztah added kind/experimental and removed docs/revisit labels Nov 11, 2016

thaJeztah mentioned this pull request Dec 20, 2016

Unable to use experimental daemon metrics API docker/for-mac#1058

Closed

justincormack mentioned this pull request Jan 21, 2017

Add docs for Prometheus metrics docker/docs#1307

Closed

FrenchBen mentioned this pull request Jan 25, 2017

Request: Docker Metrics Port for Prometheus prometheus/prometheus#2366

Closed

aluzzardi mentioned this pull request Feb 17, 2017

Prometheus Metrics API exposure #31107

Open

thaJeztah changed the title ~~Add metrics output to docker~~ Add prometheus metrics output to docker Nov 4, 2023

thaJeztah added area/metrics area/metrics/prometheus labels Nov 4, 2023

Conversation

crosbymichael commented Aug 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhorwit2 commented Aug 18, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crosbymichael commented Aug 25, 2016

Uh oh!

cpuguy83 commented Aug 26, 2016

Uh oh!

discordianfish commented Aug 26, 2016

Uh oh!

stevvooe commented Aug 27, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

discordianfish commented Sep 1, 2016

Uh oh!

icecrime commented Sep 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LK4D4 commented Sep 26, 2016

Uh oh!

LK4D4 commented Sep 27, 2016

Uh oh!

crosbymichael commented Sep 29, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LK4D4 commented Oct 27, 2016

Uh oh!

LK4D4 commented Oct 27, 2016

Uh oh!

thaJeztah commented Oct 27, 2016

Uh oh!

thaJeztah commented Oct 27, 2016

Uh oh!

outcoldman commented Oct 29, 2016

Uh oh!

brian-brazil commented Oct 30, 2016

Uh oh!

peterbourgon commented Nov 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants