Add prometheus metrics output to docker#25820
Conversation
daemon/metrics.go
Outdated
There was a problem hiding this comment.
Thoughts on subscribing to the health events for this container and keeping counters? This way you can track/alert containers that are flapping.
There was a problem hiding this comment.
Sounds like a good idea. I was trying to keep the scope small at this point to make sure we have the basics monitored before adding more metrics. This is just the first metrics PR, we will expand the scope after to other packages/etc.
|
Thoughts on adding expvar collector if daemon is in debug mode? ❤️ this PR! 😃 |
There was a problem hiding this comment.
StartTimer costs a closure allocation. This interface is the most performant, but simple stuff can use the sugar.
|
@xbglowx you might be interested in this PR |
|
So cool design LGTM |
|
@crosbymichael Ha, nice! You might want to look into #9130 for what naming I proposed. That follows the prometheus best practices better. In doubt, see: https://prometheus.io/docs/practices/naming/ @fabxc You might be interested in this as well. |
@discordianfish We've read through this in detail, a few times now, and we think we are following them. It would be more constructive if you could identify the areas where we don't follow best practices or suggest how making them different could help out. |
daemon/metrics.go
Outdated
There was a problem hiding this comment.
Not sure if you are aware, but this isn't called anywhere.
There was a problem hiding this comment.
ya, i removed it for now until we have multiple registries
|
@stevvooe I referred to my PR for specific naming suggestions. Beside that by quickly skimming through:
Since we're migrating away from Docker, I can't really put much more time in this though. Still, if you have specific questions I'm happy to answer. |
|
Ping @LK4D4: PTAL! |
daemon/metrics.go
Outdated
There was a problem hiding this comment.
@crosbymichael The recommendation from @discordianfish was to standardize on using seconds. Should we update go-metrics to follow that standard?
There was a problem hiding this comment.
for this, everyone expects cputime in nanoseconds, that is that is returned from all stat and cgroup calls return. i don't know if it makes sense to do it here or not. what do you think?
There was a problem hiding this comment.
I'm not a huge fan of the units type. I don't think people will take time to make a considered choice in practice, leaving us with inconsistent counters throughout the project.
However, I do see your point.
In this case, I am really not sure, since this is actually a gauge (may need to be a counter). We don't really get a guaranteed conversion here, since it will still be incremented as a float.
There was a problem hiding this comment.
yah i think this should be a counter.
|
@crosbymichael I wonder how heavy all registrations in |
|
Ok, seems like it quite heavy, but that's supposed way to use prometheus client. |
|
Rebased and placed this feature behind the experimental flag. @LK4D4 for the initialization, these counters are not heavy at all and have no out of process calls or allocations. They are just initializing types and they are small |
cmd/dockerd/metrics.go
Outdated
cmd/dockerd/daemon.go
Outdated
There was a problem hiding this comment.
let's split it a little and return an error if MetricsAddress is not empty on non-experimental.
This adds a metrics packages that creates additional metrics. Add the metrics endpoint to the docker api server under `/metrics`. Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Add metrics to daemon package Signed-off-by: Michael Crosby <crosbymichael@gmail.com> api: use standard way for metrics route Also add "type" query parameter Signed-off-by: Alexander Morozov <lk4d4@docker.com> Convert timers to ms Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
|
LGTM |
|
ping @thaJeztah |
|
Since this is a new API, we need a new section in the docs for this. Let's do so in a follow up before 1.13 is released /cc @mstanleyjones |
|
opened #27843 for tracking docs |
|
Am I right that in current implementation only prometheus can be used for collecting metrics? Why not to have extensibility as in log drivers? Current implementation is really tight to the prometheus client library without any abstractions. |
For clarity I define "collection" to mean instrumentation, and "ingestion" as the process of getting the collected metric data into a monitoring system. In that context, the answer is yes as I understand things on the Docker side.
The challenge is that metric instrumentation is nowhere near as standardised as logs. Indeed Prometheus client libraries can be viewed as one standardisation option to allow for extensiblity, as they are designed to be an open ecosystem. You could add an abstraction at the instrumentation level as I believe you are suggesting. You'd likely end up with a lowest-common denominator library, which would lose you the benefits of things like labels, floating point numbers, take a performance hit and not be idiomatic for any monitoring system. In short you'd gain abstraction, at the cost of good instrumentation. I personally do not believe that is a good tradeoff, and have never seen it work out well when attempted. With Prometheus client libraries there's abstraction at the ingestion level. The Prometheus approach is to instrument full-on with Prometheus client libraries, and then you use a small shim (called a "bridge") to output to other systems like Graphite (currently in code review to be included out of the box for the Go library, already exists for Java/Python), New Relic, InfluxDB, CloudWatch etc. There's no need for a user to run a Prometheus server in such a scenario, see https://www.robustperception.io/exporting-to-graphite-with-the-prometheus-python-client/ for example. There are other ways you can plumb that too. Prometheus client libraries put all the smarts in the server rather than the instrumentation and have a data model that's on the powerful end of the scale, so it's almost always possible to automagically convert to something sensible in other monitoring/instrumentation systems. The reverse is unfortunately rarely true. |
Self-plug: one candidate is go-kit/kit/metrics, which provides exactly this abstraction and suffers none of the drawbacks you mention. I'm not necessarily advocating for it in this circumstance—it seems exceedingly unlikely that Docker should or would use a different metrics backend than Prometheus—but at least the option is there. |
This adds metrics support to docker via the prometheus output. There is a new
/metricsendpoint where the prometheus handler is installed.This adds metrics to the daemon package for basic container, image, and daemon operations.
The client package being used is located here:
https://github.com/docker/go-metrics
This pr is the beginning, any more packages need to be instrumented but help/suggestions from the individual subsystem maintainers is needed to collection relevant and useful information.