-
Notifications
You must be signed in to change notification settings - Fork 841
Description
I have an application that emits some histogram metrics using django-prometheus. We recently migrated from daphne to gunicorn, and so I started looking at enabling multi-process support in prometheus-client. We have unit tests that run the output of our metrics endpoint through text_string_to_metric_families in order to ensure that it's well-formed, and so I noticed that it isn't well-formed when using the multi-process collector with OpenMetrics exposition. Part of the output looks like this:
# HELP django_http_requests_latency_seconds_by_view_method Histogram of request processing time labelled by view.
# TYPE django_http_requests_latency_seconds_by_view_method histogram
django_http_requests_latency_seconds_by_view_method_sum{method="GET",view="homepage:homepage"} 0.017995235999933357
django_http_requests_latency_seconds_by_view_method_sum{method="GET",view="api:open-metrics"} 0.03084654700023748
django_http_requests_latency_seconds_by_view_method_bucket{le="0.01",method="GET",view="homepage:homepage"} 0.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.025",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.05",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.075",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.1",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.25",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.5",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.75",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="1.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="2.5",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="5.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="7.5",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="10.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="25.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="50.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="75.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="+Inf",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_count{method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.01",method="GET",view="api:open-metrics"} 0.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.025",method="GET",view="api:open-metrics"} 0.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.05",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.075",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.1",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.25",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.5",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.75",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="1.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="2.5",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="5.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="7.5",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="10.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="25.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="50.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="75.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="+Inf",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_count{method="GET",view="api:open-metrics"} 1.0
You can see that the metrics distinguished by view="homepage:homepage" and view="api:open-metrics" are interleaved, contrary to https://prometheus.io/docs/specs/om/open_metrics_spec/#text-format which says that "Metrics MUST NOT be interleaved".
I think what's happening here is that MultiProcessCollector._accumulate_metrics adds the accumulated _bucket and _count samples in a separate loop after adding the samples read from the mmaped files (for each metric family), and so they end up being positioned after all the _sum samples when iterating over the resulting dictionary. The simplest and most reliable fix is probably to add an extra layer to the samples dictionary, to guarantee that all the samples for a given label set are kept together. I'm working on a PR for that.