Multi-process collector with OpenMetrics exposition interleaves histogram metrics

I have an [application](https://salsa.debian.org/freexian-team/debusine) that emits some histogram metrics using [django-prometheus](https://github.com/django-commons/django-prometheus).  We recently migrated from daphne to gunicorn, and so I started looking at enabling multi-process support in prometheus-client.  We have unit tests that run the output of our metrics endpoint through `text_string_to_metric_families` in order to ensure that it's well-formed, and so I noticed that it isn't well-formed when using the multi-process collector with OpenMetrics exposition.  Part of the output looks like this:

```
# HELP django_http_requests_latency_seconds_by_view_method Histogram of request processing time labelled by view.
# TYPE django_http_requests_latency_seconds_by_view_method histogram
django_http_requests_latency_seconds_by_view_method_sum{method="GET",view="homepage:homepage"} 0.017995235999933357
django_http_requests_latency_seconds_by_view_method_sum{method="GET",view="api:open-metrics"} 0.03084654700023748
django_http_requests_latency_seconds_by_view_method_bucket{le="0.01",method="GET",view="homepage:homepage"} 0.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.025",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.05",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.075",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.1",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.25",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.5",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.75",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="1.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="2.5",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="5.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="7.5",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="10.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="25.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="50.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="75.0",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="+Inf",method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_count{method="GET",view="homepage:homepage"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.01",method="GET",view="api:open-metrics"} 0.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.025",method="GET",view="api:open-metrics"} 0.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.05",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.075",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.1",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.25",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.5",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.75",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="1.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="2.5",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="5.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="7.5",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="10.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="25.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="50.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="75.0",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_bucket{le="+Inf",method="GET",view="api:open-metrics"} 1.0
django_http_requests_latency_seconds_by_view_method_count{method="GET",view="api:open-metrics"} 1.0
```

You can see that the metrics distinguished by `view="homepage:homepage"` and `view="api:open-metrics"` are interleaved, contrary to https://prometheus.io/docs/specs/om/open_metrics_spec/#text-format which says that "Metrics MUST NOT be interleaved".

I think what's happening here is that `MultiProcessCollector._accumulate_metrics` adds the accumulated `_bucket` and `_count` samples in a separate loop after adding the samples read from the mmaped files (for each metric family), and so they end up being positioned after all the `_sum` samples when iterating over the resulting dictionary.  The simplest and most reliable fix is probably to add an extra layer to the `samples` dictionary, to guarantee that all the samples for a given label set are kept together.  I'm working on a PR for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-process collector with OpenMetrics exposition interleaves histogram metrics #1147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-process collector with OpenMetrics exposition interleaves histogram metrics #1147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions