Redesign parser interface to return metrics and not lines#15115
Redesign parser interface to return metrics and not lines#15115
Conversation
…eries If we turn on NHCB, we need to parse classic values even if they are dropped as classic series, to be able to do NHCB Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The idea is to load metrics into curr, but if we see a change in metric name or labels we load them into next , return curr as detected and continue from next. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
| Next() (Entry, error) | ||
| // Parse returns the next metric with all information collected about it, such as | ||
| // metric type, help text, unit, created timestamps, samples, etc. | ||
| Next(d DropperCache, keepClassicHistogramSeries bool) (interface{}, error) |
There was a problem hiding this comment.
Quite a redesign! As we discussed on Slack, it's getting similar to expfmt, should we consider merging those?
I wonder how to make that decision for total redesign to return interface/types vs doubling down for method for each complex metric type (similar to proto getters kind of)...
The current exposition parser design is returning individual lines from the text formats and emulates this behavior over Protobuf. This PR rewrites the interface to return metrics, that is individual series with all information about the series in one interface: all samples of a histogram/summary, type, unit, help, exemplars, created timestamp. This should make implementing NHCB simpler.
My understanding of this parser was that we were asking callers to provide methods, because we wanted to avoid type-to-type conversions. This is because we want to append directly low-level values. For protobuf you already unmarshal to a "type" (unless we write our own parser or something), so hard to avoid. With this work we seem to create a new types to convert from/into. Isn't that a regression? Wouldn't this be simpler if we would just just a protobuf (dto) types (which is essentially what expfmt is doing)? Then the question is.. should it be OM or Prom proto (:
To sum up, what's the best way to make decision here?
Also if we do such redesign, I would highly suggest textparsev2 or expparse or some other package. This will give us way to slowly add implementations/formats and slowly migrate stuff and compare efficiencies.
There was a problem hiding this comment.
Also, what's the TL;DR blocker for doing line by line?
For nhcb - we already “abstract” line by line with CreatedTimestamp logic. Can we assume group of classic histograms that represent nhcb is our new "one" line?
For proto - it's weird indeed, but is it really slower/less efficient because of that interface?
There was a problem hiding this comment.
Ok, I understand small confusion, I think we focused here on this idea that we can have a wrapper parser for all 3 implementations 🤔 that is one perspective, but I would argue, for low-level parsing, with our learnings from CT parsing in OM Text, it’s much faster/lean to NOT use abstracted Metrics/Series etc methods, but use things inside from lexer directly.
There was a problem hiding this comment.
This is because we want to append directly low-level values.
From what I could gather this, in practice, means:
- parser takes a byte slice
- that byte slice is a metrics response body, so the whole body needs to be first read into memory, often also uncompressed fully
- this means that scrape code needs a lot of byte slices, so it ends up with a lot of GC pressure and so there's a sync.Pool to help mitigate this
- these low level values need to be parsed later to get the actual metrics
- this also means that scrape code uses raw response metric strings, instead of parsed one, which isn't always correct - Scrape cache should use labels hash after relabeling #14712
IMHO it would be be great if parser was given io.Reader instead of byte slice, so that there's no need to fully read the response before starting to parse metrics, and then there would be no need to use sync.Pool and, hopefully, memory pressure from scrapes would go down.
But I do understand that current implementation avoids parsing most metrics on every scrape if they're already in the scrape cache (but that can cause issues like #14712). Maybe a cache in the parser itself could mitigate that.
There was a problem hiding this comment.
I gave up this approach because it turned out to be way more complicated than I thought. There's a lot of baggage due to the text formats and how scrape works now, so it just didn't fit our timeline to try and implement this.
This rewrite would be an XL size project, so might as well include io.Reader change as @prymitive suggests.
Anyway, since I'm unlikely to have time for this, I'm closing the PR, but feel free anyone to take it on. See PR #14978 and many followups that was merged instead.
|
Alternate solution used in the end #14978 |
Related to #13529 , to simplify the implementation there.
The current exposition parser design is returning individual lines from the text formats and emulates this behavior over Protobuf. This PR rewrites the interface to return metrics, that is individual series with all information about the series in one interface: all samples of a histogram/summary, type, unit, help, exemplars, created timestamp. This should make implementing NHCB simpler.
The code will be fairly similar to the text to DTO parser code in prometheus/common , however it should also support OpenMetrics and be more efficient.
To be tested with #15083 .