feat(storage): add new appender interface and compatibility layer#17104
feat(storage): add new appender interface and compatibility layer#17104
Conversation
3a6908e to
e8278cb
Compare
This comment was marked as outdated.
This comment was marked as outdated.
|
"CombinedAppender" seems to be a name that has meaning if you know the history. It would be better to have a name that has meaning at the time someone reads it. |
|
Agree, v2 is fair. I was thinking about storagev2 package for this too, but we don't want to really change the read interface a lot, so I discarded that idea. Happy to update it to EDIT: It's now AppenderV2 |
e8278cb to
f3e9e1a
Compare
f3e9e1a to
f85547f
Compare
f85547f to
8dc5e13
Compare
When the storage is remote storage over remote-write 1.0, then the metadata is keyed on metric family name, not series So basically this is needed for backwards compatibility. Some implementations can just ignore it.
👍
|
Got it. This is fair, but for backward compatibility, why not offering instead the old |
That interface also doesn't receive the metric family name, so that's not helpful either. We could just keep the AppenderV2 and what we use in OTLP endpoint separate. Call CombinedAppender for OTLP actually OTLPAppender. In Prometheus it could call AppenderV2 and just drop the metric family name. And others could still get the metric family name. |
…stamp Following discussion in #17104 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
|
Copying good questions from Slack. by @ringerc
|
I don't entirely understand why OTLP client has to calculate metric family name from OTLP ingestion (I don't think it has this info ready, right?) to ingest data to Prometheus natively. Do you know why? @krajorama
When Prometheus would append directly to remote write 1.0? Also official remote write does not have metadata support 🙈 Sounds like this problem is only applicable for Mimir. I'm sorry to be brutal here and I'm obviously biased here (I don't work for Grafana), but it feels that the fact Mimir is still using unofficial, soon to be deprecated remote write v1, with experimental flows, which was never released in a spec is slowly affecting Prometheus codebase decisions more and more. What would motivate Mimir to stop depending on this v1 flow, so we can remove the experimental code from Prometheus?
I don't understand it. With the current direction of OM2 discussions and type and unit labels, we already know we need to support metadata per series.
I'm not sure I agree -- this flow is only used for scrape target purposes AFAIK. This is not needed in OTLP ingestion, right? SummaryIndeed two interfaces for Mimir purposes would be fine for now, but let's be explicit in the comment that we do that for Mimir code (so we know can remove it when Mimir move to non v1 flow). Otherwise... we could also add
|
That's a good question. If it only happens for RW1.0 sake and otherwise zero use in Prometheus AND we have all the information to calculate it inside our AppenderV2 implementation, we could move the logic to Mimir and drop the code/parameter in Prometheus right away. Needs a short POC on my part, give me a couple of days. |
I've started to look into this. For all metrics the OTLP endpoint already calculates the Prometheus metric name (both in baseline and this PR). In case of classic histograms in particular this is used to generate the classic histogram series by appending ( On the other hand , Mimir is implementing https://prometheus.io/docs/prometheus/latest/querying/api/#querying-metric-metadata which is kind of a weird endpoint (in both Prom and Mimir) because it returns the metadata keyed on the metric family name (most of the time, except counters where it returns the metric name with So I think the huge performance gain of the #16951 PR more than makes up for passing an extra argument that we calculate anyway.
We can either do that (keep metric family name in AppenderV2) or as I suggested have AppenderV2 clean (without extra arg), but have an
Hmm. I'm not a fan of adding something that might never get used, but it's hard to argue against context. Up to you. |
|
We discussed on Slack with @krajorama , but tldr - sounds like a metric family name is useful for Mimir in some context and if we keep per mf metadata in Prometheus further. We ruled out explicit arguments, as it will be hard to remove later. CTX and variadic options are on the table, but perhaps we could extend metadata struct to include mfname. We have that in some other metadata representations (API?) |
I'll make some measurements with new metadata struct passed by value and reference. |
|
According to the first measurement with #16951 using pass by value: |
|
Yea, so I think this means metadata fits on the stack, am I understanding this correctly? |
8dc5e13 to
7ffcb80
Compare
|
Updated my PR @krajorama |
7ffcb80 to
d86bf8b
Compare
krajorama
left a comment
There was a problem hiding this comment.
this is nicely taking shape, we're getting down to the nitty gritty details
| // The type of samples (float64, histogram, etc) appended for a given series must remain same within an Appender. | ||
| // The behaviour is undefined if samples of different types are appended to the same series in a single Commit(). | ||
| type AppenderV2 interface { | ||
| // AppendSample appends a float sample and related exemplars, metadata, and created timestamp to the storage. |
There was a problem hiding this comment.
I think we need to define how errors should be handled , what is the contract?
So far we have the OTLP endpoint (Prom+Mimir) and scrape (Prom) use cases, and I can think of Remote Write as well .
I think all share that just because an exemplar fails, we'd like to update the metadata.
Probably same for CT zero sample insert, you'd still want to ingest everything else.
See implementation in #16951 .
There was a problem hiding this comment.
Yes this is good point to agree on the initial error setup. I think we have two choices:
A) Don't mention specific partial error guarantees on interface level.
B) Mention the current (likely) implementation: This is problematic because tomorrow this might change, plus it definitely changes on different Appender config options (e.g. enableCTzero) - the same input would produce different outcome.
C) Return complex errors that will give ultimate control and mention that we try to append all information and return PartialError{sampleErr, metaErr, zeroCTErr, exemplarErr} when one of pieces failed. This feels unnecessary given our clients drop non core errors anyway.
So what about mix of A+B:
| // AppendSample appends a float sample and related exemplars, metadata, and created timestamp to the storage. | |
| // AppendSample appends a float sample and related exemplars, metadata, and created timestamp to the storage. | |
| // | |
| // Implementations MUST return error if historically core sample data (value and timestamp) were not successfully appended. Implementations MAY return nil error (success) if the core data was appended, but any of the auxiliary data appends like exemplars, CT as zero, or metadata failed. |
I think this might be a good balance of the client expectations vs implementation changing. I think with this it's obvious that exemplar failure should not block other append attempts. WDYT? @krajorama
There was a problem hiding this comment.
Ok, I added the A+B and fixed the behaviour.
Unfortunately scrape is still depending on some informations from the failed e.g. exemplar appends (to add warning log line and increment some scrape metric, see TODOs in compatibility layer).
We need to double check if we want to move this log/metric to implementation (TSDB) or keep it on some clients that want this logic (only scrape for now, but likely OTLP and RW should follow same, no?) with the option (C) (special error handling).
Given it's the TSDB (or implementation) role to append sample together with exemplars, I think we should do that warning logging (AND metric) on TSDB side. We could keep the same metric metric name (scrape namespace) for compatibility. Thoughts?
There was a problem hiding this comment.
Given it's the TSDB (or implementation) role to append sample together with exemplars, I think we should do that warning logging (AND metric) on TSDB side. We could keep the same metric metric name (scrape namespace) for compatibility. Thoughts?
TSDB interface will have two kinds of users: scrape and remote write/OTLP (server). Scrape is internal totally so that can naturally move error handling and logging to TSDB. However remote write (especially 2.0)/OTLP needs to return some information to the user (partial error 4xx and statistics).
Just from the numbers above we have 1 scrape vs 2 remote cases :)
We could solve returning errors with https://pkg.go.dev/github.com/hashicorp/go-multierror or that PartialError (CombinedError 😄 ) and having a Statistics() function to get counters from the interface at any point , even after Commit. But then you have to check for those errors on the caller side. That could be still more efficient than separate calls to the Appender with hash lookups. Need benchmark probably.
Good interface is hard to do, so I think we need to bite the bullet and do option C and measure. If it's still more efficient for scrape than current approach, we're good.
There was a problem hiding this comment.
Thanks, I was thinking along those lines today too. C is also much more cleaner for everyone IMO (especially when TSDB changes or for other people).
I also think we should drop CT zero injection feature from appender. This should be (and can be) done on caller side. This will simplify error handling and our code too.
There was a problem hiding this comment.
hm ok, maybe encapsulating CT is easier.
I also noticed you implemented A+B for OTLP ingestion in your combined appender WITHOUT caring about propagating errors ;p
TSDB interface will have two kinds of users: scrape and remote write/OTLP (server). Scrape is internal totally so that can naturally move error handling and logging to TSDB. However remote write (especially 2.0)/OTLP needs to return some information to the user (partial error 4xx and statistics).
I assume you would like to fix that later on (or in this PR)
There was a problem hiding this comment.
ah no, you implemented sort of C (returning the last important err as they are) - cool, will match that.
96dd7bf to
850c164
Compare
To be more inline with future direction in #17104 Also use some helper variables in appender tests to make it more readable in shorter. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
0c88201 to
b4f470f
Compare
Depends on #17104 Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
b4f470f to
656092e
Compare
|
Added a few TODOs and started to see how OTLP ingestion combined appender would look like as a first user #17175 (draft) |
Depends on #17104 Signed-off-by: bwplotka <bwplotka@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
|
FWIW, the metric family name would also be useful in the OTel collector's appender implementation in the prometheus receiver. Today, we trim common suffixes to get the family name. |
|
Replacement: #17610 |
This PR proposed a new storage interface
AppenderV2for TSDB along the compatibility layer to translate from and toAppender.The interface is taken from @krajorama PR who already did amazing research on various possibilities.
We seem to generally agree with what we want from this interface, so I propose we add it properly AND slowly migrate to it, instead of adding it only for OTLP endpoint.
Rationales for
AppenderV2:In the chained PRs I will show (without breaking Prometheus) how we can slowly use and migrate to the new interface in:
I did diverge a bit from @krajorama interface here, notably:
t, ctwithct, tbecause ct has to be before, conceptually.Does this PR introduce a user-facing change?