refactor(remote)[PART4]: switch remote receivers (RW, OTEL) to AppenderV2#17675
refactor(remote)[PART4]: switch remote receivers (RW, OTEL) to AppenderV2#17675
Conversation
8121456 to
dc5c344
Compare
dc5c344 to
aebe1b8
Compare
f544bf9 to
61a888f
Compare
a8e2fad to
d6712cd
Compare
61a888f to
3e035e4
Compare
…tstorage.Appender mock Signed-off-by: bwplotka <bwplotka@gmail.com> debug
d6712cd to
aea8139
Compare
Signed-off-by: bwplotka <bwplotka@gmail.com>
d3f8c3f to
10624b9
Compare
7121a95 to
5d55839
Compare
| // Used to ensure we only update metadata and created timestamps once, and to share storage.SeriesRefs. | ||
| // To detect hash collision it also stores the labels. | ||
| // There is no overflow/conflict list, the TSDB will handle that part. | ||
| refs map[uint64]seriesRef |
There was a problem hiding this comment.
I am not sure I fully understood the purpose of it. Was it only to cache refs for efficiency for separate Append vs UpdateMetadata/AppendExemplar for same series? If yes, then it's safe to remove.
cc @krajorama
Signed-off-by: bwplotka <bwplotka@gmail.com> comm Signed-off-by: bwplotka <bwplotka@gmail.com>
5d55839 to
e61361f
Compare
…erV2 Signed-off-by: bwplotka <bwplotka@gmail.com>
e61361f to
497aacf
Compare
|
This should be rdy for PTAL cc @aknuds1 @krajorama But no rush, if you are busy we can tackle this after xmas 🤗 |
ywwg
left a comment
There was a problem hiding this comment.
I would feel a lot better if there were a way to select which appender is in use while we are in transition rather than doing a hard cutover. There is so much changing in all of these PRs that is seems plausible there will be bugs (some have already been found in review), and we want it to be easy to roll back to V1 in event of emergency. If the two appenders can co-exist as seems implied by the code I've read, it should be possible to put V2 behind a default-on feature flag that can be flipped
| h.logger.Debug("Error while adding exemplar in AppendExemplar", "series", ls.String(), "exemplar", fmt.Sprintf("%+v", e), "err", err) | ||
| } | ||
| } | ||
| es = append(es, e) |
There was a problem hiding this comment.
(why not make es a slice with a preallocated length of len(ts.Exemplars)?)
There was a problem hiding this comment.
Good question!
Ideally we reuse from previous slice which should be enough for that 1-10 max elements. But we could optimize further:
es = es[:0]
if cap(es) < len(ts.Exemplars) {
es = make([]exemplar.Exemplar, 0, len(ts.Exemplar))
}
Benchmarks would tell if this really matter, so I did the most readable yet efficient enough solution for now... I would postpone this optimization for later, can add TODO, wdyt?
| factoryAr := func(context.Context) api_v1.AlertmanagerRetriever { return h.notifier } | ||
| FactoryRr := func(context.Context) api_v1.RulesRetriever { return h.ruleManager } | ||
|
|
||
| var app storage.Appendable |
There was a problem hiding this comment.
here is an example where it looks like we could use either one appender or the other -- The perfect place to have a switchable selection for rollbacks.
There was a problem hiding this comment.
It's a good place IF we want to double down on tech debt of extra ~3k LOC of potentially prod code to keep (most of the code I am removing in this PR would need to be left out). I acknowledge the risk but I propose we don't do dual path anywhere else than scrape and TSDB. We can revert this PR/commit if needed.
Dual Appender Path or Switch
Thanks for pointing this out, understandable opinion. Dual appender here and in others pieces would be possible, but it's a big ask. Keeping two paths is enormous work. We are talking about ~15k LOC of dual path tech debt we already added to critical code (TSDB, agent), and doing dual path for more components adds likely ~10k+ more. Ideally we switch straight away for less complex/less imported functionality. Otherwise, the moment you fix/develop something in dual-appender enabled code, which path you fix? Which part you read to debug? How much of confusion we do? Both? new only? Who will maintain dual bug porting process? Initially with @krajorama (see Proposal section on #17632 (comment)) we were thinking to only keep dual appender path for
Receiver implementation is important, but practically speaking I'd say 1000x less people use receiver than scrape, so it gives us a good % of users for a "canary" switch in in 3.9 or so. We fix all the issues until we get it right. I propose we balance out the risk of switch vs doing extra work for dual path and risking maintainability issues here. Concrete dual path tech debt problemLet me give you example of why it's worth switching immediately. The reason why switch PR also simplifies some code is thanks to the new AppenderV2 having one method. For example receiverAppender abstractions would be really hard to pull out, because repeating same code on ~6 methods would defy point of DRY-ing here. With dual path we are talking about a second abstraction too for AppenderV2. To sum up, One could argue that doing both switch to AppenderV2 and some DRY and consistency abstractions is bad, makes a PR bigger. There's some truth, but alternative yields way bigger PRs, and more work for everyone, because the DRY-ing to some extend IS making the review easier (less code to review, easier to fix bugs). And because we switch we don't need to leave old code behind that is unused and only serving as a potential backup. ProposalFor the reasons above, I propose we keep the original plan of full switch for all parts that not scrape, TSDB/agent. We swarm on it and derisk by switching individual components one by one (remote, PromQL tests, scrape, etc), revertable on demand. But we keep balance of tech debt vs risk. This is meant to be a refactor, so no significant functionality should be changed to user. We risk changes and bugs, but we also risk it now by leaving the messy AppenderV1 alive for long and especially supporting 2 paths. It feels adding feature flag for refactor is like adding a feature flag to every bigger PR that changes prod code without breaking user functionality. It would be safer, but would it be productive? 🙃 |
Signed-off-by: bwplotka <bwplotka@gmail.com>
|
I don't really agree that this is a simple refactor -- there is a huge amount of code changing, a bunch of new logic to detect what kind of append is being asked for, massively rewritten tests. In my experience, a "noop refactor" usually contains extremely minimal changes above the layer of the refactor, whereas components using this new API also seem to be changing a lot. I am pushing back on this characterization because just saying "it's a refactor, don't worry" is very unconvincing to me. I would prefer to have a solid plan in place for how to handle a situation where we get a bug report that something major is broken with this rewrite. If we can't have a simple flag, then which PRs need to be reverted? Do we need to roll back 4 PRs totalling more than 20,000 lines of code? Or can we construct a final switch-over PR that gives us a clear path and breathing room to fix any problems that come up? There is a significant risk here in creating this much churn that if there are problems, downstream projects and users will be stuck on the last version of Prometheus released before this rewrite, waiting for us to work out the issues before being able to resync with HEAD. I am not saying this is an inevitability, I am saying that it feels wise to have a clear plan for that possibility. |
|
I acknowledge your push back, also I never said this is a "simple" refactor. I wonder if your worries are not coming from some place of fear (how familiar you are with this component?) which exaggerates some risks here.
No, you need to roll back this PR in the worse case (~1500 removed LOC total).
This is the final switch over PR for remote component. (: I wonder how this PR is different to a similar PR recently that switches OTEL to a new interface (#16951). This PR is essentially reverting the whole thing, plus doing this for RW. And this PR changes more or less same amount of LOC (actually less lines). Why that one was reviewed and accepted then? 🙃 |
|
To sum up, together with #17631 there are few high level feedback points around the way we change API in various places: A) @ywwg wishes for components to support dual-appender path for the migration time with runtime switch flag. For (A) it's unrealistic and I would need to spend 10h to show you by adding another ~8k LOC PR that copies appenderv1 paths and does them to v2 next to v1. Much more stuff to review only to finally do the swtich and remove the whole thing ASAP to avoid big tech-debt. Abstractions and usage have to change, unless we do complex common abstraction, but that would change prod appenderv1 code defying the point? For (B) I agree, but literally this (#16951 PR is an example "final switch" for OTLP). It's not going to be a lot of smaller than this PR. I could try to split Remote Write receive vs OTLP receive switch to make things smaller along the (B) point. Would that help? For (C) is a matter of taste, but I can try to minimize unrelated fixes, but it's some of them are trivial and it's really hard to ignore trivial trash to fix (: I can try. Would some split towards (B) and (C) would work for you here? |
There was a problem hiding this comment.
I think it looks mostly good after reviewing the Grafana Mimir vendoring, but I'm fairly certain I spotted a bug.
My main criticism otherwise is there are too many stylistic changes in the OTLP code. These are better proposed separately from this refactor, so the diff can be minimized.
| if rollbackErr := appender.Rollback(); rollbackErr != nil { | ||
| f.logger.Error("Squashed rollback error on commit", "err", rollbackErr) |
There was a problem hiding this comment.
[Nit]
| if rollbackErr := appender.Rollback(); rollbackErr != nil { | |
| f.logger.Error("Squashed rollback error on commit", "err", rollbackErr) | |
| f.logger.Error("Squashed secondary rollback error on commit", "err", rollbackErr) |
There was a problem hiding this comment.
Will address in separate PR, trying to limit unrelated changes as per feedback.
| case err == nil: | ||
| err = rollbackErr | ||
| case rollbackErr != nil: | ||
| f.logger.Error("Squashed rollback error on rollback", "err", rollbackErr) |
There was a problem hiding this comment.
Isn't the following more descriptive?
| f.logger.Error("Squashed rollback error on rollback", "err", rollbackErr) | |
| secondaryErr := appender.Rollback() | |
| switch { | |
| case err == nil: | |
| err = secondaryErr | |
| case secondaryErr != nil: | |
| f.logger.Error("Squashed secondary error on rollback", "err", rollbackErr) |
There was a problem hiding this comment.
Why you propose to rollback twice? Something is off with this suggestion 🤔
| // Best effort timestamp check within a single request. Doing this across message would incur overhead. | ||
| // We are doing this, since storage don't check ordering within a single Append. | ||
| err = validateTimestamp(hp.Timestamp, seriesLastTs) | ||
| if err == nil { |
There was a problem hiding this comment.
Can you merge this and the following if err == nil block? It would simplify the code.
| for _, e := range pErr.ExemplarErrors { | ||
| if errors.Is(err, storage.ErrOutOfOrderExemplar) { | ||
| a.oooExemplarErrCount++ | ||
| a.logger.Debug("Out of order exemplar", "series", ls.String(), "timestamp", t, "sampleType", sTyp, "exemplars", opts.Exemplars, "err", e.Error()) |
There was a problem hiding this comment.
Should this be debug or warn level?
There was a problem hiding this comment.
On this level I would say debug. Mimics RW1.0 flow.
| // Partial error due to exemplars, account for that. | ||
| a.stats.Exemplars += len(opts.Exemplars) - len(pErr.ExemplarErrors) | ||
| for _, e := range pErr.ExemplarErrors { | ||
| if errors.Is(err, storage.ErrOutOfOrderExemplar) { |
There was a problem hiding this comment.
This looks like a bug to me.
| if errors.Is(err, storage.ErrOutOfOrderExemplar) { | |
| if errors.Is(e, storage.ErrOutOfOrderExemplar) { |
Thank you for the thoughtful responses. It is true that I am not very familiar with the code being touched, but I also have too many horror stories of things breaking because a PR was "too big to roll back". The last thing I want is a panic fix-forward as we try to plug holes in the dam. Yes I think splitting out the final switch PRs would help make this easier, thank you. I agree that maintaining dual appender interfaces for any significant period of time is unsustainable. |
|
Splitting this PR into smaller ones as per feedback points. All comments e.g. @aknuds1 are being addressed in subsequent PRs. For now: |
Related to #17632
Depends on:
This PR:
AppenderV2to our appender mock.storage/remotetostorage.AppenderV2. This means OTLP and RW receiving will now use strictlyAppenderV2TSDB implementation.otlptranslator/prometheusremotewrite/CombinedAppender. This means it's a breaking change for Mimir (we ensured with @krajorama Mimir can now nicely use standardAppenderV2. I think this is the point when we could test Mimir conversion (cc @aknuds1 as I think you wanted to try it out 🤗 )samplesAppendedWithoutMetadatabecause no such errors can happen anymore (the only error it was guarding against was mismatched series ref, which couldn't really happened in pre PR handler flow I think too 🤔 ). Unless this metrics was suppose to reflect OTLP samples without Prometheus metadata (this was not what was implemented). cc @aknuds1Does this PR introduce a user-facing change?