refactor(remote)[PART4]: switch remote receivers (RW, OTEL) to AppenderV2 by bwplotka · Pull Request #17675 · prometheus/prometheus

bwplotka · 2025-12-12T10:49:34Z

Related to #17632

Depends on:

refactor(scrape)[PART2]: simplified scrapeLoop constructors & tests; add teststorage.Appendable mock #17631
refactor(tsdb/agent)[PART3]: add AppenderV2 support to agent #17677

This PR:

Adds support for AppenderV2 to our appender mock.
Add common way for handling Append errors and statistics
Switches the storage/remote to storage.AppenderV2. This means OTLP and RW receiving will now use strictly AppenderV2 TSDB implementation.
Removes otlptranslator/prometheusremotewrite/CombinedAppender. This means it's a breaking change for Mimir (we ensured with @krajorama Mimir can now nicely use standard AppenderV2. I think this is the point when we could test Mimir conversion (cc @aknuds1 as I think you wanted to try it out 🤗 )
Removed samplesAppendedWithoutMetadata because no such errors can happen anymore (the only error it was guarding against was mismatched series ref, which couldn't really happened in pre PR handler flow I think too 🤔 ). Unless this metrics was suppose to reflect OTLP samples without Prometheus metadata (this was not what was implemented). cc @aknuds1

samplesAppendedWithoutMetadata: promauto.With(reg).NewCounter(prometheus.CounterOpts{
			Namespace: "prometheus",
			Subsystem: "api",
			Name:      "otlp_appended_samples_without_metadata_total",
			Help:      "The total number of samples ingested from OTLP without corresponding metadata.",
		}),

Does this PR introduce a user-facing change?

NONE

storage/remote/write_handler.go

…tstorage.Appender mock Signed-off-by: bwplotka <bwplotka@gmail.com> debug

Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka · 2025-12-17T09:12:23Z

storage/remote/otlptranslator/prometheusremotewrite/combined_appender.go

-	// Used to ensure we only update metadata and created timestamps once, and to share storage.SeriesRefs.
-	// To detect hash collision it also stores the labels.
-	// There is no overflow/conflict list, the TSDB will handle that part.
-	refs map[uint64]seriesRef


I am not sure I fully understood the purpose of it. Was it only to cache refs for efficiency for separate Append vs UpdateMetadata/AppendExemplar for same series? If yes, then it's safe to remove.

cc @krajorama

Signed-off-by: bwplotka <bwplotka@gmail.com> comm Signed-off-by: bwplotka <bwplotka@gmail.com>

…erV2 Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka · 2025-12-17T09:30:36Z

This should be rdy for PTAL cc @aknuds1 @krajorama

But no rush, if you are busy we can tackle this after xmas 🤗

ywwg

I would feel a lot better if there were a way to select which appender is in use while we are in transition rather than doing a hard cutover. There is so much changing in all of these PRs that is seems plausible there will be bugs (some have already been found in review), and we want it to be easy to roll back to V1 in event of emergency. If the two appenders can co-exist as seems implied by the code I've read, it should be possible to put V2 behind a default-on feature flag that can be flipped

storage/remote/write_handler.go

ywwg · 2025-12-17T20:38:45Z

storage/remote/write_handler.go

-					h.logger.Debug("Error while adding exemplar in AppendExemplar", "series", ls.String(), "exemplar", fmt.Sprintf("%+v", e), "err", err)
-				}
-			}
+			es = append(es, e)


(why not make es a slice with a preallocated length of len(ts.Exemplars)?)

Good question!

Ideally we reuse from previous slice which should be enough for that 1-10 max elements. But we could optimize further:

es = es[:0] if cap(es) < len(ts.Exemplars) { es = make([]exemplar.Exemplar, 0, len(ts.Exemplar)) }

Benchmarks would tell if this really matter, so I did the most readable yet efficient enough solution for now... I would postpone this optimization for later, can add TODO, wdyt?

storage/remote/write_handler.go

util/teststorage/appender.go

ywwg · 2025-12-17T20:48:18Z

web/web.go

 	factoryAr := func(context.Context) api_v1.AlertmanagerRetriever { return h.notifier }
 	FactoryRr := func(context.Context) api_v1.RulesRetriever { return h.ruleManager }

-	var app storage.Appendable


here is an example where it looks like we could use either one appender or the other -- The perfect place to have a switchable selection for rollbacks.

It's a good place IF we want to double down on tech debt of extra ~3k LOC of potentially prod code to keep (most of the code I am removing in this PR would need to be left out). I acknowledge the risk but I propose we don't do dual path anywhere else than scrape and TSDB. We can revert this PR/commit if needed.

See: #17675 (comment)

bwplotka · 2025-12-18T08:17:01Z

Dual Appender Path or Switch

@ywwg I would feel a lot better if there were a way to select which appender is in use while we are in transition rather than doing a hard cutover. There is so much changing in all of these PRs that is seems plausible there will be bugs (some have already been found in review), and we want it to be easy to roll back to V1 in event of emergency. If the two appenders can co-exist as seems implied by the code I've read, it should be possible to put V2 behind a default-on feature flag that can be flipped

Thanks for pointing this out, understandable opinion.

Dual appender here and in others pieces would be possible, but it's a big ask. Keeping two paths is enormous work. We are talking about ~15k LOC of dual path tech debt we already added to critical code (TSDB, agent), and doing dual path for more components adds likely ~10k+ more. Ideally we switch straight away for less complex/less imported functionality. Otherwise, the moment you fix/develop something in dual-appender enabled code, which path you fix? Which part you read to debug? How much of confusion we do? Both? new only? Who will maintain dual bug porting process?

Initially with @krajorama (see Proposal section on #17632 (comment)) we were thinking to only keep dual appender path for scrape and implementations head and db then and immediately remove v1:

From scrape the moment we got ack from Otel/Alloy for moving to v2.
From TSDB/agent the moment we got ack from Cortex/Thanos/Mimir/Alloy.

Receiver implementation is important, but practically speaking I'd say 1000x less people use receiver than scrape, so it gives us a good % of users for a "canary" switch in in 3.9 or so. We fix all the issues until we get it right. I propose we balance out the risk of switch vs doing extra work for dual path and risking maintainability issues here.

Concrete dual path tech debt problem

Let me give you example of why it's worth switching immediately. The reason why switch PR also simplifies some code is thanks to the new AppenderV2 having one method. For example receiverAppender abstractions would be really hard to pull out, because repeating same code on ~6 methods would defy point of DRY-ing here. With dual path we are talking about a second abstraction too for AppenderV2.

To sum up, AppenderV2 switch not only changes the flow of appends to be one append vs 3 (sample, meta, exemplar), but unlocks great consistency and simplifications achieved easily.

One could argue that doing both switch to AppenderV2 and some DRY and consistency abstractions is bad, makes a PR bigger. There's some truth, but alternative yields way bigger PRs, and more work for everyone, because the DRY-ing to some extend IS making the review easier (less code to review, easier to fix bugs). And because we switch we don't need to leave old code behind that is unused and only serving as a potential backup.

Proposal

For the reasons above, I propose we keep the original plan of full switch for all parts that not scrape, TSDB/agent. We swarm on it and derisk by switching individual components one by one (remote, PromQL tests, scrape, etc), revertable on demand. But we keep balance of tech debt vs risk.

This is meant to be a refactor, so no significant functionality should be changed to user. We risk changes and bugs, but we also risk it now by leaving the messy AppenderV1 alive for long and especially supporting 2 paths.

It feels adding feature flag for refactor is like adding a feature flag to every bigger PR that changes prod code without breaking user functionality. It would be safer, but would it be productive? 🙃

Signed-off-by: bwplotka <bwplotka@gmail.com>

ywwg · 2025-12-18T16:34:07Z

I don't really agree that this is a simple refactor -- there is a huge amount of code changing, a bunch of new logic to detect what kind of append is being asked for, massively rewritten tests. In my experience, a "noop refactor" usually contains extremely minimal changes above the layer of the refactor, whereas components using this new API also seem to be changing a lot. I am pushing back on this characterization because just saying "it's a refactor, don't worry" is very unconvincing to me.

I would prefer to have a solid plan in place for how to handle a situation where we get a bug report that something major is broken with this rewrite. If we can't have a simple flag, then which PRs need to be reverted? Do we need to roll back 4 PRs totalling more than 20,000 lines of code? Or can we construct a final switch-over PR that gives us a clear path and breathing room to fix any problems that come up?

There is a significant risk here in creating this much churn that if there are problems, downstream projects and users will be stuck on the last version of Prometheus released before this rewrite, waiting for us to work out the issues before being able to resync with HEAD. I am not saying this is an inevitability, I am saying that it feels wise to have a clear plan for that possibility.

bwplotka · 2025-12-19T10:50:11Z

I acknowledge your push back, also I never said this is a "simple" refactor. I wonder if your worries are not coming from some place of fear (how familiar you are with this component?) which exaggerates some risks here.

I would prefer to have a solid plan in place for how to handle a situation where we get a bug report that something major is broken with this rewrite. If we can't have a simple flag, then which PRs need to be reverted? Do we need to roll back 4 PRs totalling more than 20,000 lines of code?

No, you need to roll back this PR in the worse case (~1500 removed LOC total).

Or can we construct a final switch-over PR that gives us a clear path and breathing room to fix any problems that come up?

This is the final switch over PR for remote component. (:

I wonder how this PR is different to a similar PR recently that switches OTEL to a new interface (#16951). This PR is essentially reverting the whole thing, plus doing this for RW. And this PR changes more or less same amount of LOC (actually less lines). Why that one was reviewed and accepted then? 🙃

bwplotka · 2025-12-19T10:54:54Z

To sum up, together with #17631 there are few high level feedback points around the way we change API in various places:

A) @ywwg wishes for components to support dual-appender path for the migration time with runtime switch flag.
B) @ywwg if we can't do dual appender path, the PR that switches should be as small as possible and only do the switching. Essentially concrete plan what to do if AppenderV2 path is buggy.
C) @krajorama prefers limited diff/PR size, even if it means splitting trivial refactors (e.g. changing deprecated vars on the way, renames) to separate PRs.

For (A) it's unrealistic and I would need to spend 10h to show you by adding another ~8k LOC PR that copies appenderv1 paths and does them to v2 next to v1. Much more stuff to review only to finally do the swtich and remove the whole thing ASAP to avoid big tech-debt. Abstractions and usage have to change, unless we do complex common abstraction, but that would change prod appenderv1 code defying the point?

For (B) I agree, but literally this (#16951 PR is an example "final switch" for OTLP). It's not going to be a lot of smaller than this PR. I could try to split Remote Write receive vs OTLP receive switch to make things smaller along the (B) point. Would that help?

For (C) is a matter of taste, but I can try to minimize unrelated fixes, but it's some of them are trivial and it's really hard to ignore trivial trash to fix (: I can try.

Would some split towards (B) and (C) would work for you here?

aknuds1

I think it looks mostly good after reviewing the Grafana Mimir vendoring, but I'm fairly certain I spotted a bug.

My main criticism otherwise is there are too many stylistic changes in the OTLP code. These are better proposed separately from this refactor, so the diff can be minimized.

storage/fanout.go

aknuds1 · 2025-12-19T12:14:49Z

storage/fanout.go

+			if rollbackErr := appender.Rollback(); rollbackErr != nil {
+				f.logger.Error("Squashed rollback error on commit", "err", rollbackErr)


[Nit]

Suggested change

if rollbackErr := appender.Rollback(); rollbackErr != nil {

f.logger.Error("Squashed rollback error on commit", "err", rollbackErr)

f.logger.Error("Squashed secondary rollback error on commit", "err", rollbackErr)

Will address in separate PR, trying to limit unrelated changes as per feedback.

aknuds1 · 2025-12-19T12:15:52Z

storage/fanout.go

+		case err == nil:
+			err = rollbackErr
+		case rollbackErr != nil:
+			f.logger.Error("Squashed rollback error on rollback", "err", rollbackErr)


Isn't the following more descriptive?

Suggested change

f.logger.Error("Squashed rollback error on rollback", "err", rollbackErr)

secondaryErr := appender.Rollback()

switch {

case err == nil:

err = secondaryErr

case secondaryErr != nil:

f.logger.Error("Squashed secondary error on rollback", "err", rollbackErr)

Why you propose to rollback twice? Something is off with this suggestion 🤔

storage/interface.go

aknuds1 · 2025-12-19T14:13:58Z

storage/remote/write_handler.go

+			// Best effort timestamp check within a single request. Doing this across message would incur overhead.
+			// We are doing this, since storage don't check ordering within a single Append.
+			err = validateTimestamp(hp.Timestamp, seriesLastTs)
+			if err == nil {


Can you merge this and the following if err == nil block? It would simplify the code.

aknuds1 · 2025-12-19T14:29:51Z

storage/remote/write_handler.go

+		for _, e := range pErr.ExemplarErrors {
+			if errors.Is(err, storage.ErrOutOfOrderExemplar) {
+				a.oooExemplarErrCount++
+				a.logger.Debug("Out of order exemplar", "series", ls.String(), "timestamp", t, "sampleType", sTyp, "exemplars", opts.Exemplars, "err", e.Error())


Should this be debug or warn level?

On this level I would say debug. Mimics RW1.0 flow.

storage/remote/write_handler.go

aknuds1 · 2025-12-19T14:30:58Z

storage/remote/write_handler.go

+		// Partial error due to exemplars, account for that.
+		a.stats.Exemplars += len(opts.Exemplars) - len(pErr.ExemplarErrors)
+		for _, e := range pErr.ExemplarErrors {
+			if errors.Is(err, storage.ErrOutOfOrderExemplar) {


This looks like a bug to me.

Suggested change

if errors.Is(err, storage.ErrOutOfOrderExemplar) {

if errors.Is(e, storage.ErrOutOfOrderExemplar) {

util/teststorage/appender.go

ywwg · 2025-12-19T15:24:55Z

For (B) I agree, but literally this (#16951 PR is an example "final switch" for OTLP). It's not going to be a lot of smaller than this PR. I could try to split Remote Write receive vs OTLP receive switch to make things smaller along the (B) point. Would that help?

Thank you for the thoughtful responses. It is true that I am not very familiar with the code being touched, but I also have too many horror stories of things breaking because a PR was "too big to roll back". The last thing I want is a panic fix-forward as we try to plug holes in the dam.

Yes I think splitting out the final switch PRs would help make this easier, thank you. I agree that maintaining dual appender interfaces for any significant period of time is unsustainable.

bwplotka · 2026-01-12T09:14:27Z

Splitting this PR into smaller ones as per feedback points. All comments e.g. @aknuds1 are being addressed in subsequent PRs.

For now:

feat(teststorage)[PART4a]: Add AppendableV2 support for mock Appendable #17834
feat(storage)[PART4b]: add AppenderV2 to the rest of storage.Storage implementations + mock exemplar fix #17835

bwplotka mentioned this pull request Dec 12, 2025

storage: Switch to unified, transactional AppenderV2 API; remove Appender. #17632

Open

22 tasks

bwplotka changed the title ~~refactor(remote)[PART3]: switch remote receivers (RW, OTEL) to AppenderV2~~ refactor(remote)[PART4]: switch remote receivers (RW, OTEL) to AppenderV2 Dec 12, 2025

bwplotka force-pushed the bwplotka/a2-remote branch 2 times, most recently from 8121456 to dc5c344 Compare December 13, 2025 06:22

bwplotka marked this pull request as ready for review December 13, 2025 06:25

bwplotka requested review from aknuds1, cstyan, jesusvazquez and tomwilkie as code owners December 13, 2025 06:25

bwplotka force-pushed the bwplotka/a2-remote branch from dc5c344 to aebe1b8 Compare December 13, 2025 06:30

bwplotka requested review from krajorama and ywwg and removed request for tomwilkie December 13, 2025 06:30

bwplotka force-pushed the bwplotka/a2-remote branch 7 times, most recently from f544bf9 to 61a888f Compare December 15, 2025 10:31

bwplotka force-pushed the bwplotka/scrapeloop-ref branch 4 times, most recently from a8e2fad to d6712cd Compare December 15, 2025 11:42

bwplotka force-pushed the bwplotka/a2-remote branch from 61a888f to 3e035e4 Compare December 15, 2025 11:49

bwplotka commented Dec 15, 2025

View reviewed changes

storage/remote/write_handler.go Outdated Show resolved Hide resolved

refactor(scrape): simplified scrapeLoop constructors & tests; add tes…

aea8139

…tstorage.Appender mock Signed-off-by: bwplotka <bwplotka@gmail.com> debug

bwplotka force-pushed the bwplotka/scrapeloop-ref branch from d6712cd to aea8139 Compare December 16, 2025 09:33

refactor(scrape): simplified newLoop even more

10624b9

Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka force-pushed the bwplotka/scrapeloop-ref branch from d3f8c3f to 10624b9 Compare December 16, 2025 13:54

bwplotka force-pushed the bwplotka/a2-remote branch 4 times, most recently from 7121a95 to 5d55839 Compare December 17, 2025 09:09

bwplotka commented Dec 17, 2025

View reviewed changes

refactor(storage): add AppendableV2 support to Storage interface

c869386

Signed-off-by: bwplotka <bwplotka@gmail.com> comm Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka force-pushed the bwplotka/a2-remote branch from 5d55839 to e61361f Compare December 17, 2025 09:14

refactor(remote)[PART4]: switch remote receivers (RW, OTEL) to Append…

497aacf

…erV2 Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka force-pushed the bwplotka/a2-remote branch from e61361f to 497aacf Compare December 17, 2025 09:29

ywwg requested changes Dec 17, 2025

View reviewed changes

bwplotka requested a review from ywwg December 18, 2025 08:06

addressed comments

6693dcf

Signed-off-by: bwplotka <bwplotka@gmail.com>

aknuds1 reviewed Dec 19, 2025

View reviewed changes

Base automatically changed from bwplotka/scrapeloop-ref to main December 22, 2025 09:38

bwplotka requested review from ArthurSens, alexgreenbank, beorn7 and codesome as code owners December 22, 2025 09:38

This was referenced Jan 12, 2026

feat(teststorage)[PART4a]: Add AppendableV2 support for mock Appendable #17834

Merged

feat(storage)[PART4b]: add AppenderV2 to the rest of storage.Storage implementations + mock exemplar fix #17835

Merged

bwplotka closed this Jan 12, 2026

This was referenced Jan 12, 2026

refactor(remote)[PART4d]: switch Remote Write 1.0 to AppendableV2 #17837

Closed

test(remote)[PART4c]: Refactor remote handler tests to use teststorage.Appendable mock #17838

Closed

		if rollbackErr := appender.Rollback(); rollbackErr != nil {
		f.logger.Error("Squashed rollback error on commit", "err", rollbackErr)

	if rollbackErr := appender.Rollback(); rollbackErr != nil {
	f.logger.Error("Squashed rollback error on commit", "err", rollbackErr)
	f.logger.Error("Squashed secondary rollback error on commit", "err", rollbackErr)

-			f.logger.Error("Squashed rollback error on rollback", "err", rollbackErr)
+		secondaryErr := appender.Rollback()
+		switch {
+		case err == nil:
+			err = secondaryErr
+		case secondaryErr != nil:
+			f.logger.Error("Squashed secondary error on rollback", "err", rollbackErr)

	if errors.Is(err, storage.ErrOutOfOrderExemplar) {
	if errors.Is(e, storage.ErrOutOfOrderExemplar) {

Conversation

bwplotka commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does this PR introduce a user-facing change?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwplotka commented Dec 17, 2025

Uh oh!

ywwg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwplotka Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwplotka commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dual Appender Path or Switch

Concrete dual path tech debt problem

Proposal

Uh oh!

ywwg commented Dec 18, 2025

Uh oh!

bwplotka commented Dec 19, 2025

Uh oh!

bwplotka commented Dec 19, 2025

Uh oh!

aknuds1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aknuds1 Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywwg commented Dec 19, 2025

Uh oh!

bwplotka commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bwplotka commented Dec 12, 2025 •

edited

Loading

bwplotka Dec 18, 2025 •

edited

Loading

bwplotka commented Dec 18, 2025 •

edited

Loading

aknuds1 left a comment •

edited

Loading

aknuds1 Dec 19, 2025 •

edited

Loading