Skip to content

PromQL: Add fill*() binop modifiers to provide default values for missing series#17644

Merged
juliusv merged 8 commits intomainfrom
binop-fill-modifier
Jan 19, 2026
Merged

PromQL: Add fill*() binop modifiers to provide default values for missing series#17644
juliusv merged 8 commits intomainfrom
binop-fill-modifier

Conversation

@juliusv
Copy link
Member

@juliusv juliusv commented Dec 3, 2025

This starts implementing the proposal for default value binop modifiers throughout the stack, from the PromQL parser+engine all the way to the UI and PromLens visualizer.

So you can now do something like:

# Fill in missing series from either side with a default value of 0.
metric1 + fill(0) metric2

Or:

# Fill in missing custom rate thresholds with a default value of 23.
some_rates > fill_right(23) some_rate_thresholds

A few special considerations to note:

  • Other binop modifiers like on(), group_left(), etc. are reserved words that are NOT allowed as metric names. I did not want to break existing users, so I could not adopt the same behavior for the new fill modifiers. For example, a user might have an expression like foo + fill, where both are metric names, and that still needs to be parseable as a metric name. Thus I had to extend the lexer to only emit fill modifier tokens if there is a parenthesis following them. Maybe there's a better way of doing this directly in the grammar, but I think yacc works makes that hard? Input / experimentation by others welcome.
  • I am not implementing the parameterless variants for now, so you always have to include the parentheses with a value inside.
  • The current binOp fill implementation in the engine introduces a closure function in a place that might be performance-sensitive. It was the easiest way to implement things for now, but we need to benchmark this properly and determine if this is ok or whether the binop evaluation should be changed in such a way that it is centered around the match label hashbucket (similar to how Thanos does it), with both sides (LHS + RHS) being filled in for each bucket before performing the final binop operation (or filling in of missing series).

Other than those notes, progress is roughly:

  • Add the fill, fill_left, and fill_right modifiers to the PromQL lexer + parser.
  • Implement the evaluation logic for the new modifiers in the PromQL engine.
  • Update the Go-based PromQL printer / formatter to correctly render the new modifiers.
  • Update the PromQL grammar for the CodeMirror code editor to recognize the new modifiers.
  • Add autocompletion support for the new modifiers in the PromQL editor.
  • Add linting support for the new modifiers in the PromQL editor. (HALF DONE, can be completed later)
  • Support the new modifier in the PromLens-style tree view.
  • Support visualizing the new modifier behavior in the "Explain" tab for binary operator nodes.
  • Add unit tests and integration tests to cover the new functionality.
  • Update the Prometheus documentation to include the new modifiers and their usage.

Here's a screenshot of how it looks in the PromLens tree view and the matching visualization in the "Explain" tab:

image

Fixes #13625

Does this PR introduce a user-facing change?

[FEATURE] PromQL: Add `fill()` / `fill_left()` / `fill_right()` binop modifiers for specifying default values for missing series

@juliusv juliusv force-pushed the binop-fill-modifier branch from a1b8c38 to f03a45a Compare December 10, 2025 18:25
@juliusv juliusv changed the title [WIP] PromQL: Add fill*() binop modifiers to provide default values for missing series PromQL: Add fill*() binop modifiers to provide default values for missing series Dec 11, 2025
@juliusv
Copy link
Member Author

juliusv commented Dec 11, 2025

I added tests and documentation for the modifiers now and removed the [WIP] from the PR description, so I think it's ready for broader review.

For the closure that I introduced in the binop computation, it seems fine, or at least it doesn't seem to change the BenchmarkJoinQuery benchmark at all in terms of allocs or speed.

Before:

BenchmarkJoinQuery/expr=rpc_request_success_total_+_rpc_request_error_total/steps=10000-20         	       1	2612443093 ns/op	1580342576 B/op	  577780 allocs/op
BenchmarkJoinQuery/expr=rpc_request_success_total_+_ON_(job,_instance)_GROUP_LEFT_rpc_request_error_total/steps=10000-20         	       1	3083784580 ns/op	1977043056 B/op	20567736 allocs/op
BenchmarkJoinQuery/expr=rpc_request_success_total_AND_rpc_request_error_total{instance=~"0.*"}/steps=10000-20                    	       2	 521685498 ns/op	39702924 B/op	  306865 allocs/op
BenchmarkJoinQuery/expr=rpc_request_success_total_OR_rpc_request_error_total{instance=~"0.*"}/steps=10000-20                     	       1	1077586308 ns/op	518800192 B/op	  311608 allocs/op
BenchmarkJoinQuery/expr=rpc_request_success_total_UNLESS_rpc_request_error_total{instance=~"0.*"}/steps=10000-20                 	       2	 986880906 ns/op	39771716 B/op	  306864 allocs/op

After:

BenchmarkJoinQuery/expr=rpc_request_success_total_+_rpc_request_error_total/steps=10000-20         	       1	2619260793 ns/op	1580348944 B/op	  577797 allocs/op
BenchmarkJoinQuery/expr=rpc_request_success_total_+_ON_(job,_instance)_GROUP_LEFT_rpc_request_error_total/steps=10000-20         	       1	3126463664 ns/op	1977042880 B/op	20567734 allocs/op
BenchmarkJoinQuery/expr=rpc_request_success_total_AND_rpc_request_error_total{instance=~"0.*"}/steps=10000-20                    	       2	 509301824 ns/op	39703108 B/op	  306866 allocs/op
BenchmarkJoinQuery/expr=rpc_request_success_total_OR_rpc_request_error_total{instance=~"0.*"}/steps=10000-20                     	       1	1054013104 ns/op	518800112 B/op	  311607 allocs/op
BenchmarkJoinQuery/expr=rpc_request_success_total_UNLESS_rpc_request_error_total{instance=~"0.*"}/steps=10000-20                 	       2	 971900134 ns/op	39771844 B/op	  306863 allocs/op

A remaining issue is the interplay of delayed metric name removal (--enable-feature=promql-delayed-name-removal) and fill modifiers that @vpranckaitis pointed out: If a series exists at some time steps and not others, it will exist with its metric name (metric_a{case="2"}) when it is present and without the metric name ({case="2"}) when it has to be filled in. This is no problem at each single resolution step (as only one variant of the series will be present at any given time, either the real one or the filled-in one), but apparently the output vectors from multiple resolution steps are gathered into a single result matrix, and with delayed metric name removal, the metric name is removed only after that matrix has been gathered, causing collisions at that point in the process, when metric_a{case="2"} gets reduced to {case="2"}, which already exists in the matrix. Any advice on how to correctly address this is welcome.

@juliusv
Copy link
Member Author

juliusv commented Dec 11, 2025

CC @jcreixell as the author of the delayed metric name removal feature, would you be able to advise on the following?

A remaining issue is the interplay of delayed metric name removal (--enable-feature=promql-delayed-name-removal) and fill modifiers that @vpranckaitis pointed out: If a series exists at some time steps and not others, it will exist with its metric name (metric_a{case="2"}) when it is present and without the metric name ({case="2"}) when it has to be filled in. This is no problem at each single resolution step (as only one variant of the series will be present at any given time, either the real one or the filled-in one), but apparently the output vectors from multiple resolution steps are gathered into a single result matrix, and with delayed metric name removal, the metric name is removed only after that matrix has been gathered, causing collisions at that point in the process, when metric_a{case="2"} gets reduced to {case="2"}, which already exists in the matrix. Any advice on how to correctly address this is welcome.

There's never really a collision at each individual time step - only when merging different time steps together.

@bboreham
Copy link
Member

I believe Jorge is on paternity leave for a few months, so may not reply.

@roidelapluie
Copy link
Member

@juliusv do you have a unit test highlighting the issue so we can better help?

@juliusv
Copy link
Member Author

juliusv commented Dec 12, 2025

@roidelapluie Yes, I've noticed now that the problem is actually more general and does not require the new fill modifiers to be provoked. The problem is in general that we:

  • First gather all resolution steps over time in a big matrix,
  • ...only then apply modifications to that matrix (removing the metric name) that could result in some series names colliding across steps
  • ...and only then detecting that as a collision in ContainsSameLabelset:

    prometheus/promql/value.go

    Lines 321 to 341 in 583bc01

    // ContainsSameLabelset checks if a matrix has samples with the same labelset.
    // Such a behavior is semantically undefined.
    // https://github.com/prometheus/prometheus/issues/4562
    func (m Matrix) ContainsSameLabelset() bool {
    switch len(m) {
    case 0, 1:
    return false
    case 2:
    return m[0].Metric.Hash() == m[1].Metric.Hash()
    default:
    l := make(map[uint64]struct{}, len(m))
    for _, ss := range m {
    hash := ss.Metric.Hash()
    if _, ok := l[hash]; ok {
    return true
    }
    l[hash] = struct{}{}
    }
    return false
    }
    }

So you can also provoke it like this:

load 10m
    metric_a   1  _
    metric_b   3  4

eval range from 0 to 20m step 10m -metric_a or -metric_b
    {} -1  -4

Or if you want to provoke it with the new fill modifiers, you could do:

load 10m
    metric_a   1  _
    metric_b   _  4

eval range from 0 to 20m step 10m metric_a <= bool fill(0) metric_b
    {} 0  1

There's also more ways to provoke this, like:

load 10m
    metric_a   1  _
    metric_b   _  4

eval range from 0 to 20m step 10m -metric_a or on(__name__) -metric_b
    {} -1  -4

All these queries run fine without delayed metric name removal, but cause collision errors when delayed metric name removal is enabled. I mentioned the same example queries in #15855 already as well. So I think this is generally an issue that can and should be fixed separately from the new fill modifiers. I guess you'd either have to:

  • In the big final matrix, merge together series with the same label set that don't have overlapping sample values at the same timestamp.
  • Or do any transformations that could cause collisions before gathering everything into one big final matrix.

juliusv added a commit that referenced this pull request Dec 12, 2025
juliusv added a commit that referenced this pull request Dec 12, 2025
juliusv added a commit that referenced this pull request Dec 12, 2025
@juliusv
Copy link
Member Author

juliusv commented Dec 12, 2025

Btw. I built a tenative fix for merging colliding series after delayed metric name removal in this branch: https://github.com/prometheus/prometheus/commits/delayed-name-removal-series-merge/ - maybe I can still simplify it though, so not opening a PR yet. In general, I'm wondering whether we should keep delayed metric name removal at all. It just makes everything more confusing and less clean IMO. But that's again not really related to the fill modifiers :)

@roidelapluie
Copy link
Member

WDYT of #17678 ?

@jcreixell
Copy link
Contributor

Hi, as @bboreham mentioned, I am on parental leave until February and nowhere near a laptop to look into this in depth right now. I wanted to keep @beorn7 in the loop as he is really the brain behind this feature (I mostly followed his advice while implementing it).

In terms of whether to keep the feature or not, I can see that this has come up a few times, and I think it's a philosophical question on whether every processing step should be self contained and independent (which helps tooling like promlens, and might be more elegant and intuitive, particularly for devs) or whether we should defer some processing to the last step for efficiency reasons and to resolve some otherwise surprising (to the less experience user) query errors due to implementation details of the query engine.

I don't have a definitive answer and don't know much about the specifics on how multiple steps are merged together during query processing (intuitively, it sounds like having duplicated labels for a single step at the end of processing should indeed cause an error, but i need to look into this in more detail to fully understand it).

I just wanted to say that I am not married to this feature and I fully trust the team to make the right call on this, so please don't be blocked by me and do not hesitate to remove it if it doesn't make sense or creates too many problems.

And thank you @juliusv and @roidelapluie looking into this and proposing solutions 🙏

@juliusv
Copy link
Member Author

juliusv commented Dec 15, 2025

Hi, as @bboreham mentioned, I am on parental leave until February and nowhere near a laptop to look into this in depth right now. I wanted to keep @beorn7 in the loop as he is really the brain behind this feature (I mostly followed his advice while implementing it).

Thanks for chiming in, and I hope you are enjoying your parental leave :)

I just wanted to say that I am not married to this feature and I fully trust the team to make the right call on this, so please don't be blocked by me and do not hesitate to remove it if it doesn't make sense or creates too many problems.

Good to know! And yep, my opinion is also not super strong on this, but I do feel it has some complexity and understandability drawbacks at least that people should consider when deciding to keep it or not. But it's probably better discussed on the issues related to delayed metric name removal in due time.

…issing series

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
@juliusv juliusv force-pushed the binop-fill-modifier branch from f0d2ad4 to d6aa6a3 Compare January 15, 2026 06:59
@linasm
Copy link
Contributor

linasm commented Jan 15, 2026

Did a basic search, did not find anything, perhaps I has missed something - my question is:
was there a discussion on whether this should be rolled out as an experimental feature first, or is is going straight for GA?

@juliusv
Copy link
Member Author

juliusv commented Jan 15, 2026

@linasm I wanted to bring up that question today as well. While I am tempted to just merge it without a flag immediately, I think putting it behind an feature flag might actually be the safer and more responsible way to go. So far it feels like only a small handful of people have taken a look at the details, and it's probably best to not set things 100% in stone yet. You probably would agree?

@linasm
Copy link
Contributor

linasm commented Jan 15, 2026

@linasm I wanted to bring up that question today as well. While I am tempted to just merge it without a flag immediately, I think putting it behind an feature flag might actually be the safer and more responsible way to go. So far it feels like only a small handful of people have taken a look at the details, and it's probably best to not set things 100% in stone yet. You probably would agree?

In general, it is better to be safe than sorry, but I don't have a strong opinion on this one. More interested in what the maintainers would say.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
@juliusv
Copy link
Member Author

juliusv commented Jan 15, 2026

Alright, I added a new commit that puts the new modifiers behind a new promql-binop-fill-modifiers feature flag.

@bwplotka
Copy link
Member

bwplotka commented Jan 15, 2026

nit: Why not reusing experimental-promql-functions flag for this? It's is a function in some way (:

https://prometheus.io/docs/prometheus/latest/feature_flags/#experimental-promql-functions

@bwplotka
Copy link
Member

Chatted on Slack, it is modifier, so fine for another flag. I wanted to try to reduce the ff surface, but it's a bit of an abuse to reuse.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

The only nit is around test case for the metric names like `"fill + fill" that hits the special case we added.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
@juliusv
Copy link
Member Author

juliusv commented Jan 19, 2026

Will merge for now to get it in before more merge conflicts, since it's already approved and I only made the minimal requested changes. Happy to make follow-up changes of course, if desired.

@juliusv juliusv merged commit 1d3d98e into main Jan 19, 2026
50 checks passed
@juliusv juliusv deleted the binop-fill-modifier branch January 19, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: PromQL arithmetic with default value, or outer join

7 participants