feat(promql): offset from time expressions#15899
Conversation
17831fb to
7f5188e
Compare
Related to #12318 . Allow using a PromQL expression for setting the offset, provided that: the expression evaluates to a scalar the expression does not contain vector selector (no TSDB access) the expression does not call the info() function (no TSDB access) the time() function is only called on the top level of the expression, not inside a subquery We'll call such expression a time expression. During parsing we no longer set OriginalOffset field in sub-queries and vector selectors, instead we set a new OriginalOffsetExpr field to the time expression, even if the expression is a literal number. Before evaluating the overall PromQL expression, the engine shall evaluate the time expressions and set the OriginalOffset field in vector selectors and subqueries to the calculated value. This makes the change fairly non intrusive. Later we can change the code to start using the expression more directly. Note: this prohibits using time() function inside a sub-query as the evaluation time is not constant inside a sub-query. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
7f5188e to
2f144dd
Compare
beorn7
left a comment
There was a problem hiding this comment.
Very cool that this works, and fascinating to read. I just have no idea if this is a sane approach. We should definitely loop in some experts to tell us if so. Maybe @juliusv or @aknuds1 for starters?
@juliusv can probably also tell us if there will be problems mirroring this in the frontend code (for syntax highlighting, auto completion, etc.). Which we need to do before merging this. And documentation, of course…
| } | ||
| return fmt.Sprintf("%s%s", sign, model.Duration(time.Duration(v)*time.Second)) | ||
| } | ||
| return e.String() |
There was a problem hiding this comment.
I guess we could have a DurationString method that works like the original String method just that it does what you have implemented above whenever it hits a NumberLiteral leave. (Not meant as an action item for this PR, just thinking aloud.)
There was a problem hiding this comment.
I'll see the need when moving on to range:step and timestamps.
promql/engine.go
Outdated
| return output, bufHelpers | ||
| } | ||
|
|
||
| // timeValueOf calculates the scalar time value of an expression in float64. |
There was a problem hiding this comment.
Currently, it's more a duration than a time value. I guess later we want to use the same for timestamps (with @), so maybe timeValue is a good word, but this should explain that it might stand for a duration in seconds or a timestamp in seconds.
|
Another thought about how to merge and deliver this: Is there a good way to put this behind a feature flag? This is one of those things that might easily kreep in because it is so handy, i.e. people might just use and like it in large numbers. And then it will be hard to pull out again in case we realize it was a dead alley, even if we marked it as experimental, because people might have used it without knowing it was experimental. On the other hand, we did exactly that with the "numbers and timestamp/durations are the same" thing (declared it experimental without hiding it behind a feature flag). We were reasonably confident that we wouldn't change the feature dramatically. I guess the same is true here. And most importantly, of course: We don't change any behavior of existing valid PromQL expressions. So maybe we could go without a feature flag. |
I think we can add a feature flag and already error out in the parser if it's not set and you still try to use time expression. |
promql/engine.go
Outdated
| if nl, ok := expr.(*parser.NumberLiteral); ok { | ||
| return nl.Val, nil | ||
| } |
There was a problem hiding this comment.
I think we should check for a NumberLiteral before we check for the storage/eval time dependencies since that's likely going to be the most common expression type
promql/engine.go
Outdated
| } | ||
|
|
||
| func (tev timeExprVisitor) Visit(node parser.Node, path []parser.Node) (parser.Visitor, error) { | ||
| checkTimeExpr := func(expr parser.Expr) (float64, error) { |
There was a problem hiding this comment.
Why is checkTimeExpr is defined within Visit? It doesn't use any of the Visit variables/parameters.
Also I think evaluateTimeExpr would be a better name, since the result of the expression is returned
There was a problem hiding this comment.
I've updated the name in 7cd13e2, but kept in Visit since that's the only function that uses this function
| if dur.Type() != ValueTypeScalar { | ||
| yylex.(*parser).unexpected("offset", "time expression does not evaluate to a scalar") | ||
| } |
There was a problem hiding this comment.
For function calls, type check of args is done in checkAST(). For consistency, I'd put the type checking of the time expression there too.
But as we discussed IRL it's not required. Skipping feature flag for now. |
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
|
Note: Rebasing this should fix the CI tests. |
|
@krajorama do you plan to update the "frontend" in this PR, too? (I can try and dig out similar changes for reference, if you don't know those already.) It worries me if we merge something in PromQL itself that isn't supported in the UI (so you get red squiggly lines when trying the new feature, and autocomplete doesn't work). |
beorn7
left a comment
There was a problem hiding this comment.
@fionaliao is already deeper in the parser woods than I would ever get, so I'll leave this part to her.
My one concern is about the time expression always being parenthesized. That's fine for the usual a + b, but it will feel weird and inconsistent if the parentheses aren't really needed from a syntax perspective.
And of course, we need the UI update (where the parser is mirrored), and it would give me peace of mind of @juliusv could double-check for any concerns.
| subqueries. For example, 5 minute offset may be calculated from adding 2 minutes | ||
| and 180 seconds together: | ||
|
|
||
| http_requests_total offset (2m + 180) |
There was a problem hiding this comment.
The reader might wonder how this is of any use. ("I could just write 5m. 🤔")
Maybe we should add that this is useful if queries are generated via some templating system that isn't capable of doing this kind of math on its own.
And maybe we should add that we plan to add support for time expressions to @ and to range selectors and sub-queries. (Or maybe not. Just anticipating what users will ask when they read this.)
There was a problem hiding this comment.
I think it's useful to clarify, updated 229652a
I also added your suggestion of mentioning we are planning to support it in more cases and also linking to the issue, maybe that way we might get some comments from people who are interested in the additional cases
| if v[0].H != nil { | ||
| return 0, ErrTimeExprNotAFloat("vector with a histogram element") | ||
| } | ||
| return v[0].F, nil |
There was a problem hiding this comment.
Do we actually need to support this "one element vector" case? In the documentation, you say a time expression has to be a scalar. It's easy for the user to just say offset scalar(one_element_vector) if they need to. I would even say that allowing a one element vector with the implicit conversion here could be dangerous.
So maybe really just allow "proper" scalars, change the error to ErrTimeExprNotAScalar and then also get rid of all the elaborate error generation here because the user will easily see their mistake if they get the general ErrTimeExprNotAScalar error.
There was a problem hiding this comment.
As far as I can tell, eval() mostly returns Matrix types, even if e.Type() == parser.ValueTypeScalar. For example binary exprs between scalars still use rangeEval which returns a Matrix with a single value.
Lines 2061 to 2067 in 74d7d37
Line 1308 in 74d7d37
The check done in
prometheus/promql/parser/generated_parser.y
Lines 450 to 457 in 2f144dd
should ensure the value returned is a scalar (i.e. only a single value being returned for a vector or matrix) even if the actual value received is within a matrix. So we shouldn't reach the cases where there are multiple values returned.
There is also an additional check+panic in scalarValueOf
Lines 4008 to 4011 in 0efd36e
| yylex.(*parser).addOffset($1, dur) | ||
| $$ = $1 | ||
| } | ||
| | expr OFFSET paren_expr |
There was a problem hiding this comment.
Does this mean that the time expression has to be in parentheses? For example, if I need to use some PromQL functions to calculate my offset, I could shoehorn that like scalar(sqrt(vector(2))), but could I then write foo offset scalar(sqrt(vector(2))), or do I have to write foo offset (scalar(sqrt(vector(2)))) ? Looks like the latter.
There was a problem hiding this comment.
Yes, the latter.
The problem is that currently when you write metric offset 5m + 1, metric offset 5m will be evaluated first and then + 1. Allowing offset expressions without parentheses would mean that 5m + 1 could be evaluated first and then the offset part, which will make old queries behave differently. The old queries would have to be rewritten in the format (metric offset 5m) + 1.
I can see this being confusing because a single number_duration_literal doesn't need brackets while everything else does. But I think it's better than the alternative.
There was a problem hiding this comment.
I'm fully onboard with offset (5m +1 ). What I'm talking about are the other expression I mentioned, namely foo offset scalar(sqrt(vector(2))). What I want is that the latter works, and offset (5m +1 ) works, and metric offset 5m + 1 would still mean that we are having an offset of 5m, then evaluate metric at that time, and then we add 1 to the result. Or in other words, it should behave "as expected" :)
There was a problem hiding this comment.
Hmm, maybe the problem is that offset isn't formally an operator, but a modifier. So it's not part of the operator precedence rules. But maybe we can still make it work? foo offset scalar(sqrt(vector(2))) doesn't even contain an operator. Is "function_call" a thing in the parser code? (Sorry for being so illiterate with parser generation.)
There was a problem hiding this comment.
I've tried messing around in yacc to get this to work, haven't got there yet, but I noticed one thing around subqueries
some_metric offset 1m [1m:] currently means (some_metric offset 1m) [1m:]
While for binary operators, vector(0) + vector(0)[1m:] currently means vector(0) + (vector(0)[1m:]) (which errors)
If we treat offsets like binary operators then some_metric offset 1m [1m:] should mean some_metric offset (1m [1m:]) which is probably not what we want
We could have different rules for offsets vs other binary operators, but that starts to get confusing.
There was a problem hiding this comment.
Are you suggesting we just allow functions and number literals without parentheses? And anything else (e.g. 1 + 1) has to go in parentheses to be evaluated as part of the offset time expression?
So like krajo/duration-arithmetic...fionaliao-function-bracketless
There was a problem hiding this comment.
Yes, I think that will help. (Or maybe that's exactly what I want.)
So yeah, we cannot make offset formally an operator. It's a modifier. But we can probably say that the time expression has to be in parentheses if it is necessary to avoid ambiguities. (Later, when we will use time expressions as duration in range selectors, we won't need parentheses in those cases. So we cannot say that time expressions generally have to be in parentheses if they contain an operator or something.)
There was a problem hiding this comment.
If we were introducing offsets for the first time and decided at that point that we wanted time expressions too, I would have preferred all expression types including just durations like 5m to be inside parentheses for less ambiguity.
With the current context, where numbers/durations are allowed without parentheses, and also with the @ modifier allowing start() and end() without parentheses, I'm okay with the rule being functions/numbers/durations don't need to be bracketed, everything else does.
I would wait for @krajorama's opinion as well though, in case he had additional reasons for the parentheses in all non-number and duration cases
There was a problem hiding this comment.
I used parenthesis for everything other than number literals for simplicity and get to a working MVP.
To me the question is about documentation , UX: what's more complicated/confusing ?
- offset time expression is a number or duration or an expression in parenthesis
- offset time expression is a number, duration or function call or an expression inside parenthesis
- offset time expression must be inside a parenthesis if otherwise the expression is ambiguous
I don't really want to force the user to figure out what's ambiguous and what's not, so I think we'll end up with something concrete (first two options or maybe a wider list?) anyway.
I can also agree to
With the current context, where numbers/durations are allowed without parentheses, and also with the @ modifier allowing start() and end() without parentheses, I'm okay with the rule being functions/numbers/durations don't need to be bracketed, everything else does.
I'll implement that.
There was a problem hiding this comment.
I guess if we designed PromQL from scratch, we might have required parentheses around the "argument" of offset in all cases. We essentially didn't do that back then because we only ever allowed a duration literal as that "argument". Now it would be a pretty big change to suddenly require offset(5m) rather than offset 5m.
What I would argue for is to do something that meets what users are used to from other languages. And the rule there is usually that you are always allowed to use parentheses so that you don't have to care about operator precedence and other ambiguities, but that there is a set of rules in place to resolve those ambiguities if there are no parentheses present. And that's what I would like to implement. For offset, I think the quoted rule is equivalent in effect, i.e.:
With the current context, where numbers/durations are allowed without parentheses, and also with the @ modifier allowing start() and end() without parentheses, I'm okay with the rule being functions/numbers/durations don't need to be bracketed, everything else does.
It gets more interesting when we arrive at durations in a range selector. I guess we all agree that we want to allow foo[5m + 15s] and not enforce foo[(5m + 15s)] (although the latter should still be possible, of course). So the rule in the end cannot be that a time expression has parentheses unless it's just a single number or duration literal or a function call. The rule in the end will be that a time expression requires parentheses to avoid ambiguity (with the recommendation to use parentheses in doubt). Having different rules depending on the context is IMHO more confusing.
While @Nexucis built & understands the autocompletion/highlighting/linting UI code way better than me, I suspect that it should be doable. Maybe some invalid queries (like accessing the TSDB) would only be detected during runtime evaluation and not by the linter though. But since you asked me in general, my larger question (as an old curmudgeon) is of course going to be around whether it's is worth adding the conceptual complexity and special-casing ("it's like a normal expression but it's not") around a niche use case, or if this is actually a pretty common use case that I'm not aware of. I'm just wondering that because it's the first time I've seen that linked issue and someone requesting something like that. And in that issue it's still not clear to me whether they could solve their use case by just setting the overall query time parameter, or whether they require a range query with some regularly moving selectors and others that have a computed fixed timestamp. |
To clarify, we just took the linked issue as the one umbrella issue, but arithmetic on durations and timestamps has been asked for repeatedly and for a long time. The more appropriate reference is probably my exploratory doc. A fairly regular request I see in my Grafana bubble comes from Grafana users that use Grafana template variables and want to multiply them or add a number to them, but that's not supported in the Grafana templating. (It might be at some point in the future, but Prometheus might just have more velocity here, and there are also other reasons for Prometheus to have this.) Another more visionary use case is utilization of
"Both", kind of… I think this is one of the more common remaining use cases (note that this PR only implements The other thing is that we also want to make "any" expression work (if that's feasible). So you could store a range interval in another series (like we do for alert thresholds) and things like that. Then the special case would disappear. |
|
@beorn7 Thanks for the additional context, yeah the Grafana multiplication use case makes the most sense to me.
In general, this would mean that we are splitting what is currently a single expression node into a parent and a child expression (with potentially more children). So in a PromLens-style tree view, even ...would turn into something like: ...unless we special-case simple duration literals again or choose to display them differently. Same for |
Yes, that's the idea at the core, so really unavoidable unless we don't do the feature at all. As a UI improvement, it sounds OK to me to collapse the child structure into the current representation if a duration or timestamp is just a literal. |
Co-authored-by: Björn Rabenstein <beorn@grafana.com> Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
5f30055 to
5940b31
Compare
I guess that could work ok, although it was always nice to say "Everything that is its own PromQL expression gets its own box in PromLens and can be selected and evaluated/explained individually". Special cases always necessitate special explanations to users and can be confusing. And it'll be easier to handle for the pure tree display case than in-tree editing, where you need to be able to go back and forth between a full expression with children and a simple inlined literal (not sure if I or someone else would get to implementing that in the full PromLens anytime soon TBH). I'm still slightly worried about missing something else down the line where this could create issues, but that's more of an unknown unknown :) |
|
if @krajorama you don't want to update the UI, I will handle it no worries |
|
fyi - I been working on this PR while Krajo is on PTO |
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
99540bc to
229652a
Compare
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
| panic("parser should have discarded non scalar time expression") | ||
| } | ||
|
|
||
| // TODO(krajorama): would we ever get annotations here? |
There was a problem hiding this comment.
Currently this could happen in an extremely contrived case like offset (scalar(limit_ratio(-2, vector(0))))
Note some tests are currently failing (giving confusing error messages). This will be modified in a later commit. Also fixed some linting errors. Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Using addParseErrf instead of unexpected so the ")" doesn't get reported as the problem. Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
de5b98e to
446d0b2
Compare
| { | ||
| dur, _ := $3.(*ParenExpr) | ||
| if dur.Type() != ValueTypeScalar { | ||
| yylex.(*parser).addParseErrf($3.PositionRange(), "offset time expression does not evaluate to a scalar") | ||
| } | ||
| yylex.(*parser).addOffset($1, dur) | ||
| $$ = $1 | ||
| } |
There was a problem hiding this comment.
I haven't moved this logic to checkAST(), but I did make a change to the error reporting in 446d0b2
Previously if the value wasn't a scalar, the error would be 1:36: parse error: unexpected ")" in offset, expected time expression does not evaluate to a scalar. The second half of the error message is correct, but it reports the problem being the ). This is because unexpected() logs the last item produced by the lexer. Instead, changed to using addParseErrf(). I'm not 100% happy with that either because the error is reported as parse error: ... but it's a type issue rather than a problem with the structure of the query so it's not really a parse error. I've done it this way because that's how all errors around the parser is being reported, including for type issues with binops in checkAST().
An additional change was to add $$ = $1 to the paren_expr rule. It doesn't seem necessary (because the node is modified in-place in the parser stack, and the stack position that $$ would write to already contains the same modified node), adding this makes the code more consistent with other rules.
That would be appreciated thank you. Please note that we have changed the syntax a little bit , see #15899 (comment) . We have an event next week at my work, but I'll get back to the implementation after that. |
|
Closing this in favor of #16249 |
Please squash in when merging.
Related to #12318 .
In this iteration we implement the following as a first step:
Allow using a PromQL expression for setting the
offset, provided that:info()function (no TSDB access)time()function is only called on the top level of the expression, not inside a subqueryWe'll call such expression a time expression.
During parsing we no longer set
OriginalOffsetfield in sub-queries and vector selectors, instead we set a newOriginalOffsetExprfield to the time expression, even if the expression is a literal number.Before evaluating the overall PromQL expression, the engine shall evaluate the time expressions and set the
OriginalOffsetfield in vector selectors and subqueries to the calculated value. This makes the change fairly non intrusive. Later we can change the code to start using the expression more directly. Note: this prohibits usingtime()function inside a sub-query as the evaluation time is not constant inside a sub-query.Literals: https://prometheus.io/docs/prometheus/latest/querying/basics/#float-literals-and-time-durations