-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Description
This is meant as an umbrella issues for adding warnings (and possibly non-warning just-informational annotations) to PromQL results. (I failed to find an existing issue.) If parts of this turn out to be more involved, separate issues might be filed for it.
This topic was discussed at the dev summit, as can be seen in the meeting notes. Very generally, there is a long-standing desire to “explain” the results of a PromQL query better. Most commonly, this would affect things that went wrong in a way that isn't an outright error. That's where the term “warning” is coming from. But there are also cases where the explanation isn't even a warning, e.g. performance stats about the evaluation or (relevant with the new histograms) information about the accuracy of a result (based on the resolution of the histograms involved).
The effort has two parts: (1) Providing the plumbing to deliver the warnings (or “annotations” or whatever we call them). (2) Actually issuing warnings/annotations.
Plumbing
The query API already has a warnings field (only used for remote-read issues so far) and returns stats. What's missing is support in the UI to display warnings and stats, and maybe a notion of “annotations” for valuable information that isn't supposed to flag anything “wrong” so that framing it as a warning would be confusing.
A concern is that the promql.Result type contains the warnings as storage.Warnings, presumably because the warnings are currently only used for issues coming from remote-read. With more warnings (or even “annotations”), they will also come from other sources, like the PromQL engine itself, and the storage.Warnings type would be misleading.
Actual warnings/annotations to add
Reasons for warnings expressed in the past include:
- Explain a failed label match (or any label match).
- Warn about technically correct but nonsensical usage (e.g.
quantile(10, foo)). - Warn about applying gauge functions to counters or counter function to gauges (based on the naming (
..._total) or even based on what's stored in the metadata buffer, it would of course be better to have a proper persistent metadata storage). - Warn about
rateand similar calculations that fail for a lack of samples covered by the used range.
Some of those are not trivial. E.g. the last point about not enough samples to calculate a rate shouldn't warn if it happens for a “legitimate” reason (e.g. the series ends or starts within the range or even outright before or after the range), but if “simply” extending the range a bit would allow the calculation to succeed, a warning would be helpful. Finding out about that might be costly, so the we might not want those warnings to be “on by default”… (Again, a fully-fledged metadata storage would help, which could know when a series starts or ends and what intended scrape interval it has.)
The new sparse histograms add a whole lot of more opportunities to warn:
- New histogram samples are mixed up with conventional samples.
- Incompatible bucket layouts prevent an aggregation (over time or between different histograms).
- …