Skip to content

Add warnings (and annotations) to PromQL query results#12152

Merged
beorn7 merged 35 commits intoprometheus:mainfrom
zenador:bubblebobble
Sep 14, 2023
Merged

Add warnings (and annotations) to PromQL query results#12152
beorn7 merged 35 commits intoprometheus:mainfrom
zenador:bubblebobble

Conversation

@zenador
Copy link
Copy Markdown
Contributor

@zenador zenador commented Mar 19, 2023

Addresses some of #10839 and #11216 as a hackathon project.

Most of the code changes are just changing from storage.Warnings to notes.Warnings to move it to a more neutral location as the new warnings we are adding are not related to storage.

Annotations are created and handled separately from warnings here, but they are merged together with the warnings in the PromQL engine because the HTTP API and the UI don't support it yet and we want to avoid making external changes for now. By creating annotations separately though, it would make it easier in future when we want to display them differently.

Due to the way warnings are created, we end up with many duplicate warnings, or perhaps we find the warning is not needed later on, e.g. for the RangeTooSmallWarning, we may add this warning at each section of the range, or we may only add it at the beginning or end but still return a non-empty result for the query. For now we do the deduplication and the removal of the unnecessary warning at the final response writing step in the API.

New warnings (util/notes/warnings.go):

  • RangeTooSmallWarning = errors.New("Need at least 2 points to compute, perhaps time range is too small")
  • MixedFloatsHistogramsWarning = errors.New("Range contains a mix of histograms and floats")
  • MixedOldNewHistogramsWarning = errors.New("Range contains a mix of conventional and native histograms")
  • InvalidQuantileWarning = errors.New("Quantile value should be between 0 and 1")
  • BadBucketLabelWarning = errors.New("No bucket label or malformed label value")
  • PossibleNonCounterWarning = errors.New("Metric might not be a counter (name does not end in _total/_sum/_count)")

Possible future improvements:

  • We might want to change notes.Warnings ([]error) in the PromQL engine to structs that contain both warnings and annotations (the new notes.Notes) to delay the merging of warnings and annotations (which would be helpful if we eventually separate them in the final output), and to have it as an independent struct to make any future changes easier (not done yet as it is not strictly necessary at this stage and I'm not sure if we want to do this)
  • With the above change, we can do the deduplication at an earlier stage each time we merge Notes to avoid collecting a long list of duplicate warnings until the final step. With a struct, we can easily add internal maps to support the deduplication in the merge function.

Copy link
Copy Markdown
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. As you can see from the comments, I'm not sure yet how we should tackle this problem. Comments and thoughts from @codesome, @roidelapluie, or any other interested maintainer are highly welcome.

As a minor point: Apparently, the linter has given up on enforcing a doc comment for every exported type. However, for many of the new exported types in this PR, a doc comment would be really helpful. Could you go through them and add a doc comment wherever it appears even remotely plausible to do so?

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Apr 5, 2023

So, in summary, I think we have to crack to nuts here:

  1. How to deal with the "not enough samples in the range" warning/annotation.
  2. How to implement the plumbing for conveying warnings/annotations at all.

To reduce the problem surface, I think we should focus on (2) for now and only start thinking about (1) once the plumbing is in place.

WRT (2): To give some preference among the many thoughts I have expressed in the comments, my gut feeling is currently to…

  1. …not think about deduplication but go for more specific warnings/annotations so that deduplication might not be needed (but could be revisited if needed).
  2. …treat everything as Go errors utilizing the errors package tooling (which you have already done partially anyway).

In that way, we won't need any structures (like notes.Notes) but just specific error values and types. Maybe have an error value called PromQLAnnotation and then an (unexported) type promQLAnnotation with an Is method returning true for target == PromQLAnnotation.

Copy link
Copy Markdown
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies again for taking so long. This is a big deal, both in reward and effort, and smaller (also important) things kept preempting this bulky thing.

Thank you very much for addressing a lot of the detailed comments (and postponing the "range too short" issue). Before diving into a detailed code level review, let's find a way forward with the fundamental structure of the annotations. See my comments here. I'll also try to write down a proposal in the next hour or so. Maybe we should jump on a call to discuss our options in detail.

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Jun 1, 2023

Here some rough ideas how to approach the problem. Happy to discuss here or in a call to refine.

  • Let's make the annotation.Annotations type the only thing we hand around and propagate (essentially replacing storage.Warnings in the current upstream code). And let's get rid of annotations.Info and annotations.Errors.
  • annotations.Annotations could have map[string]error as the underlying type, with the stringified version of each error as the key, thereby providing de-duplication.
  • The errors we put into that map could wrap marker errors in a suitable way (see Go documentation and details below).

To expand on the last point:

Here is a slight variation of what you had already done:

var (
	PromQLInfo    = errors.New("PromQL info")
	PromQLWarning = errors.New("PromQL warning")

	BadBucketLabelWarning  = fmt.Errorf("%w: no bucket label or malformed label value", PromQLWarning)
	PossibleNonCounterInfo = fmt.Errorf("%w: metric might not be a counter (name does not end in _total/_sum/_count)", PromQLInfo)
)

func NewBadBucketLabelWarning(metricName, label string) error {
	return fmt.Errorf("%w: %s %s", BadBucketLabelWarning, metricName, label)
}

func NewPossibleNonCounterInfo(metricName string) error {
	return fmt.Errorf("%w: %s", PossibleNonCounterInfo, metricName)
}

For this example, I have only implemented two annotations here: BadBucketLabelWarning precisely as before, and NewPossibleNonCounterInfo as an info version of the previous PossibleNonCounterWarning.

Now a new version of the API could return the info annotations in a separate section by iterating through all annotations and testing them with errors.Is(annotation, annotations.PromQLInfo) and putting them into the new section if true or in the old warnings section if false, more or less like this:

for _, annotation := range annotations {
    if errors.Is(annotation, annotations.PromQLInfo) {
        // Add to info section.
    } else {
        // Add to warning section as before.
    }
}

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Copy link
Copy Markdown
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. This is a huge accomplishment.

I'll merge this now, but there is one more thing to do:

Rule evaluations have ignored warnings so far, which was probably fine as they only came from remote read. Now that warnings AKA annotations are much more useful for recording rules, we need to log and count the annotations (and maybe even display them in the rules page of the web UI). The starting point is probably the EngineQueryFunc, where we need to extract the annotations from the res and return it as an additional return argument.

@zenador
Copy link
Copy Markdown
Contributor Author

zenador commented Sep 14, 2023

Thank you for your thorough reviews and all your suggestions!

juliusv added a commit that referenced this pull request Oct 14, 2023
Related to PR #12152

Signed-off-by: Julius Volz <julius.volz@gmail.com>
juliusv added a commit that referenced this pull request Oct 15, 2023
)

Related to PR #12152

Signed-off-by: Julius Volz <julius.volz@gmail.com>
return enh.Out, annos.Add(annotations.NewMixedFloatsHistogramsWarning(metricName, args[0].PositionRange()))
}

if isCounter && !strings.HasSuffix(metricName, "_total") && !strings.HasSuffix(metricName, "_sum") && !strings.HasSuffix(metricName, "_count") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add metricName=="" as a case that is not warned.
This would happen for instance if the input to rate comes from a calculation, and not from a series.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... but is there a valid scenario where that can happen? Rates should always be done on raw counters coming from the storage before any other operation is done on them... to pass in a derived expression into rate() you would have to do a subquery to turn it into a range vector, and I'm not sure there's ever a case where it makes sense to do that for counters before rate-ing them?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second derivative?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmpf. Ok fine :)

LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
…metheus#12982)

Related to PR prometheus#12152

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
…metheus#12982)

Related to PR prometheus#12152

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
…metheus#12982)

Related to PR prometheus#12152

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
LeviHarrison pushed a commit to LeviHarrison/prometheus that referenced this pull request Oct 15, 2023
…metheus#12982)

Related to PR prometheus#12152

Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Levi Harrison <git@leviharrison.dev>
@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Oct 15, 2023

We already have #12945 and #12922 . Please discuss there. (I'm pretty sure we should not warn on nameless metrics. Once we have stronger typing in PromQL, we can do it properly.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants