insights: concerns#85345
insights: concerns#85345craig[bot] merged 1 commit intocockroachdb:masterfrom matthewtodd:insights-concerns
Conversation
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 2 of 5 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @matthewtodd)
pkg/sql/sqlstats/insights/detector.go line 140 at r1 (raw file):
} func (l latencyThresholdDetector) examine(s *Statement) (concerns []Concern) {
should you also have a function with hasConcerns? So whoever is using doesn't need to know what having concerns entails?
Right now is jus if length > 0, but we could potentially have different checks in the future (e.g. filter by specific concerns, etc)
pkg/sql/sqlstats/insights/registry.go line 87 at r1 (raw file):
delete(r.mu.statements, sessionID) // TODO(todd): How careful do we need to be with this map allocation?
do you have any limitation somewhere else or checks for memory usage?
pkg/sql/sqlstats/insights/detector_test.go line 55 at r1 (raw file):
&fakeDetector{stubExamine: []Concern{Concern_UnusuallySlow}}, }} require.Equal(t, []Concern{Concern_Slow, Concern_UnusuallySlow}, detector.examine(&Statement{}))
does it make sense to have something unusually slow and slow? More in the sense that everything usually slow will also be "regular" slow, so might be worth adding just that one?
As we expand the previous outliers system to handle more insights, we add a field to start capturing our "concerns," things we detect that might be amiss. Note that we don't yet expose these values in the crdb_internal virtual tables of execution insights. We can do that next. Release note: None
matthewtodd
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @maryliag)
pkg/sql/sqlstats/insights/detector.go line 140 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
should you also have a function with hasConcerns? So whoever is using doesn't need to know what having concerns entails?
Right now is jus if length > 0, but we could potentially have different checks in the future (e.g. filter by specific concerns, etc)
Yes, I see what you mean -- I think that hasConcerns method would live downstream of here, to help with the interpretation of the concerns field. Let's keep an eye out for when we need something like that. (For now, I think we're just concerned with populating the list, not interpreting it.)
pkg/sql/sqlstats/insights/detector_test.go line 55 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
does it make sense to have something unusually slow and slow? More in the sense that everything usually slow will also be "regular" slow, so might be worth adding just that one?
Yeah, I was looking for better words here, hoping to distinguish between >100ms and >p99. But the UI design just calls everything "slow execution," so I've updated here.
pkg/sql/sqlstats/insights/registry.go line 87 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
do you have any limitation somewhere else or checks for memory usage?
Yes, especially in the LatencyQuantileDetector. What I was concerned about here was the cost of the map allocation, and whether I should try pooling them, but looking around on the internet suggests that probably doesn't make sense, so I've deleted the TODO.
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)
|
bors r+ |
|
Build succeeded: |
86415: insights: identify statements with high retry counts r=matthewtodd a=matthewtodd Addresses #85827. This change undoes the previous "concerns" concept from #85345 and replaces it with a "problems" field.[^1][^2] We then mark slow statements retried more than 5 times as having a "HighRetryCount" problem.[^3] Slow statements without any specifically identified problems will have "Unknown" in their list of problems. We will add support for further problem detection in future commits. [^1]: Every execution insight we offer happens because a statement was slow, so the "SlowExecution" concern was the only one we'd ever have. [^2]: The protocol buffer changes are safe because we haven't deployed this code anywhere yet. [^3]: This threshold may be configured via the `sql.insights.high_retry_count.threshold` cluster setting. Release justification: Category 2: Bug fixes and low-risk updates to new functionality Release note (ops change): The new `sql.insights.high_retry_count.threshold` cluster setting may be used to configure how many times a slow statement (as identified by the execution insights system) must have been retried to be marked as having a high retry count. Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
As we expand the previous outliers system to handle more insights, we
add a field to start capturing our "concerns," things we detect that
might be amiss.
Note that we don't yet expose these values in the
crdb_internalvirtualtables of execution insights. We can do that next.
Release note: None