Skip to content

insights: concerns#85345

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
matthewtodd:insights-concerns
Aug 8, 2022
Merged

insights: concerns#85345
craig[bot] merged 1 commit intocockroachdb:masterfrom
matthewtodd:insights-concerns

Conversation

@matthewtodd
Copy link
Copy Markdown

As we expand the previous outliers system to handle more insights, we
add a field to start capturing our "concerns," things we detect that
might be amiss.

Note that we don't yet expose these values in the crdb_internal virtual
tables of execution insights. We can do that next.

Release note: None

@matthewtodd matthewtodd requested a review from a team July 29, 2022 20:30
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @matthewtodd)


pkg/sql/sqlstats/insights/detector.go line 140 at r1 (raw file):

}

func (l latencyThresholdDetector) examine(s *Statement) (concerns []Concern) {

should you also have a function with hasConcerns? So whoever is using doesn't need to know what having concerns entails?
Right now is jus if length > 0, but we could potentially have different checks in the future (e.g. filter by specific concerns, etc)


pkg/sql/sqlstats/insights/registry.go line 87 at r1 (raw file):

	delete(r.mu.statements, sessionID)

	// TODO(todd): How careful do we need to be with this map allocation?

do you have any limitation somewhere else or checks for memory usage?


pkg/sql/sqlstats/insights/detector_test.go line 55 at r1 (raw file):

			&fakeDetector{stubExamine: []Concern{Concern_UnusuallySlow}},
		}}
		require.Equal(t, []Concern{Concern_Slow, Concern_UnusuallySlow}, detector.examine(&Statement{}))

does it make sense to have something unusually slow and slow? More in the sense that everything usually slow will also be "regular" slow, so might be worth adding just that one?

As we expand the previous outliers system to handle more insights, we
add a field to start capturing our "concerns," things we detect that
might be amiss.

Note that we don't yet expose these values in the crdb_internal virtual
tables of execution insights. We can do that next.

Release note: None
Copy link
Copy Markdown
Author

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag)


pkg/sql/sqlstats/insights/detector.go line 140 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

should you also have a function with hasConcerns? So whoever is using doesn't need to know what having concerns entails?
Right now is jus if length > 0, but we could potentially have different checks in the future (e.g. filter by specific concerns, etc)

Yes, I see what you mean -- I think that hasConcerns method would live downstream of here, to help with the interpretation of the concerns field. Let's keep an eye out for when we need something like that. (For now, I think we're just concerned with populating the list, not interpreting it.)


pkg/sql/sqlstats/insights/detector_test.go line 55 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

does it make sense to have something unusually slow and slow? More in the sense that everything usually slow will also be "regular" slow, so might be worth adding just that one?

Yeah, I was looking for better words here, hoping to distinguish between >100ms and >p99. But the UI design just calls everything "slow execution," so I've updated here.


pkg/sql/sqlstats/insights/registry.go line 87 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

do you have any limitation somewhere else or checks for memory usage?

Yes, especially in the LatencyQuantileDetector. What I was concerned about here was the cost of the map allocation, and whether I should try pooling them, but looking around on the internet suggests that probably doesn't make sense, so I've deleted the TODO.

Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)

@matthewtodd
Copy link
Copy Markdown
Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 8, 2022

Build succeeded:

@craig craig bot merged commit fbe4fea into cockroachdb:master Aug 8, 2022
@matthewtodd matthewtodd deleted the insights-concerns branch August 11, 2022 15:41
craig bot pushed a commit that referenced this pull request Aug 19, 2022
86415: insights: identify statements with high retry counts r=matthewtodd a=matthewtodd

Addresses #85827.

This change undoes the previous "concerns" concept from #85345 and
replaces it with a "problems" field.[^1][^2] We then mark slow statements
retried more than 5 times as having a "HighRetryCount" problem.[^3] Slow
statements without any specifically identified problems will have
"Unknown" in their list of problems. We will add support for further
problem detection in future commits.

[^1]: Every execution insight we offer happens because a statement was
  slow, so the "SlowExecution" concern was the only one we'd ever have.

[^2]: The protocol buffer changes are safe because we haven't deployed
  this code anywhere yet.

[^3]: This threshold may be configured via the
  `sql.insights.high_retry_count.threshold` cluster setting.

Release justification: Category 2: Bug fixes and low-risk updates to new
functionality

Release note (ops change): The new
`sql.insights.high_retry_count.threshold` cluster setting may be used to
configure how many times a slow statement (as identified by the
execution insights system) must have been retried to be marked as having
a high retry count.

Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants