ui: add an Overload dashboard#66595
Conversation
|
pkg/ui/src/views/cluster/containers/nodeGraphs/dashboards/overload.tsx, line 128 at r1 (raw file):
I couldn't figure out how to group these so I can show both metrics for n1, then both metrics for n2, etc. I tried putting both |
dhartunian
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)
pkg/ts/catalog/chart_catalog.go, line 2271 at r1 (raw file):
"rocksdb.compacted-bytes-read", "rocksdb.compacted-bytes-written", "rocksdb.flushed-bytes",
Minor: why is this repeated here?
pkg/ui/src/views/cluster/containers/nodeGraphs/dashboards/overload.tsx, line 128 at r1 (raw file):
Previously, RaduBerinde wrote…
I couldn't figure out how to group these so I can show both metrics for n1, then both metrics for n2, etc. I tried putting both
<Metric>s in the same=> ( ), not sure what I'm doing wrong.
You might have been missing one of: {} instead of (), return in the lambda, or <> to group the multiple <Metrics> into a single JSX return object. If your lambda looks like the following it should work:
(nid) => {
...
return (
<>
<Metric ... />
<Metric ... />
</>
)
}
If it doesn't maybe try flatmap. I think the code that renders the plots, iterates through the children of <Axis> so it might get confused if they're grouped in 2s:
nodeIDs.flatMap((nid) => {
...
return [
<Metric ... />,
<Metric ... />
]
})
7133832 to
45ffa45
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @sumeerbhola)
pkg/ts/catalog/chart_catalog.go, line 2271 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Minor: why is this repeated here?
Oops, no idea. Fixed.
pkg/ui/src/views/cluster/containers/nodeGraphs/dashboards/overload.tsx, line 128 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
You might have been missing one of:
{}instead of(),returnin the lambda, or<>to group the multiple<Metrics>into a single JSX return object. If your lambda looks like the following it should work:(nid) => { ... return ( <> <Metric ... /> <Metric ... /> </> ) }If it doesn't maybe try flatmap. I think the code that renders the plots, iterates through the children of
<Axis>so it might get confused if they're grouped in 2s:nodeIDs.flatMap((nid) => {
...
return [
<Metric ... />,
<Metric ... />
]
})</blockquote></details> Your `<>` sorcery worked! Thanks! <!-- Sent from Reviewable.io -->
45ffa45 to
ed4b235
Compare
|
@sumeerbhola let me know if this is ok with you. This is not necessarily what I'm proposing we'll have in the release, but at least for now having all these on the same page would be useful to understand what admission control is doing. |
I meant to look at this on monday but am very behind on things because of oncall. |
sumeerbhola
left a comment
There was a problem hiding this comment.
sorry about the delay
Reviewed 3 of 5 files at r1, 2 of 2 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @RaduBerinde)
pkg/kv/kvserver/metrics.go, line 410 at r2 (raw file):
} metaRdbL0Sublevels = metric.Metadata{ Name: "rocksdb.l0-sublevels",
There is no concept of sub-levels in rocksdb. I think we should start adding new metrics like this as pebble.l0-sublevels even though it potentially hurts discoverability.
@petermattis @itsbilal @jbowens do you have an opinion?
pkg/ui/src/views/cluster/containers/nodeGraphs/dashboards/overload.tsx, line 36 at r2 (raw file):
} = props; return [
I am wondering whether sql latency should be on this dashboard. I think of an overload dashboard as something that one goes to to see if there is a resource overload, after one has seen some bad externally visible behavior (like high sql latency) on other dashboards. So it would focus on "resources": cpu, LSM health, root-sql BytesMonitor capacity and usage, (and when we have it) contention metrics for latches and locks.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @RaduBerinde, and @sumeerbhola)
pkg/ui/src/views/cluster/containers/nodeGraphs/dashboards/overload.tsx, line 36 at r2 (raw file):
Previously, sumeerbhola wrote…
I am wondering whether sql latency should be on this dashboard. I think of an overload dashboard as something that one goes to to see if there is a resource overload, after one has seen some bad externally visible behavior (like high sql latency) on other dashboards. So it would focus on "resources": cpu, LSM health, root-sql BytesMonitor capacity and usage, (and when we have it) contention metrics for latches and locks.
I agree with you in principle but I wanted to put them here for practical reasons - many times you are looking for correlations between the latency/throughput graphs and overload-related stats. It's hard to look for that if you have to keep switching between dashboards. It's even harder to extract a screenshot that shows a correlation.
If we had a way to see two dashboards one below the other, this wouldn't be necessary. Maybe I should file an issue about that?
|
pkg/kv/kvserver/metrics.go, line 410 at r2 (raw file): Previously, sumeerbhola wrote…
Yeah I agree, we shouldn't be naming more things |
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @jbowens, @petermattis, and @RaduBerinde)
pkg/ui/src/views/cluster/containers/nodeGraphs/dashboards/overload.tsx, line 36 at r2 (raw file):
Maybe I should file an issue about that?
Yes, I would prefer that to be solved in a different manner. This issue occurs already, and I don't think we should pollute our curated dashboards to attempt to solve it.
One approach I've seen is a pin beside each graph which one can click. All the pinned graphs can then be seen on a separate pinned dashboard.
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @jbowens, @petermattis, and @sumeerbhola)
pkg/kv/kvserver/metrics.go, line 410 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Yeah I agree, we shouldn't be naming more things
rocksdb.at least. I'd also prefer calling itstorage.l0-sublevelsas I'm not sure how discoverable thepebblename is, but either of those two is better thanrocksdb.
What would be the consequences of renaming all existing rocksdb.* to pebble. or 'storage.'?
In any case, seems like I'd be setting a precedent here so I want to make sure I make the right choice for these new stats.
pkg/ui/src/views/cluster/containers/nodeGraphs/dashboards/overload.tsx, line 36 at r2 (raw file):
Previously, sumeerbhola wrote…
Maybe I should file an issue about that?
Yes, I would prefer that to be solved in a different manner. This issue occurs already, and I don't think we should pollute our curated dashboards to attempt to solve it.
One approach I've seen is a pin beside each graph which one can click. All the pinned graphs can then be seen on a separate pinned dashboard.
Filed #66798 and I'll remove the SQL Queries and SQL Latency graphs from this dashboard.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @jbowens, @petermattis, and @sumeerbhola)
pkg/kv/kvserver/metrics.go, line 410 at r2 (raw file):
Previously, RaduBerinde wrote…
What would be the consequences of renaming all existing
rocksdb.*topebble.or 'storage.'?In any case, seems like I'd be setting a precedent here so I want to make sure I make the right choice for these new stats.
I don't think we want to rename existing ones. Customers may have tooling or custom dashboard links that rely on the existing names. So we'll have a mix of prefixes: rocksdb and pebble/storage.
f620f37 to
22a23fe
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @jbowens, @petermattis, and @sumeerbhola)
pkg/kv/kvserver/metrics.go, line 410 at r2 (raw file):
Previously, sumeerbhola wrote…
I don't think we want to rename existing ones. Customers may have tooling or custom dashboard links that rely on the existing names. So we'll have a mix of prefixes: rocksdb and pebble/storage.
Renamed the new ones to storage.*
pkg/ui/src/views/cluster/containers/nodeGraphs/dashboards/overload.tsx, line 36 at r2 (raw file):
Previously, RaduBerinde wrote…
Filed #66798 and I'll remove the
SQL QueriesandSQL Latencygraphs from this dashboard.
Done.
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @dhartunian, @jbowens, @petermattis, and @sumeerbhola)
e6c79fc to
d431e43
Compare
This commit adds an Overload dashboard in the metrics view. This is intended to be a convenient way to monitor admission control. The dashboard contains: - CPU Percent - Runnable Goroutines per CPU - L0 Sublevels and Files Release note (ui change): a new Overload dashboard groups metrics that are useful for admission control.
d431e43 to
92bc49c
Compare
|
bors r+ |
|
Build failed: |
|
Some acceptance flake. bors r+ |
|
Build succeeded: |
This commit adds an Overload dashboard in the metrics view. This is
intended to be a convenient way to monitor admission control.
The dashboard contains:
Release note (ui change): a new Overload dashboard groups metrics that
are useful for admission control.