sqlstats: include regions in statement_statistics.#92259
sqlstats: include regions in statement_statistics.#92259matthewtodd wants to merge 1 commit intocockroachdb:masterfrom matthewtodd:sqlstats-regions
Conversation
Part of #89949. Before this change, tenants did not have a way to see which regions their statements were executing in: the SQL Activity UI depends on `cockroach.server.serverpb.NodesUI`, which is not available for tenants, and there doesn't seem to be any way to gather the information via SQL. I am not sure that the approach I've taken here will be the one that makes the most sense architecturally, so I've laid out my thinking and would welcome some advice. An alternative path would be to implement the `NodesUI` endpoint for tenants, to allow the UI to continue working as is. This work has been suggested in #92065, but I am wondering if we perhaps don't want to be talking about "nodes" for serverless tenants at all? In the meantime, the concept of regions _is_ still meaningful for serverless tenants. So, we attach a `regions` array to the `statistics` JSON of `crdb_internal.statement_statistics` bearing the regions of the nodes the statement executed on. Note that, as of this writing, this change only includes the region of the gateway node. We will want to do more, but an immediate solution was not at hand, so I figured it was worth first validating this direction before pushing further. Release note (sql change): A `regions` field was added to the statistics column of `crdb_internal.statement_statistics`, reporting the regions of the nodes on which the statement was executed.
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 12 of 13 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)
-- commits line 23 at r1:
Seems like a reasonable approach to me and easy for gateway node. You will need to find a good approach to gather the remaining node/regions info, and check if that would cause any performance issues.
pkg/sql/executor_statement_metrics.go line 188 at r1 (raw file):
// TODO(todd): Is there a way to gather region info from the other nodes // the query traverses? (Compare with getNodesFromPlanner.) While the
can you check how the nodes endpoint does it? maybe something there that you can use as a guide
maryliag
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)
pkg/ui/workspaces/cluster-ui/src/util/appStats/appStats.ts line 199 at r1 (raw file):
: b.last_exec_timestamp, nodes: uniqueLong([...a.nodes, ...b.nodes]), regions: [],
you should be combining the value here, the same way we do for plan gist and idx recommendations
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @matthewtodd)
pkg/sql/executor_statement_metrics.go line 188 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
can you check how the nodes endpoint does it? maybe something there that you can use as a guide
We already collect the regions information in some cases (when sampling the stmt) for the telemetry, so we could piggy-back on that. This happens in populateQueryLevelStatsAndRegions and the regions info is stored in instrumentationHelper.regions. I believe we currently don't have other places where we compute this info.
There is #88498 to improve things further. I just looked into getAllNodeDescriptors (what we use for determining the region locality) and it appears to not work in multi-tenant environment, so it seems like instrumentationHelper.regions will be empty for serverless even when we do sample a stmt.
maryliag
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd and @yuzefovich)
pkg/ui/workspaces/db-console/src/views/statements/statements.spec.tsx line 530 at r1 (raw file):
}, nodes: [Long.fromInt(1), Long.fromInt(2), Long.fromInt(3)], regions: [],
better to use actual values, otherwise you won't see anything when testing
maryliag
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd and @yuzefovich)
-- commits line 33 at r1:
isn't this also adding to the system table?
|
Thank you both for your reviews! Closing this for now in favor of the stop-gap fix in #93268. |
93268: statusccl: temporarily serve /_status/nodes to tenants r=matthewtodd a=matthewtodd Fixes #92065 Part of #89949 Our current UI architecture has many of the SQL Activity pages mapping node or sql instance ids to regions at page load time, using the response from this `/_status/nodes` endpoint to get the information it needs. This approach is problematic for ephemeral serverless tenants but will need a broader reimagining, probably ending up with us storing locality information directly in many of the sqlstats tables, which will probably require something like sending instance localities along in the distsql tracing. An initial exploration in this direction happened in #92259. In the meantime, as a stop-gap measure, we implement a reduced version of this `/_status/nodes` endpoint for tenants, keeping the SQL Activity pages somewhat working for multiregion tenants. Release note: None Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
95449: sqlstats: include regions in statement_statistics r=matthewtodd a=matthewtodd Part of #89949. This completes the work abandoned in #92259, properly storing the regions in which a statement was executed, rather than waiting until UI render time to try to map (dangerously ephemeral) node IDs to their respective regions. With this work in place, we will be able to remove both the UI mapping code (usages of this [selector][]) and the stop-gap work of #93268. [selector]: https://github.com/cockroachdb/cockroach/blob/0f6333cbd6c78264a59dc435324c0c33d75ea148/pkg/ui/workspaces/cluster-ui/src/store/nodes/nodes.selectors.ts#L52-L64 Release note (sql change): A regions field was added to the statistics column of crdb_internal.statement_statistics, reporting the regions of the nodes on which the statement was executed. Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Part of #89949.
Before this change, tenants did not have a way to see which regions their statements were executing in: the SQL Activity UI depends on
cockroach.server.serverpb.NodesUI, which is not available for tenants, and there doesn't seem to be any way to gather the information via SQL.I am not sure that the approach I've taken here will be the one that makes the most sense architecturally, so I've laid out my thinking and would welcome some advice.
An alternative path would be to implement the
NodesUIendpoint for tenants, to allow the UI to continue working as is. This work has been suggested in #92065, but I am wondering if we perhaps don't want to be talking about "nodes" for serverless tenants at all?In the meantime, the concept of regions is still meaningful for serverless tenants.
So, we attach a
regionsarray to thestatisticsJSON ofcrdb_internal.statement_statisticsbearing the regions of the nodes the statement executed on.Note that, as of this writing, this change only includes the region of the gateway node. We will want to do more, but an immediate solution was not at hand, so I figured it was worth first validating this direction before pushing further.
Release note (sql change): A
regionsfield was added to the statistics column ofcrdb_internal.statement_statistics, reporting the regions of the nodes on which the statement was executed.