-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql: Statement statistics should be persisted #56219
Description
Today, Statement statistics (e.g. count, retries, timings grouped by statement fingerprint) are accumulated in memory. When the AdminUI calls the Statement API on the gateway CRDB node, it will fan-out the call to all other CRDB nodes. Every node will return its in-memory statement statistics, and the gateway CRDB node will merge together the results and return them. At no time are these statement statistics persisted to a table.
There are several problems with this behavior:
- When CRDB nodes are upgraded or restart, they lose their accumulated statement statistics. This can cause unexpected results to be shown to the user that are missing key bits of information.
- Statement statistics are periodically cleared on the server, meaning that users can only get a limited history of statement statistics. Customers have repeatedly asked for a longer history, so they can examine trends going back hours or days.
- This design does not work well with multi-tenant clusters, since pure-SQL nodes do not know about one another, and will be brought up/down on a more frequent basis than single-tenant CRDB nodes.
One idea to address these problems is to periodically persist statement statistics to system tables that can later be queried by the user using SQL (e.g. similar to the "replication reports" system tables). This gives a rich query-ability experience and should scale up to large amounts of data. We could implement a configurable retention period / data volume to prevent runaway storage usage. Once we had this, the Statements API would simply query this table using SQL. Or the user could query this table directly using SQL.
One downside to the suggested approach is that data would not be as "fresh", since it would only be persisted periodically (say once every 30 or 60 seconds). If we need fresher data in the AdminUI, we could still do a fan-out that triggers persistence of the latest in-memory data (i.e. a "flush" operation).