-
Notifications
You must be signed in to change notification settings - Fork 4.1k
reporting: capture scrubbed information about queries and schemas #13968
Copy link
Copy link
Closed
Milestone
Description
To gather information to guide work on optimizations, and to track which features are used, we want to receive more information about SQL queries from production clusters.
Here's how it looks like:
- reporting is enabled by default, users can opt-out
- server will push information to our registration server
- what information is pushed: SQL query distribution & database schemas.
- from every node in the cluster
- at a predefined interval (say, once per day) this info is encoded, compressed, sent to our reg server, then cleared on the node
- regarding "SQL query distribution": during execution, each node maintains internally (in memory) a map from "query structure" (see below for definition) to a tuple
- tuple: (count, first/last latency quantiles, nr rows, error status)
- count = how many times that structure was encountered
- first latency = from the point the query is received on pgwire to the point the first result is returned
- last latency = from the point the query is received on pgwire to the point the last result is returned
- SQL query structure: either SQL query syntax or logical plan (or both) are used as a key in the distribution map defined above, with a twist:
- all SQL Datums are replaced by placeholders;
- all db/table/column names are replaced by their numeric IDs
(We work under the assumption that applications always reuse a small set of query structure, but with a large variability on the constant parameters)
- database schemas are also reported, alongside with the distribution map defined above
- reports table IDs together with types, constraints, index definitions etc (names scrubbed away)
- default/check expressions also scrubbed from datums & names; all names replaced by equivalent IDs
- CLI utility or web UI allows users to inspect the data that is being collected.
Proposed approach by Peter:
- make application naming functional sql: support setting an application name for a session. #14085 pgwire: propagate application_name if set during connect. #14089
- equip the executor with per-app statistics sql: set up foundations for per-application statistics #14092
- store some rudimentary stats (count, error code, number of result rows) in a per-node struct with proper mutex sql: per-statement statistics, step two #14181
- expose the data via a virtual table sql: per-statement statistics, step two #14181
- make a function that scrubs a query sql: statement statistics, step three #14845
- make a function that scrubs the table/db names from the collected statics.
- add settings: how frequently it's reported, how frequently it's cleared
- add a cmdline flag to identify the operator/user
- add a setting for the user
- link the registration form to editing this user
- update the reg server schema to add a table
- update the reg server program/binary to accept the new queries
- reporting loop - upload the collected stats to reg server with user, clear them out
- document what is being reported - either in docs or admin UIs
Bonus features:
- make a function that scrubs a schema
- make a json endpoint in the debug interface
- make a CLI utility that dumps it
- submit data to reporting server + UI to opt-out
Reactions are currently unavailable