Is your feature request related to a problem? Please describe.
Currently, we have a number of crdb_internal.* tables that exposed user-defined SQL statements. These are useful for debugging purposes.
While some of these statement strings have been anonymized to remove any user-supplied constants (e.g. something like SELECT * FROM x WHERE name = $1 instead of SELECT * FROM x WHERE name = 'Pat'), some tables expose the user-supplied constants.
As our compliance requirements grow, this exposure of user-supplied constants restricts our ability to use these crdb_internal.* tables. For example, in a redacted version of a debug zip bundle, we're forced to omit these columns from the crdb_internal table dumps to meet our compliance requirements, which reduces observability into SQL queries.
Describe the solution you'd like
We'd like to see these statement strings properly anonymized wherever possible in various crdb_internal.* tables, so that we can expose the anonymized statement strings in redacted debug zip bundles.
The following columns have been identified as exposing unredacted statement strings and related data:
stmt from crdb_internal.cluster_distsql_flows
query from crdb_internal.cluster_queries
active_queries and last_active_query from crdb_internal.cluster_sessions
create_statement, create_nofks, and alter_statements from crdb_internal.create_statements
create_statement and enum_members from crdb_internal.create_type_statements
list_value and range_value from crdb_internal.partitions
query from crdb_internal.node_queries
active_queries and last_active_query from crdb_internal.node_sessions
Describe alternatives you've considered
We've also considered the idea of using something like a session setting to indicated to the vtable handlers that the results should be redacted, where we could optionally show the anonymized statement strings only upon request. Is this necessary, though? Can we get by in a world where they're always anonymized? There are plenty of other areas where the anonymized statement strings seem to work just fine. I'm not against doing this conditionally, but simple is good.
Jira issue: CRDB-19996
Is your feature request related to a problem? Please describe.
Currently, we have a number of
crdb_internal.*tables that exposed user-defined SQL statements. These are useful for debugging purposes.While some of these statement strings have been anonymized to remove any user-supplied constants (e.g. something like
SELECT * FROM x WHERE name = $1instead ofSELECT * FROM x WHERE name = 'Pat'), some tables expose the user-supplied constants.As our compliance requirements grow, this exposure of user-supplied constants restricts our ability to use these
crdb_internal.*tables. For example, in a redacted version of adebug zipbundle, we're forced to omit these columns from thecrdb_internaltable dumps to meet our compliance requirements, which reduces observability into SQL queries.Describe the solution you'd like
We'd like to see these statement strings properly anonymized wherever possible in various
crdb_internal.*tables, so that we can expose the anonymized statement strings in redacted debug zip bundles.The following columns have been identified as exposing unredacted statement strings and related data:
stmtfromcrdb_internal.cluster_distsql_flowsqueryfromcrdb_internal.cluster_queriesactive_queriesandlast_active_queryfromcrdb_internal.cluster_sessionscreate_statement,create_nofks, andalter_statementsfromcrdb_internal.create_statementscreate_statementandenum_membersfromcrdb_internal.create_type_statementslist_valueandrange_valuefromcrdb_internal.partitionsqueryfromcrdb_internal.node_queriesactive_queriesandlast_active_queryfromcrdb_internal.node_sessionsDescribe alternatives you've considered
We've also considered the idea of using something like a session setting to indicated to the vtable handlers that the results should be redacted, where we could optionally show the anonymized statement strings only upon request. Is this necessary, though? Can we get by in a world where they're always anonymized? There are plenty of other areas where the anonymized statement strings seem to work just fine. I'm not against doing this conditionally, but simple is good.
Jira issue: CRDB-19996