Skip to content

sql: anonymize query strings exposed in various crdb_internal tables. #88823

@abarganier

Description

@abarganier

Is your feature request related to a problem? Please describe.
Currently, we have a number of crdb_internal.* tables that exposed user-defined SQL statements. These are useful for debugging purposes.

While some of these statement strings have been anonymized to remove any user-supplied constants (e.g. something like SELECT * FROM x WHERE name = $1 instead of SELECT * FROM x WHERE name = 'Pat'), some tables expose the user-supplied constants.

As our compliance requirements grow, this exposure of user-supplied constants restricts our ability to use these crdb_internal.* tables. For example, in a redacted version of a debug zip bundle, we're forced to omit these columns from the crdb_internal table dumps to meet our compliance requirements, which reduces observability into SQL queries.

Describe the solution you'd like
We'd like to see these statement strings properly anonymized wherever possible in various crdb_internal.* tables, so that we can expose the anonymized statement strings in redacted debug zip bundles.

The following columns have been identified as exposing unredacted statement strings and related data:

  1. stmt from crdb_internal.cluster_distsql_flows
  2. query from crdb_internal.cluster_queries
  3. active_queries and last_active_query from crdb_internal.cluster_sessions
  4. create_statement, create_nofks, and alter_statements from crdb_internal.create_statements
  5. create_statement and enum_members from crdb_internal.create_type_statements
  6. list_value and range_value from crdb_internal.partitions
  7. query from crdb_internal.node_queries
  8. active_queries and last_active_query from crdb_internal.node_sessions

Describe alternatives you've considered
We've also considered the idea of using something like a session setting to indicated to the vtable handlers that the results should be redacted, where we could optionally show the anonymized statement strings only upon request. Is this necessary, though? Can we get by in a world where they're always anonymized? There are plenty of other areas where the anonymized statement strings seem to work just fine. I'm not against doing this conditionally, but simple is good.

Jira issue: CRDB-19996

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions