-
Notifications
You must be signed in to change notification settings - Fork 4.1k
telemetry: expose operational sensitive data in telemetry logs without redaction #76595
Description
Is your feature request related to a problem? Please describe.
Telemetry logging is low-fidelity and we'd like to have more information about schema structure available in cases where sql telemetry is enabled.
Describe the solution you'd like
The functionality would be gated behind the sql.telemetry.query_sampling.enabled cluster setting that's used to gate telemetry output and would reduce the redaction on some pieces of data:
SQL statements are output with table and column names intact:
For example, instead of collecting this select _, _, _ from _ where _ = _ (which we do today), we’d want to collect something like this select organization_id, account_id, ssn from tableName where organization_id = <redacted value> _log
tags containing IP addresses output those addresses without redaction:_
Before:
"tags": {
"client": "‹108.214.21.187:50084›",
"hostssl": "",
"peer": "‹10.0.30.33:54092›",
"sql": "",
"user": "‹jordan›"
},
After:
"tags": {
"client": "108.214.21.187:50084",
"hostssl": "",
"peer": "10.0.30.33:54092",
"sql": "",
"user": "‹jordan›"
},
Describe alternatives you've considered
The option of introducing a separate set of redaction markers for operational data was considered but rejected due to the high added complexity. It's possible if we find that introducing this distinction directly in the code that we could pursue this but right now the benefits to doing this are not clear.
We also considered changing the redaction behavior only for telemetry output. This was rejected since we don't want to change redaction behavior in certain cases and cause confusion. The code would likely be more complex and make the behavior inconsistent and harder to understand.
Additional context
see #66359 for prior work here
Epic CC-6083