sql,cli: add redacted sql stmts to debug zip#92263
sql,cli: add redacted sql stmts to debug zip#92263craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
ccd216c to
7a3d0d5
Compare
maryliag
left a comment
There was a problem hiding this comment.
have you checked performance after using the new builtin? (vs adding a new column redacted and the debuzip deciding which one to use)
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)
pkg/sql/sem/builtins/builtins.go line 7357 at r1 (raw file):
}, ), "crdb_internal.redact_sql_constants": makeBuiltin(tree.FunctionProperties{
can you add tests to the new function?
7a3d0d5 to
a6cd9ea
Compare
xinhaoz
left a comment
There was a problem hiding this comment.
I didn't benchmark this but I think this approach is preferred to extra columns because if we were to create a new redacted column for each of these tables, we'd be redacting every time we query when we don't need to and also showing a redundant extra column quite often in the select all. Would parsing and redacting the stmts during debug zip creation add that much overhead?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @maryliag)
pkg/sql/sem/builtins/builtins.go line 7357 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
can you add tests to the new function?
Done.
maryliag
left a comment
There was a problem hiding this comment.
Would parsing and redacting the stmts during debug zip creation add that much overhead
that's my question. I don't know, so I want to be safe that when you're getting a debug zip that already adds some overhead, you wouldn't be adding even more
we'd be redacting every time we query when we don't need to
from the issue, I had the impression we were already redacting and was a matter of using that value on the columns, instead of the non-redacted value, so my assumption is that it wouldn't be adding extra overhead
I'm okay with either approach, but want to make sure there is no performance degradation on either approach
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @abarganier)
a6cd9ea to
6a421b6
Compare
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2, 1 of 2 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @xinhaoz)
pkg/sql/logictest/testdata/logic_test/builtin_function line 3686 at r4 (raw file):
subtest crdb_internal.redact_sql_constants query T
can you add a few more complex queries, with IN (...) or something with strings (to confirm is keeping the ' for example)
6a421b6 to
40a319b
Compare
xinhaoz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @maryliag)
pkg/sql/logictest/testdata/logic_test/builtin_function line 3686 at r4 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
can you add a few more complex queries, with
IN (...)or something with strings (to confirm is keeping the'for example)
Done.
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @abarganier)
abarganier
left a comment
There was a problem hiding this comment.
🔥
Apologies for the review delay - this looks great! TSE is going to really appreciate this - thank you @xinhaoz!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @maryliag and @xinhaoz)
pkg/cli/zip_table_registry.go line 670 at r6 (raw file):
"crdb_internal.node_sessions": { // `active_queries` and `last_active_query` columns contain unredacted // SQL statement strings.
nit: here and above, let's remove these comments for columns that are no longer omitted.
Code quote:
// `active_queries` and `last_active_query` columns contain unredacted
// SQL statement strings.
// `client_address` contains unredacted client IP addresses.40a319b to
220a245
Compare
220a245 to
3f825ed
Compare
xinhaoz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @maryliag)
pkg/cli/zip_table_registry.go line 670 at r6 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: here and above, let's remove these comments for columns that are no longer omitted.
Done.
3f825ed to
152e5b1
Compare
dhartunian
left a comment
There was a problem hiding this comment.
@xinhaoz @maryliag
I have a quick question about the use of the word "redact" in this API. The redaction that's applied here is not the same as our log redaction which uses the special bracket markers etc. Just wondering if that might be confusing. Is there another word we can use? It kinda turns the statement into a fingerprint, right? Is there a reason to use different terminology here? Maybe "anonymize" is better. Not sure. I don't feel super strongly here but just want to get thoughts from folks.
Reviewed 1 of 4 files at r5, 1 of 3 files at r7, 2 of 2 files at r9.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @xinhaoz)
|
@dhartunian I see, if |
maryliag
left a comment
There was a problem hiding this comment.
I like the anonymize option
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian)
e-mbrown
left a comment
There was a problem hiding this comment.
Reviewed all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian)
152e5b1 to
a4e8b2c
Compare
|
@maryliag Sounds good, rename has been pushed |
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 3 of 4 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz)
dhartunian
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @xinhaoz)
|
TFTR, all! |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
bors r- |
|
Canceled. |
a4e8b2c to
ed7c286
Compare
This commit adds the builtin, `crdb_internal.anonymize_sql_constants` which takes in a sql string and returns it with constants hidden. This will be used to safely expose columns that are sql stmts in the redacted debug zip. Release note: None
Closes cockroachdb#88823 This commit adds the following fields to the redacted debug zip: crdb_internal.create_statements: - create_statement - create_nofks - alter_statements (each elem is redacted) crdb_internal.create_function_statements: - create_statement crdb_internal.{node,cluster}_distsql_flows: - stmt crdb_internal.{cluster,node}_sessions: - last_active - active_queries crdb_internal.{cluster,node}_queries: - query Release note (cli change): The following fields have been redacted and added to the redacted debug zip: crdb_internal.create_statements: - create_statement - create_nofks - alter_statements (each elem is redacted) crdb_internal.create_function_statements: - create_statement crdb_internal.{node,cluster}_distsql_flows: - stmt crdb_internal.{cluster,node}_sessions: - last_active - active_queries crdb_internal.{cluster,node}_queries: - query
ed7c286 to
1224138
Compare
|
bors r+ |
|
Build succeeded: |
|
@abarganier do you want this backported to 22.2? |
Commit 1
This commit adds the builtin,
crdb_internal.anonymize_sql_constantswhich takes in a sql string and returns it with constants redacted.
This will be used to redact columns that are sql stmts in the
redacted debug zip.
Release note: None
Commit 2
Closes #88823
This commit adds the following fields to the redacted
debug zip:
crdb_internal.create_statements:
crdb_internal.create_function_statements:
crdb_internal.{node,cluster}_distsql_flows:
crdb_internal.{cluster,node}_sessions:
crdb_internal.{cluster,node}_queries:
Release note (cli change):
The following fields have been redacted and added to
the redacted debug zip:
crdb_internal.create_statements:
crdb_internal.create_function_statements:
crdb_internal.{node,cluster}_distsql_flows:
crdb_internal.{cluster,node}_sessions:
crdb_internal.{cluster,node}_queries:
Running ycsb, tpcc and movr default workloads for 15 minutes and requesting the debug zip on a fresh node in master vs with new changes:

master
branch
