Skip to content

sql: add partial redactability to logs#66359

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
THardy98:partial_redactability_logs
Jul 13, 2021
Merged

sql: add partial redactability to logs#66359
craig[bot] merged 1 commit intocockroachdb:masterfrom
THardy98:partial_redactability_logs

Conversation

@THardy98
Copy link
Copy Markdown

@THardy98 THardy98 commented Jun 11, 2021

Prior to this change, enabling log redaction redacted entire SQL
statements. Consequently, troubleshooting efforts were hindered by the
limited SQL statement information available in the redacted logs. This
change introduces partial redaction to log SQL statements, wherein only
sensitive information is redacted. As a result, more information is
available to those using redacted logs for troubleshooting.

Changes were introduced to the statement formatter. The formatter now
checks where the redaction flag is set. If so, redaction markers are
placed are specific node types in the statement's AST (namely, the
Datum, Constant, Name, and Unrestricted Name types).

Virtual schemas/tables are not redacted. The public schema is also not
redacted.

Note: redaction markers are ‹ and ›. Terms between the redaction markers are redacted.

Redaction Marker Examples:

1) SELECT city, revenue FROM rides LIMIT 10;

Before: ‹SELECT city, revenue FROM rides LIMIT 10›
After: SELECT ‹city›, ‹revenue› FROM ‹\"\"›.‹\"\"›.‹rides› LIMIT ‹10›
2) SELECT key FROM crdb_internal.node_statement_statistics LIMIT 10;

Before: ‹SELECT key FROM crdb_internal.node_statement_statistics LIMIT 10›
After: SELECT ‹key› FROM crdb_internal.node_statement_statistics LIMIT ‹10›

Redaction Examples:

1) SELECT city, revenue FROM rides LIMIT 10;

Before: ‹×›
After: SELECT ‹×›, ‹×› FROM ‹×›.‹×›.‹×› LIMIT ‹×›
2) SELECT key FROM crdb_internal.node_statement_statistics LIMIT 10;

Before: ‹×›
After: SELECT ‹×› FROM crdb_internal.node_statement_statistics LIMIT ‹×›

Resolves: #65401

Release note (bug fix): Added partial redactability to log SQL
statements. This change provides greater visibility to SQL usage in the
logs, enabling greater ability to troubleshoot.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@maryliag maryliag marked this pull request as draft June 11, 2021 14:25
@knz
Copy link
Copy Markdown
Contributor

knz commented Jun 11, 2021

can you perhaps rebase your branch on top of master to pick up the latest changes to sql/event_log.go thanks

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we can do before we discuss:
can you outline in the PR description (in prose) a high-level summary of what you did, and why? That will help clarify your understanding and ensure the convo doesn't make mistaken assumptions.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@THardy98 THardy98 force-pushed the partial_redactability_logs branch from a854963 to 3868842 Compare June 11, 2021 14:44
@THardy98 THardy98 changed the title sql: Added partial redactionability to logs (WIP) [draft] sql: Added partial redactionability to logs Jun 11, 2021
@THardy98 THardy98 force-pushed the partial_redactability_logs branch from 3868842 to 3b16e16 Compare June 11, 2021 14:50
Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)


pkg/sql/event_log.go, line 181 at r1 (raw file):

		user:         p.User(),
		//stmt:         tree.AsStringWithFQNames(p.stmt.AST, p.extendedEvalCtx.EvalContext.Annotations),
		stmt: string(tree.AsRedactableStringWithFQNames(p.stmt.AST, p.extendedEvalCtx.EvalContext.Annotations)),

since the name of the function already indicates that it will return a string, you don't need the extra string() here


pkg/sql/sem/tree/format.go, line 439 at r1 (raw file):

	ctx := NewFmtCtxEx(FmtAlwaysQualifyTableNames|FmtRedactNode, ann)
	ctx.FormatNode(n)
	return ctx.CloseAndGetRedactableString()

you can add the content of the CloseAndGetRedactableString() directly here, no need to create a new function just for that since you're not using this part of code anywhere

Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)


pkg/util/log/redact.go, line 40 at r1 (raw file):

	// part(s) of the SQL statement, in contrast with withRedaction, which redacts
	// the entirety of the SQL statement
	withPartialRedaction = 8

We should keep using the flag withRedaction, instead of creating a new one and then fix the behaviours on that one, since I don't think hiding the whole statement is useful anyway

@knz
Copy link
Copy Markdown
Contributor

knz commented Jun 11, 2021

Hint: this work should not require any changes to the util/log package.

@THardy98
Copy link
Copy Markdown
Author

Hint: this work should not require any changes to the util/log package.

Ah, so changes should have been made for WithoutSensitiveData

@knz
Copy link
Copy Markdown
Contributor

knz commented Jun 11, 2021

Ah, so changes should have been made for WithoutSensitiveData

what kind of changes?

@THardy98
Copy link
Copy Markdown
Author

THardy98 commented Jun 11, 2021

Ah, so changes should have been made for WithoutSensitiveData

what kind of changes?

None in util/log as you mentioned, but the redactablePackages it receives will have changed from changes in event_log and format

@knz
Copy link
Copy Markdown
Contributor

knz commented Jun 11, 2021

Please use more words with examples :)
(does not need to be exact/precise)

@THardy98
Copy link
Copy Markdown
Author

THardy98 commented Jun 11, 2021

Please use more words with examples :)
(does not need to be exact/precise)

When maybeRedactEntry is called during log entry output, and the redaction editor is WithoutSensitiveData, the changes in event_log.go and format.go with cause the provided redactablePackage to contain a message in a different format than how it is formatted currently.

Currently, it would receive something like:
‹SELECT city, revenue FROM rides LIMIT 10›

But with changes to event_log and format, it would receive a statement in a different format. The changes in this PR cause the statement to be formatted as:
SELECT ‹city›, ‹revenue› FROM ‹""›.‹""›.‹rides› LIMIT ‹10›

@knz
Copy link
Copy Markdown
Contributor

knz commented Jun 11, 2021

yes that sounds reasonable.

@THardy98
Copy link
Copy Markdown
Author

@knz do you have a suggestions to enforcing no redaction on virtual schemas/tables? I noted in the code how I can't use VirtualTabler in format.go making it difficult to check whether schemas/tables are virtual while parsing the AST.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)


pkg/sql/sem/tree/format.go, line 496 at r1 (raw file):

func (ctx *FmtCtx) formatNodeMaybeRedact(n NodeFormatter) {
	if ctx.flags.HasFlags(FmtRedactNode) {
		switch v := n.(type) {

This is not the right location for this.
Instead of this method,
modify hideNonVirtualTableNameFunc in sql/exec_util.go

and conditionally replace the formatting flags using (*FmtCtx).WithFlags() inside the reformat function implemented there.

@THardy98 THardy98 force-pushed the partial_redactability_logs branch 2 times, most recently from 0ed6a0f to 7967603 Compare June 14, 2021 15:37
@THardy98 THardy98 force-pushed the partial_redactability_logs branch from 7967603 to 271b23b Compare June 14, 2021 19:27
@THardy98 THardy98 changed the title [draft] sql: Added partial redactionability to logs [draft] sql: Added partial redactability to logs Jun 14, 2021
@THardy98 THardy98 changed the title [draft] sql: Added partial redactability to logs [draft] sql: add partial redactability to logs Jun 15, 2021
craig bot pushed a commit that referenced this pull request Jun 16, 2021
66431: eventpb,sql: lay some foundation for redactable statement strings r=maryliag,dhartunian a=knz

Informs #65401.
Needed as prerequisite for  #66359. 

Prior to this change, the centralized structured logging logic and the
query logging mechanism were both assuming that the entirety of the
statement string was unsafe for reporting.

We want in a later change to introduce some nuance into this. However,
the structured event logging encoding did not know anything else than
strings that were either completely-safe or completely-unsafe.

This patch introduces the possibility for a structured event to
contain a string of go type `redact.RedactableString`, where the
caller has already taken responsibility for placing redaction markers.

Release note: None

66518: builtins: implement ST_HasArc r=rafiss a=otan

Release note (sql change): Implement the ST_HasArc function. This adds
better out-of-the-box support for Geoserver.

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
@THardy98 THardy98 force-pushed the partial_redactability_logs branch 3 times, most recently from b95e1a0 to 49ae063 Compare June 16, 2021 16:49
@THardy98 THardy98 marked this pull request as ready for review June 16, 2021 19:20
@THardy98 THardy98 changed the title [draft] sql: add partial redactability to logs sql: add partial redactability to logs Jun 16, 2021
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM, but you also still need to iron out your grep for CI.

Reviewed 4 of 11 files at r6, 1 of 3 files at r8, 5 of 7 files at r11, 2 of 4 files at r12, 1 of 1 files at r13, 1 of 1 files at r14.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @maryliag and @THardy98)


pkg/cli/interactive_tests/test_exec_log.tcl, line 103 at r14 (raw file):

# succeeds.
system "for i in `seq 1 3`; do
  grep 'SELECT .+660.+' $logfile && exit 0;
  1. the + sign is part of extended regular expressions. Basic grep doesn't support it. You probably want to use ..* instead or try your luck with grep -E
  2. also I don't think that you just want to match "anything" after the number. Obviously, in a log file the data will be enclosed in double quotes and other punctuation. I think you want this match to say "anything BUT some other word or punctuation that I'm otherwise also expecting after the number". In this case, the thing that's expected afterwards is a space followed by a plus sign.

same comments apply below.


pkg/ccl/importccl/import_csv_mark_redaction_test.go, line 22 at r14 (raw file):

// TestMarkRedactionCCLStatement verifies that CCL statements are marked
// for redaction correctly.

they are not "marked for redaction" - that phrasing would suggest that they will necessarily be redacted at some point
I think you mean "... verifies that the redactable parts of CCL statements are marked correctly."


pkg/sql/statement_mark_redaction_test.go, line 27 at r14 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: period

I also think you need to rephrase the comment, see my previous comment on this above.

@THardy98 THardy98 force-pushed the partial_redactability_logs branch 4 times, most recently from 7fbcc7c to da22982 Compare July 8, 2021 18:57
Copy link
Copy Markdown
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @Azhng, @knz, @maryliag, and @THardy98)


pkg/cli/interactive_tests/test_exec_log.tcl, line 103 at r14 (raw file):

Previously, knz (kena) wrote…
  1. the + sign is part of extended regular expressions. Basic grep doesn't support it. You probably want to use ..* instead or try your luck with grep -E
  2. also I don't think that you just want to match "anything" after the number. Obviously, in a log file the data will be enclosed in double quotes and other punctuation. I think you want this match to say "anything BUT some other word or punctuation that I'm otherwise also expecting after the number". In this case, the thing that's expected afterwards is a space followed by a plus sign.

same comments apply below.

Modified per your suggestion, but a last look would be nice.


pkg/sql/exec_util.go, line 1631 at r14 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: remove blank line between the comment and the code it's referring to

Done.


pkg/ccl/importccl/import_csv_mark_redaction_test.go, line 22 at r12 (raw file):

Previously, Azhng (Archer Zhang) wrote…

nit: period

Done.


pkg/ccl/importccl/import_csv_mark_redaction_test.go, line 22 at r14 (raw file):

Previously, knz (kena) wrote…

they are not "marked for redaction" - that phrasing would suggest that they will necessarily be redacted at some point
I think you mean "... verifies that the redactable parts of CCL statements are marked correctly."

Done.


pkg/sql/statement_mark_redaction_test.go, line 27 at r14 (raw file):

Previously, knz (kena) wrote…

I also think you need to rephrase the comment, see my previous comment on this above.

Done.

@THardy98
Copy link
Copy Markdown
Author

THardy98 commented Jul 8, 2021

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 8, 2021

Build failed (retrying...):

@otan
Copy link
Copy Markdown
Contributor

otan commented Jul 9, 2021

bors r-

🚨 bors is busted, doing a reset 🚨

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 9, 2021

Canceled.

@otan
Copy link
Copy Markdown
Contributor

otan commented Jul 9, 2021

bors r=knz

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 9, 2021

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 9, 2021

Build failed (retrying...):

@otan
Copy link
Copy Markdown
Contributor

otan commented Jul 9, 2021

bors r-

^ that last failure seems related to this PR. may need a rebase.

[05:06:58]Step 4/4: Assert Workspace Clean (Command Line) (25s)
[05:06:58][Step 4/4] Starting: /home/agent/temp/agentTmp/custom_script15181142398405742973
[05:06:58][Step 4/4] in directory: /home/agent/work/.go/src/github.com/cockroachdb/cockroach
[05:06:58][Step 4/4] On branch staging
[05:06:58][Step 4/4] Your branch is up to date with 'origin/staging'.
[05:06:58][Step 4/4] 
[05:06:58][Step 4/4] You are in a sparse checkout with 100% of tracked files present.
[05:06:58][Step 4/4] 
[05:06:58][Step 4/4] Changes not staged for commit:
[05:06:58][Step 4/4]   (use "git add <file>..." to update what will be committed)
[05:06:58][Step 4/4]   (use "git restore <file>..." to discard changes in working directory)
[05:06:58][Step 4/4] 	modified:   pkg/util/log/eventpb/json_encode_generated.go
[05:06:58][Step 4/4] 
[05:06:58][Step 4/4] no changes added to commit (use "git add" and/or "git commit -a")
[05:06:58][Step 4/4] diff --git a/pkg/util/log/eventpb/json_encode_generated.go b/pkg/util/log/eventpb/json_encode_generated.go
[05:06:58][Step 4/4] index 32466e6c17..78af421479 100644
[05:06:58][Step 4/4] --- a/pkg/util/log/eventpb/json_encode_generated.go
[05:06:58][Step 4/4] +++ b/pkg/util/log/eventpb/json_encode_generated.go
[05:06:58][Step 4/4] @@ -15,9 +15,7 @@ var safeRe1 = regexp.MustCompile(`^root|node$`)
[05:06:58][Step 4/4]  var safeRe2 = regexp.MustCompile(`^\$.*$`)
[05:06:58][Step 4/4]  
[05:06:58][Step 4/4]  // AppendJSONFields implements the EventPayload interface.
[05:06:58][Step 4/4] -func (m *AdminQuery) AppendJSONFields(
[05:06:58][Step 4/4] -	printComma bool, b redact.RedactableBytes,
[05:06:58][Step 4/4] -) (bool, redact.RedactableBytes) {
[05:06:58][Step 4/4] +func (m *AdminQuery) AppendJSONFields(printComma bool, b redact.RedactableBytes) (bool, redact.RedactableBytes) {
[05:06:58][Step 4/4]  
[05:06:58][Step 4/4]  	printComma, b = m.CommonEventDetails.AppendJSONFields(printComma, b)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 9, 2021

Canceled.

@THardy98 THardy98 force-pushed the partial_redactability_logs branch from da22982 to b624e29 Compare July 9, 2021 15:29
@THardy98
Copy link
Copy Markdown
Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 12, 2021

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 12, 2021

Build failed (retrying...):

@THardy98
Copy link
Copy Markdown
Author

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 12, 2021

Canceled.

Prior to this change, enabling log redaction redacted entire SQL
statements. Consequently, troubleshooting efforts were hindered   by the
limited SQL statement information available in the redacted logs. This
change introduces partial redaction to log SQL statements, wherein only
sensitive information is redacted. As a result, more information is
available to those using redacted logs for troubleshooting.

Changes were introduced to the statement formatter. The formatter now
checks where the redaction flag is set. If so, redaction markers are
placed are specific node types in the statement's AST (namely, the
Datum, Constant, Name, and Unrestricted Name types).

Virtual schemas/tables are not redacted. The `public` schema is also not
redacted.

Resolves: cockroachdb#65401

Release note (bug fix): Added partial redactability to log SQL
statements. This change provides greater visibility to SQL usage in the
logs, enabling greater ability to troubleshoot.
@THardy98
Copy link
Copy Markdown
Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 13, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: print out statements in logs with redaction markers only around sensitive bits (make SQL keywords etc non-redactable)

6 participants