rfcs: new RFC on log file redaction (stripping PII and confidential data) by knz · Pull Request #48076 · cockroachdb/cockroach

knz · 2020-04-27T21:37:06Z

Release note: None

cockroach-teamcity · 2020-04-27T21:37:12Z

This change is

tbg

Thanks Raphael, I'm excited to see this. I agree with you that a move to structured logging is not giving us the best return for our efforts and that it spawns lots of new questions.
Using marker tags as you proposed generally checks out in my opinion. My exploration about them (below) is light-hearted in nature - putting non-printable characters into logs is always going to be the source of some annoyance, and I just wanted to expose myself to it a little bit to see what it's like (and fine it is - just have to get less -r into muscle memory, which is a small ask). SI/SO seems like a perfectly reasonable choice.

One danger that I see with the "PII-free debug zip" here is that we could be in a place where customers that would previously (perhaps with a grumble or two) have sent a full debug zip now heavily gravitate towards the PII-free one, which initially perhaps wouldn't contain the relevant information (such as the range statuses).
I would ask you to mention what you consider the MVP for PII-free debug zip, and how it would fit in with that concern. For instance, if the initial PII-free zip is just the PII-free logs, that may not be great if it turns out that customers will only send us that in 100% of cases and will refuse anything else. This concern might be overblown, but there could be strategies to mitigate, such as not advertising the PII-free option until it becomes clear that a customer will not settle for anything less. Another strategy is simply to make the debug zip useful enough without PII and I would be interested in your thoughts on how much work that is.

Some questions/comments:

what will happen if someone stuffs the special markers into a log message? I assume they will be escaped accordingly, but I didn't see this mentioned
I didn't understand why the UTF-8 suggestion is worse. Could you explain what the copy-paste problems are and why they don't apply to the SI/SO suggestion?
Are there concerns about mutated log files? You could imagine someone going into a log file and accidentally (or intentionally) making an edit that effects an SI/SO, which could potentially invert the redaction. The resulting file should trigger some alarm bells when it gets included in a PII-free zip.

As far as I can tell, SI/SO are simply omitted when copy-pasting. For example, copying the output of this (from my iterm2)

$ echo -e "safe\x0Eunsafe\x0Fsafe"
safeunsafesafe

into the go playground

package main

import (
	"fmt"
)

func main() {
	fmt.Printf("%q", "safeunsafesafe")
}

prints "safeunsafesafe".

Of course, when never visually copy-pasted, it works:

printf "%q" $(echo -e "safe\x0Eunsafe\x0Fsafe" | pbcopy; pbpaste)
$'safe\016unsafe\017safe'

When I paste the clipboard into a file in vim and safe, the marker is gone: printf "%q" $(cat foo), safeunsafesafe
However, when I paste directly into a file and open it in vim, I do get the markers: safe�unsafe�safe (they print as ^N and ^O).

I wonder whether you've considered using color escape sequences instead of SI/SO. For example, dim and reset dim:

It's more bytes, but at the same time it possibly adds utility because less -r will highlight the PII-free information (my less -r simply ignores SI/SO).

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @knz, and @petermattis)

docs/RFCS/20200427_pii_free_logs.md, line 85 at r1 (raw file):

- we extend the "context tags" mechanism to also support `log.Safe`,
  so that non-safe bits of data in context tags also gets marked by SI/SO.

👍 in particular, we should be redacting keys in an intelligent way so that we keep the table and index ID and only strip the user-defined parts. This needs work in the pretty printer (not too much I assume).

docs/RFCS/20200427_pii_free_logs.md, line 262 at r1 (raw file):

- It's unclear whether we can effectively identify a set of
  "important" logging events to convert to the new API within the
  given time frame (summer 20202 deployments).

typo

docs/RFCS/20200427_pii_free_logs.md, line 264 at r1 (raw file):

  given time frame (summer 20202 deployments).

- There will be a long period of time with two APIs side by side,

We have >6000 calls to log.X (where X != V) and out of those ~5500 in prod code, so yes, this will need a decent amount of work.
A fork lift is possible by introducing log.TODO() which is unsafe but can be used in log.SafeLog. That way, we get to remove the deprecated API pretty much at once.

docs/RFCS/20200427_pii_free_logs.md, line 270 at r1 (raw file):

- While the `log.SafeLog` API gets introduced (over the course of one
  or two release cycles), there will be more calls to the old API
  introduced in the code base. We will need to design a "catch up" process.

Something that could aid in "strangling" is CI tooling that does not allow changing/adding lines following the deprecated format.

docs/RFCS/20200427_pii_free_logs.md, line 274 at r1 (raw file):

- In day-to-day programming, the ability to log/trace potentially
  PII-laden information *is important* - it helps troubleshoot
  problems during development. It is possible that we will not be able

I consider this the most relevant con here. By forcing PII out of log method calls, we're putting ourselves in a tight spot when it comes to ergonomics.

Here's a counter-proposal: we keep the current log methods but instead of the signature

(ctx context.Context, format string, args ...interface{})

we require

(ctx context.Context, formatNoPII string /* enforced via linting: only const strings */, args ...SafeMessager)

and use

func PII(v interface{}) log.SafeMessager {
  return safeMessager{pii: true, msg: fmt.Sprint(v)}
}

in

log.Infof(ctx, "something interesting happened with %v and %s", log.PII("i'm secret"), log.Safe("i'm ok to share"))

As mentioned earlier, we can fork lift this via

func TodoPII(v interface{}) SafeMessager {
  return PII(v)
}

to avoid parallel APIs.

As a result, I think we've achieved the goal here: we've put PII-free logs on the path to first class citizenship due to the new signature. Down the road, we could easily decide to write completely PII-free logs (just change the logging code), while at the same time keeping PII during dev work (just set a flag to not strip). And we can also do the thing you're favoring here, using SI/SO.

All that said, if we are committing to SI/SO (or any other delimiter) then we don't strictly need to do this now. We would do it once we want to boost the usefulness of the redacted logs. Which we may find to be the case sooner rather than later.

docs/RFCS/20200427_pii_free_logs.md, line 385 at r1 (raw file):

## Unresolved questions

- Should the "confidentiality mode" in `cockroach debug zip` be opt in or opt out?

If we can get a PII-laden one, that's what we'll prefer, at least initially - PII free debug zip needs significant work (imo) to be almost as useful.

petermattis

Some of my comments below overlap with @tbg's. Apologies for the duplication.

I generally like the idea of tagging sensitive information in the log files so it can be redacted later. I have concerns about making these markers invisible, though. I suspect that will cause more long-term confusion than visible markers would. Using a marker like $...$ or <...> would be nicely visual while still allowing redaction. In addition to reducing surprises, visible markers would make it clear when redaction hasn't occurred which could indicate the presence of a bug, or that someone should take care with copying and pasting data. You seem to have taken the assumption that invisible markers are strictly better, but I'm not seeing a discussion of why.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @knz, and @tbg)

docs/RFCS/20200427_pii_free_logs.md, line 46 at r1 (raw file):

- however, the parts of the error messages that contain PII or
  potentially sensitive information would be annotated with special
  characters.

Is there any precedent for using special characters like this? I would feel a bit reassured if we're not inventing something new here.

I would guess there is precedent for using visible characters to mark sensitive information. I wonder if it would be less surprising from a UX perspective to use visible characters for this purpose.

docs/RFCS/20200427_pii_free_logs.md, line 82 at r1 (raw file):

  (Note also that this enclosing must be accompanied by careful
  escaping of any SI/SO bytes present in the datum being logged,
  so that we don't result in mismatched SI/SO pairs.)

We also need to disallow SI/SO in the format string, or automatically escape them when present.

docs/RFCS/20200427_pii_free_logs.md, line 111 at r1 (raw file):

- enhance the `cockroachdb/errors` library to also perform this marking
  when an error object is logged, so that we get insight into errors
  even in "confidientality" mode.

Nit: `s/confidientality/confidentiality/g

docs/RFCS/20200427_pii_free_logs.md, line 175 at r1 (raw file):

  formatting again every time the messages are to be displayed.

- certain data types display differently depening on the `%` verb

Nit: s/depening/depending/g

docs/RFCS/20200427_pii_free_logs.md, line 224 at r1 (raw file):

   contract that *everything* logged through this API can be considered
   PII-safe and suitable for reporting. The contract would
   be enforced largely by linting and code reviews.

If we take this strangler approach, I'd suggest that we do an initial step 0 of renaming the old log API to log.Deprecated*. That makes it obvious at the call sites that a deprecated API is being used and that the caller should switch to the new API.

docs/RFCS/20200427_pii_free_logs.md, line 232 at r1 (raw file):

2. teach engineers how to design their logging code so that it's
   always PII-safe (i.e. use `log.Redact()` calls upfront) and start
   using `log.SafeLog()` in new code.

I'm not sure what you have in mind here, but I think we need logging to default to assuming data contains PII so that we "fail safe" if an engineer forgets about this.

docs/RFCS/20200427_pii_free_logs.md, line 264 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

We have >6000 calls to log.X (where X != V) and out of those ~5500 in prod code, so yes, this will need a decent amount of work.
A fork lift is possible by introducing log.TODO() which is unsafe but can be used in log.SafeLog. That way, we get to remove the deprecated API pretty much at once.

Fixing 6000+ call sites is onerous, but doable if we divide up the work. That's "only" 200 call sites per engineer for 30 engineers. Smells like a good fixit task to me.

docs/RFCS/20200427_pii_free_logs.md, line 320 at r1 (raw file):

Cons:

- the redaction post-processing becomes more complicated.

Complicated and error prone. I'm thumbs down on this approach.

docs/RFCS/20200427_pii_free_logs.md, line 377 at r1 (raw file):

  - pros: better behaved in text editors than control characters like SI/SO.

  - cons: does not appear invisible when displayed in a terminal or `less -r`.

This might be a pro. I have a suspicion that putting hidden characters in our log files will be surprising. I'll forget about them and then my grep's will silently fail if the pattern crosses the hidden character.

aaron-crl · 2020-04-29T14:30:44Z

I am strongly against any form of special characters or unprintable markers in the log files. These have a habit of choking up log post-processing and aggregation systems in unpredictable ways and as was mentioned above would likely be very surprising to an operator.

The strangler approach has a lot of benefits and comes with the caveat that it may help address some security tech debt as we go through and update those call sites.

On an aside, in a past life I saw systems demarcate each structured log entry with its security properties. This allowed for post processing while also being syslog and log aggregate friendly. We might consider adding such log decorators to the strangler approach as they would support post process routing and blunt information stripping.

andreimatei · 2020-04-29T15:38:07Z

I think it'd be useful for the text to make note of the SafeMessager interface, even if just to note that it doesn't help.

…

On Wed, Apr 29, 2020 at 10:31 AM aaron-crl ***@***.***> wrote: I am strongly against any form of special characters or unprintable markers in the log files. These have a habit of choking up log post-processing and aggregation systems in unpredictable ways and as was mentioned above would likely be very surprising to an operator. The strangler approach has a lot of benefits and comes with the caveat that it may help address some security tech debt as we go through and update those call sites. On an aside, in a past life I saw systems demarcate each structured log entry with its security properties. This allowed for post processing while also being syslog and log aggregate friendly. We might consider adding such log decorators to the strangler approach as they would support post process routing and blunt information stripping. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#48076 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAC4C4MRAQYBAQ7UCGVU3RDRPA22JANCNFSM4MSJCBJQ> .

bdarnell

If unredacted logs are available by default, we'll continue to rely on them, and there won't be pressure to add log.Safe calls, etc. I think it might be useful to have a "obscure" mode (and use it in cockroach-internal clusters) in which the PII is still available but obscured in some way (rot13 or similar). That way when we're doing some debugging that requires PII, we'll be aware of it and be annoyed enough to think about ways to get our debugging done without requiring access to PII :)

I also join the others in supporting ordinary visible characters instead of hidden ones for this.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @knz, and @tbg)

docs/RFCS/20200427_pii_free_logs.md, line 231 at r1 (raw file):

2. teach engineers how to design their logging code so that it's
   always PII-safe (i.e. use `log.Redact()` calls upfront) and start

This seems like a migration in the wrong direction, to require and trust that logs are PII-free unless explicitly redacted. No matter how good we are with linting and code reviews, we'll make mistakes, and with this new interface those mistakes would mean leakage of PII instead of less-useful log messages.

knz · 2020-05-04T19:04:33Z

Thank you for all the comments. The biggest lesson learned (and surprise) was to discover/understand that visible markers are preferable to invisible markers. It certainly makes things easier in several ways.

I will investigate this and the remainder of the comments throughout the week.

knz

Thanks again for the comments, I have modified the text and the example implementation accordingly. The biggest change is that the markers are now visible as this had total consensus so far.

I would ask you to mention what you consider the MVP for PII-free debug zip, and how it would fit in with that concern. For instance, if the initial PII-free zip is just the PII-free logs, that may not be great if it turns out that customers will only send us that in 100% of cases and will refuse anything else

I added a note in a couple of places that this RFC is focusing on log anonymization. There must be additional effort spent separately on extending debug zip to anonymize other things, or skip over system tables/RPCs that can deliver sensitive data. I planned to work on this afterwards.

what will happen if someone stuffs the special markers into a log message? I assume they will be escaped accordingly, but I didn't see this mentioned

It is mentioned (in multiple places), now with examples.

I didn't understand why the UTF-8 suggestion is worse.

That makes the special characters multi-byte, and increases the computational work to do escaping, substitutions etc.

Could you explain what the copy-paste problems are and why they don't apply to the SI/SO suggestion?

They actually apply, as you've found out.

Are there concerns about mutated log files? You could imagine someone going into a log file and accidentally (or intentionally) making an edit that effects [a marker], which could potentially invert the redaction. The resulting file should trigger some alarm bells when it gets included in a PII-free zip.

I'm not sure how to think about this. I hope that @aaron-crl can walk us through the risk.

have concerns about making these markers invisible, though. [...] You seem to have taken the assumption that invisible markers are strictly better, but I'm not seeing a discussion of why.

I added that discussion in the "alternatives" section. The idea was to facilitate glancing at logs and copy-paste on a terminal.

If unredacted logs are available by default, we'll continue to rely on them, and there won't be pressure to add log.Safe calls, etc. I think it might be useful to have a "obscure" mode (and use it in cockroach-internal clusters) in which the PII is still available but obscured in some way (rot13 or similar). That way when we're doing some debugging that requires PII, we'll be aware of it and be annoyed enough to think about ways to get our debugging done without requiring access to PII :)

I understand the concern. Can you clarify though: are you proposing that this "obscure" mode be active for e.g. all our roachtest clusters or unit tests? I am not sure about workflow: what happens when a test fails?

if we let the strings "obscured" in the CI output, e.g. in the issues that get filed, that will be a major productivity barrier for engineers.
if we automate the de-obscuration of the data in the CI output, then I don't see how that incentivizes anything.

An alternative I could see would be to enable the feature by default for CC clusters, and request from SREs that they use redaction by default when they submit logs to us for analysis (and only deliver fuller logs after we've experienced the pain of missing data). Would this work better perhaps?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @petermattis, and @tbg)

docs/RFCS/20200427_pii_free_logs.md, line 46 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is there any precedent for using special characters like this? I would feel a bit reassured if we're not inventing something new here.

I would guess there is precedent for using visible characters to mark sensitive information. I wonder if it would be less surprising from a UX perspective to use visible characters for this purpose.

I changed the proposal to use visible characters.

docs/RFCS/20200427_pii_free_logs.md, line 82 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

We also need to disallow SI/SO in the format string, or automatically escape them when present.

Done.

docs/RFCS/20200427_pii_free_logs.md, line 85 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

👍 in particular, we should be redacting keys in an intelligent way so that we keep the table and index ID and only strip the user-defined parts. This needs work in the pretty printer (not too much I assume).

Done.

docs/RFCS/20200427_pii_free_logs.md, line 224 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

If we take this strangler approach, I'd suggest that we do an initial step 0 of renaming the old log API to log.Deprecated*. That makes it obvious at the call sites that a deprecated API is being used and that the caller should switch to the new API.

Done.

docs/RFCS/20200427_pii_free_logs.md, line 231 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This seems like a migration in the wrong direction, to require and trust that logs are PII-free unless explicitly redacted. No matter how good we are with linting and code reviews, we'll make mistakes, and with this new interface those mistakes would mean leakage of PII instead of less-useful log messages.

Good point. Added as first "Con" below.

docs/RFCS/20200427_pii_free_logs.md, line 232 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm not sure what you have in mind here, but I think we need logging to default to assuming data contains PII so that we "fail safe" if an engineer forgets about this.

I think that's the problem Ben is identifying above.

For the record, I have changed my mind and I think a new API is not the way to go in any case.
I only include it in the RFC for the sake of discussion and because the idea came up already twice as I was brainstorming with other people.

docs/RFCS/20200427_pii_free_logs.md, line 262 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

typo

Done.

docs/RFCS/20200427_pii_free_logs.md, line 264 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Fixing 6000+ call sites is onerous, but doable if we divide up the work. That's "only" 200 call sites per engineer for 30 engineers. Smells like a good fixit task to me.

Do you want me to elaborate on this point in the RFC text?

(I'd rather not since it's unlikely we go in this direction anyway.)

docs/RFCS/20200427_pii_free_logs.md, line 270 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Something that could aid in "strangling" is CI tooling that does not allow changing/adding lines following the deprecated format.

Good point, added in the text.

docs/RFCS/20200427_pii_free_logs.md, line 274 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I consider this the most relevant con here. By forcing PII out of log method calls, we're putting ourselves in a tight spot when it comes to ergonomics.

Here's a counter-proposal: we keep the current log methods but instead of the signature
(ctx context.Context, format string, args ...interface{})
we require
(ctx context.Context, formatNoPII string /* enforced via linting: only const strings */, args ...SafeMessager)
and use
func PII(v interface{}) log.SafeMessager {
  return safeMessager{pii: true, msg: fmt.Sprint(v)}
}
in
log.Infof(ctx, "something interesting happened with %v and %s", log.PII("i'm secret"), log.Safe("i'm ok to share"))
As mentioned earlier, we can fork lift this via
func TodoPII(v interface{}) SafeMessager {
  return PII(v)
}
to avoid parallel APIs.

As a result, I think we've achieved the goal here: we've put PII-free logs on the path to first class citizenship due to the new signature. Down the road, we could easily decide to write completely PII-free logs (just change the logging code), while at the same time keeping PII during dev work (just set a flag to not strip). And we can also do the thing you're favoring here, using SI/SO.

All that said, if we are committing to SI/SO (or any other delimiter) then we don't strictly need to do this now. We would do it once we want to boost the usefulness of the redacted logs. Which we may find to be the case sooner rather than later.

I understand your proposal.
(from an ergonomics perspective it's mandating log.PII/Safe() calls for every argument value. This looks cumbersome!)
But it would do what you say.

Do you want me to include this alternative design in the RFC?

docs/RFCS/20200427_pii_free_logs.md, line 320 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Complicated and error prone. I'm thumbs down on this approach.

Added "and error prone" to the text. We agree on the outcome already.

docs/RFCS/20200427_pii_free_logs.md, line 377 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This might be a pro. I have a suspicion that putting hidden characters in our log files will be surprising. I'll forget about them and then my grep's will silently fail if the pattern crosses the hidden character.

Done.

docs/RFCS/20200427_pii_free_logs.md, line 385 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

If we can get a PII-laden one, that's what we'll prefer, at least initially - PII free debug zip needs significant work (imo) to be almost as useful.

Done.

knz · 2020-05-12T13:08:56Z

RFAL - with special requests:

since I have responded to all the open points, is there any reason left to not do what is proposed here? (I'd like at least a nod before I spend more days on productionizing the MVP)
are there any major design aspects that need to be considered before I commit to a particular implementation design?

thx

tbg

LGTM

One thing I'm curious about is what "real" log lines would look like with the redaction markers. You've thought about this more than I have, but are you planning to minimize the number of markers? For example, log(ctx, "%s %s %s", pii1, pii2, pii3) gets <pii1 pii2 pii3> instead of <pii1> <pii2> <pii3>?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @knz, @petermattis, and @tbg)

docs/RFCS/20200427_pii_free_logs.md, line 274 at r1 (raw file):

Previously, knz (kena) wrote…

I understand your proposal.
(from an ergonomics perspective it's mandating log.PII/Safe() calls for every argument value. This looks cumbersome!)
But it would do what you say.

Do you want me to include this alternative design in the RFC?

No, that's fine. But I think there are also two concerns here that are getting conflated a bit in this PR (assuming we've decided against a structured logging format). One is how to handle PII in logs and more specifically, how to get logs that are free of PII. The other is how to encourage that the logs without PII are useful. The RFC right now implicitly assumes that the logs will not benefit from a different (more structured) API if they're not structured. I think the redacted logs will be more useful with a logging API that forces explicit designation as PII or not-PII, and that would ultimately be the goal I personally think is the one to aim for long-term - my gut feeling is that log messages are not super frequently coded up and when they are, the effort is worth it. The one counter-argument is perhaps printf debugging, but thoughtful API design can minimize the overhead here, too.
I don't think the RFC needs to bother with that though, I just wanted to put it on your radar.

knz · 2020-05-12T13:47:05Z

You've thought about this more than I have, but are you planning to minimize the number of markers?

No. I did think about it, so I did some experimental review of the logs produced with this proposal in place; I observed we simply don't have this type of printing in the code, so I don't get the impression it's a problem that needs solving.

I think the redacted logs will be more useful with a logging API that forces explicit designation as PII or not-PII

I thought about this and the fundamental problem I'm seeing is that the call point is not the right position in the code to make this distinction: it's much too easy for engineer A to mark some item in the log.Info call as "not PII" and then engineer B change the definition of the struct in an unrelated file to start containing PII.

It's a hard problem that's solved in PL theory by using type annotations and taint checking but given the tools we're working with that's out of our reach.

What I'm settling on here is:

log.Safe() as an opt-in promise by the implementor of the log.Info call that they have enforced the PII-freeness of the argument by other means. It also pops out during reviews. This is a weak solution and I don't like it.
a type registry that globally marks certain Go types as always non-PII; this will include node IDs, range IDs, store IDs, table IDs and the like. We can then over time increase this via an interface (maybe SafeDetailer, maybe something better) that's used by the formatter instead of the value itself. I think this will bring us to better places.

andreimatei · 2020-05-12T13:57:39Z

I'm still thinking about the SafeMesseger - there are types that we log frequently that have some PII in them, but are still quite useful with the PII stripped - in particular transactions and batches. I'm thinking it'd be really useful to let these types string-ify themselves such that the PII-parts are enclosed by the markers, but the others aren't. What do you think?

knz · 2020-05-12T14:02:02Z

I'm thinking it'd be really useful to let these types string-ify themselves such that the PII-parts are enclosed by the markers, but the others aren't. What do you think?

Yes this is a very good idea. I was planning to do exactly this for error objects already, but I think we can extend the algorithm to more objects. Thanks for the heads up.

bdarnell

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @knz, @petermattis, and @tbg)

docs/RFCS/20200427_pii_free_logs.md, line 43 at r2 (raw file):

From a user's perspective, the feature would work as follows:

- the feature is enabled by passing the `--redactable-logs` flag on

This should be the default in 20.2. I know you're planning to backport this, but I think the main "guide-level explanation" should focus on the main upcoming release instead of the backported state.

docs/RFCS/20200427_pii_free_logs.md, line 80 at r2 (raw file):

I200508 12:34:36.229124 56 server/server.go:1423 password of user admin was set to "s3cr34›!@x_"

This isn't a great example since we shouldn't be logging passwords anyway (and we're not, in any code path that knows that they're passwords). I suggest the "initiating a split at key %s" message as a more realistic example that includes PII.

It would also be nice to include some tags in the example - I had to go double-check a log file to verify that tags are after the gap instead of before it.

docs/RFCS/20200427_pii_free_logs.md, line 146 at r2 (raw file):

- in v20.2: TBD, probably `true` — new feature, enabled by default.
- in tests: `false` — test logs are always marker-free.

Why? The markers shouldn't be intrusive and we should get used to the new default log format.

In fact, I'd almost argue that we should filter test and in-house cluster logs through --anonymize-logs by default to give us an incentive to expose enough information to be useful in the unredacted form (but I wouldn't want to force us to repro tests just to be able to get clearer logs).

docs/RFCS/20200427_pii_free_logs.md, line 155 at r2 (raw file):

so that logs can be readily shared should a problem ever arise.

Client-s1ize, `cockroach debug zip` always defaults to

s/s1ize/side/ (i think)

docs/RFCS/20200427_pii_free_logs.md, line 163 at r2 (raw file):

them. We do not want to promote the use of the flags unless a concern
was already voiced by the user, because "more information" always
helps with troubleshooting problems faster.

I'm not sure I agree. Having more information is also a liability for us; we don't want customer information except as a last resort. The end goal of the log redaction effort should be that the redacted logs are sufficient in most cases and we only need the unredacted logs in rare, novel situations.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @knz, @petermattis, and @tbg)

docs/RFCS/20200427_pii_free_logs.md, line 101 at r2 (raw file):

- marking was used to delimit the sensitive datums `admin` and `"s3cr34?!@x_"`
- occurrences of the markers inside the datums have been escaped: `"s3cr34‹!@x_"` has become `"s3cr34?!@x_"`.
- there is now a vertical ellipsis "⋮" between the logging message prefix and the message itself. This

Note that this is likely breaking cockroach debug merge-logs. Not a big deal, since it's an internal tool and we usually use a recent version, but something that will need to be rectified together with the impl work.

docs/RFCS/20200427_pii_free_logs.md, line 146 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why? The markers shouldn't be intrusive and we should get used to the new default log format.

In fact, I'd almost argue that we should filter test and in-house cluster logs through --anonymize-logs by default to give us an incentive to expose enough information to be useful in the unredacted form (but I wouldn't want to force us to repro tests just to be able to get clearer logs).

I think I also would prefer it if we just always had the markers with no way to turn them off. At that point you could also avoid changing the log format (i.e. drop the v-ellipsis), by handling "pre-this RFC" logs via a heuristic: if there isn't a single redaction char in the file, redact the whole log file.

Before making that call though it would be nice to see the first, say, ten seconds of logs in the life of a node with the tags.

As for redacting in-house, yes, ideally - but then we need to fork-lift the usefulness of our redacted logs and this is not something that has been dared to be declared in-scope. Without the fork lift, it's too much of an interruption to dev-work and likely an ongoing tax + hate inducer.

docs/RFCS/20200427_pii_free_logs.md, line 163 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm not sure I agree. Having more information is also a liability for us; we don't want customer information except as a last resort. The end goal of the log redaction effort should be that the redacted logs are sufficient in most cases and we only need the unredacted logs in rare, novel situations.

Ben, that may be true in theory, but in practice I have yet to see a debugging session in which I have wished for less information. I agree that ultimately we should have usable redacted logs, but we might shoot ourselves badly in the foot if we release something that by default results in basically no useful logs - round trips to the customer are the bane of support.

knz

I'm suggesting the former. [...] I claim that there are few issues where customer keys or other private data is really necessary.

Yes that has been the working assumption all along. If we discover this assumption doesn't hold, that'd be a big setback and compromises the approach as a whole.

However, as a practical matter, I don't think we could do that from day one of the new logging scheme. [...] We'd need to do a first pass of adding log.Safe calls to our most important logging messages before we could consider obscuring logs by default.

I am mildly more optimistic about this, based on my reading of log files with the current implementation. I think what I'd like to do next is expose some support KV engineers to a redacted log from a broken cluster and ask them what they can make of it. Then work upwards from there.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @bdarnell, @BramGruneir, @knz, and @tbg)

docs/RFCS/20200427_pii_free_logs.md, line 1 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Maybe: s/PII/user data/g

PII is a subset of what we want to redact. Data can be sensitive or confidential without identifying a person.

Done.

See "log redaction" RFC from cockroachdb#48076 for details. General concept: - logs stored on disk contain markers around data items potentially containing PII or confidential data (“unsafe data”). - `debug zip`, `debug merge-log`, and the `Logs` and `LogFiles` RPCs edit the log entries upon request to replace all the unsafe data by `‹×›`. See also the release notes below for a summary of UX. Example redacted output: ``` I200426 20:06:45.558765 1 cli/start.go:671 ⋮ GEOS initialized at ‹×› I200426 20:06:45.558772 1 cli/start.go:676 ⋮ starting cockroach node I200426 20:06:45.562212 75 storage/rocksdb.go:606 ⋮ opening rocksdb instance at ‹×› I200426 20:06:45.574608 75 server/server.go:750 ⋮ [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled I200426 20:06:45.574782 75 storage/rocksdb.go:606 ⋮ opening rocksdb instance at ‹×› I200426 20:06:45.581451 75 server/config.go:576 ⋮ [n?] ‹×› storage engine‹×› initialized I200426 20:06:45.581459 75 server/config.go:579 ⋮ [n?] ‹×› I200426 20:06:45.581466 75 server/config.go:579 ⋮ [n?] ‹×› I200426 20:06:45.602204 75 server/server.go:1270 ⋮ [n?] **** add additional nodes by specifying --join=‹×› I200426 20:06:45.602218 75 server/init.go:166 ⋮ [n?] no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node. I200426 20:06:45.602234 75 server/init.go:186 ⋮ [n?] **** cluster ‹×› has been created I200426 20:06:47.676567 117 gossip/gossip.go:1538 ⋮ [n1] node has connected to cluster via gossip I200426 20:06:47.676620 117 kv/kvserver/stores.go:255 ⋮ [n1] wrote ‹×› node addresses to persistent storage I200508 12:34:37.238123 1346 kv/kvserver/replica_consistency.go:255 ⋮ [n1,consistencyChecker,s2,r‹×›] triggering stats recomputation to resolve delta of ‹×› ``` This output was produced from the following raw log: ``` I200426 20:06:45.558765 1 cli/start.go:671 ⋮ GEOS initialized at ‹/data/home/kena/src/go/src/github.com/cockroachdb/cockroach/lib/libgeos_c.so› I200426 20:06:45.558772 1 cli/start.go:676 ⋮ starting cockroach node I200426 20:06:45.562212 75 storage/rocksdb.go:606 ⋮ opening rocksdb instance at ‹"/data/home/kena/src/go/src/github.com/cockroachdb/cockroach/cockroach-data/cockroach-temp302681397"› I200426 20:06:45.574608 75 server/server.go:750 ⋮ [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled I200426 20:06:45.574782 75 storage/rocksdb.go:606 ⋮ opening rocksdb instance at ‹"/data/home/kena/src/go/src/github.com/cockroachdb/cockroach/cockroach-data"› I200426 20:06:45.581451 75 server/config.go:576 ⋮ [n?] ‹1› storage engine‹› initialized I200426 20:06:45.581459 75 server/config.go:579 ⋮ [n?] ‹RocksDB cache size: 128 MiB› I200426 20:06:45.581466 75 server/config.go:579 ⋮ [n?] ‹store 0: RocksDB, max size 0 B, max open file limit 1878853› I200426 20:06:45.602204 75 server/server.go:1270 ⋮ [n?] **** add additional nodes by specifying --join=‹kenax:26257› I200426 20:06:45.602218 75 server/init.go:166 ⋮ [n?] no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node. I200426 20:06:45.602234 75 server/init.go:186 ⋮ [n?] **** cluster ‹2a7d34b5-a2aa-469b-8efd-ef7deb62ca73› has been created I200426 20:06:47.676567 117 gossip/gossip.go:1538 ⋮ [n1] node has connected to cluster via gossip I200426 20:06:47.676620 117 kv/kvserver/stores.go:255 ⋮ [n1] wrote ‹0› node addresses to persistent storage I200508 12:34:37.238123 1346 kv/kvserver/replica_consistency.go:255 ⋮ [n1,consistencyChecker,s2,r‹4/1:/System{/tsd-tse}›] triggering stats recomputation to resolve delta of ‹{ContainsEstimates:1438 LastUpdateNanos:1588941276231756380 IntentAge:0 GCBytesAge:0 LiveBytes:-35159 LiveCount:-692 KeyBytes:-33660 KeyCount:-692 ValBytes:-1499 ValCount:-692 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0}› ``` Release note (cli change): Certain kinds of advanced troubleshooting information, for example internal details from the Go runtime, are now output to a separate log file in the configured log directory. This file is named `<program>-stderr.xxx.log`. Release note (cli change): The server CLI commands (`start`, `start-single-node`, `demo`) now support a new flag `--redactable-logs`, which introduces markers in generated log entries to facilitate redaction of sensitive information by `cockroach debug zip` and other CockroachDB APIs. This flag is currently not enabled by default, but is recommended for all deployments and will be enabled by default in a later version. This feature is experimental. Note: the flag is *advertised* (in `--help`) by `cockroach start` and `cockroach start-single-node`, where log files are expected as a matter of course; it is also supported, yet *hidden* (from `--help`), by every other CLI command because every command can also produce log entries when passed `--log-dir` or `--logtostderr`. Release note (cli change): The commands `cockroach debug zip` and `cockroach debug merge-logs` support a new flag `--redact-logs`. When specified, it causes log messages that potentially contain confidential data or PII to be redacted away. This feature is experimental. Note 1: At the time of this writing, this redaction is extremely aggressive and may hinder the ability of Cockroach Labs to investigate issues from log files, especially if `--redactable-logs` is not enabled server-side. Note 2: Other potentially-confidential data may still be retrieved by `cockroach debug zip`; the new flag only applies to retrieved log files inside the output zip file.

48051: util/log: automatic log redaction r=tbg,bdarnell a=knz Fixes #46031 See "log redaction" RFC from #48076 for details. General concept: - logs stored on disk contain markers around data items potentially containing PII or confidential data (“unsafe data”). - `debug zip`, `debug merge-log`, and the `Logs` and `LogFiles` RPCs edit the log entries upon request to replace all the unsafe data by `‹×›`. See also the release notes below for a summary of UX. Example redacted output: ``` I200426 20:06:45.558765 1 cli/start.go:671 ⋮ GEOS initialized at ‹×› I200426 20:06:45.558772 1 cli/start.go:676 ⋮ starting cockroach node I200426 20:06:45.562212 75 storage/rocksdb.go:606 ⋮ opening rocksdb instance at ‹×› I200426 20:06:45.574608 75 server/server.go:750 ⋮ [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled I200426 20:06:45.574782 75 storage/rocksdb.go:606 ⋮ opening rocksdb instance at ‹×› I200426 20:06:45.581451 75 server/config.go:576 ⋮ [n?] ‹×› storage engine‹×› initialized I200426 20:06:45.581459 75 server/config.go:579 ⋮ [n?] ‹×› I200426 20:06:45.581466 75 server/config.go:579 ⋮ [n?] ‹×› I200426 20:06:45.602204 75 server/server.go:1270 ⋮ [n?] **** add additional nodes by specifying --join=‹×› I200426 20:06:45.602218 75 server/init.go:166 ⋮ [n?] no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node. I200426 20:06:45.602234 75 server/init.go:186 ⋮ [n?] **** cluster ‹×› has been created I200426 20:06:47.676567 117 gossip/gossip.go:1538 ⋮ [n1] node has connected to cluster via gossip I200426 20:06:47.676620 117 kv/kvserver/stores.go:255 ⋮ [n1] wrote ‹×› node addresses to persistent storage I200508 12:34:37.238123 1346 kv/kvserver/replica_consistency.go:255 ⋮ [n1,consistencyChecker,s2,r‹×›] triggering stats recomputation to resolve delta of ‹×› ``` This output was produced from the following raw log: ``` I200426 20:06:45.558765 1 cli/start.go:671 ⋮ GEOS initialized at ‹/data/home/kena/src/go/src/github.com/cockroachdb/cockroach/lib/libgeos_c.so› I200426 20:06:45.558772 1 cli/start.go:676 ⋮ starting cockroach node I200426 20:06:45.562212 75 storage/rocksdb.go:606 ⋮ opening rocksdb instance at ‹"/data/home/kena/src/go/src/github.com/cockroachdb/cockroach/cockroach-data/cockroach-temp302681397"› I200426 20:06:45.574608 75 server/server.go:750 ⋮ [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled I200426 20:06:45.574782 75 storage/rocksdb.go:606 ⋮ opening rocksdb instance at ‹"/data/home/kena/src/go/src/github.com/cockroachdb/cockroach/cockroach-data"› I200426 20:06:45.581451 75 server/config.go:576 ⋮ [n?] ‹1› storage engine‹› initialized I200426 20:06:45.581459 75 server/config.go:579 ⋮ [n?] ‹RocksDB cache size: 128 MiB› I200426 20:06:45.581466 75 server/config.go:579 ⋮ [n?] ‹store 0: RocksDB, max size 0 B, max open file limit 1878853› I200426 20:06:45.602204 75 server/server.go:1270 ⋮ [n?] **** add additional nodes by specifying --join=‹kenax:26257› I200426 20:06:45.602218 75 server/init.go:166 ⋮ [n?] no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node. I200426 20:06:45.602234 75 server/init.go:186 ⋮ [n?] **** cluster ‹2a7d34b5-a2aa-469b-8efd-ef7deb62ca73› has been created I200426 20:06:47.676567 117 gossip/gossip.go:1538 ⋮ [n1] node has connected to cluster via gossip I200426 20:06:47.676620 117 kv/kvserver/stores.go:255 ⋮ [n1] wrote ‹0› node addresses to persistent storage I200508 12:34:37.238123 1346 kv/kvserver/replica_consistency.go:255 ⋮ [n1,consistencyChecker,s2,r‹4/1:/System{/tsd-tse}›] triggering stats recomputation to resolve delta of ‹{ContainsEstimates:1438 LastUpdateNanos:1588941276231756380 IntentAge:0 GCBytesAge:0 LiveBytes:-35159 LiveCount:-692 KeyBytes:-33660 KeyCount:-692 ValBytes:-1499 ValCount:-692 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0}› ``` Release note (cli change): Certain kinds of advanced troubleshooting information, for example internal details from the Go runtime, are now output to a separate log file in the configured log directory. This file is named `<program>-stderr.xxx.log`. Release note (cli change): The server CLI commands (`start`, `start-single-node`, `demo`) now support a new flag `--redactable-logs`, which introduces markers in generated log entries to facilitate redaction of sensitive information by `cockroach debug zip` and other CockroachDB APIs. This flag is currently not enabled by default, but is recommended for all deployments and will be enabled by default in a later version. This feature is experimental. Note: the flag is *advertised* (in `--help`) by `cockroach start` and `cockroach start-single-node`, where log files are expected as a matter of course; it is also supported, yet *hidden* (from `--help`), by every other CLI command because every command can also produce log entries when passed `--log-dir` or `--logtostderr`. Release note (cli change): The commands `cockroach debug zip` and `cockroach debug merge-logs` support a new flag `--redact-logs`. When specified, it causes log messages that potentially contain confidential data or PII to be redacted away. This feature is experimental. Note 1: At the time of this writing, this redaction is extremely aggressive and may hinder the ability of Cockroach Labs to investigate issues from log files, especially if `--redactable-logs` is not enabled server-side. Note 2: Other potentially-confidential data may still be retrieved by `cockroach debug zip`; the new flag only applies to retrieved log files inside the output zip file. 49978: importccl: add basic tests for CSV export of enum types r=miretskiy a=rohany Work for #49971. This PR adds some basic tests for CSV export of enum types. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>

Release note: None

knz · 2020-06-10T11:34:28Z

bors r=tbg,bdarnell,andreimatei

craig · 2020-06-10T12:17:52Z

Build succeeded

GitHub CI (Cockroach)

knz requested review from aaron-crl, bdarnell, petermattis and tbg April 27, 2020 21:37

knz requested a review from a team as a code owner April 27, 2020 21:37

knz force-pushed the 20200427-pii-free-logging branch from 4e8b87a to aa9cc1b Compare April 27, 2020 21:40

knz mentioned this pull request Apr 27, 2020

util/log: automatic log redaction #48051

Merged

knz force-pushed the 20200427-pii-free-logging branch 2 times, most recently from d6f244e to 30ff0da Compare April 28, 2020 08:31

tbg reviewed Apr 28, 2020

View reviewed changes

petermattis reviewed Apr 28, 2020

View reviewed changes

bdarnell reviewed May 4, 2020

View reviewed changes

irfansharif mentioned this pull request May 7, 2020

cli: support debug zip --anonymize #46031

Closed

knz commented May 8, 2020

View reviewed changes

knz force-pushed the 20200427-pii-free-logging branch from 30ff0da to 48e34bd Compare May 8, 2020 13:38

tbg requested review from bdarnell and petermattis May 12, 2020 13:33

tbg approved these changes May 12, 2020

View reviewed changes

tbg self-requested a review May 12, 2020 13:33

bdarnell approved these changes May 18, 2020

View reviewed changes

tbg requested a review from bdarnell May 19, 2020 07:56

tbg reviewed May 19, 2020

View reviewed changes

knz changed the title ~~rfcs: new RFC on PII-free logging~~ rfcs: new RFC on log file redaction (stripping PII and confidential data) Jun 2, 2020

knz force-pushed the 20200427-pii-free-logging branch from e4a3683 to 556ab96 Compare June 3, 2020 12:45

knz commented Jun 3, 2020

View reviewed changes

knz force-pushed the 20200427-pii-free-logging branch from 556ab96 to 6c88b4d Compare June 5, 2020 17:52

knz force-pushed the 20200427-pii-free-logging branch from 6c88b4d to a9bfba4 Compare June 8, 2020 08:45

rfcs: new RFC on log file redaction

58a2c36

Release note: None

knz force-pushed the 20200427-pii-free-logging branch from a9bfba4 to 58a2c36 Compare June 10, 2020 11:34

craig bot merged commit fd08ff1 into cockroachdb:master Jun 10, 2020

knz deleted the 20200427-pii-free-logging branch June 10, 2020 12:19

thtruo mentioned this pull request Jul 1, 2020

log: provide option to redact PII and confidential data from logs #50905

Closed

Conversation

knz commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Apr 27, 2020

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

aaron-crl commented Apr 29, 2020

Uh oh!

andreimatei commented Apr 29, 2020 via email

Uh oh!

bdarnell left a comment

Choose a reason for hiding this comment

Uh oh!

knz commented May 4, 2020

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

knz commented May 12, 2020

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

knz commented May 12, 2020

Uh oh!

andreimatei commented May 12, 2020

Uh oh!

knz commented May 12, 2020

Uh oh!

bdarnell left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

knz commented Jun 10, 2020

Uh oh!

craig bot commented Jun 10, 2020

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

knz commented Apr 27, 2020 •

edited

Loading