Skip to content

cli: support COCKROACH_REDACTION_POLICY_MANAGED env var#86475

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
abarganier:log-redaction-fixes-1
Sep 8, 2022
Merged

cli: support COCKROACH_REDACTION_POLICY_MANAGED env var#86475
craig[bot] merged 1 commit intocockroachdb:masterfrom
abarganier:log-redaction-fixes-1

Conversation

@abarganier
Copy link
Copy Markdown
Contributor

@abarganier abarganier commented Aug 19, 2022

Currently, log redaction policies have no way to discern their own
runtime environment. Logged objects that may be considered sensitive
and unsafe in on-prem deployments of CockroachDB might be otherwise
safe when we're running within a managed service such as Cockroach
Cloud. For example, CLI argument lists included as part of the
cockroach start command are already known to those operating the
managed service, so there's no reason we should be redacting this
information from logs in this case.

This patch adds the --managed flag to the start commands. This
flag is plumbed through to the global logging config object where
the log package has access to it.

We also introduce log.SafeManaged(s interface{}), which conditionally
marks an object with redact.Safe() depending on whether or not we
are running as a managed service. This is inspired by the original
log.SafeOperational(s interface{}) function.

I believe that this new --managed flag should not be advertised in
our public documentation, as its intended use is for those running
Cockroach Cloud.

Release justification: low-risk, high benefit changes to existing
functionality. The new CLI flag has a minimal impact on DB
operations and provides high value reduction of log redaction,
which will be necessary for support staff with our latest compliance
requirements.

Release note (cli change): cockroach start commands now have an
additional --managed flag that can be used to indicate whether
or not the node is running as part of a managed service (e.g.
Cockroach Cloud). Perhaps this shouldn't be advertised in our
public facing docs, as its only intended for use by those running
Cockroach Cloud and not for on-prem deployments.

Addresses #86316

@abarganier abarganier requested review from a team, knz and logston August 19, 2022 19:02
@abarganier abarganier requested review from a team as code owners August 19, 2022 19:02
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@abarganier abarganier removed request for a team August 19, 2022 19:02
@abarganier abarganier force-pushed the log-redaction-fixes-1 branch 3 times, most recently from 7445944 to b6b2a76 Compare August 19, 2022 19:48
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 15 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @logston)


pkg/base/config.go line 163 at r2 (raw file):

	// of a managed service (e.g. CockroachCloud). Impacts log
	// redaction policies.
	Managed bool

Not here. Add the field to cliCtx in pkg/cli/context.go.


pkg/base/config.go line 270 at r2 (raw file):

func (cfg *Config) InitDefaults() {
	cfg.Insecure = defaultInsecure
	cfg.Managed = defaultManaged

Do this in setCliConextDefaults, pkg/cli/context.go.


pkg/cli/context.go line 454 at r2 (raw file):

	serverCertPrincipalMap []string
	serverListenAddr       string
	serverManaged          bool

no need for this.


pkg/cli/flags.go line 476 at r2 (raw file):

		// NB: Insecure is deprecated. See #53404.
		cliflagcfg.BoolFlag(f, &startCtx.serverInsecure, cliflags.ServerInsecure)
		cliflagcfg.BoolFlag(f, &startCtx.serverManaged, cliflags.ServerManaged)

cliCtx.managed


pkg/cli/flags.go line 894 at r2 (raw file):

		// NB: Insecure is deprecated. See #53404.
		cliflagcfg.BoolFlag(f, &startCtx.serverInsecure, cliflags.ServerInsecure)
		cliflagcfg.BoolFlag(f, &startCtx.serverManaged, cliflags.ServerManaged)

ditto


pkg/cli/flags.go line 994 at r2 (raw file):

	serverCfg.User = username.NodeUserName()
	serverCfg.Insecure = startCtx.serverInsecure
	serverCfg.Managed = startCtx.serverManaged

no need for this


pkg/util/log/redact.go line 223 at r2 (raw file):

// and does not need to be treated as sensitive. This function marks
// the provided object as safe/unsafe accordingly.
func SafeManaged(s interface{}) redact.RedactableString {

Nah, this is not the right way to go about this. Let's get away from global variables. Try this instead:

  1. here:
func SafeManaged(a interface{}) interface{} {
   return safeManaged{a}
}

type safeManaged struct {a  interface{}}
  1. in makeUnstructuredEntry, taking inspiration from #82891, something like that:
for i, a := range args {
			if e, ok := a.(safeManaged); ok && logger.managed {
				args[i] = e.a
			}
		}

(you might need to copy the args slice if there's a safeManaged in it, instead of overwriting it in-place, i'm not sure)

this will likely need to make makeUnstructuredEntry a method of *loggerT, but I think that's OK.

The flip side is that it opens the door to choosing the redaction per logger (i.e. per channel) too.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @logston)


pkg/util/log/redact.go line 223 at r2 (raw file):

safeManaged

btw that new safeManaged type will need a Format and/or SafeFormat method too.

@knz
Copy link
Copy Markdown
Contributor

knz commented Aug 22, 2022

I encourage you to study how the redact library does this, see the source in internal/redact/wrappers.go.

Copy link
Copy Markdown
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback - RFAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @logston)


pkg/base/config.go line 163 at r2 (raw file):

Previously, knz (kena) wrote…

Not here. Add the field to cliCtx in pkg/cli/context.go.

Done.


pkg/base/config.go line 270 at r2 (raw file):

Previously, knz (kena) wrote…

Do this in setCliConextDefaults, pkg/cli/context.go.

Done.


pkg/cli/context.go line 454 at r2 (raw file):

Previously, knz (kena) wrote…

no need for this.

Done.


pkg/cli/flags.go line 476 at r2 (raw file):

Previously, knz (kena) wrote…

cliCtx.managed

Done.


pkg/cli/flags.go line 894 at r2 (raw file):

Previously, knz (kena) wrote…

ditto

Done.


pkg/cli/flags.go line 994 at r2 (raw file):

Previously, knz (kena) wrote…

no need for this

Done.


pkg/util/log/redact.go line 223 at r2 (raw file):

Previously, knz (kena) wrote…

safeManaged

btw that new safeManaged type will need a Format and/or SafeFormat method too.

Thanks for this advice - the separate safeManaged type implementing redact.SafeFormatter is a much better approach.

I gave modifying makeUnstructuredEntry a try and defining it on the *loggerT itself. Things were going great, until some testing at the end revealed that handling the conditional unwrapping depending on logger.managed within makeUnstructuredEntry doesn't cover cases such as this one where RedactableBytes() is called on a redact.StringBuilder.

Instead, we need to know the state of the managed flag at the time SafeManaged was called, or at the time SafeFormat is called. Otherwise, my interpretation is that we won't have full coverage. Let me know if I'm missing something.

As a result, I chose to store a copy of the state of the managed flag at call time for SafeManaged (but I don't feel strongly about it over checking isManaged() when executing SafeFormat). I've added a managed bool to the new type, which we can reference within SafeFormat to make the determination there on whether to mark safe or not. This covers cases such as RedactableBytes nicely.

I realize this still leaves us depending on the package-global logging variable 🤔. As an alternative, I chose to at least define setManaged and isManaged on the loggingT itself. I didn't see a good way to include it as part of the loggerT itself, but I'm open to alternatives - let me know any thoughts you might have.

Copy link
Copy Markdown
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some thinking about this over the weekend. To optimize for meeting our requirements before we cut for 22.2, I'd like to keep the implementation here simple where we choose to wrap in safeManaged/leave as-is at the time we call SafeManaged.

Looking forward, I am feeling the need for a more holistic solution to these "conditional redaction" cases. I've created #87038 to track this work & come up with a broader solution that aligns with our future goals of multiple logging configurations per-process.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @logston)


pkg/util/log/redact.go line 223 at r2 (raw file):

As a result, I chose to store a copy of the state of the managed flag at call time for SafeManaged (but I don't feel strongly about it over checking isManaged() when executing SafeFormat). I've added a managed bool to the new type, which we can reference within SafeFormat to make the determination there on whether to mark safe or not. This covers cases such as RedactableBytes nicely.

After some sleep & advice from the team, I realized that storing the state of managed within safeManaged is completely unnecessary. We'll just check it at the time SafeManaged is called & make the decision to wrap/leave as-is then.

@abarganier abarganier force-pushed the log-redaction-fixes-1 branch 2 times, most recently from f42a094 to 838aa0a Compare August 30, 2022 20:17
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @knz, and @logston)


pkg/cli/cliflags/flags.go line 729 at r4 (raw file):

	}

	ServerManaged = FlagInfo{

instead of a flag, would it be better to make this part of the logconfig.Config ?


pkg/cli/cliflags/flags.go line 733 at r4 (raw file):

		EnvVar: "COCKROACH_MANAGED",
		Description: `
Indicate that the node being started is being run as part of a managed service.

s/being started/


pkg/util/log/redact.go line 226 at r4 (raw file):

// NB: If the argument itself implements the redact.SafeFormatter interface,
// then we delegate to its implementation in either case.
func SafeManaged(a interface{}) interface{} {

consider leaving a link to #87038 in a comment


pkg/util/log/redact.go line 230 at r4 (raw file):

return safeManaged{a}

instead of this, could we do return redact.Safe(a) and get rid of safeManaged ? Wouldn't that be simpler?
I think the safeManaged struct would be useful if the decision to the argument or not was deferred to logging time, but given the current structure I don't understand what it's for.

@abarganier abarganier force-pushed the log-redaction-fixes-1 branch from 838aa0a to c2e3bcd Compare September 1, 2022 19:17
Copy link
Copy Markdown
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR, RFAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @logston)


pkg/cli/cliflags/flags.go line 729 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

instead of a flag, would it be better to make this part of the logconfig.Config ?

I gave this some thought and I'd like to keep it as a flag. I could see this flag being used in the future for purposes beyond just logging, where we want to alter behavior elsewhere if we're running in a managed service.


pkg/cli/cliflags/flags.go line 733 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/being started/

Done.


pkg/util/log/redact.go line 226 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider leaving a link to #87038 in a comment

Done.


pkg/util/log/redact.go line 230 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

return safeManaged{a}

instead of this, could we do return redact.Safe(a) and get rid of safeManaged ? Wouldn't that be simpler?
I think the safeManaged struct would be useful if the decision to the argument or not was deferred to logging time, but given the current structure I don't understand what it's for.

You're right - we've essentially come full circle on this 😅 . Switched it to the simpler approach.

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier, @andreimatei, @knz, and @logston)


pkg/cli/cliflags/flags.go line 729 at r4 (raw file):

Previously, abarganier (Alex Barganier) wrote…

I gave this some thought and I'd like to keep it as a flag. I could see this flag being used in the future for purposes beyond just logging, where we want to alter behavior elsewhere if we're running in a managed service.

I don't very clearly see how "running as a managed service" is tied to the logging changes being made here. It seems to me that it's not the fact that it's a managed service that makes all the things you're unredacting safe, but that it's CRL who runs this managed service and we don't intend to give anyone else debug.zip's produced by it. So perhaps consider --crl-managed as the flag name. I generally feel that we haven't dissected enough what it is that makes what we're doing here safe, so I'd still vote we relegate this setting to a more narrowly-scoped log config option. But whatever you think.


pkg/cli/cliflags/flags.go line 733 at r5 (raw file):

		EnvVar: "COCKROACH_MANAGED",
		Description: `
Indicate that the node is being run as part of a managed service/ This primarily

there's a slash at the end of the first sentence

Copy link
Copy Markdown
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @abarganier, @andreimatei, @knz, and @logston)


pkg/cli/cliflags/flags.go line 729 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't very clearly see how "running as a managed service" is tied to the logging changes being made here. It seems to me that it's not the fact that it's a managed service that makes all the things you're unredacting safe, but that it's CRL who runs this managed service and we don't intend to give anyone else debug.zip's produced by it. So perhaps consider --crl-managed as the flag name. I generally feel that we haven't dissected enough what it is that makes what we're doing here safe, so I'd still vote we relegate this setting to a more narrowly-scoped log config option. But whatever you think.

I think you make a good point - I'll prepare a patch to move this into the logging config

@abarganier abarganier force-pushed the log-redaction-fixes-1 branch from c2e3bcd to c5a9f9d Compare September 6, 2022 14:25
Copy link
Copy Markdown
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @logston)


pkg/cli/cliflags/flags.go line 729 at r4 (raw file):

Previously, abarganier (Alex Barganier) wrote…

I think you make a good point - I'll prepare a patch to move this into the logging config

After further consideration, our time table to get this delivered as part of the larger PCI redaction project is too tight to rework the implementation at this point. I'm going to leave as-is.


pkg/cli/cliflags/flags.go line 733 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

there's a slash at the end of the first sentence

Done.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall OK, but I'm 👎 on adding a CLI flag at all. IMHO this should be exclusively an env var. See rationale below.

Reviewed 8 of 15 files at r2, 12 of 14 files at r3, 1 of 2 files at r4, 1 of 2 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, @knz, and @logston)


pkg/cli/cliflags/flags.go line 730 at r6 (raw file):

	ServerManaged = FlagInfo{
		Name:   "managed",

I think a flag with this word structure --managed is not sufficiently descriptive.
Can we make it --keep-operational-details-during-redaction? or perhaps --log-redact-policy=managed

But really I am also wondering if this needs a documented command-line flag at all. I think the env var should be sufficient? Especially for CC deployments, adding a new flag means that the operator team needs to detect the version of crdb before they decide to add the flag.

With an env var, they can set it for every version without causing errors and the new version will pick it up automatically.

Copy link
Copy Markdown
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @logston)


pkg/cli/cliflags/flags.go line 730 at r6 (raw file):

Previously, knz (kena) wrote…

I think a flag with this word structure --managed is not sufficiently descriptive.
Can we make it --keep-operational-details-during-redaction? or perhaps --log-redact-policy=managed

But really I am also wondering if this needs a documented command-line flag at all. I think the env var should be sufficient? Especially for CC deployments, adding a new flag means that the operator team needs to detect the version of crdb before they decide to add the flag.

With an env var, they can set it for every version without causing errors and the new version will pick it up automatically.

That SGTM, especially since this is primarily meant to only be used by CRL. Thanks for the guidance here - I'll prepare a patch to remove the flag & solely rely on the env var instead.

@abarganier abarganier force-pushed the log-redaction-fixes-1 branch from c5a9f9d to f2ffb93 Compare September 7, 2022 17:45
Copy link
Copy Markdown
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the flag in favor of an environment variable, and adjusted the naming to be more descriptive.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @logston)

@abarganier abarganier changed the title cli: add --managed flag, log.SafeManaged redaction function & usage cli: support COCKROACH_REDACTION_POLICY_MANAGED env var Sep 7, 2022
@abarganier abarganier changed the title cli: support COCKROACH_REDACTION_POLICY_MANAGED env var cli: support COCKROACH_REDACTION_POLICY_MANAGED env var Sep 7, 2022
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing unit tests.

Reviewed 11 of 12 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @andreimatei, @knz, and @logston)


pkg/cli/flags.go line 526 at r7 (raw file):

		}

		cliCtx.redactionPolicyManaged = envutil.EnvOrDefaultBool(redactionPolicyManagedEnvVar, false)

Not quite. I'm expecting this will crash.
Also, it won't properly reset in-between unit tests.

I recommend you move this logic in context.go (same directory), in setCliContextDefaults().

Currently, log redaction policies have no way to discern their own
runtime environment. Logged objects that may be considered sensitive
and unsafe in on-prem deployments of CockroachDB might be otherwise
safe when we're running within a managed service such as Cockroach
Cloud. For example, CLI argument lists included as part of the
`cockroach start` command are already known to those operating the
managed service, so there's no reason we should be redacting this
information from logs in this case.

This patch adds the `COCKROACH_REDACTION_POLICY_MANAGED` env var to
be handled in the various the start commands. This flag is plumbed
through to the global logging config object where the log package has
access to it.

We also introduce `log.SafeManaged(s interface{})`, which conditionally
marks an object with `redact.Safe()` depending on whether or not we
are running as a managed service. This is a successor to the original
`log.SafeOperational(s interface{})` function.

I believe that this new env var should not be advertised in
our public documentation, as its intended use is for those running
Cockroach Cloud.

This patch also implements a handful of these redaction tweaks.
While it's not meant to be exhaustive, it supplies a good
baseline example of how `log.SafeManaged()` is used, and
provides an immediate reduction of redaction when it comes
to critical information for support staff, such as CLI args,
network addresses, and more.

Additionally, a few log lines have been identified as being
redacted when it was not necessary. This patch also makes a
few spot improvements using `redact.Safe` as well in such
cases.

We intentionally avoid a release note here, as we aim to keep
the usage of this environment variable internal.

Release justification: low-risk, high benefit changes to existing
functionality. The new CLI flag has a minimal impact on DB
operations and provides high value reduction of log redaction,
which will be necessary for support staff with our latest compliance
requirements.

Release note: none
@abarganier abarganier force-pushed the log-redaction-fixes-1 branch from f2ffb93 to 4640c58 Compare September 7, 2022 18:51
Copy link
Copy Markdown
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests and reduced invasiveness on pkg/cli. We now contain the handling of the env var within pkg/util/log.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @logston)


pkg/cli/flags.go line 526 at r7 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Not quite. I'm expecting this will crash.
Also, it won't properly reset in-between unit tests.

I recommend you move this logic in context.go (same directory), in setCliContextDefaults().

I didn't experience any crash during manual testing, but nonetheless, if we're not using a CLI flag anymore then it doesn't feel like we need to invade on this package.

I've instead opted to move all of this into log.ApplyConfig. We can read the env variable there, allowing tests that touch logging (e.g. use TestingResetActive() and log.ApplyConfig()) to reset in-between unit tests.

@abarganier abarganier requested a review from knz September 7, 2022 18:53
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r8, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @knz, and @logston)

@abarganier
Copy link
Copy Markdown
Contributor Author

bors r=knz

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 8, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants