roachprod: default to secure clusters unless GCE cockroach-ephemeral#151230
roachprod: default to secure clusters unless GCE cockroach-ephemeral#151230craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
No flag, Secure flag, No flag, non Insecure flag, non- |
62bad8b to
7eba42f
Compare
herkolategan
left a comment
There was a problem hiding this comment.
I understand passing a tri-state value to convey if the user explicitly intended to set secure, otherwise we infer it based on context. But since the only context that we really need to infer it is at CLI level, could we not use this https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachprod/cli/util.go#L140, and determine if someone is passing a project or set the env var for a project that we change the default behaviour. Then it will automatically trickle through to the roachprod library, which then doesn't have to care about intent?
Since they won't be able to access their cluster without setting the correct project in any case? And if the project is set we can infer the default at CLI level?
| ClusterSettings map[string]string | ||
|
|
||
| // This is used to pass the CLI flag | ||
| secureFlagsOpt SecureOption |
There was a problem hiding this comment.
Nit: Having a hidden flag hitching a ride on ClusterSettings here has me a bit worried. I kind of liked the fact that ClusterSettings was a pure exported struct without any hidden tricks.
Not sure I understand your point. The issue that forced me to pass a tri-state value is that the cluster In a nutshel:
At step 3, the cli doesn't know about the project the cluster lives in and cannot pre-compute a default applied to the project. So it could pass a general |
You're right, forgot about that aspect. In my mind to run "start" (or any other command) on the cluster in question you have to have to pass the project as well, but forgot the cluster cache keeps that info, so you don't have to. I'll take another pass in a minute :) |
|
I might be nitpicking now, and I may have some ulterior motives to avoid a conflict on ClusterSettingOption with one of my PRs... but hear me out: Could But it may still be a significant change since any command that uses "secure" would need to be updated to pass the cluster name, usually arg 0, to Wrap [1] - so that it can do the default secure logic. I could be wrong, but I think most or all of the operations would use the cache, so it wouldn't be an expensive call, and then the decision can be made on the CLI? [1] master/pkg/cmd/roachprod/cli/util.go#L140 / https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachprod/cli/util.go#L106 |
Took the night to reflect on it, and I do feel like it's putting a lot of (too much?) intelligence into the CLI. Making the decision on a default behavior depending on a cluster state is a business rule and not user-defined. If what really bothers you is the addition of this dual-state or tri-state boolean in the |
| } | ||
|
|
||
| type SecureOption interface { | ||
| apply(settings *ClusterSettings) |
There was a problem hiding this comment.
can replace apply(settings *ClusterSettings) with ClusterSettingOption inline, to make the composition clear that SecureOption is a ClusterSettingOption
There was a problem hiding this comment.
Not really, or it wouldn't satisfy the ClusterSettingOptions interface anymore:
// ClusterSettingOption is the interface satisfied by options to MakeClusterSettings.
type ClusterSettingOption interface {
apply(settings *ClusterSettings)
}
There was a problem hiding this comment.
Pretty sure this works:
type SecureOption interface {
ClusterSettingOption
compute(c *SyncedCluster) error
}
Checked out the PR to confirm and run tests.
There was a problem hiding this comment.
This will ensure if someone updates ClusterSettingOption, that your option inherits the changes (i.e., it would provide better hints that the implementation needs to be updated).
There was a problem hiding this comment.
Oh! Smart! Thanks!
I thought you wanted to change the signature of the apply() method on the SecureOption interface :)
Edit: updated
|
I guess this all boils down to the fact that we don't store any state, especially The 3 states of I'm not opposed to the solution in this PR. It just felt a bit heavy to support a very specific use case, but given the current state of things there's likely no squeaky clean way around it. |
Yep, I've restarted the work on This is intended as a stop gap PR, though we all know what stop gap usually means....... |
7eba42f to
fd6c483
Compare
There was a problem hiding this comment.
LGTM, I tried to think of a different approach, but nothing especially cleaner comes to mind.
I did wonder for a second if we should make the "main/exported" option (ClusterSettings.Secure) on the struct itself have multiple states {True, False, Default}, and then compute at the last minute where we need to convert Default -> True or False. But this would probably result in a much larger foot print to keep track of.
| settings.Secure = bool(o) | ||
| type SecureOption interface { | ||
| ClusterSettingOption | ||
| compute(c *SyncedCluster) error |
There was a problem hiding this comment.
Nit: compute is somewhat ambiguous relative to apply. It also doesn't actually compute much. How about making it into,
overrideForCluster(c *SyncedCluster) error)
There was a problem hiding this comment.
Went with the verbose overrideBasedOnClusterSettings(c *SyncedCluster) error).
| } | ||
| c.Nodes = nodes | ||
|
|
||
| if c.ClusterSettings.secureFlagsOpt != nil { |
There was a problem hiding this comment.
Nit: for brevity, could fold the nullity check into the method.
There was a problem hiding this comment.
As this is an interface, this method is bound to have multiple implementation, so this check being implemented multiple times vs. the single call. It feels safer to implement it there once and for all.
| // we make it insecure by default. This is to avoid dealing with certificates | ||
| // for ephemeral engineering test clusters. | ||
| if len(c.Clouds()) == 1 && c.Clouds()[0] == fmt.Sprintf("%s:%s", gce.ProviderName, gce.DefaultProjectID) { | ||
| fmt.Printf("WARN: cluster %s defaults to insecure, because it is in project %s\n", |
There was a problem hiding this comment.
Nit: this could spam the log multiple times. E.g., I see it twice during roachprod start; the second time is during updatePrometheusTargets. Maybe we could suppress it by persisting the override into the cluster metadata (i.e., *cloud.Cluster)?
There was a problem hiding this comment.
Feels like a much more involved project. Let's revisit and store the secure state of a cluster when we implement a service registry? Or in a subsequent PR if logging verbosity is getting out of hand.
Since cockroachdb#147737, to ease testing for engineering teams, roachprod defaults to insecure clusters. This change has led to security impact for other entities at CRL. This patch brings smart selection of secure vs. insecure cluster depending on the Cloud project the cluster is provisionned in. If the cluster is provisionned in the engineering's ephemeral GCP project, it will default to insecure, while all other configurations will default to secure. In order to achieve that, and because this depends on the cluster's configuration that is not available at the CLI package level, the CLI now keeps track of the user's arguments (--secure, --insecure, default) and passes the information to the roachprod library. Epic: none Release note: None
fd6c483 to
c48f303
Compare
|
TFTRs! bors r=herkolategan,srosenberg |
As of cockroachdb#151230, roachprod's default behavior is to provision secure clusters unless provisioning is in the cockroach-ephemeral GCP project. To warn the user about the fact that the cluster is insecure in this configuration, a log was added to stdout. The addition of this log message broke some K/V performance scripts, so this patch moves the warning to stderr instead. Epic: none Release note: None
151737: sql: add support for automatically repairing dangling comments r=fqazi a=fqazi Previously, we added logic to detect dangling comments on descriptors but did not include a mechanism to clean them up. This could block initial upgrades due to dangling entries in the system.comments table. This patch adds support for automatically cleaning up these dangling comments. Fixes: #151497 Release note (bug fix): Added an automatic repair for dangling or invalid entries in the system.comment table. 152079: roachprod: insecure cluster warn log r=herkolategan a=golgeek As of #151230, roachprod's default behavior is to provision secure clusters unless provisioning is in the cockroach-ephemeral GCP project. To warn the user about the fact that the cluster is insecure in this configuration, a log was added to stdout. The addition of this log message broke some K/V performance scripts, so this patch moves the warning to stderr instead. Epic: none Release note: None Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: Ludovic Leroux <ludo.leroux@cockroachlabs.com>
152079: roachprod: insecure cluster warn log r=herkolategan a=golgeek As of #151230, roachprod's default behavior is to provision secure clusters unless provisioning is in the cockroach-ephemeral GCP project. To warn the user about the fact that the cluster is insecure in this configuration, a log was added to stdout. The addition of this log message broke some K/V performance scripts, so this patch moves the warning to stderr instead. Epic: none Release note: None 152170: workload/schemachange: handle LTREE in mixed version CREATE TYPE r=rafiss a=rafiss Using this type in a composite type in a mixed version cluster will cause UndefinedObject errors, so we need to make the workload aware of that. informs #150547 Release note: None Co-authored-by: Ludovic Leroux <ludo.leroux@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Since #147737, to ease testing for engineering teams, roachprod defaults to insecure clusters. This change has led to security impact for other entities at CRL.
This patch brings smart selection of secure vs. insecure cluster depending on the Cloud project the cluster is provisionned in. If the cluster is provisionned in the engineering's ephemeral GCP project, it will default to insecure, while all other configurations will default to secure.
In order to achieve that, and because this depends on the cluster's configuration that is not available at the CLI package level, the CLI now keeps track of the user's arguments (--secure, --insecure, default) and passes the information to the roachprod library.
Epic: none
Release note: None