Skip to content

add upstream info#2434

Closed
kyessenov wants to merge 1 commit intoistio:masterfrom
kyessenov:add_upstream_tls_attributes
Closed

add upstream info#2434
kyessenov wants to merge 1 commit intoistio:masterfrom
kyessenov:add_upstream_tls_attributes

Conversation

@kyessenov
Copy link
Copy Markdown
Contributor

Signed-off-by: Kuat Yessenov kuat@google.com

Adds:

  • upstream.mtls a bool indicating whether client-side connection is mTLS
  • upstream.failure_reason a string recording connection reset log line (usually due to TLS)

/cc @nrjpoddar @douglas-reid

Signed-off-by: Kuat Yessenov <kuat@google.com>
@kyessenov kyessenov requested a review from a team September 30, 2019 20:11
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Sep 30, 2019
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 30, 2019
@douglas-reid
Copy link
Copy Markdown
Contributor

can you add detail on how these new attributes impact the calculation of the connection security policy labels, etc.?

@nrjpoddar
Copy link
Copy Markdown
Member

Thanks @kyessenov for this PR. Can you add in a comment few examples of how the newly added attributes show up in mixer access logs or telemetry?

static const char kConnectionId[];
// Record TCP connection status: open, continue, close
static const char kConnectionEvent[];
static const char kUpstreamMtls[];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does work with this attribute kConnectionMtls

Copy link
Copy Markdown
Contributor Author

@kyessenov kyessenov Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection.mtls means downstream connection mTLS status.
upstream.mtls means upstream connection mTLS status.
Depending on whether the proxy acts like an outbound or an inbound sidecar, you use one or the other.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, may be this attribute kConnectionMtls needs to explicitly say that it's related to downstream. It's can be confusing specially after adding this new attribute.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't change the meaning of existing attributes. There are UIs that have expectations. We can change the metric, however, in istio config.

@kyessenov
Copy link
Copy Markdown
Contributor Author

@douglas-reid

We should modify the mtls label to be conditional(context.reporter.kind == "client", upstream.mtls | false, connection.mtls | false).
Failure reason is more useful in access logs, since we don't have a good catalog of errors yet.

Copy link
Copy Markdown
Member

@nrjpoddar nrjpoddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the types of error messages that can be reported via GetUpstreamFailureReason? I'm worried about cardinality here if we add this on a metric specially if it is directly tied to the Envoy message.

One way to deal with it would be to create known types based of the set of error messages we care about/know in proxy repo and then skip the other kinds to decouple label cardinality explosion from Envoy string message directly.

static const char kConnectionId[];
// Record TCP connection status: open, continue, close
static const char kConnectionEvent[];
static const char kUpstreamMtls[];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, may be this attribute kConnectionMtls needs to explicitly say that it's related to downstream. It's can be confusing specially after adding this new attribute.

@kyessenov
Copy link
Copy Markdown
Contributor Author

@nrjpoddar yes, we'll have to update the attribute vocabulary once this gets in. I think this is better than changing the meaning of the attribute. It was always about downstream, just never properly documented.

Upstream failure reason is coming straight from boringSSL. Yes, it should not be used for metrics, unless we can catalog all the error kinds in the library.

@nrjpoddar
Copy link
Copy Markdown
Member

Upstream failure reason is coming straight from boringSSL. Yes, it should not be used for metrics, unless we can catalog all the error kinds in the library. -> I can work with you if needed and we can divide and conquer. Getting this information in metrics, even at low resolution where all SSL related errors are reported as boolean is very useful. An alert via metrics + more information in logs/tracing will be an acceptable short time solution for me.

@kyessenov
Copy link
Copy Markdown
Contributor Author

kyessenov commented Sep 30, 2019

@nrjpoddar Sure. I still think we need to propagate full-fidelity info from envoy, in case we get our catalog wrong, or there is a need for more details. As a first pass, we should use a CEL expression (upstream.failure_reason.startsWith("blah")) as a rough error type metric.

@kyessenov
Copy link
Copy Markdown
Contributor Author

Hey @PiotrSikora , as a WG lead, can you review or assign someone to review? We need this for 1.4 in case Mixer v2 slips.

builder.FlattenMapOfStringToStruct(report_data->GetDynamicFilterState());

builder.AddBool(utils::AttributeName::kUpstreamMtls,
report_data->IsUpstreamMutualTLS());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit. instead of having a bool for mutual TLs, why not have an enum of sorts? upstream: plaintext, tls, mutualTLS ? We are coming across a lot of setups where the sidecar has to terminate one way TLS on VMs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enums are generally painful to deal with in proto. mTLS is special since that's the end state, so it's needed to track the progress on getting to that state. We can populate other TLS properties later (e.g. version, cipher suite ID, etc).

std::string* trust_domain);

// Returns true if connection is mutual TLS enabled.
// Returns true if downstream connection is mutual TLS enabled.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this to IsDownstreamMutualTLS?


bool IsUpstreamMutualTLS(const StreamInfo::StreamInfo& stream_info) {
const auto ssl = stream_info.upstreamSslConnection();
return ssl != nullptr && ssl->peerCertificatePresented();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on the usage of ssl->peerCertificatePresented(). If this flag is set to true, does this imply client was using TLS for talking to upstream and got a successful certificate from upstream or it should mean that client is using TLS irrespective of how the negotiation went and what upstream server presented?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, I think this might be an issue. We should be checking whether a local certificate was presented, not peer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I thought. I don't see an Envoy API for that. @lizan may be you know better?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extend envoy to report whether a local certificate is presented. That might not mean that the server validated it. Not sure how to get that info.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we didn't have this in Envoy API because this has been simply reflecting what in the TLS context config, though we're adding more dynamic nature to the config for incremental adoption etc, so it makes sense to add, thanks for posting envoyproxy/envoy#8464.

@nrjpoddar
Copy link
Copy Markdown
Member

@kyessenov in general, from a user point of view I only care about if the connection over which this transaction is happening is plaintext, TLS or mTLS. In all of these states, I care about errors if any occur.

I'm struggling to see how users can get benefit from this low level separation of upstream/downstream without getting more confused.

@kyessenov
Copy link
Copy Markdown
Contributor Author

kyessenov commented Oct 2, 2019

@nrjpoddar We cannot conflate the two for gateways. We only have one set of attributes we report from the gateways, and downstream/upstream connections have very different characteristics.

@istio-testing
Copy link
Copy Markdown
Collaborator

@kyessenov: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/proxy-presubmit-release.sh 1939596 link /test proxy-presubmit-release
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Nov 2, 2019
@istio-policy-bot
Copy link
Copy Markdown

🧭 This issue or pull request has been automatically marked as stale because it has not had activity from an Istio team member since 2019-10-02. It will be closed on 2019-12-01 unless an Istio team member takes action. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot
Copy link
Copy Markdown

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2019-10-02. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Nov 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants