Skip to content

Properly fall back in the dst_router#291

Merged
kleimkuhler merged 5 commits intomasterfrom
kleimkuhler/unresolvable-addrs
Jul 26, 2019
Merged

Properly fall back in the dst_router#291
kleimkuhler merged 5 commits intomasterfrom
kleimkuhler/unresolvable-addrs

Conversation

@kleimkuhler
Copy link
Contributor

@kleimkuhler kleimkuhler commented Jul 26, 2019

Motivation

TLDR: Requests whose target authority is a socket address do not properly fall
back. Requests to unresolvable authorities which are DNS names do fall back as
expected.

In a control plane branch, I have the tap controller properly adding the header
metadata to tap requests, and the pod proxy’s expecting the tap identity the
requests would be coming from. After installing a local build of this control
plane and the proxy that supports l5d-require-id, I noticed that the tap
controller’s proxy was still not setting identity on outbound::Endpoints when
the header was there.

I began looking for the debug! lines that would be hit in the
orig_dst_router because I would expect to be hitting that router, and did not
see them—indicating those requests were still going through the balancer.

It turns out that requests with an authority that we cannot resolve still get a
single slot in the balancer, and therefore do not fallback because they do not
return the Unresolvable error. We match on Socket here and make a new
slot in the balancer here

This is an issue because if these requests contain the l5d-require-id header,
the Endpoint identity is not set because we do not go through the
orig_dst_router and therefore do not have a TLS connection

Solution

In outbound::discover, Resolve will now only attempt to resolve
Addr::Name, and return a Unresolvable error on Addr::Socket. For requests
that are Addr::Socket, we rely on getting the original dst of the request and
therefore routes through orig_dst_router.

Tests

The tests that specifically test SO_ORIGINAL_DST have been removed because those
tests now result in errors which is the expected behavior now.

Signed-off-by: Kevin Leimkuhler kleimkuhler@icloud.com

@kleimkuhler kleimkuhler requested review from adleong, hawkw and olix0r July 26, 2019 00:05
@kleimkuhler kleimkuhler self-assigned this Jul 26, 2019
match self.orig_dst {
None => None,
None => {
trace!("no SO_ORIGINAL_DST on source");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace!s in this file are not needed, but I found them helpful when testing and therefore probably a good reason to keep around.

@hawkw
Copy link
Contributor

hawkw commented Jul 26, 2019

Motivation

TLDR: Fallback layer was not actually falling back on requests that cannot be
resolved

This is specifically in the case of requests whose target authority is a socket address. Requests to unresolvable authorities which are DNS names do fall back as expected.

Just felt like that should be clarified.

@kleimkuhler
Copy link
Contributor Author

@hawkw Yes thank you for pointing out the more specific case of this! I updated the comment.

Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with making this change, provided that we agree that it's fine for the proxy to fail all requests without an original destination (and therefore, not work at all on platforms without SO_ORIGINAL_DST). I'd like to hear what @olix0r and @seanmonstar think about that.

let resolution = match resolution.poll() {
Ok(Async::Ready(resolution)) => resolution,
Ok(Async::NotReady) => return Ok(Async::NotReady),
Err(_) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that Unresolvable is the only error that the control::destination resolver will return?

ref mut resolution,
}) = self.resolving
{
let resolution = match resolution.poll() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this just be try_ready!?

#[derive(Clone, Debug)]
pub struct Resolve<R: resolve::Resolve<NameAddr>>(R);
#[derive(Clone)]
pub struct Resolver<R: resolve::Resolve<NameAddr>>(R);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it might be worth considering just moving all the remaining functionality in this module into control::destination::Resolver. It seems like this module is just a pretty thin layer over the destination resolver and we could probably put all that functionality in one place. I think a majority of the code here now is just re-implementing some future combinators, and there's only a few lines of actual logic here.

On the other hand, moving that to control::destination might be a worse separation of concerns. 🤷‍♀

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice there are a bunch of things changing here in addition to the primary change of making Addr::Socket unresolvable:

  • Resolve renamed to Resolver
  • Debug no longer impl'd
  • Introduction of a new future type instead of having Resolution impl Future directly

I'm wondering if this can be accomplished with a smaller diff by simply changing the Resolving enum to

enum Resolving<R> {
  Name(NameAddr, R),
  Unresolvable,
}

This allows almost all of the existing logic to stay the same and allows us to continue with Future being impl'd for Resolution.

@hawkw
Copy link
Contributor

hawkw commented Jul 26, 2019

  • Debug no longer impl'd

Ah, yeah, I missed this --- can we put debug back @kleimkuhler?

@olix0r
Copy link
Member

olix0r commented Jul 26, 2019

(Why) is it necessary to change the naming scheme for this module?

Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer that this diff be smaller and not rename types that need not be renamed. But the change seems fine to me, functionally.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
@kleimkuhler kleimkuhler force-pushed the kleimkuhler/unresolvable-addrs branch from 55a0222 to 0053f1e Compare July 26, 2019 18:12
@kleimkuhler
Copy link
Contributor Author

@adleong Thanks for the great insight about changing the Resolving variants to get a minimal diff for this fix! The diff is much smaller now.

I will open a new issue that addresses what @hawkw has pointed out about making some larger changes to this module that remove some complexity around wrapping the destination resolver.

@kleimkuhler
Copy link
Contributor Author

I have also confirmed that this fixes the issue with a local build of this branch injected into my local build of the control plane.

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's beautiful 😭

@kleimkuhler kleimkuhler merged commit 9faa7ca into master Jul 26, 2019
@kleimkuhler kleimkuhler deleted the kleimkuhler/unresolvable-addrs branch July 26, 2019 20:49
kleimkuhler added a commit to linkerd/linkerd2 that referenced this pull request Jul 27, 2019
* Update h2 to v0.1.26
* Properly fall back in the dst_router (linkerd/linkerd2-proxy#291)

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
kleimkuhler added a commit to linkerd/linkerd2 that referenced this pull request Jul 30, 2019
### Summary

In order for Pods' tap servers to start authorizing tap clients, the tap
controller must open TLS connections so that it can identity itself to the
server.

This change introduces the use of `l5d-require-id` header on outbound tap
requests.

### Details

When tap requests are made by the tap controller, the `Authority` header is an
IP address. The proxy does not attempt to do service discovery on such requests
and therefore the connection is over plaintext. By introducing the
`l5d-require-id` header the proxy can require a server name on the connection.
This allows the tap controller to identity itself as the client making tap
requests. The name value for the header can be made from the Pod Spec and tap
request, so the change is rather minimal.

#### Proxy Changes

* Update h2 to v0.1.26
* Properly fall back in the dst_router (linkerd/linkerd2-proxy#291)

### Testing

Unit tests for the header have not been added mainly because [no test
infrastructure currently exists](https://github.com/linkerd/linkerd2/blob/065c2218588093831dc2f8f49516e85e4a71b896/controller/tap/server_test.go#L241) to mock proxy requests. After talking with
@siggy a little about this, it makes to do in a separate change at some point
when behavior like this cannot be reliably tested through integration tests
either.

Integration tests do test this well, and will continue to do once
linkerd/linkerd2-proxy#290 lands.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Aug 9, 2019
* Update h2 to v0.1.26
* Properly fall back in the dst_router (linkerd/linkerd2-proxy#291)
* Tap server authorizes clients when identity is expected (linkerd/linkerd2-proxy#290)
* update-rust-version: Check usage (linkerd/linkerd2-proxy#298)
* tap: fix tap response streams never ending (linkerd/linkerd2-proxy#299)
* Require identity on tap requests (linkerd/linkerd2-proxy#295)
* Authority label should reflect logical dst (linkerd/linkerd2-proxy#300)
* Replace futures_watch with tokio::sync::watch (linkerd/linkerd2-proxy#301)
* metrics: add `request_handle_us` histogram (linkerd/linkerd2-proxy#294)
* linkerd2-proxy: Adopt Rust 2018 (linkerd/linkerd2-proxy#302)
* Remove futures-mpsc-lossy (linkerd/linkerd2-proxy#305)
* Adopt std::convert::TryFrom (linkerd/linkerd2-proxy#304)
* lib: Rename directories to match crate names (linkerd/linkerd2-proxy#303)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants