Conversation
This adds a wrapped connection pool whose purpose is to control how connection pools are allocated and how connection are mapped to them. These allocations and mappings depend on various criteria of the request, chosen by an abstracted mapping, when a stream is first handled. Signed-off-by: Kyle Larose <kyle@agilicus.com>
If src transparency is enabled on a cluster, we need to ensure that requests from the same IP go to the same connection pool. We do this to simply the stream matching criteria. The existing ConnectionPools maintain a set of open connections to their upstream hosts. When a new request comes in, one of those is chosen without the connection being reestablished. This is normally fine. However, when src transparency is enabled, we expect the upstream connection's source IP, and maybe port, to match the source IP/port of the downstream connection. This means that we need logic to choose the upstream connection based on the request. This does not exist in the connection pools. Rather than add that complexity to the pools, we'll build selection logic into the wrapper so that pools are married to specific connection information as long as necessary. Then, the wrapper can handle the mapping logic the same way for all types of connection pools. We can even change the mapping logic without changing the pools! Magic! Signed-off-by: Kyle Larose <kyle@agilicus.com>
This class will eventually be responsible for mapping requests to connection pools. Currently it is just a stub until we flesh out the interface a bit more. Signed-off-by: Kyle Larose <kyle@agilicus.com>
We want to abstract creation of the mapper itself for a few reasons:
1) The complexity of creating it based off config/etc doesn't belong at
its creation point
2) It helps with test introspection.
Signed-off-by: Kyle Larose <kyle@agilicus.com>
Signed-off-by: Kyle Larose <kyle@agilicus.com>
This tests the connection pool factory by showing that it correctly creates the wrapped connection pool when a transparent pool is desired. Signed-off-by: Kyle Larose <kyle@agilicus.com>
We want to be able to pass the LoadBalancerContext to the connection pools in case they use it to partition the contection. So, create a new version of newStream that takes it as a parameter. We do this rather than changing the existing newStream for two reasons: 1) We may know that we don't want to use the LoadBalancerContext when calling newStream, so keep the old one around for that purpose 2) We don't need to change everywhere that uses the old method, since for all intents and purposes, the old calls to newStream wouldn't use LoadBalancerContext anyway! Signed-off-by: Kyle Larose <kyle@agilicus.com>
We need access to the connection_mapper mock from multiple locations. Consequently, it makes sense for connection_mapper to be in include, not source. This refactors it out and updates the various locations that use it. Signed-off-by: Kyle Larose <kyle@agilicus.com>
This should speed up compilation for tests that use only these mocks. Signed-off-by: Kyle Larose <kyle@agilicus.com>
Starting to add concrete functionality using mock to test via the UT infrastructure. Lots more to do! Signed-off-by: Kyle Larose <kyle@agilicus.com>
Since the connection mapper knows when a pool has been newly assigned, it can register a single callback to let it know when the pool is fully drained. This allows us to tell anything that may have pending requests that they can try again. Contrast this with the WrappedConnectionPool. It doesn't know if a pool is newly assigned. So, it would have to register with every stream, which isn't really efficient. Signed-off-by: Kyle Larose <kyle@agilicus.com>
The WrappedConnectionPool needs to store the load balancer context used for a request when pushing a pending request since it needs to use it later when trying to assign the request again. Extend the ConnectionPoolImplBase to optionally track this information. Signed-off-by: Kyle Larose <kyle@agilicus.com>
Progress: we now have pending requests! They are processed when an idle callback is triggered from the mapper. We still need to do a bit more work here, because the sub-pool itself could also make the connection idle. In that case, we still need to propagate teh cancellation callback. Signed-off-by: Kyle Larose <kyle@agilicus.com>
Handle the case where the stream was pending, then it became pending again when allocated to a pool. Signed-off-by: Kyle Larose <kyle@agilicus.com>
More work to minimize the size of headers in UTs to hopefully speed them up. Signed-off-by: Kyle Larose <kyle@agilicus.com>
We need to be able to handle failures to allocate pools/etc from the wrapped pool. We also need to be able to pass that information back up to our owner. Consequently, we need to wrap the ConnectionPool callbacks. This change adds that logic. Signed-off-by: Kyle Larose <kyle@agilicus.com>
If the initial call to newStream results in a pool, but calling newStream on it returns a Cancellable we need to make sure we wrap it so that we can properly cleanup on a call to cancel from the parent. Signed-off-by: Kyle Larose <kyle@agilicus.com>
If there are waiting connections, we shouldn't drain. Don't! Signed-off-by: Kyle Larose <kyle@agilicus.com>
The connection pool could immediately invoke a callback as part of creating a new stream. We need to make sure we properly clean up and pass forward the callback in both cases where we create the new stream: 1) When a pool is allocated on the first attempt 2) When a pool is allocated after first failing. Signed-off-by: Kyle Larose <kyle@agilicus.com>
PendingWrapper no longer just tracks pending requests. It tracks the entire 'stream' setup lifetime, so rename it to more appropriately capture this. Signed-off-by: Kyle Larose <kyle@agilicus.com>
When we are told to drain, we need to in turn tell each of the partitioned pools we control to drain. Do this through the mapper, which actually knows about all the pools. Further, use the mapper to determine whether or not all pools are idle so we can empty them out when checking for drained as part of the idle callback. Signed-off-by: Kyle Larose <kyle@agilicus.com>
We're going to be adding more wrapper code. Makes sense to put it into its own subdirectory so we can easily view it all. At the same time, fix up a UT that was failing due to some mock changes. Signed-off-by: Kyle Larose <kyle@agilicus.com>
Currently just very simple stuff. More to come! Signed-off-by: Kyle Larose <kyle@agilicus.com>
The network mocks header is quite large. I only need access to the connection mocks for my UT. So, split out the network mocks into their own header to decrease test compile times. Signed-off-by: Kyle Larose <kyle@agilicus.com>
We now map IPv4 addresses to pool. Next up, ipv6. Signed-off-by: Kyle Larose <kyle@agilicus.com>
Now with extra bits! Signed-off-by: Kyle Larose <kyle@agilicus.com>
We can now idle pools so that they will be reused for future requests, even if the address differs. Signed-off-by: Kyle Larose <kyle@agilicus.com>
A pool going idle is interesting, since it means that a request which previously failed may now be assigned. Issue callbacks to any interested parties when this happens. Signed-off-by: Kyle Larose <kyle@agilicus.com>
We can't transparently map the IP if there isn't a valid IP! Catch these cases and throw an exception when they occur. This may not be the best solution in the long term, but it's good enough for now. Signed-off-by: Kyle Larose <kyle@agilicus.com>
Forgot to clear out the optional v4/v6 address in the PoolTracker. This could lead to us accidentally assigning a new pool for an IP that was already active. Signed-off-by: Kyle Larose <kyle@agilicus.com>
Minor cleanup for the PR Signed-off-by: Kyle Larose <kyle@agilicus.com>
The signature for SrcIpTransparentMapper's constructor's definition differed from its declaration. Make them match. Signed-off-by: Kyle Larose <kyle@agilicus.com>
|
@lizan can you take a first pass at this? |
|
Sure I will take first pass. @klarose definitely want to see 1. as a separate PR so it can be merged sooner. The gdoc in the description is 404, can you update? |
|
@lizan thanks for taking a look. I've fixed the link. Let me know if you still have issues. It's a relatively long document, covering more than the work I've done here. So, if you want to focus on this, just look at the design section near the end. If anything is unclear, or you'd like more of an explanation, let me know. I'll update the document/we can chat about it. As for 1. I'll try to get that up in the next few hours. |
|
@klarose @lizan Have you looked at the This functionality is currently used by Cilium to separate the upstream connection pools by the I have recently prototyped multiplexed transport on top of a normal TCP listener, where the original 5-tuple is carried in the beginning of each frame. I used a IMO this could be used to implement source transparency so that a Socket::Option instance would set the upstream Socket's localAddress (using So, it seems to me that a single-proxy version of source transparency could be implemented entirely using the existing extension mechanisms already built into Envoy. It could be one more listener filter in |
|
@jrajahalme That's certainly an interesting approach. I think it could work, as long as the pools are cleaned up fairly regularly. However, I have a few concerns with it that I think are addressed by the wrapped pool I have proposed:
At the surface, I think your proposed approach is nicer. It's simple, and it uses the existing mechanisms as you've suggested. However, I worry that it won't scale as well as we'd like, and that we may need to build something similar in the future anyway. What do the others think? I'm willing to revisit the design. |
|
@jrajahalme Out of curiosity, what are you using for the multiplexed transport? PROXY protocol? |
#4973 adds TransportSocketOptions for SNI with |
|
@lizan @jrajahalme Alright. You have me convinced. It's worth taking a look at. I'll see what I can do today to prove it out. |
|
Hmm. I'm not sure this will cover all cases I'm interested in, at least as a listener filter. I want to be able to use the downstream remote address set by x-forwarder-for, which is set in EDIT: Looks like a network read filter may be the answer. |
|
@klarose Http |
|
@klarose I've prototyped a multiplexed transport for kTLS, for which kernel-side eBPF program creates custom shim headers, basically just the 5-tuple and frame length in a binary representation. So, very similar to proxy protocol, but different :-) |
|
@jrajahalme Thanks for the info. My current plan is to create a network filter which uses the ReadFilterCallbacks to get access to the socket option list. It'll add the option which sets the transparency/mark/etc as well as the source IP of the downstream. Being a network filter, it can come after listener filters such as the proxy protocol filter, and the http connection manager (which sets the downstream from x-forwarded-for if it's trusted). |
|
@klarose Interesting, I never realized you can have network filters after the http connection manager, I've kind of assumed that it will consume the payload, but maybe it does not matter for setting metadata like this. |
While developing the source transparency feature, I became frustrated with both the compile time of some of the tests, and the fact that changes in seemingly unrelated headers caused many tests to recompile. Much of this was caused by the monolithic mocks headers we use. Since they include everything for a high level component, any header depended upon by that component, which is likely many, being changed can cause a recompilation. Further, simply including that much slows down compilation. As an example, I found that splitting out the connection pool mocks decreased my unit test compile time from 13 seconds to 8 seconds. While not a drastic speedup, 5 seconds per compile is still quite a bit if you're doing TDD. You want feedback as quickly as possible! This PR was split out of #5255 with some minor changes to remove code that only applied in that PR. Risk Level: Low. This is just testing headers. Testing: Ran the tests multiple times. Docs Changes: N/A Release Notes: N/A Signed-off-by: Kyle Larose <kyle@agilicus.com>
|
@jrajahalme Yeah, I think the filters work by doing some work up front when the connection is established, on all filters, then they proceed to do work with the data. I'll make these changes at connection establishment time. We'll see if it works. :) |
|
@jrajahalme Looking at adding the new socket option to do this... Unfortunately, Ways around this I can think of offhand?
What do you think? I prefer 1, even though it's a little gross to pollute an interface with knowledge of its subclasses... That's the visitor pattern for you, though... I guess it's not really the interface we're polluting, since Option is a separate class. It just happens to be nested in Socket. |
|
It looks like createClientConnection doesn't use the sourceAddress of the connection. Should be simple to change, though. sourceAddress comes from the clusterInfo. The pre-bind options are set just before this. So, I could check if the local address is not null. If it is, I'll override the cluster address. |
|
@klarose I had used a dynamic_cast in the prototype, but did not remember that fact, sorry. I'd also want to see a non-null value returned by Socket::localAddress() override the cluster's source address setting. |
|
I have it working for PROXY protocol. However, it doesn't work when I want to consume the XFF. This makes sense, now that I think about it -- the XFF is only set once all the http headers have been parsed, and that's too late. So, I think I need to make two filters:
The listener filter can be used with ProxyProtocol, for tcp or http. The http decoder can only be used for http. Sad, because it's more work than I anticipated, and I need to do some refactoring, but that's life I guess. :/ I'll submit a PR for the listener one, so we can take a look, then get on to implementing the http one. |
|
Cancelling this as it has been subsumed by the work for another approach (#5337) |
While developing the source transparency feature, I became frustrated with both the compile time of some of the tests, and the fact that changes in seemingly unrelated headers caused many tests to recompile. Much of this was caused by the monolithic mocks headers we use. Since they include everything for a high level component, any header depended upon by that component, which is likely many, being changed can cause a recompilation. Further, simply including that much slows down compilation. As an example, I found that splitting out the connection pool mocks decreased my unit test compile time from 13 seconds to 8 seconds. While not a drastic speedup, 5 seconds per compile is still quite a bit if you're doing TDD. You want feedback as quickly as possible! This PR was split out of envoyproxy#5255 with some minor changes to remove code that only applied in that PR. Risk Level: Low. This is just testing headers. Testing: Ran the tests multiple times. Docs Changes: N/A Release Notes: N/A Signed-off-by: Kyle Larose <kyle@agilicus.com> Signed-off-by: Fred Douglas <fredlas@google.com>
Description:
This PR contains the bulk of the IP Src Transparency work for #4128. It is missing one piece to tie it together so that it may be enabled. I'll integrate into into the PR, or submit a new PR, when #5035 is merged.
This PR can be viewed as three major chunks of work:
The PR as it stands is very large. I'm willing to break it up if the reviewers so desire. Just let me know how you'd like me to go about doing it. We'd likely need to do the merges in the order I've provided above, though number 2 won't stand alone as described, since it itself requires a change or two to some public interfaces within envoy.
For the general approach, see Google Docs. While this does not 100% follow that document, the general approach matches.
In a nutshell:
We create a new type of ConnectionPool which wraps the Http connection pools. This connection pool uses abstract mapping logic to map a new stream request to one of many connection pools. These pools themselves are told at mapping time how to connect to their upstream using an interface added to the high level connection pool interface.
Connection pools are recycled into an "idle" list when all upstream connections on the pool have been drained. When a new stream is assigned to a pool, we try to use an idle stream first. If there are none, we assign a new one up to a limit. Multiple streams may be assigned to a pool if they are mapped to it concurrently.
General TODOs and notes:
Risk Level: Medium
This is a large change, but it cannot yet be enabled.
Testing:
Docs Changes:
I need to document the feature:
I'll do this in a future PR.
Release Notes:
This will need a release note so people are aware of the feature. Doesn't make sense just yet though.