listener: add support for multiple filter chains.#3217
listener: add support for multiple filter chains.#3217ggreenway merged 5 commits intoenvoyproxy:masterfrom
Conversation
|
This is still WIP (but code complete, and works fine with manual testing), split into 4 commits for easier review. Sending this early to get feedback on the interface changes (mostly from @mattklein123). |
include/envoy/network/filter.h
Outdated
| class Connection; | ||
| class ConnectionSocket; | ||
| class TransportSocket; | ||
| typedef std::unique_ptr<TransportSocket> TransportSocketPtr; |
There was a problem hiding this comment.
Just include transport socket here? If you use unique_ptr, this is going to force files including this file to include transport_socket as well.
There was a problem hiding this comment.
Unfortunately, including transport_socket.h here introduces circular dependency.
There was a problem hiding this comment.
Note that if FilterChain would be defined somewhere in include/envoy/server/... (i.e. Server::Configuration:: namespace) instead, then we wouldn't need to declare TransportSocketPtr and NetworkFilterFactoryCb here.
It felt a bit weird defining FilterChain outside of Network:: namespace, so it's here now, but I don't mind moving it into Server::Configuration:: namespace if you think that's better.
There was a problem hiding this comment.
I think you can change transport_socket.h a bit to avoid circular dependency by forward declaring Network::Connection there. Since it only use reference to it (which doesn't force includes).
ggreenway
left a comment
There was a problem hiding this comment.
A few comments to start, mostly from a high-level. I still need to spend more time going through the code in detail.
| google.protobuf.UInt32Value destination_port = 8; | ||
|
|
||
| // TODO | ||
| string transport_protocol = 9; |
There was a problem hiding this comment.
Should sni_domains (not part of this PR) be renamed to something more generic, so that if we eventually sniff the Host header from plaintext http it can match on that?
There was a problem hiding this comment.
Yes! It's been annoying me for a while, but the proto is frozen, and it's not clear whether it's worth adding a new field and deprecating sni_domains just yet.
| if (filter_chain == nullptr) { | ||
| ENVOY_LOG_TO_LOGGER(parent_.logger_, debug, | ||
| "closing connection: no matching filter chain found"); | ||
| socket->close(); |
There was a problem hiding this comment.
Please increment a stat in this case
| } | ||
|
|
||
| // Automatically inject TLS Inspector if it wasn't configured explicitly and it's needed. | ||
| if (install_tls_inspector) { |
There was a problem hiding this comment.
This makes the system easier to use, but at the possible cost of flexibility. Envoy has a pluggable filter system, but this is hard-coding that a particular filter is added automatically in this case. Do we care to support someone providing their own filter that provides the same information as tls_inspector? Given that most/all configs are machine-generated, should we just make people manually add the tls_inspector filter to their config? I'm torn on this; what does the group think?
There was a problem hiding this comment.
I actually don't like this code too much, and I agree that it should be explicitly configured, so I'm happy to remove it (which is one of the reasons why it was split into separate commit).
The downside of removing it, however, is that transport protocol and SNI-based filter chain matching won't work at all without it, and worse, traffic might be incorrectly routed to a filter chain with empty SNI and/or wrong transport protocol.
There was a problem hiding this comment.
What if we log a warning anytime a filterchainmatch specifies a transport protocol, but the detectedTransportProtocol hasn't been set? And then revert the commit that automatically adds in this filter?
There was a problem hiding this comment.
With the current implementation of TLS sniffer, we have (and I should have caught this earlier, sorry!):
- TLS sniffer: set
"tls"or set"raw_buffer". - 2nd sniffer: set
"dummy"or set"raw_buffer".
However, this clearly doesn't work with more than one sniffer, since 2nd sniffer would overwrite value set by TLS sniffer, therefore we need:
- TLS sniffer: set
"tls"or do nothing. - 2nd sniffer: set
"dummy"or do nothing. - listener: set
"raw_buffer"if nothing was detected.
See PR #3260.
We could emit warning in 3., but after above change empty detectedTransportProtocol() is a valid state (i.e. no transport protocol was detected).
Also, I'd really like to be able to detect this at config time, and not at runtime, but nothing obvious comes to mind... Perhaps listener filters could indicate which transport protocols they can detect?
There was a problem hiding this comment.
(same is true for missing SNI, btw.)
There was a problem hiding this comment.
I think my general opinion here is to just inject it, as we do similar things in other places. If in the future someone really wants to not use TLS inspector we can make injection configurable. I would definitely add some type of warning though if we do inject telling people what we did, and make it more obvious if the config then fails to load because TLS inspector was not compiled/linked.
There was a problem hiding this comment.
I left the auto-inject, but removed "downstream send first" guard, since it wouldn't allow mixing with filter chains with custom transport sockets.
| } | ||
|
|
||
| const Network::FilterChainSharedPtr | ||
| ListenerImpl::findFilterChain(const std::string& transport_protocol_name, |
There was a problem hiding this comment.
The documentation says "The FilterChain with the most specific FilterChainMatch criteria is used on a connection."
I think this is too imprecise to be useful. What exactly does "most specific" mean? Given that every field except sni_domains is marked not-implemented-hide, I think this algorithm is fine. But as soon as we un-hide any of the other fields, we'll have to revisit this, and update the documentation to specify how it works.
FWIW, this is the matching algorithm that I want/need for my use case: hash-table lookup by SNI, as opposed to O(n) walking a list looking for matches.
There was a problem hiding this comment.
Yes, we'll need to revisit this sooner rather than later (i.e. I need to add filter chain matching based on the destination IP:port to get ride of the bind_to_port setting), but I think we can iterate on it as we unhide features, instead of blocking this PR on that.
| std::vector<Network::NetworkFilterFactoryCb> filters_factory); | ||
| static bool isWildcardServerName(const std::string& name); | ||
|
|
||
| std::unordered_map<std::string, std::unordered_map<std::string, Network::FilterChainSharedPtr>> |
There was a problem hiding this comment.
Please comment on what the two strings are in the map keys. I think it's transport-socket-type and SNI value.
There was a problem hiding this comment.
Done.
Also, as I was writing comments and optimizing the code, I've started questioning my previous choice of having 2 separate maps (one for exact server names and one for the wildcard domains), so I merged them together.
48e1ecb to
9a6f285
Compare
| google.protobuf.UInt32Value destination_port = 8; | ||
|
|
||
| // TODO | ||
| string transport_protocol = 9; |
There was a problem hiding this comment.
I think this should be an enum, not a string.
There was a problem hiding this comment.
I disagree. Enum won't work with custom transport protocols (e.g. Istio's ALTS). cc @lizan
There was a problem hiding this comment.
In other places in the config, there is an enum for the builtin types, and then a 2nd string field for custom types. I'm not sure if that was always the desired model, or if that happened via evolution. I don't feel very strongly either way about enum vs string, just trying to aim for consistency. Any maintainers know the "preferred" style for something like this?
There was a problem hiding this comment.
I don't have a strong preference here either. We do string for ALPN already so IMO it's not a big deal to do string only.
There was a problem hiding this comment.
Ok, I'm fine with leaving it as a string; just document the valid values.
include/envoy/network/filter.h
Outdated
| */ | ||
| virtual bool createNetworkFilterChain(Connection& connection) PURE; | ||
| virtual bool | ||
| createNetworkFilterChain(Connection& connection, |
There was a problem hiding this comment.
This list of params will keep growing as we implement all the filter matches. Should we pass the ConnectionSocket here instead?
There was a problem hiding this comment.
I'm fine changing this to ConnectionSocket, but please note that (a) I don't envision more parameters being added here (they would be added to the findFilterChain() function), (b) createNetworkFilterChain() is part of the Network::FilterChainFactory interface that's also implemented by upstreams, and it appears that it's used there (but I cannot find where it's being called from, to be honest, so maybe that's a dead code).
Thoughts? Should I change it here and remove from upstreams or leave it as-is?
There was a problem hiding this comment.
Sorry, commented on the wrong line. I meant this comment for findFilterChain().
source/server/http/admin.cc
Outdated
| bool AdminImpl::createNetworkFilterChain( | ||
| Network::Connection& connection, | ||
| const std::vector<Network::NetworkFilterFactoryCb>& filter_factories) { | ||
| UNREFERENCED_PARAMETER(filter_factories); |
There was a problem hiding this comment.
nit: remove UNREFERENCED_PARAMETER, and lave this un-named in the function parameters
source/server/http/admin.h
Outdated
| const Network::FilterChainSharedPtr | ||
| findFilterChain(const std::string& transport_socket_name, | ||
| const std::string& server_name) const override { | ||
| UNREFERENCED_PARAMETER(transport_socket_name); |
There was a problem hiding this comment.
nit: remove UNREFERENCED_PARAMETER
ggreenway
left a comment
There was a problem hiding this comment.
I'm pretty happy with the overall direction of this change. I really like how this simplified the ssl_context.
There's still some tests failing, and we'll need to add coverage for the new code.
@envoyproxy/maintainers Anyone else want to comment on the overall direction, structure, architecture of this?
|
@PiotrSikora @ggreenway I will take a look tomorrow. |
mattklein123
left a comment
There was a problem hiding this comment.
In general I love the direction of this change. I agree with @ggreenway that it simplifies a lot of stuff and I think makes the overall system a lot more well structured. I didn't really focus on the detailed code that much. I'm happy to take a much more detailed pass through once it's fully ready.
One suggestion, you might want to just do a dedicated PR now with some of the namespace changes. It will make the final PR easier to review.
| google.protobuf.UInt32Value destination_port = 8; | ||
|
|
||
| // TODO | ||
| string transport_protocol = 9; |
There was a problem hiding this comment.
I don't have a strong preference here either. We do string for ALPN already so IMO it's not a big deal to do string only.
include/envoy/network/filter.h
Outdated
| * TODO | ||
| */ | ||
| virtual const FilterChainSharedPtr findFilterChain(const std::string& transport_socket_name, | ||
| const std::string& server_name) const PURE; |
There was a problem hiding this comment.
What does "server_name" mean here? SNI? Is this related to @ggreenway comment above about potentially allowing sniffing plaintext HTTP host? I think it's probably fine to continue calling this "server_name" but it would be good to add some good docs/comments on example ways this can be populated in the real world.
| } | ||
|
|
||
| // Automatically inject TLS Inspector if it wasn't configured explicitly and it's needed. | ||
| if (install_tls_inspector) { |
There was a problem hiding this comment.
I think my general opinion here is to just inject it, as we do similar things in other places. If in the future someone really wants to not use TLS inspector we can make injection configurable. I would definitely add some type of warning though if we do inject telling people what we did, and make it more obvious if the config then fails to load because TLS inspector was not compiled/linked.
There was a problem hiding this comment.
I think in general, you can imagine listener filters setting arbitrary key/value and wanting to match here. In that sense, {transport_protocol=tls, sni_domains=[foo.com]} is one example. Does it make sense to generalize to a Struct match?
There was a problem hiding this comment.
This is a bit more complex than a simple key-value matching, i.e. options covered in this PR, and the one I'm going to do next (destination IP:port), are matching on:
- transport protocol,
- server names / sni domains,
- destination IP:port.
Those are not equal, and we need to filter on destination IP:port first, then on server names (or use catch-all), then on transport protocol (or use catch-all).
...and even that isn't totally obvious, i.e. should {requestedServerName=www.example.com, detectedTransportProtocol=tls} be matched against {sni_domains=[www.example.com],transport_protocol=} or {sni_domains=,transport_protocol=tls}?
I'm in favor of the former, since it better matches "virtual hosts" concept, but I could see the arguments being made for the either of those.
There was a problem hiding this comment.
Yeah, I agree that the full logic can't be captured as a simple key/value match. I'm curious though if you think it makes sense to allow listener filters more expressive power than just a string match on protocol here? We could punt this to future work with a TODO if it makes sense.
htuch
left a comment
There was a problem hiding this comment.
At a high-level, I think the direction in this PR is great. I have one API level question and will do a more detailed review next week.
htuch
left a comment
There was a problem hiding this comment.
This is really fantastic, thanks for making this happen @PiotrSikora.
There was a problem hiding this comment.
Can you fix the build so that we can generate a coverage run?
There was a problem hiding this comment.
Sigh, sorry for the delay, I've got stuck trying to figure out why mocking findFilterChain() results in socket being destroyed... Ultimately, I gave up and implemented it in the tests instead.
There was a problem hiding this comment.
Should we bump a stat or something else that could be monitored? Seems a debug log here is effectively a silent fail in production. I realize this is existing behavior, so a TODO is fine as well.
There was a problem hiding this comment.
Do you think we have adequate documentation on filter chain match precedence? E.g. if we have a CIDR match, do we consider the most specific filter chain the winner, or the first in the list of filter chains?
There was a problem hiding this comment.
CIDR match isn't implemented yet, but the idea is that the most specific match wins.
There was a problem hiding this comment.
Sure. I think if you can add some comments on what the end goal behavior is in the docs or API it would be great.
There was a problem hiding this comment.
I wasn't involved in the initial reviews for this feature, but I'm wondering about the tradeoff here between making this explicit as behavior the user should guarantee if they want the feature vs. implicit manipulation of the filter chain?
There was a problem hiding this comment.
Well, without TLS inspector, sni_domains and transport_protocol requirements are never going to be matched, so we can either:
- Automatically inject TLS Inspector when
sni_domainsortransport_protocolare specified (this is the behavior right now). - Reject configuration and emit error when TLS Inspector is needed (i.e. when
sni_domainsortransport_protocolare specified), but isn't configured. - Do nothing and let user figure out why things like SNI don't work.
I've heard the "config is machine generated" argument enough times to not care anymore, so let me know which one you prefer and I'm going to change the code if necessary.
There was a problem hiding this comment.
I'm fine with either (1) or (2). Definitely not (3) please. @ggreenway @htuch opinions?
There was a problem hiding this comment.
Prefer (2), since it's least surprising. Alternatively, make the TLS inspector impossible to configure by hand and then (1) is fine as well. I would find it surprising as a user if sometimes filters appear by automatic insertion and other times I'm expected or allowed to put them there by hand (or machine generation).
There was a problem hiding this comment.
If choosing between (1) and (2) I also have a mild preference for (2) so let's just go with that with a very clear error message on what the user needs to configure.
There was a problem hiding this comment.
One thing to note about (2) is that once people start adding custom transport protocols and sniffers, we might end up with config like:
filter_chain_match:
- sni_domains: "example.com"
- transport_protocol: "bla"
which we would reject without TLS inspector, but the user don't really need it here...
In theory we could limit exception only to cases when transport_protocol: "tls", but then we wouldn't reject configurations without TLS inspector that list only sni_domains as their criteria.
Not sure if we want to deal with this now, though...
There was a problem hiding this comment.
I think (2) is better than (1). But to allow custom transport protocols with custom listener filters, we'd need a way for either to filter to say that it provides this connection data, or a way to turn off the error, or something similar.
I think (1) is a bad idea because it's very hard to change in the future, because it could make a previously-working configuration no longer working if we remove it.
With (2), if we change/remove it, we're just being less-strict in validation, so that would be an easier change to make in the future.
There was a problem hiding this comment.
(2) done, with a check that should work for custom transport protocols as well.
There was a problem hiding this comment.
I think it would be great to document in user facing documentation somewhere what the rules regarding order are here.
There was a problem hiding this comment.
Agreed, I'll write something up tomorrow.
htuch
left a comment
There was a problem hiding this comment.
Also, can you add release notes as this is user facing behavior?
|
@PiotrSikora can you ping me after you have gone though @htuch comments? I would like to take a more complete pass also. |
|
This is almost complete (missing docs & release notes), with automatic TLS Inspector injection being something that needs to be decided. Also, I disabled ( |
|
Thanks @PiotrSikora will spend some dedicated time on this tomorrow. |
|
I rewrote the old The only outstanding question is what should we do about the automatic injection of TLS Inspector, the options are:
@ggreenway @htuch @mattklein123 please take a look. Also, it seems that I broke the DCO bot, because my merge commits are not singed... any way to fix it? |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks @PiotrSikora this is totally awesome, really love seeing this come together from an API/code perspective. So many hacks removed. Few small comments. I didn't look at the tests carefully. Hopefully @ggreenway can do that.
api/envoy/api/v2/lds.proto
Outdated
There was a problem hiding this comment.
Can we still link to the FAQ from somewhere relevant (or multiple places)? This is a super common question and it's still going to confuse people so would like to maximize cross linking.
There was a problem hiding this comment.
Please check rendered docs. I think this might not render correctly as a bullet list.
include/envoy/network/filter.h
Outdated
There was a problem hiding this comment.
nit: Could this be ConnectionSocketConstPtr ?
Also, is there a reason to return a shared pointer here? From my read of the code we never actually persistently store the result. Could we just return a const FilterChain* and avoid the shared_ptr inc/dec overhead?
There was a problem hiding this comment.
wrt ConnectionSocketConstPtr - do you mean const unique_ptr<ConnectionSocket> (i.e. const ConnectionSocketPtr) or unique_ptr<const ConnectionSocket>?
wrt FilterChainSharedPtr - there is no reason why we couldn't return const FilterChain*, but I thought the whole idea behind unique_ptr / shared_ptr is to avoid passing raw pointers around.
There was a problem hiding this comment.
I meant unique_ptr<const ConnectionSocket>
For FilterChainSharedPtr raw pointer is fine in a case like this where we are just referencing constant data that might be nullptr.
There was a problem hiding this comment.
Erm, but we cannot safely create ConnectionSocketConstPtr from ConnectionSocketPtr without move... and few lines below, we need ConnectionSocketPtr to pass into createServerConnection to create ConnectionImpl, which means that we would need to move twice.
Per @lizan's suggestion, perhaps we should pass const ConnectionSocket* here instead?
There was a problem hiding this comment.
I ended up with const FilterChain* findFilterChain(const ConnectionSocket* socket) const.
source/server/http/admin.h
Outdated
There was a problem hiding this comment.
Can we do this via proto constraint? Do we need the check here?
There was a problem hiding this comment.
Actually, it looks that it's already enforced by the proto constraint.
I removed the check and test for it.
There was a problem hiding this comment.
I'm fine with either (1) or (2). Definitely not (3) please. @ggreenway @htuch opinions?
|
@PiotrSikora re: DCO bot, I think it ignores merge commits, so not sure what the problem is. You could do interactive rebase and just fix the offending commits, then force push. Or better, I would just get final sign off from everyone then we can squash/rebace for push and merge. |
|
@mattklein123 I found one "revert" commit without |
htuch
left a comment
There was a problem hiding this comment.
Implementation and API docs look really great, a couple of comments. Would be nice to see coverage run when taking a look at tests.
There was a problem hiding this comment.
What about when we have CIDR and SNI, what order do we resolve with? In general, it would be helpful to clarify where we want to go in the end state of the API for all the match criteria and precedence.
There was a problem hiding this comment.
I've been thinking about this a fair bit. I think the precedence should be configurable. I can think of a few ways that might be useful; I'm not sure yet what the right answer is.
One option is to let the user provide a list for the field precedence order. For instance, [SNI, source-ip, transport protocol]. Another possible matching type is the entire list of matches as an ordered list, with first-match-wins. I'm sure there's other options as well. I would be surprised if we can come up with a one-size-fits-all matching algorithm with this many potential criteria.
I think this can all be solved in a later PR.
There was a problem hiding this comment.
Sure, I can see this being a useful thing to add later. So, for now, as long as we have precedence for the fields that are actually implemented, and a TODO for the rest I'd be happy.
There was a problem hiding this comment.
As mentioned earlier (I think?), the destination IP would take precedence over SNI, over transport protocol. I didn't consider source IPs.
I could see having filter_chain_match_precedence: ["server_names", "source_ranges", "transport_protocol"] as a way of controlling precedence, but it doesn't seem that this is something required by most of the use cases, so I'm going to let someone that has an use case for it work on this.
As for the "end goal" docs - I'm under the impression that the API and docs document current state, and not future vision, so I don't think the "end goal" should be added here, especially since things change during implementation, and I wouldn't want to promise stuff that won't be eventually implemented.
There was a problem hiding this comment.
This isn't quite correct. The user facing API docs should reflect the current state. We do explicit hidden TODOs in various places to signal to developers and those interested in understanding future directions where we want to head, e.g.
.Please do add appropriate TODOs in that style and preferably open an issue to track future work here. That's all that's being asked at this point.
There was a problem hiding this comment.
Added TODOs about few upcoming rules and the precedence we discussed here (I'll open separate issue to track it).
There was a problem hiding this comment.
I intentionally didn't add it here, since this can only happen if filter chain is intentionally configured without filters (i.e. blackhole), therefore I don't see a value in stat here.
But feel free to disagree, and I'll add one.
There was a problem hiding this comment.
I don't understand why this is removed. Is this handled elsewhere now, or we just want to let the counter for no matching filter chain go up, or is there some different semantic in place now for this case?
There was a problem hiding this comment.
This is enforced by proto constraint already, so the check was useless:
repeated listener.FilterChain filter_chains = 3
[(validate.rules).repeated .min_items = 1, (gogoproto.nullable) = false];
include/envoy/network/filter.h
Outdated
There was a problem hiding this comment.
nit: Sorry on the param stuff, but can it be const ConnectionSocket& socket? It can't be null right?
include/envoy/network/filter.h
Outdated
There was a problem hiding this comment.
nit: update return param type
There was a problem hiding this comment.
I think you can lose the const_cast here when you switch to const reference passing? (In general const_cast is an anti-pattern)
There was a problem hiding this comment.
If choosing between (1) and (2) I also have a mild preference for (2) so let's just go with that with a very clear error message on what the user needs to configure.
include/envoy/network/filter.h
Outdated
There was a problem hiding this comment.
You're copying two method from TransportSocketFactory, why don't just have an accessor returns a const TransportSocketFactory&, like Cluster does?
There was a problem hiding this comment.
To be honest, I don't see much value in exposing implementation details / member classes, since it makes it harder to make changes in the future.
What would be benefit of returning TransportSocketFactory directly here? Calls to those 2 methods are done in separate places, so it wouldn't save any calls.
There was a problem hiding this comment.
It is not exposing implementation details / member classes, the TransportSocketFactory is a pure interface that could be returned here. Just to be more consistent with cluster, also avoid copy and paste same interface around.
There was a problem hiding this comment.
Fair enough, changed.
|
@PiotrSikora this is great, thanks, LGTM. If you can just add the TODO and fix DCO, we can ship it. |
|
@ggreenway @mattklein123 @htuch @lizan please take last look at the interface changes in 5256e84. If it's fine with you, I'm going to push squashed commit to fix the DCO bot. |
*Risk Level*: Medium *Testing*: bazel test //test/... *Docs Changes*: Added *Release Notes*: Added Fixes envoyproxy#1843. Signed-off-by: Piotr Sikora <piotrsikora@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM. Let's squash/rebase!
5256e84 to
b24190c
Compare
|
Updated description to indicate that it also fixes #1280. |
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>
@htuch can't get this PR to open without unicorn, but he approved via slack.
Signed-off-by: Greg Greenway <ggreenway@apple.com>
|
Thanks for fixing conflicts, @ggreenway! I didn't expect those 2 PRs to be merged in that order. |
Risk Level: Medium
Testing: bazel test //test/...
Docs Changes: Added
Release Notes: Added
Fixes #1280, #1843.
Signed-off-by: Piotr Sikora piotrsikora@google.com