TCP listener connection rebalancing#8422
Conversation
9a84d75 to
266abd1
Compare
This commit introduces optional connection rebalancing for TCP listeners, targeted as cases where there are a small number of long lived connections such as service mesh HTTP2/gRPC egress. Part of this change involved tracking connection counts at the per-listener level, which made it clear that we have quite a bit of tech debt in some of our interfaces in this area. I did various cleanups in service of this change which leave the connection handler / accept path in a cleaner state. Fixes #4602 Signed-off-by: Matt Klein <mklein@lyft.com>
266abd1 to
c1ca210
Compare
|
This is ready for review. @danzh2010 @jmarantz do you mind taking a first pass? |
|
/azp run envoy-macos |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
cc @oschaaf also if you want to take a look. |
Signed-off-by: Matt Klein <mklein@lyft.com>
jmarantz
left a comment
There was a problem hiding this comment.
flushing comments. looks really good so far.
jmarantz
left a comment
There was a problem hiding this comment.
So I have been through all the code and left some superficial comments. I have a very high level understanding of the change you are making here -- which sounds awesome -- and a very low level understanding of the changes in lines of code you've made. But I don't have a deep enough understanding of how this all works together.
We can to 3 ways here:
- I'm happy to learn the connection architecture more deeply but that will take some time.
- We can bring in another reviewer who has more familiarity the subsystems you are modifying to help as
well - We can take it on faith that these are good modifications to an already-robust system and go from there.
One medium-level comment: I see changes to the integration test infrastructure, and to unit tests, but I was thinking there might be a new integration test demonstrating the new dynamics.
danzh2010
left a comment
There was a problem hiding this comment.
Sorry for the delay. It took me a while to understand the whole workflow. But overall it's a very clear change.
|
So I wanted to share some thoughts. As I understand it, currently a connection/transaction is fully handled on a single thread. If we migrate sockets because of rebalancing, will we see transaction handling for |
I think @danzh2010 is an expert in this code so she can cover.
See integration_test.cc. I modified the stats test to actually do balancing. I also verified that without balancing that test does not work reliably over 1000 runs. |
We only do rebalancing at listener accept, so once a connection is accepted, it will stay on that worker forever and have similar characteristics as today. I don't foresee any performance implications beyond the (hopefully positive) ones that this change is targeted for in particular use cases. I also don't think this will have any effect on extensions as this happens before any extensions are run. |
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
jmarantz
left a comment
There was a problem hiding this comment.
basically looks good modulo tiny code nits and requests for more comments,
Signed-off-by: Matt Klein <mklein@lyft.com>
|
@jmarantz @danzh2010 updated per comments. @jmarantz I tried to address your comment about the assert and lack of test. It's a little ugly but I think covers it. Let me know what you think. |
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
|
@danzh2010 @jmarantz updated per comments to date. |
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
|
/azp run envoy-macos |
|
Azure Pipelines successfully started running 1 pipeline(s). |
jmarantz
left a comment
There was a problem hiding this comment.
just a couple of nits; up to you if you want to apply fixes.
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
This commit introduces optional connection rebalancing for TCP listeners, targeted as cases where there are a small number of long lived connections such as service mesh HTTP2/gRPC egress. Part of this change involved tracking connection counts at the per-listener level, which made it clear that we have quite a bit of tech debt in some of our interfaces in this area. I did various cleanups in service of this change which leave the connection handler / accept path in a cleaner state. Fixes envoyproxy#4602 Signed-off-by: Matt Klein <mklein@lyft.com>
|
@mattklein123 at the end of istio/istio#18152, I add a test result based on envoy proxy 1.14.1(istio1.6.1), but seems the balancing does not work, could you please help to check there is any configuration error or else I missed? thanks, |
This commit introduces optional connection rebalancing
for TCP listeners, targeted as cases where there are a
small number of long lived connections such as service
mesh HTTP2/gRPC egress.
Part of this change involved tracking connection counts
at the per-listener level, which made it clear that we
have quite a bit of tech debt in some of our interfaces
in this area. I did various cleanups in service of this
change which leave the connection handler / accept path
in a cleaner state.
Fixes #4602
Risk Level: Medium
Testing: Unit/integration/manual
Docs Changes: Added
Release Notes: Added