Skip to content

api: add envoy internal address#12837

Merged
mattklein123 merged 21 commits intoenvoyproxy:masterfrom
lambdai:eiaddr
Sep 11, 2020
Merged

api: add envoy internal address#12837
mattklein123 merged 21 commits intoenvoyproxy:masterfrom
lambdai:eiaddr

Conversation

@lambdai
Copy link
Copy Markdown
Contributor

@lambdai lambdai commented Aug 27, 2020

Commit Message:

Add envoy internal address for envoy internal connection.

Signed-off-by: Yuchen Dai silentdai@gmail.com

Additional Description:
Part of #11725

Risk Level: low (code unused)
Testing: new unit tests
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #12837 was opened by lambdai.

see: more, trace.

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Sep 2, 2020

Chatted offline: I want to clarify that

  1. This internal address is used by Envoy cluster to locate the virtual listener(or internal listener).
    Dispatcher.registerPipeFactory(InternalAddress virtual_listener_address, ListenerCallbacks callbacks);
  2. A natural requirement is that the virtual listener need to find the original address(as well as other socket options).
    Conceptually, the virtual connection should have those attributes
    internal_address: listener_9000, typed InternalAddressInstance
    internal_peer_address:  no prototype yet but C++ type is InternalAddressInstance
    original_L4_address: 10.0.0.1:443, typed Ipv4AddressInstance
    original_peer_L4_address: 10.0.0.2:12345: typed Ipv4AddressInstance

These connection attributes will be gradually added in the further PRs so that virtual address could support various of network filters, listener filters

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - just a few minor nits on my end.

const Ip* ip() const override { return nullptr; }
const Pipe* pipe() const override { return nullptr; }
const EnvoyInternalAddress* envoyInternalAddress() const override { return &internal_address_; }
const sockaddr* sockAddr() const override { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly concerned that a panic() isn't the right call here. Can you take a peek at call sites and see what we should do here to make sure these can never be called for the new address type?

We should probably also update the comments in the include directory to make it clear when it's safe to call and when it's not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With NOT_IMPLEMENTED_GCOVR_EXCL_LINE it means
If the later version of control plane send the config containing EnvoyInternalAddress,
this particular version with this PR but no further of envoy might crash.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I can do is to find the potential usage of
this new type of address.
Listener and Endpoint as a immediate candidate, making sure these are either
invalidate config per config update, mostly in master thread.
or fail the connection when this address is used.

What else can I do?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are making implementation of parts of the interface optional (panic if called), we either need to have TODOs to implement later or TODOs to check all call sites to make them internal-address compatible. Having bits of an interface which may work for some address and may not work for others, and no checks in the code base makes me uncomfortable. Again cc @mattklein123 in case he has other ideas of how to aim the decaped TCP stream back into a new envoy filter chain without a half-working address class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to return nullptr and I mentally go through all the sockAddr() usage in the existing code base: all of them accepts nullptr usage without crash.
Luckily these usages are inline followed by syscall (connect/bind/send/recv) so I could inject syscall error code as if the following syscalls fail.
Now I believe these errors are now non-fatal.

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you make sure to fill out the rest of the template for your PRs?
for this one I'd say something like

Risk Level: low (code unused)
Testing: new unit tests
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Sep 3, 2020

For internal address as server address: I add the validation that all listener should reference internal address.
Such listener config will fails the addOrUpdateListener().

Internal address as client address is much harder.
Theoretically I can validate any internal address is not used in endpoint. However, we will allow it in the future.
And we need to fail the connections anyway.
I find that Envoy never fail fast at connection construction. I am not sure if this is good though.
I switch to provide a NullIoSocketIoHandler which always fails at sock related OS call. I assume the upper
connection should eventually close the connection given this NullIoSocket.
This client side efforts is WIP

I am more than happy to hear the alternative approaches from you @alyssawilk Thank you in advance!

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check CI, it looks like it is not passing.

const Ip* ip() const override { return nullptr; }
const Pipe* pipe() const override { return nullptr; }
const EnvoyInternalAddress* envoyInternalAddress() const override { return &internal_address_; }
const sockaddr* sockAddr() const override { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are making implementation of parts of the interface optional (panic if called), we either need to have TODOs to implement later or TODOs to check all call sites to make them internal-address compatible. Having bits of an interface which may work for some address and may not work for others, and no checks in the code base makes me uncomfortable. Again cc @mattklein123 in case he has other ideas of how to aim the decaped TCP stream back into a new envoy filter chain without a half-working address class.

const Address::InstanceConstSharedPtr addr) {
// Cannot create IoHandle for internal address yet.
if (addr->envoyInternalAddress()) {
return nullptr;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this stands, there should be a unit test please.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are so right! My brain stop working last night so I pushed before I save this change.

The latest update has an integration test connecting to a internal address

static inline IoHandlePtr ioHandleForAddr(Socket::Type type,
const Address::InstanceConstSharedPtr addr) {
// Cannot create IoHandle for internal address yet.
if (addr->envoyInternalAddress()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this looks like it's solid error handling but most callers of ioHandleForAddr will crash if it returns nullptr.

@mattklein123 @lambdai 's plan is basically creating this new listener type to be able to point a "cluster" at this new virtual address. Any thoughts on how that might best be done without causing this cascading chain of not-fully-implemented things?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this by providing the NullIoSocketHandleImpl to give it a socket handle allow upper layer to create a connection but will fail to send/recv.
https://github.com/envoyproxy/envoy/pull/12837/files#r483259684

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal case is control plane never send unsupported addresses before envoy could support.

  1. Mark the cluster config invalid if it contains internal address. It's always safer to reject cluster configs at master.
    I did this check in LDS config as well. That is easy since LDS is the only entrance to receive new listener config.
  2. In case of escaped internal address which we fail to invalid, the NullIoSocketHandle could fail the per client connect.
    This NullSocketHandle could also support mingled addresses such as
endpoints:
    - internal address: "not_yet_supported_listener_address"
    - socket address: "127.0.0.1:80"

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Comment on lines +96 to +102
/**
* IoHandle for invalid socket. This is rquired because Envoy expects IoHandle creation never fail.
* This NullIoSocketHandleImpl allows Envoy surives on unsupported socket options.
*/
class NullIoSocketHandleImpl : public IoHandle, protected Logger::Loggable<Logger::Id::io> {
public:
NullIoSocketHandleImpl() : closed_{false} {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NullIoSocketHandle

Comment on lines +90 to +112
// Test that connecting to internal address should always followed by disconnection with no crash.
TEST_P(SocketInterfaceIntegrationTest, InternalAddressWithSocketInterface) {
BaseIntegrationTest::initialize();

ConnectionStatusCallbacks connect_callbacks_;
Network::ClientConnectionPtr client_;
const Network::SocketInterface* sock_interface = Network::socketInterface(
"envoy.extensions.network.socket_interface.default_socket_interface");
Network::Address::InstanceConstSharedPtr address =
std::make_shared<Network::Address::EnvoyInternalInstance>("listener_0", sock_interface);

client_ = dispatcher_->createClientConnection(address, Network::Address::InstanceConstSharedPtr(),
Network::Test::createRawBufferSocket(), nullptr);

client_->addConnectionCallbacks(connect_callbacks_);
client_->connect();

while (!connect_callbacks_.closed()) {
dispatcher_->run(Event::Dispatcher::RunType::NonBlock);
}

client_->close(Network::ConnectionCloseType::FlushWrite);
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing Dispatcher::createClientConnection(). This should cover the client tcp connection IMO.
I have the concern of unknown unknown...

Now I am adding UDP socket test

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@alyssawilk
Copy link
Copy Markdown
Contributor

clang tidy is still failing. Let me know when this is ready for another pass?

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@mattklein123 mattklein123 self-assigned this Sep 9, 2020
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Sep 9, 2020

@alyssawilk @mattklein123

The recent update is that socket manipulation on internal address are non-fatal.
Either with socket() syscall series, or send()/recv() syscall series.

The goal is to enable internal address usage everywhere.

Let me know if this is good enough to move on. Thanks!

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, as discussed on slack let's hide the config and remove some of the temporary classes with TODOs (it's Ok to crash if you're configuring Envoy with hidden incomplete-implementation code :-P )
Thanks so much for bearing with me on this!

@htuch
Copy link
Copy Markdown
Member

htuch commented Sep 10, 2020

Yes, fine to leave docs until we un-hide.

@htuch
Copy link
Copy Markdown
Member

htuch commented Sep 10, 2020

/lgtm api

Copy link
Copy Markdown
Contributor Author

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updating PR

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Sep 10, 2020

I find a machine running sphinx correct and fixed the docs. It turns out to be EnvoyInternalAddress field(located in one_of) must be marked hidden as well. Now the doc should pass on CI...

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Sep 10, 2020

I find a machine running sphinx correct and fixed the docs. It turns out to be EnvoyInternalAddress field(located in one_of) must be marked hidden as well. Now the doc should pass on CI...

@htuch Sorry... another api approval is needed b/c my doc fix :(

alyssawilk
alyssawilk previously approved these changes Sep 10, 2020
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! over to @mattklein123

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at a high level, though, I wonder whether it's really worth landing this PR vs. the real implementation. Some small comments. Thank you!

/wait

Comment on lines +340 to +344
ENVOY_LOG(debug,
"listener {} has envoy internal address {}. Internal address cannot be used by "
"listener yet",
config.name(), config.address().envoy_internal_address().DebugString());
return false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use NOT_IMPLEMENTED? Or just ASSERT not has_envoy_internal_address? Seems not worth it to handle this case right now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT_IMPLEMENTED will crash the Envoy. That is not expected.

ASSERT could replace the the ENVOY_LOG, but this if branch should be kept in case Envoy does receive a listener config with internal address. Schema check is not helping here.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this PR? It's not that many lines of code and it doesn't actually do anything? I would rather you either a) just hold this PR and implement the full thing or b) put in ASSERT or RELEASE_ASSERT for stuff like this since you have a not implemented and not documented feature. It doesn't make any sense to add this type of logic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to push api first as the conversion.

I don't get your plan b RELEASE_ASSERT is easier but it will crash Envoy. Why do you prefer an unsupported listener config crashing Envoy? With these lines above envoy would NACK the listener config without crash.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you are adding error handling/logging logic for something that is not implemented. It bloats the code base for no good reason. If you want to make sure it crashes, do RELEASE_ASSERT(..., "this is not implemented yet."). Thanks.

/wait

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, to NACK I should throw exception instead of return false.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 Is throwing EnvoyException beating return false and RELEASE_ASSERT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Crash, or please just close this PR and reopen when there is an implementation. It does not make any sense to add actual error handling for an unimplemented partial feature.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to crash and move to next listener config api.
And then the internal connection driven by scheduler.

Running tests...

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@htuch
Copy link
Copy Markdown
Member

htuch commented Sep 11, 2020

/lgtm api

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lambdai
Copy link
Copy Markdown
Contributor Author

lambdai commented Sep 11, 2020

Thank you @mattklein123
This crash on unimplemented strategy make my future PRs much easier.
I am expecting more combinations are allowed in the follow up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants