Skip to content

quic: HTTP/3 upstream initial checkin#15027

Merged
alyssawilk merged 8 commits intoenvoyproxy:mainfrom
alyssawilk:upstream_quic
Mar 3, 2021
Merged

quic: HTTP/3 upstream initial checkin#15027
alyssawilk merged 8 commits intoenvoyproxy:mainfrom
alyssawilk:upstream_quic

Conversation

@alyssawilk
Copy link
Copy Markdown
Contributor

This is working end to end for a number of the protocol integration tests, but still needs a bunch of work before it's production ready (I triaged some of the excludes but not all, watermarks not working etc).
I think given this works for the happy path it's worth checking in, and I can iterate on getting individual tests working and doing the TODOs.

Risk Level: low (minor core refactors, major changes are all behind hidden config)
Testing: integration tests, some unit tests
Docs Changes: n/a
Release Notes: n/a
#14829

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

cc @DavidSchinazi

Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working out this framework!

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Copy Markdown
Contributor Author

comments should be addressed, PTAL

Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. I have some thoughts on the EnvoyQuicClientSessionWithExtra, happy to discuss. Looks good over all, but I would prefer someone who is familiar with allocateConnPool() implementation and cluster_manager_impl to review that part.

Event::Dispatcher& dispatcher, TimeSource& time_source) {
// TODO(#14829): reject the config if a raw buffer socket is configured.
auto* ssl_socket_factory =
dynamic_cast<Extensions::TransportSockets::Tls::ClientSslSocketFactory*>(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding getter for config_ in ClientSslSocketFactory(), can we use QuicClientTransportSocketFactory instead? Maybe worth to reject the config if it is configured to use anything else.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, have a TODO to reject the config already. Using the quic transport factory involves a bunch of extra workarounds due to the release assert on creating transport sockets. It means I have to fix the TODO for not creating the connection (which used the TLS config) and tossing it, but isn't too much extra cruft to get that TODO done :-)

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, alone aside a few nits

namespace Http3 {

// TODO(#14829) the constructor of Http2::ActiveClient sets max requests per
// connection based on HTTP/2 config. Sort out the HTTP/3 config story.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quic listener configures this field in envoy.config.listener.v3.QuicProtocolOptions. I guess it's a question about where to place this config and how to plumb thru.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but given some recent discussions around whether QUIC should be an extension vs built-in (potentially with the option to compile it out) may influence our decisions here. I would suggest a short document on QUIC build and configuration so we can discuss the options?

std::vector<std::string> server_names;
auto& config_factory = Config::Utility::getAndCheckFactoryByName<
Server::Configuration::DownstreamTransportSocketConfigFactory>(
"envoy.transport_sockets.quic");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: use Extensions::TransportSockets::TransportSocketNames::get().Quic instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're moving away from wkn files per #7238
I used hard-coded strings to avoid further use of wkn. will let Matt comment on his preference on these ones since I do lean towards using predefined values myself.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine either way for now, but again related to why I think we need to undo all the extension stuff.

envoy::config::listener::v3::QuicProtocolOptions config;
auto& config_factory =
Config::Utility::getAndCheckFactoryByName<Server::ActiveUdpListenerConfigFactory>(
"quiche_quic_listener");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: use Envoy.Quic.QuicListenerName instead

config_helper_.addConfigModifier(
[&](envoy::config::bootstrap::v3::Bootstrap& bootstrap) -> void {
// Docker doesn't allow writing to the v6 address returned by
// Network::Utility::getLocalAddress.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a docker bug that leads us to hardcode address here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test passes locally for me without that change, but failed on CI. I as far as I could tell the docker container was resolving a local v6 address but was cranky at sending to its non-loopback v6 address, where it was fine doing so with v4.
AFIK we have similar issues in-house with v6 docker behaving differently so I'm not terribly surprised.

namespace Envoy {
namespace Http {

// A factory to create EnvoyQuicClientConnection instance for QUIC
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: s/EnvoyQuicClientConnection/EnvoyQuicClientSession and EnvoyQuicClientConnection

virtual std::unique_ptr<Network::ClientConnection>
createQuicNetworkConnection(Network::Address::InstanceConstSharedPtr server_addr,
Network::Address::InstanceConstSharedPtr local_addr,
Network::TransportSocketFactory& transport_socket_factory,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: pass in QuicClientTransportSocketFactory instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a core http3 API and that's a QUIC class in the extension directory so I think leaving as-is makes sense?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. That makes sense! Maybe I should move Quic TransportSocketFactory into core code instead. It doesn't depends on QUICHE.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move it all at this point. See my other comments.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Contributor Author

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 you up for a pass on the conn pool side of things?

virtual std::unique_ptr<Network::ClientConnection>
createQuicNetworkConnection(Network::Address::InstanceConstSharedPtr server_addr,
Network::Address::InstanceConstSharedPtr local_addr,
Network::TransportSocketFactory& transport_socket_factory,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a core http3 API and that's a QUIC class in the extension directory so I think leaving as-is makes sense?

std::vector<std::string> server_names;
auto& config_factory = Config::Utility::getAndCheckFactoryByName<
Server::Configuration::DownstreamTransportSocketConfigFactory>(
"envoy.transport_sockets.quic");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're moving away from wkn files per #7238
I used hard-coded strings to avoid further use of wkn. will let Matt comment on his preference on these ones since I do lean towards using predefined values myself.

config_helper_.addConfigModifier(
[&](envoy::config::bootstrap::v3::Bootstrap& bootstrap) -> void {
// Docker doesn't allow writing to the v6 address returned by
// Network::Utility::getLocalAddress.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test passes locally for me without that change, but failed on CI. I as far as I could tell the docker container was resolving a local v6 address but was cranky at sending to its non-loopback v6 address, where it was fine doing so with v4.
AFIK we have similar issues in-house with v6 docker behaving differently so I'm not terribly surprised.

@alyssawilk
Copy link
Copy Markdown
Contributor Author

ping!

@alyssawilk
Copy link
Copy Markdown
Contributor Author

(mac failure is the known timeout issue)

danzh2010
danzh2010 previously approved these changes Mar 2, 2021
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. LGTM with some comments and one line that I think was meant to be deleted.

/wait

namespace Http3 {

// TODO(#14829) the constructor of Http2::ActiveClient sets max requests per
// connection based on HTTP/2 config. Sort out the HTTP/3 config story.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but given some recent discussions around whether QUIC should be an extension vs built-in (potentially with the option to compile it out) may influence our decisions here. I would suggest a short document on QUIC build and configuration so we can discuss the options?

data.host_description_->transportSocketFactory();
data.connection_ =
Config::Utility::getAndCheckFactoryByName<Http::QuicClientConnectionFactory>(
Http::QuicCodecNames::get().Quiche)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another example of an extension leaking into core code which I don't think makes sense.

virtual std::unique_ptr<Network::ClientConnection>
createQuicNetworkConnection(Network::Address::InstanceConstSharedPtr server_addr,
Network::Address::InstanceConstSharedPtr local_addr,
Network::TransportSocketFactory& transport_socket_factory,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move it all at this point. See my other comments.

}

void FakeUpstream::onRecvDatagram(Network::UdpRecvData& data) {
std::cerr << "Got datagrams. Need to dispatch\n";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

std::vector<std::string> server_names;
auto& config_factory = Config::Utility::getAndCheckFactoryByName<
Server::Configuration::DownstreamTransportSocketConfigFactory>(
"envoy.transport_sockets.quic");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine either way for now, but again related to why I think we need to undo all the extension stuff.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (let's discuss/track how to deal with the extension status in #12829)

@alyssawilk alyssawilk merged commit 0a58f84 into envoyproxy:main Mar 3, 2021
@alyssawilk alyssawilk mentioned this pull request Mar 3, 2021
@alyssawilk alyssawilk deleted the upstream_quic branch August 18, 2021 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants