Skip to content

quiche: support downstream using SDS#15821

Merged
snowp merged 23 commits intoenvoyproxy:mainfrom
danzh2010:sdscallback
Apr 20, 2021
Merged

quiche: support downstream using SDS#15821
snowp merged 23 commits intoenvoyproxy:mainfrom
danzh2010:sdscallback

Conversation

@danzh2010
Copy link
Copy Markdown
Contributor

Commit Message: Update quic server transport socket factory upon SDS updates. Add stats for SDS updates. Add checking of nullptr

Additional Description: fix UDP listener to register xDS with server init_manager.
Risk Level: low
Testing: enable downstream HTTP3 in sds_dynamic_integration_test.cc

Fixes #15034
Part of #12930 #2557

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@alyssawilk
Copy link
Copy Markdown
Contributor

I'll wait until you have CI sorted.
Since you mentioned offline this is downstream only can you add a TODO(14829) somewhere appropriate for upstream support?

QuicTransportSocketFactoryStats stats_;

private:
std::unique_ptr<Ssl::ContextConfig> context_config_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the full picture regarding the client context. However, if you can put context_config_ to derived QuicServerTransportSocketFactory and QuicClientTransportSocketFactory, the dynamic cast can be eliminated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I moved it from subclasses to the base class so that they can have the SDS update callback setup in the base class :)
Doing so is just to have the same behavior on both endpoints. Indeed it brings up the need of dynamic_cast in subclasses. I don't have strong preference for either way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it more, I changed the ownership to subclasses as the client subclass may need to do extra work in the future. Thus avoiding the dynamic_cast. Thanks for pointing out the issue here!

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15821 (comment) was created by @danzh2010.

see: more, trace.

Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main merge should fix verify_examples, sorry for the breakage

getSdsTestsParams(bool test_quic) {
std::vector<std::tuple<Network::Address::IpVersion, Grpc::ClientType, bool>> ret;
for (auto ip_version : TestEnvironment::getIpVersionsForTest()) {
for (auto grpc_type : {Grpc::ClientType::EnvoyGrpc, Grpc::ClientType::GoogleGrpc}) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a utility for this, like getIpVersionsFor test, which getsGrpcVersionsForTest based on what's compiled in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, also updated grpc integration tests to use that.


ret.push_back({ip_version, grpc_type, false});
if (test_quic) {
#ifdef ENVOY_ENABLE_QUIC
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should have one for client / upstream protocols but I can take that on, since I didn't add the correct utility over in https://github.com/envoyproxy/envoy/pull/15853/files

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

transport_socket_config.mutable_downstream_tls_context()->mutable_common_tls_context();
configInlinedCerts(common_tls_context);
transport_socket->set_name("envoy.transport_sockets.quic");
transport_socket->mutable_typed_config()->PackFrom(transport_socket_config);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we're getting the quic vs tls stuff in a lot of places and I'd like to reduce that. Can you think of something cleaner? Passing in the context to a lambda and having it fill in the right one? Setting the h2 tls context by default and having a convertIfHttp3 function which moved it over and set the correct transport socket name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a utility function in ConfigHelper to distinguish ssl v.s quic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the new test helper here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should! Done

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo that one typo. Passing to Snow for second pass

}
}

void ConfigHelper::configDownstreamTransportSocketWitTls(
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WitTls -> WithTls

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable to me, just a few suggestions/questions

Comment on lines +48 to +95
const Ssl::ClientContextConfig& clientContextConfig() const { return *config_; }
const Ssl::ClientContextConfig& clientContextConfig() const { return config_; }
Copy link
Copy Markdown
Contributor

@snowp snowp Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an alternative way to do this would be to have this getter do

ASSERT(dynamic_cast<...> != nullptr);
return static_cast<...>(contextConfig());

That way we avoid storing a separate ref on the derived classes while still avoiding a dynamic cast in prod builds?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

QuicTransportSocketFactoryStats stats_;

private:
std::unique_ptr<Ssl::ContextConfig> context_config_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be const. setSecretUpdateCallback() is non-const method. But all these stuff has been moved to subclasses.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more about const unique_ptr, not unique_ptr<const ...>, since the pointer doesn't change

transport_socket->set_name("envoy.transport_sockets.quic");
envoy::extensions::transport_sockets::quic::v3::QuicDownstreamTransport
quic_transport_socket_config;
fill_tls_stuff(*quic_transport_socket_config.mutable_downstream_tls_context()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this name a bit more explicit? configureTlsContext?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public:
SdsDynamicIntegrationBaseTest()
: HttpIntegrationTest(Http::CodecClient::Type::HTTP1, ipVersion()),
: HttpIntegrationTest(Http::CodecClient::Type::HTTP1, std::get<0>(GetParam())),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems easier to read when this was ipVersion()?

Copy link
Copy Markdown
Contributor Author

@danzh2010 danzh2010 Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipVersion() is a base class member. We haven't construct the base class here yet. Switched to use TestParams.ip_version for readability.

: public Grpc::BaseGrpcClientIntegrationParamTest,
public HttpIntegrationTest,
public testing::TestWithParam<
std::tuple<Network::Address::IpVersion, Grpc::ClientType, bool>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe document what the gRPC test parameter is used for? Also what the bool refer to

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change the tuple to a struct. Hope it helps readability.

transport_socket_config.mutable_downstream_tls_context()->mutable_common_tls_context();
configInlinedCerts(common_tls_context);
transport_socket->set_name("envoy.transport_sockets.quic");
transport_socket->mutable_typed_config()->PackFrom(transport_socket_config);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the new test helper here?

Signed-off-by: Dan Zhang <danzh@google.com>
stats_.context_config_update_by_sds_.inc();
: stats_(generateStats(store, perspective)) {
context_config.setSecretUpdateCallback([this]() {
// The callback also triggers updating config_ with the new secret.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also updates

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// TODO(danzh) Client transport socket factory may also need to update quic crypto.
stats_.context_config_update_by_sds_.inc();
: stats_(generateStats(store, perspective)) {
context_config.setSecretUpdateCallback([this]() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this ever be immediately triggered? we'd call a virtual function in the ctor in that case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if triggerred in the callstack, it will call base class version of the virtual function which makes sense as the derived class hasn't use context_config to setup anything yet. Is this not desired in Envoy?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://google.github.io/styleguide/cppguide.html#Doing_Work_in_Constructors

It tends to lead to surprising behavior and is often not resilient to refactors, so I would avoid it if possible

Copy link
Copy Markdown
Contributor Author

@danzh2010 danzh2010 Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I added initialize() to add the callback after construction.

std::get<2>(p.param) ? "UsesQuic" : "UsesTcp");
struct TestParams {
Network::Address::IpVersion ip_version;
Grpc::ClientType grpc_type;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sds_grpc_type? to make it clear that this relates to the SDS client, not the downstream client of the integration test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@snowp snowp added the waiting label Apr 14, 2021
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15821 (comment) was created by @danzh2010.

see: more, trace.

@danzh2010
Copy link
Copy Markdown
Contributor Author

Ping @snowp

Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is looking pretty good! A few more comments

Comment on lines +79 to +84
if (!config_->isReady()) {
ENVOY_LOG(warn, "SDS hasn't finished updating Ssl context config yet.");
stats_.downstream_context_secrets_not_ready_.inc();
return {};
}
return config_->tlsCertificates();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding when would this happen? Do we not block the usage of this on the SDS config being ready?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do block. But I saw similar logic in SSL socket to create a dummy socket so might as well added this.

@@ -1,5 +1,6 @@
#include <memory>
#include <string>
#include <type_traits>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything from type_traits being introduced in this PR, can this be removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines +228 to +230
common_tls_context.add_alpn_protocols(downstream_protocol_ == Http::CodecClient::Type::HTTP1
? Http::Utility::AlpnNames::get().Http11
: Http::Utility::AlpnNames::get().Http3);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use test_quic instead? I think this logic just matches the logic above that uses test_quic? Not clear without that why we do this remapping

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, test_quic_ and downstream_protocol_ are consistent. Switched to use test_quic_ as other places in this test do.

Comment on lines +250 to +251
? "quic_server_transport_socket_factory.context_config_update_by_sds"
: "server_ssl_socket_factory.ssl_context_update_by_sds"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder whether the stats name should match? ie make the stat introduced in this PR ssl_context_update_by_sds? Or does that not fit well with the quic naming of things?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssl_context is a SslSocket thing. Quic transport socket factory uses its config directly instead of generating an ssl_context object. So I think use ssl_context in the stats name is confusing.

client_ssl_ctx_ = createClientSslTransportSocketFactory({}, context_manager_, *api_);
}

void configInlinedCerts(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit config -> configure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


void configInlinedCerts(
envoy::extensions::transport_sockets::tls::v3::CommonTlsContext* common_tls_context) {
common_tls_context->add_alpn_protocols(Http::Utility::AlpnNames::get().Http11);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pull this out or rename function to make it clear that we're also setting the ALPN?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it out.

@snowp snowp added the waiting label Apr 19, 2021
Signed-off-by: Dan Zhang <danzh@google.com>
@antoniovicente antoniovicente self-assigned this Apr 19, 2021
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I see @antoniovicente assigned himself so I'll let him take a look as well

@antoniovicente
Copy link
Copy Markdown
Contributor

LGTM! I see @antoniovicente assigned himself so I'll let him take a look as well

I mostly added myself as a reviewer due to Alyssa being OOO this week. I did a quick pass and think that the changes in this PR look good.

@snowp snowp merged commit d841924 into envoyproxy:main Apr 20, 2021
gokulnair pushed a commit to gokulnair/envoy that referenced this pull request May 6, 2021
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Gokul Nair <gnair@twitter.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QuicDownstreamTransport never requests SDS certificate

6 participants