Skip to content

tls: support async cert validation#21417

Merged
alyssawilk merged 91 commits intoenvoyproxy:mainfrom
danzh2010:sslsocket
Aug 3, 2022
Merged

tls: support async cert validation#21417
alyssawilk merged 91 commits intoenvoyproxy:mainfrom
danzh2010:sslsocket

Conversation

@danzh2010
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 commented May 24, 2022

Commit Message: change TLS transport socket to use SSL_CTX_set_custom_verify() instead of SSL_CTX_set_verify() and change CertValidator interface to support async cert validation. Also change EnvoyQuicCertVerifier to use the new async interfaces.

This change is needed for envoyproxy/envoy-mobile#1575. Envoy Mobile allows certificates to be verified by the OS-provided certificate verifier. And these verification can be very slow and so blocking the network thread while the verification happens is not an option. Instead, the verification should be performed asynchronously on a different thread. (This is how the cert verification works in Chrome, which is what we're modeling this implementation on).

Risk Level: high, change boring SSL interface used
Testing: added new unit tests and integration tests
Docs Changes: release note
Release Notes: documented tls transport changes.
Runtime guard: envoy.reloadable_features.tls_async_cert_validation

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Ssl::ValidateResult result = cert_validator_->doCustomVerifyCertChain(
*cert_chain, nullptr, extended_socket_info, transport_socket_options, SSL_get_SSL_CTX(ssl),
ech_name_override, SSL_is_server(ssl), error_details, out_alert);
if (result != Ssl::ValidateResult::Pending && extended_socket_info != nullptr) {
Copy link
Copy Markdown
Member

@soulxu soulxu May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems extended_socket_info can't be nullptr. otherwise the code here doesn't check the nullptr https://github.com/envoyproxy/envoy/pull/21417/files#diff-2b39f212c2f760ae547351a46f92ad08c0ec84c556d678ebe9388b9d008e7313R462

but I could be wrong, still learning this part of code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out! I changed it into ASSERT.

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/assign @RyanTheOptimist

Signed-off-by: Dan Zhang <danzh@google.com>
@yanavlasov yanavlasov self-assigned this May 24, 2022
Copy link
Copy Markdown
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot going on in this PR, as you mentioned :) If there are any opportunities to break out refactors into smaller PRs, I would strongly recommend doing so to expedite getting this landed. Here's a first pass at reviewing source/.

@ggreenway
Copy link
Copy Markdown
Member

High level question to start, since it wasn't obvious from skimming through the code: why does this need to become async?

/wait-any

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

High level question to start, since it wasn't obvious from skimming through the code: why does this need to become async?

/wait-any

Oh, great question. We should discuss that in the PR and potentially link to envoyproxy/envoy-mobile#1575. This PR lays the ground work for eventually allowing certificates to be verified by the OS-provided certificate verifier. Annoyingly, these verifications can be very slow and so blocking the network thread while the verification happens is not an option. Instead, the verification will be performed on a different thread. (This is how the cert verification works in Chrome, which is what we're modeling this implementation on). Hopefully that helps?

@yanavlasov
Copy link
Copy Markdown
Contributor

High level question to start, since it wasn't obvious from skimming through the code: why does this need to become async?
/wait-any

Oh, great question. We should discuss that in the PR and potentially link to envoyproxy/envoy-mobile#1575. This PR lays the ground work for eventually allowing certificates to be verified by the OS-provided certificate verifier. Annoyingly, these verifications can be very slow and so blocking the network thread while the verification happens is not an option. Instead, the verification will be performed on a different thread. (This is how the cert verification works in Chrome, which is what we're modeling this implementation on). Hopefully that helps?

Another use case is support for cloud HSM, that require RPC for key operations.

@ggreenway
Copy link
Copy Markdown
Member

High level question to start, since it wasn't obvious from skimming through the code: why does this need to become async?
/wait-any

Oh, great question. We should discuss that in the PR and potentially link to envoyproxy/envoy-mobile#1575. This PR lays the ground work for eventually allowing certificates to be verified by the OS-provided certificate verifier. Annoyingly, these verifications can be very slow and so blocking the network thread while the verification happens is not an option. Instead, the verification will be performed on a different thread. (This is how the cert verification works in Chrome, which is what we're modeling this implementation on). Hopefully that helps?

Another use case is support for cloud HSM, that require RPC for key operations.

This change is to validation; I think it would be a different change to do async signing.

Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Signed-off-by: Dan Zhang <danzh@google.com>
if (error_details != nullptr) {
*error_details = error;
}
ENVOY_LOG(debug, error);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it error_details since error is fixed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using either error or error_details should be fine. What do you mean by error being fixed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think error is not changed since:

      if (error_details != nullptr) {
        *error_details = error;
      }

if (error_details != nullptr) {
*error_details = error;
}
ENVOY_LOG(debug, error);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it error_details since error is fixed?

@danzh2010
Copy link
Copy Markdown
Contributor Author

PTAL @ggreenway

@ggreenway
Copy link
Copy Markdown
Member

/wait-any

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

clang-tidy is still complaining the unrelated missing include in source/common/common/logger.h.
PTAL @ggreenway

ggreenway
ggreenway previously approved these changes Aug 2, 2022
Copy link
Copy Markdown
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I'm not sure why clang-tidy decided to look at logger.h in this PR.

@ggreenway
Copy link
Copy Markdown
Member

clang-tidy error is a tool failure; I don't understand why it's producing this error.

Writing fixes to clang-tidy-fixes.yaml ...
clang-tidy check failed, potentially fixed by clang-apply-replacements:
Diagnostics:
- BuildDirectory: /build/tmp/_bazel_envoybuild/b570b5ccd0454dc9af9f65ab1833764d/execroot/envoy
  DiagnosticMessage:
    FileOffset: 303
    FilePath: ./source/common/common/logger.h
    Message: '''source/common/common/logger_impl.h'' file not found'
    Ranges:
    - FileOffset: 303
      FilePath: ./source/common/common/logger.h
      Length: 36
    Replacements: []
  DiagnosticName: clang-diagnostic-error
  Level: Error
MainSourceFile: ''

@danzh2010
Copy link
Copy Markdown
Contributor Author

@RyanTheOptimist This is good to merge. Thanks!

@RyanTheOptimist
Copy link
Copy Markdown
Contributor

Alas, I can't merge 'cause of the spurious clang tidy error. @alyssawilk, can you?

@alyssawilk
Copy link
Copy Markdown
Contributor

alas by now it needs a main merge, but I'm happy to force merge once that's sorted

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

The PR is sync'ed @alyssawilk

@alyssawilk alyssawilk merged commit a95d1b7 into envoyproxy:main Aug 3, 2022
davidben added a commit to davidben/envoy that referenced this pull request Nov 4, 2022
Both things the comment says are inaccurate: After envoyproxy#23320, this
mechanism no longer uses the X509_STORE callback, but a flag.
After envoyproxy#21417, Envoy is also not (always) using
SSL_CTX_set_cert_verify_callback, but a different one.

Just remove the comment altogether, as it doesn't seem to provide any
value. All it's saying is that, because Envoy still uses `store` at the
end of the day, configuring things on `store` does stuff. But the whole
function configures things on `store`, so it's not specific to that
block anyway.

Signed-off-by: David Benjamin <davidben@google.com>
lizan pushed a commit that referenced this pull request Nov 7, 2022
Both things the comment says are inaccurate: After #23320, this
mechanism no longer uses the X509_STORE callback, but a flag.
After #21417, Envoy is also not (always) using
SSL_CTX_set_cert_verify_callback, but a different one.

Just remove the comment altogether, as it doesn't seem to provide any
value. All it's saying is that, because Envoy still uses `store` at the
end of the day, configuring things on `store` does stuff. But the whole
function configures things on `store`, so it's not specific to that
block anyway.

Signed-off-by: David Benjamin <davidben@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants