Skip to content

sds: certificate hot-reload for xDS gRPC connection#10163

Merged
lizan merged 3 commits intoenvoyproxy:masterfrom
Nordix:issue-9359
Apr 1, 2020
Merged

sds: certificate hot-reload for xDS gRPC connection#10163
lizan merged 3 commits intoenvoyproxy:masterfrom
Nordix:issue-9359

Conversation

@tsaarni
Copy link
Copy Markdown
Member

@tsaarni tsaarni commented Feb 25, 2020

Create watches for secrets pointed by path based SDS resources and trigger
hot-reload when the files change. This allows rotation of TLS certificate
and key for the xDS gRPC connection without restart.

Signed-off-by: Tero Saarni tero.saarni@est.tech

Description:
This PR adds support for certificate rotation for xDS gRPC interface. The implementation approach has been discussed in #9359.

The PR consists of following parts:

  • Add capability to SDS to make filesystem subscriptions for the certificate and key files referred to from SDS resources.
  • Adds capability to inotify WatcherImpl to make subscriptions for directories instead of only files. The change is backwards compatible.

The latter is needed to support commonly used pattern of updating files with atomic directory swap. Watching individual files is not possible, since in this pattern the files are symbolic links which point to a directory. The target of the symbolic links is a directory, which is atomically renamed (replaced) at update, not the files. Kubernetes uses this approach for secret volume mount. The details are covered in: #9359 (comment) and by the test case introduced in this PR.

The feature introduced by this PR is needed by Contour projectcontour/contour#2143 but it can be beneficial to any user of Envoy.

Risk Level: Medium
Testing: unittest
Docs Changes: To be defined: add information about the feature somewhere in the documentation
Release Notes:

Fixes #9359

@tsaarni
Copy link
Copy Markdown
Member Author

tsaarni commented Feb 25, 2020

I have added the new behavior to only inotify file watcher for Linux, but should this be added to kqueue watcher too - I guess the test case I added may fail on MacOS?

Edit: yes it did. I'll arrange compilation environment on my macbook and fix it!

@tsaarni
Copy link
Copy Markdown
Member Author

tsaarni commented Mar 4, 2020

After adding kqueue watcher implementation for macos, a different test case failed. I cannot find direct link between my change and the new failure. I see it occasionally failing on my machine too, ranging from 1 failure out of 1000 runs, to 3 out of 1000:

bazel test --runs_per_test 1000 //test/extensions/common/redis:cluster_refresh_manager_test`
...
//test/extensions/common/redis:cluster_refresh_manager_test              FAILED in 3 out of 1000 in 2.6s

This happens on master branch as well.

@stale
Copy link
Copy Markdown

stale bot commented Mar 11, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 11, 2020
@tsaarni
Copy link
Copy Markdown
Member Author

tsaarni commented Mar 11, 2020

I'd appreciate review comments on this PR, thank you!

Copy link
Copy Markdown
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I'm back from OOO.

@stale
Copy link
Copy Markdown

stale bot commented Mar 23, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 23, 2020
@lizan
Copy link
Copy Markdown
Member

lizan commented Mar 23, 2020

Can you merge master? thanks

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Mar 23, 2020
Create watches for secrets pointed by path based SDS resources and trigger
hot-reload when the files change.  This allows rotation of TLS certificates
and key for the xDS gRPC connection without restart.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
@tsaarni
Copy link
Copy Markdown
Member Author

tsaarni commented Mar 24, 2020

I've rebased the PR.
There was some flakiness in test environment this time...

@stale
Copy link
Copy Markdown

stale bot commented Mar 31, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 31, 2020
@lizan
Copy link
Copy Markdown
Member

lizan commented Mar 31, 2020

Sorry for the delay, can you merge master again? not rebase but just merge.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
@tsaarni
Copy link
Copy Markdown
Member Author

tsaarni commented Apr 1, 2020

I merged but got some new errors with the CI https://dev.azure.com/cncf/envoy/_build/results?buildId=36546&view=logs&j=3bf5ec37-bb6f-5fb6-2cf1-5134dc6106d6&t=41d571fe-3811-578d-07de-68468f467dae&l=1567

  1. variable generic_secret should be generic_secret_: this code existed there before
  2. sys/event.h file missing: this include is in macos specific code and should not exist on linux

@lizan lizan merged commit 5105d9d into envoyproxy:master Apr 1, 2020
@lizan
Copy link
Copy Markdown
Member

lizan commented Apr 1, 2020

@tsaarni I merged this ahead and will fix clang-tidy later. Thanks.

@mattklein123
Copy link
Copy Markdown
Member

@lizan @tsaarni this change should have a release note and documentation updates about the new behavior. @tsaarni can you please add some in a follow up? Thank you!

@tsaarni
Copy link
Copy Markdown
Member Author

tsaarni commented Apr 2, 2020

I propose to:

  1. Add new example in Security / Secret discovery service (SDS) to explain how to use SDS with filesystem based config resources to achieve hot-reload for xDS gRPC.
  2. Mention that filesystem subscriptions are also set up for referred certificate and key files in documentation of v2 core.ConfigSource and config.core.v3.ConfigSource

Though I'm bit unsure if the 2) is the best place in the API documentation for this information - at least those are the places where it already mentions about filesystem subscriptions in general.

I'm preparing documentation and will be submitting a PR soon for commenting.

@mattklein123
Copy link
Copy Markdown
Member

Thank you @tsaarni we can discuss the details in the PR!

tsaarni added a commit to Nordix/envoy that referenced this pull request Apr 2, 2020
Adds documentation for envoyproxy#10163.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
mattklein123 pushed a commit that referenced this pull request Apr 2, 2020
Adds documentation for #10163.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide support for certificate rotation for xDS connection in Envoy container images

3 participants