Skip to content

upstream: update host's socket factory when metadata is updated.#16708

Merged
lizan merged 7 commits intoenvoyproxy:mainfrom
cpakulski:issue/16536
Jul 1, 2021
Merged

upstream: update host's socket factory when metadata is updated.#16708
lizan merged 7 commits intoenvoyproxy:mainfrom
cpakulski:issue/16536

Conversation

@cpakulski
Copy link
Copy Markdown
Contributor

Commit Message:
Update host's socket factory when metadata is updated.

Additional Description:
During dynamic endpoint updates via EDS, BaseDynamicClusterImpl::updateDynamicHostList method
tries to minimize deletions/insertions. If only metadata was changed, endpoint is updated in-place by calling HostDescriptionImpl::metadata, which updates metadata but not fields dependent on metadata. In particular,
if metadata's transport socket matcher was changed, socket_factory was not updated.
Risk Level: Low
Testing: added unit test
Docs Changes: no
Release Notes: no
Platform Specific Features:
Fixes #16536

cpakulski added 3 commits May 27, 2021 21:09
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
@cpakulski
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16708 (comment) was created by @cpakulski.

see: more, trace.

void metadata(MetadataConstSharedPtr new_metadata) override {
absl::WriterMutexLock lock(&metadata_mutex_);
metadata_ = new_metadata;
// Update data members dependent on metadata.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could resolveTransportSocketFactory be moved out of the lock guard?
The referenced Metadata should be immutable and valid.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lambdai I can add a scope around the lock to unlock it before calling resolveTransportSocketFactory. Why do you think it is necessary? Deadlock?

Copy link
Copy Markdown
Contributor

@lambdai lambdai Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply.
It's not necessary but nice to have.
The only reason is performance. resolveTransportSocketFactory doesn't need the lock to protect

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take it or leave it. It's probably not a bottleneck.

auto factory = resolveTransportSocketFactory(address_, new_metadata.get());
{
absl::WriterMutexLock lock(&metadata_mutex_);
metadata_ = new_metadata;
socket_factory_ = factory;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as you suggested. There is small performance improvement to resolve factory outside of mutex protected scope. Thanks!

…tected scope.

Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
@cpakulski
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16708 (comment) was created by @cpakulski.

see: more, trace.

Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
@cpakulski
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16708 (comment) was created by @cpakulski.

see: more, trace.

HealthCheckHostMonitorPtr health_checker_;
std::atomic<uint32_t> priority_;
Network::TransportSocketFactory& socket_factory_;
std::reference_wrapper<Network::TransportSocketFactory> socket_factory_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a ABSL_GUARDED_BY annotation as well? Should this use same mutex as metadata?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation added .
Yeah, after checking the code layout I believe that it is possible that metadata update and creating a new upstream connection (which needs access to transport socket factory) to happen at the same time on different threads. To protect against race conditions, a mutex around socket_factory_ has been added.

It is possible that metadata_ is updated at the same time when socket_factory_ is
used to create a new upstream connection.

Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
@cpakulski
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16708 (comment) was created by @cpakulski.

see: more, trace.

@cpakulski cpakulski requested a review from lizan June 22, 2021 00:29
lizan
lizan previously approved these changes Jun 28, 2021
Copy link
Copy Markdown
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign-from @envoyproxy/maintainers
for non-tetrands review

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/maintainers assignee is @lizan

🐱

Caused by: a #16708 (review) was submitted by @lizan.

see: more, trace.

@lizan
Copy link
Copy Markdown
Member

lizan commented Jun 28, 2021

/assign-from @envoyproxy/maintainers

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/maintainers assignee is @lizan

🐱

Caused by: a #16708 (comment) was created by @lizan.

see: more, trace.

@lizan lizan removed their assignment Jun 28, 2021
@lizan
Copy link
Copy Markdown
Member

lizan commented Jun 28, 2021

/assign-from @envoyproxy/maintainers

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/maintainers assignee is @jmarantz

🐱

Caused by: a #16708 (comment) was created by @lizan.

see: more, trace.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some superficial comments here. I think someone that knows the semantics of this corner of the code also needs to review.

metadata_ = new_metadata;
}
{
absl::WriterMutexLock lock(&socket_factory_mutex_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it semantically sound to have a separate lock for the socket factory? If there's a thread wakeup between these two locks will there be an inconsistent view of the data?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified it and one mutex is used to protect metadata_ and fields dependent on it - in this case socket_factory_.

@jmarantz
Copy link
Copy Markdown
Contributor

/wait

Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks much better.

This was pre-existing but most people I know including myself have seen worse performance from reader/writer locks vs just plain locks. This was pre-existing though.

@cpakulski
Copy link
Copy Markdown
Contributor Author

@jmarantz Thanks for reviewing and explaining issue with locks.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lizan do you want to to do a final senior maintainer pass?

@lizan lizan merged commit 329b249 into envoyproxy:main Jul 1, 2021
baojr added a commit to baojr/envoy that referenced this pull request Jul 7, 2021
* main:
  listener: match rebalancer to listener IP family type (envoyproxy#16914)
  jwt_authn: implementation of www-authenticate header (envoyproxy#16216)
  listener: reset the file event in framework instead of listener filter doing itself (envoyproxy#17227)
  Small typo fix (envoyproxy#17247)
  Doc: Clarify request/response attributes are http-only (envoyproxy#17204)
  bazel/README.md: add aspell comment (envoyproxy#17072)
  docs: Fix broken URL links in HTTP upgrades doc (envoyproxy#17225)
  remove the wrong comment on test (envoyproxy#17233)
  upstream: allow clusters to skip waiting on warmup for initialization (envoyproxy#17179)
  JwtAuthn: support completing padding on forward jwt payload header (envoyproxy#16752)
  remove support for v2 UNSUPPORTED_REST_LEGACY (envoyproxy#16968)
  metrics service: fix wrong argument arrange on MetricsServiceSink (envoyproxy#17127)
  deps: update cel-cpp to 0.6.1 (envoyproxy#16293)
  Add ability to filter ConfigDump. (envoyproxy#16774)
  examples: fix Wasm example. (envoyproxy#17218)
  upstream: update host's socket factory when metadata is updated. (envoyproxy#16708)

Signed-off-by: Garrett Bourg <bourg@squareup.com>
chrisxrepo pushed a commit to chrisxrepo/envoy that referenced this pull request Jul 8, 2021
…oyproxy#16708)

During dynamic endpoint updates via EDS, `BaseDynamicClusterImpl::updateDynamicHostList` method
tries to minimize deletions/insertions. If only metadata was changed, endpoint is updated in-place by calling `HostDescriptionImpl::metadata`, which updates metadata but not fields dependent on metadata. In particular, 
if metadata's transport socket matcher was changed, socket_factory was not updated. 
Risk Level: Low
Testing: added unit test
Docs Changes: no
Release Notes: no 
Platform Specific Features:
Fixes envoyproxy#16536

Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: chris.xin <xinchuantao@qq.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
…oyproxy#16708)

During dynamic endpoint updates via EDS, `BaseDynamicClusterImpl::updateDynamicHostList` method
tries to minimize deletions/insertions. If only metadata was changed, endpoint is updated in-place by calling `HostDescriptionImpl::metadata`, which updates metadata but not fields dependent on metadata. In particular, 
if metadata's transport socket matcher was changed, socket_factory was not updated. 
Risk Level: Low
Testing: added unit test
Docs Changes: no
Release Notes: no 
Platform Specific Features:
Fixes envoyproxy#16536

Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

transport_socket_matches doesn't update properly when endpoint metadata updates

4 participants